Inferring whole-genome histories in large population datasets: inferred tree sequences for 1000 Genomes

From MaRDI portal
Dataset:6682872



DOI10.5281/zenodo.3051855Zenodo3051855MaRDI QIDQ6682872

Dataset published at Zenodo repository.

Author name not available (Why is that?)

Publication date: 20 May 2019

Copyright license: No records found.



Tree sequences inferred for the 1000 Genomes phase 3autosomes usingtsinferversion 0.1.4 and compressed usingtszip. Tree sequences can be decompressed as follows: $ tsunzip 1kg_chr1.trees.tsz Once decompressed, trees files can be loaded and processed usingtskit. import tskit ts = tskit.load("1kg_chr1.trees") # ts is an instance of tskit.TreeSequence print("Chromosome 1 contains {} trees".format(ts.num_trees)) Metadata associated with individuals and populations was derived from the originalsourceand converted to JSON form. For example, to access individual metadata we can use: import tskit import json ts = tskit.load("1kg_chr1.trees") ind = ts.individual(0) metadata_dict = json.loads(ind.metadata) The metadata_dict variable will now containall the metadata for the individual with ID 0 as a dictionary. Metadata associated with populations can be found in a similar way. Population IDs are associated with individuals via their constituent nodes. For example, pop_metadata = [json.loads(pop.metadata) for pop in ts.populations()] ind_node = ts.node(ind.nodes[0]) ind_pop_metadata = pop_metadata[ind_node.population] After this, theind_pop_metadata variable will contain the population level metadata for individual ID 0. The full data pipeline used to generate these tree sequences and associated metadata is available on GitHub.






This page was built for dataset: Inferring whole-genome histories in large population datasets: inferred tree sequences for 1000 Genomes