Inferring whole-genome histories in large population datasets: inferred tree sequences for 1000 Genomes (Q6682872)

Dataset published at Zenodo repository.

Language	Label	Description	Also known as
English	Inferring whole-genome histories in large population datasets: inferred tree sequences for 1000 Genomes	Dataset published at Zenodo repository.

Statements

instance of

data set

0 references

description

Tree sequences inferred for the 1000 Genomes phase 3autosomes usingtsinferversion 0.1.4 and compressed usingtszip. Tree sequences can be decompressed as follows: $ tsunzip 1kg_chr1.trees.tsz Once decompressed, trees files can be loaded and processed usingtskit. import tskit ts = tskit.load("1kg_chr1.trees") # ts is an instance of tskit.TreeSequence print("Chromosome 1 contains {} trees".format(ts.num_trees)) Metadata associated with individuals and populations was derived from the originalsourceand converted to JSON form. For example, to access individual metadata we can use: import tskit import json ts = tskit.load("1kg_chr1.trees") ind = ts.individual(0) metadata_dict = json.loads(ind.metadata) The metadata_dict variable will now containall the metadata for the individual with ID 0 as a dictionary. Metadata associated with populations can be found in a similar way. Population IDs are associated with individuals via their constituent nodes. For example, pop_metadata = [json.loads(pop.metadata) for pop in ts.populations()] ind_node = ts.node(ind.nodes[0]) ind_pop_metadata = pop_metadata[ind_node.population] After this, theind_pop_metadata variable will contain the population level metadata for individual ID 0. The full data pipeline used to generate these tree sequences and associated metadata is available on GitHub.

0 references

publication date

20 May 2019

0 references

0 references

0 references

0 references