Lake Malawi cichlid pangenome graph reveals extensive structural variation driven by transposable elements
DOI10.5281/zenodo.14029309Zenodo14029309MaRDI QIDQ6707411
Dataset published at Zenodo repository.
Author name not available (Why is that?)
This repository represents a data snapshot associated with the manuscript Lake Malawi cichlid pangenome graph reveals extensive structural variation driven by transposable elements. The manuscript is currently available as apreprint on bioRxiv. Under an Access and Benefit Sharing agreement, these data are made available on an open access basis for research use only. Any person who wishes to use these data for any form of commercial purpose must first enter into a commercial licensing and benefit sharing arrangement with the Government of Malawi. For further information, contact the Access and Benefit-sharing National Focal Point (ABS NFP) for Malawi registered with CBD at https://www.cbd.int/information/nfp.shtml. Data description Provided in this repository are the FASTA files of six new genome assemblies. troMau: Tropheops sp. mauve, PacBio Sequel II aulStu: Aulonocara stuartgranti, PacBio Sequel II rhaChi: Rhamphochromis sp. chillingali (male), PacBio Sequel II otoArg: Otopharynx argyrosoma, R9 MinION copChr: Copadichromis chrysonotus, R9 MinION rhaChi2: Rhamphochromis sp. chillingali (female), R9 MinION Two previously published genomes from Ensembl v103 were also included in the pangenome graph: Astatotilapia calliptera (fAstCal1.2, GCF_900246225.1) and Maylandia zebra (M_zebra_UMD2a, GCA_000238955.4). Other files that are also included: malawi_haplochromines-graph.gfa: pangenome graph in GFA format constructed using the minigraph software package malawi_haplochromines-variants.xlsx: detected structural variants, as defined on the fAstCal1.2 reference coordinates malawi_haplochromines-genelists.xlsx: genes that overlap and do not overlap with structural variants Access information for raw reads Raw reads used to generate the new assemblies are accessible on NCBI. Sample BioProject Genome Biosample Run ID(s) troMau PRJEB80840 GCA_964274065.1 SAMEA11293786 ERR12954135 aulStu PRJEB80765 GCA_964273965.1 SAMEA115846654 ERR13382500 rhaChi PRJEB80761 GCA_964273455.1 SAMEA115846655 ERR13382499 otoArg PRJNA1144831 GCA_046255105.1 SAMN43044617 SRR30633342 copChr PRJNA1144838 - SAMN43044710 SRR30633337, SRR30633338 rhaChi2 PRJNA1144843 - SAMN43044956 SRR30633436, SRR30633437, SRR30633438 Notes Some of the assemblies are in the process of being uploaded to NCBI, which have flagged a few contigs as part of their quality checks: ctg00001557 in otoArg (mitochondria) ctg00005350 in copChr (BLAST hits to amphibian and fish E3 SUMO-protein ligase) ctg00002210 in rhaChi2 (worm contaminant) It is very likely that these contigs will be removed from the final NCBI assemblies. However, none of these contigs are included in the pangenome graph, and therefore, the findings from the paper remain unaltered. A mapping between the Zenodo contigs and their NCBI counterparts will be provided at a later stage to facilitate coordinate conversions.
This page was built for dataset: Lake Malawi cichlid pangenome graph reveals extensive structural variation driven by transposable elements