Deprecated: $wgMWOAuthSharedUserIDs=false is deprecated, set $wgMWOAuthSharedUserIDs=true, $wgMWOAuthSharedUserSource='local' instead [Called from MediaWiki\HookContainer\HookContainer::run in /var/www/html/w/includes/HookContainer/HookContainer.php at line 135] in /var/www/html/w/includes/Debug/MWDebug.php on line 372

Notice: Unexpected clearActionName after getActionName already called in /var/www/html/w/includes/Context/RequestContext.php on line 321
Data and code for 'Pseudogenes act as a neutral reference for detecting selection in prokaryotic pangenomes' - MaRDI portal

Deprecated: Use of MediaWiki\Skin\SkinTemplate::injectLegacyMenusIntoPersonalTools was deprecated in Please make sure Skin option menus contains `user-menu` (and possibly `notifications`, `user-interface-preferences`, `user-page`) 1.46. [Called from MediaWiki\Skin\SkinTemplate::getPortletsTemplateData in /var/www/html/w/includes/Skin/SkinTemplate.php at line 691] in /var/www/html/w/includes/Debug/MWDebug.php on line 372

Deprecated: Use of MediaWiki\Skin\BaseTemplate::getPersonalTools was deprecated in 1.46 Call $this->getSkin()->getPersonalToolsForMakeListItem instead (T422975). [Called from Skins\Chameleon\Components\NavbarHorizontal\PersonalTools::getHtml in /var/www/html/w/skins/chameleon/src/Components/NavbarHorizontal/PersonalTools.php at line 66] in /var/www/html/w/includes/Debug/MWDebug.php on line 372

Deprecated: Use of QuickTemplate::(get/html/text/haveData) with parameter `personal_urls` was deprecated in MediaWiki Use content_navigation instead. [Called from MediaWiki\Skin\QuickTemplate::get in /var/www/html/w/includes/Skin/QuickTemplate.php at line 131] in /var/www/html/w/includes/Debug/MWDebug.php on line 372

Data and code for 'Pseudogenes act as a neutral reference for detecting selection in prokaryotic pangenomes'

From MaRDI portal
(Redirected from Dataset:6683175)



DOI10.5281/zenodo.8326664Zenodo8326664MaRDI QIDQ6683175

Dataset published at Zenodo repository.

Author name not available (Why is that?)

Publication date: 7 September 2023

Copyright license: No records found.



This repository contains the code and files for reproducing the analyses and results reported in 'Pseudogenes act as a neutral reference for detecting selection in prokaryotic pangenomes' by Gavin M. Douglas and B. Jesse Shapiro(https://doi.org/10.1038/s41559-023-02268-6). File organization and descriptions: code/ - Contains GitHub repository releases of code used in manuscript (the other folders contain datafiles only). This code is provided here as well as on GitHub to ensure long-term access. handy_pop_gen-1.1.0/ - release v1.1.0 of the convenience repository (used for specific data processing and analysis steps referred to in the manuscript). pangenome_pseudogene_null-1.1.0/- Main code repository for manuscript. broad_pangenome_analysis/ element_info/element_counts.tsv.gz - Counts of (filtered) pseudogenes and intact genes called per genome accession. element_info/gene_sizes.tsv.gz - Gene sizes in base-pairs. element_info/pseudogene_sizes.tsv.gz - Filtered pseudogene sizes in base-pairs. element_info/element_percent_coverage/*tsv.gz - Tables containing the percent genome coverage of genes and pseudogenes, by accession and averaged over accessions per species separately. example_Mycoplasmopsis_bovis_panaroo_output.csv.gz - Panaroo output table for Mycoplasmopsis bovis, which was used for an example. Corresponds to thegene_presence_absence.csvfile in the raw Panaroo output. focal_and_non.focal_full_to_short.tsv.gz - Mapfile of full to short (and unique) species ids used in analysis. Primarily to include species ids in cluster names without making them unnecessarily long. genome_info/accessions.tsv.gz - Genome accessions used for broad pangenome analysis (note that not all genome accessions could be downloaded [and were ignored], which is indicated in the "could_download" column). genome_info/genome_sizes.tsv.gz - Sizes of all genomes used for the broad pangenome analysis. metrics_additional_subsamples.tsv.gz - Contains columns also found in the pangenome_and_related_metrics.tsv.gzfile below, but based on genome subsamplings of 3 and 20, rather than 9. model_output/pangenome_linear_models.rds - R Data Serializationfiles containing theoutput of R linear model objects (generated by lm and provided as an R list object). There are separate elements in the list for the mean number of genes, genomic fluidity, percentagesingletons (si), and si/sp. model_output/linear_model_coef.tsv.gz - Coefficient summary table for all linear models. pangenome_and_related_metrics.tsv.gz - Metrics used for broad pangenome analysis across 670 prokaryotic species. Note that this table was filtered down to 668 species after excluding those with 9 genomes. pangenome_and_related_metrics_filt.tsv.gz - Filtered table, as described above. taxonomy.tsv.gz - Taxonomy for all species used for this analysis, taken from GTDB. Row names are species names. indepth_10_species_analysis/ cluster_breakdown_tables/ - Folder containing tables providing breakdown of how clusters are distributed by element type, pangenome partition, and species. Provided for easy plotting. cluster_COG_annot.tsv.gz - Mapping of cluster IDs to COG annotations. cluster_filt_lengths_and_additional.tsv.gz - Metadata on clusters, most pertinently the length of the representative sequence in the cluster (which was used to filter out some clusters, below the cut-off which pseudogenes could not be called). cluster_member_breakdown.tsv.gz - Table providing information on each element (called pseudogenes and intact genes) and provides information such as what cluster they are part of, what species and genome accession they are found in, etc. cluster_types.rds - R Data Serialization file containing R list providing breakdown of all clusters into categories (intact/pseudogene/mixed, where mixed means containing both pseudogene and intact elements). COG_enrichment_results/ultra.cloud-COG-gene-enrichments.tsv.gz - Output file with enrichment test summaries for COG IDs in significant COG categories, which was run for the ultra-cloud pangenome partition model only. element_glmm_input.tsv.gz - Table containing all information used for fitting generalized linear mixed models. focal_species.txt - Names of species used for the in-depth analysis. genome_info/ - Folder containing the genome accessions (and the corresponding genome sizes) for all ten analyzed species. glmm_output/ - Folder containing R Data Serialization files containing output R objects after fitting generalized linear mixed models (only ultra-rare files are present, due to file size constraints). per_genome_element.type_percent_coverages.rds - R Data Serializationfile containing R list providing the percent coverage by intact genes vs pseudogenes per accession (nested by species)






This page was built for dataset: Data and code for 'Pseudogenes act as a neutral reference for detecting selection in prokaryotic pangenomes'