RDP Classifier training files for 16S rRNA sequences from GTDB (Q6683339)
From MaRDI portal
| This is the item page for this Wikibase entity, intended for internal use and editing purposes. Please use this page instead for the normal view: RDP Classifier training files for 16S rRNA sequences from GTDB |
Dataset published at Zenodo repository.
| Language | Label | Description | Also known as |
|---|---|---|---|
| English | RDP Classifier training files for 16S rRNA sequences from GTDB |
Dataset published at Zenodo repository. |
Statements
16S rRNA gene sequences from the Genome Taxonomy Database (GTDB release 220) were used to retrain the RDP Classifier (version 2.13). Two sets of training files are provided: genus.zip - Genus level species.zip - Species level The code in prepare_files.R was used to prepare the GTDB sequence and taxonomy files for retraining the RDP Classifier. Notes: Steps to retrain the RDP Classifier are adapted from https://john-quensen.com/tutorials/training-the-rdp-classifier/ Python scripts (lineage2taxTrain.py and addFullLineage.py) are available at https://github.com/rdpstaff/classifier/issues/18 The first 1000 training sequences (train_nodups_1000.fasta) are used for benchmarking the classification accuracy (see results at end of prepare_files.R).
0 references
10 July 2024
0 references
220.0
0 references