RDP Classifier training files for 16S rRNA sequences from GTDB
DOI10.5281/zenodo.12703477Zenodo12703477MaRDI QIDQ6683339
Dataset published at Zenodo repository.
Author name not available (Why is that?)
Publication date: 10 July 2024
Copyright license: No records found.
16S rRNA gene sequences from the Genome Taxonomy Database (GTDB release 220) were used to retrain the RDP Classifier (version 2.13). Two sets of training files are provided: genus.zip - Genus level species.zip - Species level The code in prepare_files.R was used to prepare the GTDB sequence and taxonomy files for retraining the RDP Classifier. Notes: Steps to retrain the RDP Classifier are adapted from https://john-quensen.com/tutorials/training-the-rdp-classifier/ Python scripts (lineage2taxTrain.py and addFullLineage.py) are available at https://github.com/rdpstaff/classifier/issues/18 The first 1000 training sequences (train_nodups_1000.fasta) are used for benchmarking the classification accuracy (see results at end of prepare_files.R).
This page was built for dataset: RDP Classifier training files for 16S rRNA sequences from GTDB