Data Sets and Results for "Improved data sets and evaluation methods for the automatic prediction of DNA-binding proteins" (Q6702189)
From MaRDI portal
| This is the item page for this Wikibase entity, intended for internal use and editing purposes. Please use this page instead for the normal view: Data Sets and Results for "Improved data sets and evaluation methods for the automatic prediction of DNA-binding proteins" |
Dataset published at Zenodo repository.
| Language | Label | Description | Also known as |
|---|---|---|---|
| English | Data Sets and Results for "Improved data sets and evaluation methods for the automatic prediction of DNA-binding proteins" |
Dataset published at Zenodo repository. |
Statements
Data sets and results for Improved data sets and evaluation methods for the automatic prediction of DNA-binding proteins The file dna_binding_protein_sequences.zip has the training and testing sets from the paper: RLL - random_train/test_full_1000.csv RSL - random_train/test_40.csv RSLL - random_train/test_40_1000.csv RLL where included positive examples have verified DNA binding activity -random_train/test_hq_1000.csv The 10 RSLL data sets - random_train/test_40_1000.csv +random_train/test_40_1000_cv_0-8.csv The results files arenamed similarly. See see_results.ipynb in the codebase that supplement thesedata sets The species data sets are derived from uniprot_data_bac.tab and uniprot_data_not_bac.tab. See code. The ESM embeddings used by the XGBoost model are in dna_binding_protein_esm.zip
0 references
2 August 2021
0 references
2
0 references