extHomFam 2: large-scale benchmark for protein multiple sequence alignments (Q6695332)
From MaRDI portal
| This is the item page for this Wikibase entity, intended for internal use and editing purposes. Please use this page instead for the normal view: extHomFam 2: large-scale benchmark for protein multiple sequence alignments |
Dataset published at Zenodo repository.
| Language | Label | Description | Also known as |
|---|---|---|---|
| English | extHomFam 2: large-scale benchmark for protein multiple sequence alignments |
Dataset published at Zenodo repository. |
Statements
extHomFam 2 was constructed by combining Homstrad reference alignments (March 2020) with Pfam 33.1 complete families (NCBI variant). Homstrad entries with less than 3 reference sequences and those pointing to dead Pfam families were discarded. The resulting benchmark was divided into subsets depending on the family size N: subset N range # families small [200, 10 000) 86 medium [10 000, 40 000) 95 large [40 000, 100 000) 83 xlarge [100 000, 250 000) 67 huge [250 000, 3 000 000) 62 The directories in the archive correspond to the names of the subsets, while the reference alignments are located in ref folder.
0 references
6 May 2022
0 references
2.0
0 references