extHomFam 2: large-scale benchmark for protein multiple sequence alignments

From MaRDI portal
ExtHomFam 2: large-scale benchmark for protein multiple sequence alignments



DOI10.5281/zenodo.6524237Zenodo6524237MaRDI QIDQ6695332

Dataset published at Zenodo repository.

Author name not available (Why is that?)

Publication date: 6 May 2022

Copyright license: No records found.



extHomFam 2 was constructed by combining Homstrad reference alignments (March 2020) with Pfam 33.1 complete families (NCBI variant). Homstrad entries with less than 3 reference sequences and those pointing to dead Pfam families were discarded. The resulting benchmark was divided into subsets depending on the family size N: subset N range # families small [200, 10 000) 86 medium [10 000, 40 000) 95 large [40 000, 100 000) 83 xlarge [100 000, 250 000) 67 huge [250 000, 3 000 000) 62 The directories in the archive correspond to the names of the subsets, while the reference alignments are located in ref folder.






This page was built for dataset: extHomFam 2: large-scale benchmark for protein multiple sequence alignments