Prop3D-20
DOI10.5281/zenodo.6873024Zenodo6873024MaRDI QIDQ6722254
Dataset published at Zenodo repository.
Author name not available (Why is that?)
Publication date: 25 July 2022
Copyright license: No records found.
Prop3D-20 is a protein structure dataset that combines 3D atomic coordinates with biophysical and evolutionary properties for every atom in every cleaned domain structure from b20 CATH 1 Homologous Superfamilies./b Domain structures are cleaned by adding missing residues with MODELLER [2], missing atoms with SCWRL4 [3], and protonating and energy minimizing (simple de-bump) with PDB2PQR [4]. We follow the CATH hierarchy in a hierarchical data format (HDF) file and include atomic level features, residue level features, residue-residue contact, and pre-calculated train (~80%) / test (~10%) / validation (~10%) splits for each superfamily derived from CATHs sequence identity clusters (e.g. S35 for 35% seq ID). This dataset was originally stored in the Highly Scalable Data Service ([HSDS]), and was exported into this raw .h5 file as backup. We recommend loading this data into HSDS for use in h5pyd, but the .h5 file can opened using h5py as well. Please see the REAME attached to this dataset to learn how to use this dataset and how it organized. For more information on setting up HSDS and/or recreate this dataset, please see http://www.github.com/bouralab/Prop3D/README.md References 1. Sillitoe I, Bordin N, Dawson N, Waman VP, Ashford P, Scholes HM, Pang CSM, Woodridge L, Rauer C, Sen N, Abbasian M, Le Cornu S, Lam SD, Berka K, Varekova IH, Svobodova R, Lees J, Orengo CA. CATH: increased structural coverage of functional space. Nucleic Acids Res. 2021 Jan 8;49(D1):D266-D273. doi: 10.1093/nar/gkaa1079. PMID: 33237325; PMCID: PMC7778904. 2. Webb B, Sali A. Comparative Protein Structure Modeling Using MODELLER. Curr Protoc Bioinformatics. 2016 Jun 20;54:5.6.1-5.6.37. doi: 10.1002/cpbi.3. PMID: 27322406; PMCID: PMC5031415. 3. Krivov GG, Shapovalov MV, Dunbrack RL Jr. Improved prediction of protein side-chain conformations with SCWRL4. Proteins. 2009 Dec;77(4):778-95. doi: 10.1002/prot.22488. PMID: 19603484; PMCID: PMC2885146. 4. Jurrus E, Engel D, Star K, Monson K, Brandi J, Felberg LE, Brookes DH, Wilson L, Chen J, Liles K, Chun M, Li P, Gohara DW, Dolinsky T, Konecny R, Koes DR, Nielsen JE, Head-Gordon T, Geng W, Krasny R, Wei GW, Holst MJ, McCammon JA, Baker NA. Improvements to the APBS biomolecular solvation software suite. Protein Sci. 2018 Jan;27(1):112-128. doi: 10.1002/pro.3280. Epub 2017 Oct 24. PMID: 28836357; PMCID: PMC5734301.
This page was built for dataset: Prop3D-20