physiochemical_protein
OpenML dataset with id 44963
No author found.
Full work available at URL: https://api.openml.org/data/v1/download/22111827/physiochemical_protein.arff
Upload date: 22 December 2022
Copyright license: Creative Commons Attribution 4.0 International
Dataset Characteristics
Number of classes: 0
Number of features: 10 (numeric: 10, symbolic: 0 and in total binary: 0 )
Number of instances: 45,730
Number of instances with missing values: 0
Number of missing values: 0
Data Description
This is a data set of Physicochemical Properties of Protein Tertiary Structure. The data set is taken from CASP 5-9. There are 45730 decoys and size varying from 0 to 21 armstrong.
The goal of the dataset is to predict the size of the residue for a tertiary protein structure (a 3d protein structure). Once linked in the protein chain, an individual amino acid is called a residue. The target feature is root mean square error of the residue.
Attribute Description
1. *RMSD* - size of the residue
2. *F1* - total surface area
3. *F2* - non polar exposed area
4. *F3* - fractional area of exposed non polar residue
5. *F4* - fractional area of exposed non polar part of residue
6. *F5* - molecular mass weighted exposed area
7. *F6* - average deviation from standard exposed area of residue
8. *F7* - Euclidian distance
9. *F8* - secondary structure penalty
10. *F9* - Spacial Distribution constraints (N,K Value)
This page was built for dataset: physiochemical_protein