Structure modeling and function annotation of the human proteome
This dataset provides the results of protein structure modeling (meta-TASSER) and function annotation (FINDSITE, FINDSITE-metal and EFICAz2) of the human proteome (assembly GRCh37, release 55 from the Ensembl database).
Protein sequences (50-600 residues in length) are available in FASTA format (gzipped) here
Data format is described in the software manuals. Protein models are in PDB format. FINDSITE results include predicted ligand-binding sites, Gene Ontology terms, template-to-target structure alignments, binding ligands in SDF format and virtual screening rankings against KEGG Compound, KEGG Drug and ZINC8 compound libraries. FINDSITE-metal results include predicted metal-binding sites, Gene Ontology terms, template-to-target structure alignments, binding metals and confidence estimates. EFICAz2 results include predicted EC number, components of EFICAz2 and precision of prediction.
Download instructions
Using wget:
wget --base=http://cssb2.biology.gatech.edu/skolnick/files/proteomes/human/ -i h.sapiens-dataset.lst
Use the following file lists to download individual datasets:
| h.sapiens-tasser.lst | 1.9G | 56,376 protein structures modeled by meta-TASSER |
| h.sapiens-findsite.lst | 303G | 46,088 ligand-binding proteins predicted by FINDSITE |
| h.sapiens-findsitemetal.lst | 24G | 34,808 metal-binding proteins predicted by FINDSITE-metal |
| h.sapiens-eficaz2.lst | 36M | 9028 enzyme proteins predicted by EFICAz2 |
This service was created by Michal Brylinski and is maintained by Narendra Kumar