Structure modeling and function annotation of the human proteome

vitruvian_0.png
This dataset provides the results of protein structure modeling (meta-TASSER) and function annotation (FINDSITE, FINDSITE-metal and EFICAz2) of the human proteome (assembly GRCh37, release 55 from the Ensembl database).

Protein sequences (50-600 residues in length) are available in FASTA format (gzipped) here

Data format is described in the software manuals. Protein models are in PDB format. FINDSITE results include predicted ligand-binding sites, Gene Ontology terms, template-to-target structure alignments, binding ligands in SDF format and virtual screening rankings against KEGG Compound, KEGG Drug and ZINC8 compound libraries. FINDSITE-metal results include predicted metal-binding sites, Gene Ontology terms, template-to-target structure alignments, binding metals and confidence estimates. EFICAz2 results include predicted EC number, components of EFICAz2 and precision of prediction.



Download instructions

Using wget:

wget --base=http://cssb2.biology.gatech.edu/skolnick/files/proteomes/human/ -i h.sapiens-dataset.lst



Use the following file lists to download individual datasets:

h.sapiens-tasser.lst 1.9G 56,376 protein structures modeled by meta-TASSER
h.sapiens-findsite.lst 303G 46,088 ligand-binding proteins predicted by FINDSITE
h.sapiens-findsitemetal.lst 24G 34,808 metal-binding proteins predicted by FINDSITE-metal
h.sapiens-eficaz2.lst 36M 9028 enzyme proteins predicted by EFICAz2




This service was created by Michal Brylinski and is maintained by Narendra Kumar