EFICAz2: Enzyme Function Inference by a Combined Approach II


EFICAz2 (Enzyme Function Inference by a Combined Approach) is an automatic engine for large-scale enzyme function inference that combines predictions from six different methods developed and optimized to achieve high prediction accuracy: (i) recognition of functionally discriminating residues (FDRs) in enzyme families obtained by a Conservation-controlled HMM Iterative procedure for Enzyme Family classification (CHIEFc), (ii) pairwise sequence comparison using a family specific Sequence Identity Threshold, (iii) recognition of FDRs in Multiple Pfam enzyme families, (iv) recognition of multiple Prosite patterns of high specificity, (v) SVM evaluation of CHIEFc families, and (vi) SVM evaluation of Multiple Pfam enzyme families.

Details about the original EFICAz algorithm and its predictive components (i) to (iv) can be found in Tian W, Arakaki AK and Skolnick J. (2004) Nucleic Acids Res. 32:6226-6239. The reannotation of 245 genomes using EFICAz is described in Arakaki AK, Tian W and Skolnick J. (2006) BMC Genomics 7:315. EFICAz2 offers an increased precision compared to EFICAz, specially at low testing to training sequence identity (MTTSI) levels, and a confidence index associated to each predicted EC number. These improvements result from the addition of two SVM-based predictive components (see above, components (v) and (vi)) and a new classification tree-based algorithm to decide the final EC number assignment/s. EFICAz2 is described in Arakaki AK, Huang Y and Skolnick J. (2009) BMC Bioinformatics 10:107.

The current release of EFICAz2 (version 13) is based on: (i) UniProtKB release 13.0 of February 26, 2008, including Swiss-Prot 55.0 and TrEMBL release 38.0, (ii) Pfam release 22.0 of July 10, 2007, and (iii) Prosite release 20.30 of March 18, 2008. EFICAz2 version 13 recognizes 2,354 four-field (209 three-field) EC numbers, a significant improvement compared with the previous release, EFICAz version 5.0, which recognized 2,061 four-field (203 three-field) EC numbers.

EFICAz2 output consists of: 1) query sequence identifier, 2) predicted four-field or three-field EC number (if any), 3) EFICAz2 predictive components that recognized the EC number, 4) MTTSI bin asociated to the query sequence, which varies from 0 (0% < MTTSI <= 10%) to 9 (90% < MTTSI <= 100%), and, for high confidence predictions, 5) mean and SD of the precision performance obtained from averaging the precisions corresponding to all the EC number classifiers evaluated in our benchmark tests at the given MTTSI bin. Low confidence predictions (typically correct in less than half of the cases) are also reported, but the message "Caution: LOW CONFIDENCE prediction!" reminds the user that they are more speculative.


SUBMIT YOUR SEQUENCE HERE

"Email address" and "Query protein sequence" fields are mandatory. The valid formats for the query protein sequence are plain text or FASTA (only one sequence allowed), using the standard 20 single-letter amino acid code (ACDEFGHIKLMNPQRSTVWY). Maximum sequence length = 10,000 amino acids.

Query sequence name (optional):

Email address:

Query protein sequence:



Contact person : Adrian K. Arakaki

Email : adrian.arakaki@gatech.edu