EFI-EST is a free tool for visualizing sequence–function relationships
As genome sequencing has become routine, the rate of increase in the number of uncharacterized/unknown or hypothetical proteins in the sequence databases has exceeded the ability to assign their biological functions. Addressing this challenge requires tools to focus experimental efforts. A sequence similarity network (SSN) is an example of such a tool—it enables facile visualization of sequence-function relationships in protein families, thereby focusing sequence-based functional annotation efforts.
The dendrograms and trees that have long dominated similarity analyses can become unwieldy when dealing with tens of thousands of protein sequences. In contrast, SSNs are easy to both calculate and manipulate, thereby allowing analyses of sequence-function relationships for even large protein families. SNNs are graphical representations where pairs of sequences (nodes) are connected via a line (an edge) if they share a degree of similarity above a user-defined threshold.
The use of SSNs was pioneered by the Structure-Function Linkage Database (SFLD), an NIGMS‑supported resource at the University of California-San Francisco [1, 2]. The SFLD provides manually curated SSNs for a modest number of functionally diverse protein families that can be visualized using Cytoscape. In order to equip the biochemical community with tools to address the “annotation problem”, the Enzyme Function Initiative (U54GM093342) has provided “open access” to the ability to generate SSNs.
The EFI’s Enzyme Similarity Tool (EFI-EST) is a web-based resource developed and maintained at the Institute for Genomic Biology at the University of Illinois. EFI-EST provides a user with the ability to generate a SSN for any protein family. Sequences can be retrieved from the UniProt database by similarity to a user-supplied seed sequence; alternatively, all sequences in a Pfam or InterPro family can be used. EFI-EST first performs an all-by-all BLAST to determine the pairwise sequence similarities and then generates the SSN. Representative node (metanode) networks are provided for visualization of very large families. More than 30 node attributes are provided from various annotation sources that can be used to provide insights into sequence-function relationships.
For functional discovery, the SSN is most useful when filtered by a sequence identity threshold (alignment score) that achieves isofunctional fractionation – that is, all sequences within a cluster share the same function. At this level of clustering, the user can transfer function between sequences within the cluster. Although no universal alignment score exists to achieve isofunctional fractionation for all protein families, an SSN can be filtered using one or more node attributes associated with functional divergence. For example, identifying the nodes with experimentally characterized functions and choosing an alignment score that separates these functions can be used to fractionate the SSN into clusters with known and unknown functions.
The Enzyme Function Initiative – Enzyme Similarity Tool (EFI-EST) is available online (http://efi.igb.illinois.edu/efi-est/) and without charge for the generation of SSNs with 100,000 sequences or less. Generation of an SSN takes, on average, 6.5 hours. A detailed description of and tutorial for using EFI-EST, including example applications, was recently published in Biochimica et Biophysica Acta .
Users interested in generating larger networks are welcome to contact firstname.lastname@example.org for access to our local computing cluster.
- H.J. Atkinson, J.H. Morris, T.E. Ferrin, and P.C. Babbitt, Using sequence similarity networks for visualization of relationships across diverse protein superfamilies. PLoS One 2009, 4, e4345. PMCID: PMC2631154
- E. Akiva, S. Brown, D.E. Almonacid, A.E. Barber, 2nd, A.F. Custer, M.A. Hicks, C.C. Huang, F. Lauck, S.T. Mashiyama, E.C. Meng, D. Mischel, J.H. Morris, S. Ojha, A.M. Schnoes, D. Stryke, J.M. Yunes, T.E. Ferrin, G.L. Holliday, and P.C. Babbitt, The Structure-Function Linkage Database. Nucleic Acids Res 2014, 42, D521-30. PMCID: PMC3965090
- J.A. Gerlt, J.T. Bouvier, D.B. Davidson, H.J. Imker, B. Sadkhin, D.R. Slater, and K.L. Whalen, Enzyme Function Initiative-Enzyme Similarity Tool (EFI-EST): A web tool for generating protein sequence similarity networks. Biochim Biophys Acta 2015, 1854, 1019-1037. PMCID: PMC4457552