Predicting Protein Functions from 3D Structures
ECE Professor Deniz Erdogmus (co-PI) and affiliated BioE Professors Mary Jo Ondrechen (PI) and Penny Beuning (co-PI) were awarded a $600K NSF grant for “Mining for mechanistic information to predict protein function”.
Abstract Source: NSF
Proteins perform a variety of essential functions in a cell, including catalyzing chemical reactions as enzymes. With this award, the Chemistry of Life Processes Program in the Chemistry Division is funding Dr. Mary Jo Ondrechen, Dr. Penny Beuning and Dr. Deniz Erdogomus at Northeastern University to develop new ways to predict the function of a protein from its three-dimensional structure. This computational problem is a major challenge in genomics – the study of DNA sequences and their protein products. Research in genomics is opening the door to tremendous current and future innovations to benefit society, in areas as diverse as food production, energy, the economy, the environment, and health. In this project, chemical properties are computed and coupled with machine learning algorithms to identify the specific biochemical roles for the active amino acids in a protein structure, which then leads to the prediction of the protein’s function. These predictions of function are tested experimentally by direct biochemical assays and by ligand binding studies, for selected cases. Doctoral students and undergraduate research interns, including those from minority groups that are underrepresented in STEM fields, are being trained through this project to become highly qualified scientists in the areas of computational chemistry, informatics, machine learning, and biochemistry. These skills are vital to the regional high-tech economy of New England and to United States competitiveness in the global economy.
The computational prediction of biochemical functional roles of individual amino acids in a protein structure is entirely new. The predictive power of properties obtained from computational chemistry are being enhanced by machine learning approaches, including Support Vector Machines (SVM) and Graph Convolutional Neural Networks (GCNN). Improved, experimentally tested methods for the prediction of protein function contribute significantly to the interpretation of the massive quantities of data from genome sequencing and Structural Genomics (SG) initiatives. A significant feature of this project is that it incorporates computed chemical properties of the amino acids in a protein structure into more conventional informatics methods to predict function, whereas most current methods are purely informatics-based approaches. This project is unique in that it employs computed chemical reactivity and electrostatic features on the atomic scale in the protein function prediction problem to obtain residue-specific mechanistic information. With the capability to match functional types across different structural folds, i.e. cases with neither sequence nor 3D structure similarity, the ability to assign biochemical function reliably is substantially increased for SG proteins of unknown or uncertain function. This work also leads to better understanding of how enzymes work and of how specific amino acid residues achieve their catalytic power.