Biomedical Data Mining for Information Retrieval. Группа авторовЧитать онлайн книгу.
folding which was time consuming. The discovery of new protein sequences has been accelerated by next-generation sequencing techniques due to these methods being rapid and economical. The computational prediction methods that can accurately classify unknown protein sequences into specific fold categories in the shortest time possible is today’s requirement. Therefore computational recognition of protein folds holds a lot of importance in bioinformatics and computational biology. A number of efforts have led to generation of a variety of computational prediction methods and Artificial intelligence (AI) and machine learning (ML) have shown to hold great promise. In this chapter, available AI and ML methods and features have been explored and novel methods based on reinforcement learning have been discussed. Prediction of protein structure happens at four levels that is
1 i) 1-D prediction of structural features which is the primary sequence of amino acids linked by peptide bond
2 ii) 2-D prediction of which is the spatial relationships between amino acids that is alpha helix, beta turn and beta turn facilitated by hydrogen bonds
3 iii) 3-D prediction of the tertiary structure of a protein that is fibrous or globular involving multiple bonds facilitated by hydrogen bonds, Van der Wal forces, hydrophobic interactions
4 iv) 4-D prediction of the quaternary structure of a multiprotein complex which is made up of more than one peptide chain involving formation of sulfur bridge.
Thus a model development which allows the flexibility of bond formation and helps to predict a stable and functional protein structure has been facilitated to a great deal by AI and ML.
Prediction of protein structure is a complex problem as it is associated with various levels of organization and is a multi-fold process. There is a need for smart computational techniques for such purpose. AI is a great tool which when used with computational biology facilitates such prediction. Apart from determining the structure AI also aids in predicting protein structure crucial for drug development as well as in understanding the biochemical effect and ultimately the function.
A protein can be broadly described as a polymer where the individual amino acid can be considered as the monomers or the building blocks arranged in a linear chain and joined together by peptide bonds. The primary structure as described earlier is represented by a sequence of letters which represent the amino acids. The chain of amino acids of a protein folds into local secondary structures including alpha helices, beta strands, and nonregular coils [35, 36] in its native environment. The secondary structure elements are further packed to form a tertiary structure depending on hydrophobic forces and side chain interactions, such as hydrogen bonding, between amino acids [37–39]. The tertiary structure is described by the x, y and z coordinates of all the atoms of a protein or, in a coarser description, by the coordinates of the backbone atoms (Figure 2.1). The quaternary structure is formed by more than one protein chains interacting or assembling together to form a complexes structure. Theses protein complexes proteins interact with each other and with other biological macromolecules such as DNA, RNA and certain metabolites in a cell. This kind of interaction is required to carry out various types of biological functions such as enzymatic catalysis (protein complex can interact with a metal or non-metal referred to as co-enzyme), to gene regulation (interaction of transcription factors with DNA sequences), control of growth and differentiation (protein– protein interaction where ligand binding to receptor triggers a signal cascade pathway) and transmission of nerve impulses [40]. A protein’s function is and its structure are dependent on each other [37, 38, 41, 42] therefore, determination or prediction of protein structure accurately holds the key for its function determination. The most effective methods for finding protein structure since the inception of this field have been Nuclear Magnetic Resonance and X-ray crystallography which have the disadvantage of being time consuming and expensive. The recent advancement has been the introduction of cryo-electron microscope (cryo-EM) which produces high-resolution large-scale molecular structures very efficiently. Cryo-EM density maps make use of machine learning and artificial intelligence for prediction [43–46]. For such experiments protein crystal is needed which is the most disadvantageous or complex part of these methods because there are many liquid proteins which do not crystalize. Artificial intelligence comes to our aid here as it is a possible better pathway for sequencing these proteins [47, 48] due to the fact that they have proved their efficacy and accuracy of successful application in different fields like business [49], image recognition to name and can accurately and efficiently predict thousands of possible structures in shortest time by analysing big data where other methods have failed to deliver accurate and useful information.
Figure 2.1 The different level of organization of protein.
Most of the models are inaccurate and do not produce predicted proteins that contain useful information so using artificial intelligence, programs are trained using many numerically represented atomic features from the models (such as bond lengths, bond angles, residue-residue interactions, physio-chemical properties, and potential energy properties). Then the comparison of the prediction models output to the known crystal structures helps to assess the quality of the model and find the most accurate model. Models for predictions and prediction analysis are compared each year in one main gathering called the Critical Assessment of Structure Prediction (CASP). Every two years researchers from around the world submit machine learning methods designed for protein structure prediction [50] where the latest advancement has been the help of protein contact distance prediction [51] and addition of quality assessment (QA) category in CASP7 (2006) [51, 52].
AI which is time and resource efficient allows for more accurate prognosis and diagnosis of structures because the computers can analyze data and have perfect calculations and deeply analyze the details. These accuracies while may be very close to that of traditional approaches are still slightly stronger allowing confidence in the results. AI would also help in cost reduction and would not be an agent to replace researchers but rather working in conjunction with them. Artificial Intelligence is an exciting field which offers solutions to issues in finding structures of proteins which is crucial to drug development and the understanding of biochemical effects. A protein’s function is determined by its structure [53–56] as the evidence is there in many biochemical reactions, therefore elucidating a protein’s structure as seen in Table 2.1 is key to understanding its function. Function determination in turn is essential for any related biological, biotechnological, medical, or pharmaceutical applications which is much needed in today’s time of increased anti-microbial resistance and threat by unknown biological agents.
Table 2.1 Summary of database sources of protein structure classification.
Database sources | Websites | References |
---|---|---|
PDB | http://www.rcsb.org/pdb/ | [57] |
UniProt | http://www.uniprot.org/ | [58] |
DSSP | http://swift.cmbi.ru.nl/gv/dssp/ | [59] |
SCOP |
http://scop.mrc-lmb.cam.ac.uk/
|