Matches in SemOpenAlex for { <https://semopenalex.org/work/W3100159300> ?p ?o ?g. }
- W3100159300 endingPage "100142" @default.
- W3100159300 startingPage "100142" @default.
- W3100159300 abstract "Proteins are linear polymers that fold into an incredible variety of three-dimensional structures that enable sophisticated functionality for biology. Computational modeling allows scientists to predict the three-dimensional structure of proteins from genomes, predict properties or behavior of a protein, and even modify or design new proteins for a desired function. Advances in machine learning, especially deep learning, are catalyzing a revolution in the paradigm of scientific research. In this review, we summarize recent work in applying deep learning techniques to tackle problems in protein structural modeling and design. Some deep learning-based approaches, especially in structure prediction, now outperform conventional methods, often in combination with higher-resolution physical modeling. Challenges remain in experimental validation, benchmarking, leveraging known physics and interpreting models, and extending to other biomolecules and contexts. Deep learning is catalyzing a scientific revolution fueled by big data, accessible toolkits, and powerful computational resources, impacting many fields, including protein structural modeling. Protein structural modeling, such as predicting structure from amino acid sequence and evolutionary information, designing proteins toward desirable functionality, or predicting properties or behavior of a protein, is critical to understand and engineer biological systems at the molecular level. In this review, we summarize the recent advances in applying deep learning techniques to tackle problems in protein structural modeling and design. We dissect the emerging approaches using deep learning techniques for protein structural modeling and discuss advances and challenges that must be addressed. We argue for the central importance of structure, following the “sequence → structure → function” paradigm. This review is directed to help both computational biologists to gain familiarity with the deep learning methods applied in protein modeling, and computer scientists to gain perspective on the biologically meaningful problems that may benefit from deep learning techniques. Deep learning is catalyzing a scientific revolution fueled by big data, accessible toolkits, and powerful computational resources, impacting many fields, including protein structural modeling. Protein structural modeling, such as predicting structure from amino acid sequence and evolutionary information, designing proteins toward desirable functionality, or predicting properties or behavior of a protein, is critical to understand and engineer biological systems at the molecular level. In this review, we summarize the recent advances in applying deep learning techniques to tackle problems in protein structural modeling and design. We dissect the emerging approaches using deep learning techniques for protein structural modeling and discuss advances and challenges that must be addressed. We argue for the central importance of structure, following the “sequence → structure → function” paradigm. This review is directed to help both computational biologists to gain familiarity with the deep learning methods applied in protein modeling, and computer scientists to gain perspective on the biologically meaningful problems that may benefit from deep learning techniques. Proteins are linear polymers that fold into various specific conformations to function. The incredible variety of three-dimensional (3D) structures determined by the combination and order in which 20 amino acids thread the protein polymer chain (sequence of the protein) enables the sophisticated functionality of proteins responsible for most biological activities. Hence, obtaining the structures of proteins is of paramount importance in both understanding the fundamental biology of health and disease and developing therapeutic molecules. While protein structure is primarily determined by sophisticated experimental techniques, such as X-ray crystallography,1Slabinski L. Jaroszewski L. Rodrigues A.P. Rychlewski L. Wilson I.A. Lesley S.A. Godzik A. The challenge of protein structure determination-lessons from structural genomics.Protein Sci. 2007; 16: 2472-2482Crossref PubMed Scopus (0) Google Scholar NMR spectroscopy2Markwick P.R.L. Malliavin T. Nilges M. Structural biology by NMR: structure, dynamics, and interactions.PLoS Comput. Biol. 2008; 4: e1000168Crossref PubMed Scopus (86) Google Scholar and, increasingly, cryoelectron microscopy,3Jonic S. Vénien-Bryan C. Protein structure determination by electron cryo-microscopy.Curr. Opin. Pharmacol. 2009; 9: 636-642Crossref PubMed Scopus (0) Google Scholar computational structure prediction from the genetically encoded amino acid sequence of a protein has been used as an alternative when experimental approaches are limited. Computational methods have been used to predict the structure of proteins,4Kryshtafovych A. Schwede T. Topf M. Fidelis K. Moult J. Critical assessment of methods of protein structure prediction (CASP)—Round XIII.Proteins. 2019; 87: 1011-1020Crossref PubMed Scopus (162) Google Scholar illustrate the mechanism of biological processes,5Hollingsworth S.A. Dror R.O. Molecular dynamics simulation for all.Neuron. 2018; 99: 1129-1143Abstract Full Text Full Text PDF PubMed Scopus (319) Google Scholar and determine the properties of proteins.6Ranjan A. Fahad M.S. Fernandez-Baca D. Deepak A. Tripathi S. Deep robust framework for protein function prediction using variable-length protein sequences.IEEE/ACM Trans. Comput. Biol. Bioinform. 2019; 17: 1648-1659PubMed Google Scholar Furthermore, all naturally occurring proteins are a result of an evolutionary process of random variants arising under various selective pressures. Through this process, nature has explored only a small subset of theoretically possible protein sequence space. To explore a broader sequence and structural space that potentially contains proteins with enhanced or novel properties, techniques, such as de novo design can be used to generate new biological molecules that have the potential to tackle many outstanding challenges in biomedicine and biotechnology.7Huang P.S. Boyken S.E. Baker D. The coming of age of de novo protein design.Nature. 2016; 537: 320-327Crossref PubMed Scopus (641) Google Scholar,8Yang K.K. Wu Z. Arnold F.H. Machine-learning-guided directed evolution for protein engineering.Nat. Methods. 2019; 16: 687-694Crossref PubMed Scopus (192) Google Scholar While the application of machine learning and more general statistical methods in protein modeling can be traced back decades,9Bohr H. Bohr J. Brunak S. Cotterill J.R. Fredholm H. Lautrup B. Petersen S. A novel approach to prediction of the 3-dimensional structures of protein backbones by neural networks.FEBS Lett. 1990; 261: 43-46Crossref PubMed Scopus (0) Google Scholar, 10Schneider G. Wrede P. The rational design of amino acid sequences by artificial neural networks and simulated molecular evolution: de novo design of an idealized leader peptidase cleavage site.Biophys. J. 1994; 66: 335-344Abstract Full Text PDF PubMed Google Scholar, 11Schneider G. Schrödl W. Wallukat G. Müller J. Nissen E. Rönspeck W. Wrede P. Kunze R. Peptide design by artificial neural networks and computer-based evolutionary search.Proc. Natl. Acad. Sci. U S A. 1998; 95: 12179-12184Crossref PubMed Scopus (0) Google Scholar, 12Ofran Y. Rost B. Predicted protein-protein interaction sites from local sequence information.FEBS Lett. 2003; 544: 236-239Crossref PubMed Scopus (0) Google Scholar, 13Nielsen M. Lundegaard C. Worning P. Lauemøller S.L. Lamberth K. Buus S. Brunak S. Lund O. Reliable prediction of T-cell epitopes using neural networks with novel sequence representations.Protein Sci. 2003; 12: 1007-1017Crossref PubMed Scopus (711) Google Scholar recent advances in machine learning, especially in deep learning (DL)-related techniques,14LeCun Y. Bengio Y. Hinton G. Deep learning.Nature. 2015; 521: 436-444Crossref PubMed Scopus (32294) Google Scholar have opened up new avenues in many areas of protein modeling.15Angermueller C. Pärnamaa T. Parts L. Stegle O. Deep learning for computational biology.Mol. Syst. Biol. 2016; 12: 878Crossref PubMed Scopus (650) Google Scholar, 16Ching T. Himmelstein D.S. Beaulieu-Jones B.K. Kalinin A.A. Do B.T. Way G.P. Ferrero E. Agapow P.-M. Zietz M. Hoffman M.M. et al.Opportunities and obstacles for deep learning in biology and medicine.J. R. Soc. Interfaces. 2018; 15: 20170387Crossref PubMed Scopus (0) Google Scholar, 17Mura C. Draizen E.J. Bourne P.E. Structural biology meets data science: does anything change?.Curr. Opin. Struct. Biol. 2018; 52: 95-102Crossref PubMed Scopus (3) Google Scholar, 18Noé F. De Fabritiis G. Clementi C. Machine learning for protein folding and dynamics.Curr. Opin. Struct. Biol. 2020; 60: 77-84Crossref PubMed Scopus (6) Google Scholar DL is a set of machine learning techniques based on stacked neural network layers that parameterize functions in terms of compositions of affine transformations and non-linear activation functions. Their ability to extract domain-specific features that are adaptively learned from data for a particular task often enables them to surpass the performance of more traditional methods. DL has made dramatic impacts on digital applications like image classification,19Guo Y. Liu Y. Oerlemans A. Lao S. Wu S. Lew M.S. Deep learning for visual understanding: a review.Neurocomputing. 2016; 187: 27-48Crossref Scopus (1058) Google Scholar speech recognition,20Young T. Hazarika D. Poria S. Cambria E. Recent trends in deep learning based natural language processing.IEEE Comput. Intelligence Mag. 2018; 13: 55-75Crossref Scopus (0) Google Scholar and game playing.21Silver D. Schrittwieser J. Simonyan K. Antonoglou I. Huang A. Guez A. Hubert T. Baker L. Lai M. Bolton A. et al.Mastering the game of go without human knowledge.Nature. 2017; 1550: 354Crossref Scopus (1923) Google Scholar Success in these areas has inspired an increasing interest in more complex data types, including protein structures.22Senior A.W. Evans R. Jumper J. Kirkpatrick J. Sifre L. Green T. Chongli Q. Žídek A. Nelson A.W.R. Bridgland A. et al.Protein structure prediction using multiple deep neural networks in the 13th Critical Assessment of Protein Structure Prediction (CASP13).Proteins. 2019; 87: 1141-1148Crossref PubMed Scopus (24) Google Scholar In the most recent Critical Assessment of Structure Prediction (CASP13 held in 2018),4Kryshtafovych A. Schwede T. Topf M. Fidelis K. Moult J. Critical assessment of methods of protein structure prediction (CASP)—Round XIII.Proteins. 2019; 87: 1011-1020Crossref PubMed Scopus (162) Google Scholar a biennial community experiment to determine the state-of-the-art in protein structure prediction, DL-based methods accomplished a striking improvement in model accuracy (see Figure 1), especially in the “difficult” target category where comparative modeling (starting with a known, related structure) is ineffective. The CASP13 results show that the complex mapping from amino acid sequence to 3D protein structure can be successfully learned by a neural network and generalized to unseen cases. Concurrently, for the protein design problem, progress in the field of deep generative models has spawned a range of promising approaches.23Ingraham J. Garg V. Barzilay R. Jaakkola T. Generative models for graph-based protein design.Adv. Neural Inf. Process. Syst. 2019; : 15820-15831Google Scholar, 24Anand N. Huang P. Generative modeling for protein structures.Adv. Neural Inf. Process. Syst. 2018; : 7494-7505Google Scholar, 25O’Connell J. Li Z. Hanson J. Heffernan R. Lyons J. Paliwal K. Dehzangi A. Yang Y. Zhou Y. SPIN2: predicting sequence profiles from protein structures using deep neural networks.Proteins: Struct. Funct. Bioinformatics. 2018; 86: 629-633Crossref PubMed Scopus (0) Google Scholar In this review, we summarize the recent progress in applying DL techniques to the problem of protein modeling and discuss the potential pros and cons. We limit our scope to protein structure and function prediction, protein design with DL (see Figure 2), and a wide array of popular frameworks used in these applications. We discuss the importance of protein representation, and summarize the approaches to protein design based on DL for the first time. We also emphasize the central importance of protein structure, following the sequence → structure → function paradigm and argue that approaches based on structures may be most fruitful. We refer the reader to other review papers for more information on applications of DL in biology and medicine,16Ching T. Himmelstein D.S. Beaulieu-Jones B.K. Kalinin A.A. Do B.T. Way G.P. Ferrero E. Agapow P.-M. Zietz M. Hoffman M.M. et al.Opportunities and obstacles for deep learning in biology and medicine.J. R. Soc. Interfaces. 2018; 15: 20170387Crossref PubMed Scopus (0) Google Scholar,15Angermueller C. Pärnamaa T. Parts L. Stegle O. Deep learning for computational biology.Mol. Syst. Biol. 2016; 12: 878Crossref PubMed Scopus (650) Google Scholar bioinformatics,27Li Y. Huang C. Ding L. Li Z. Pan Y. Gao X. Deep learning in bioinformatics: introduction, application, and perspective in the big data era.Methods. 2019; 166: 4-21Crossref PubMed Scopus (38) Google Scholar structural biology,17Mura C. Draizen E.J. Bourne P.E. Structural biology meets data science: does anything change?.Curr. Opin. Struct. Biol. 2018; 52: 95-102Crossref PubMed Scopus (3) Google Scholar folding and dynamics,18Noé F. De Fabritiis G. Clementi C. Machine learning for protein folding and dynamics.Curr. Opin. Struct. Biol. 2020; 60: 77-84Crossref PubMed Scopus (6) Google Scholar,28Noé F. Tkatchenko A. Müller K.-R. Clementi C. Machine learning for molecular simulation.Annu. Rev. Phys. Chem. 2020; 71: 361-390Crossref PubMed Scopus (145) Google Scholar antibody modeling,29Graves J. Byerly J. Priego E. Makkapati N. Parish S.V. Medellin B. Berrondo M. A review of deep learning methods for antibodies.Antibodies. 2020; 9: 12Crossref Google Scholar and structural annotation and prediction of proteins.30Kandathil S.M. Greener J.G. Jones D.T. Recent developments in deep learning applied to protein structure prediction.Proteins: Struct. Funct. Bioinformatics. 2019; 87: 1179-1189Crossref PubMed Scopus (13) Google Scholar,31Torrisi M. Pollastri G. Le Q. Deep learning methods in protein structure prediction.Comput. Struct. Biotechnol. J. 2020; 18: 1301-1310Crossref PubMed Scopus (6) Google Scholar Because DL is a fast-moving, interdisciplinary field, we chose to include preprints in this review. We caution the reader that these contributions have not been peer-reviewed, yet are still worthy of attention now for their ideas. In fact, in communities such as computer science, it is not uncommon for manuscripts to remain in this stage indefinitely, and some seminal contributions, such as Kingma and Welling's definitive paper on autoencoders (AEs),32Kingma D.P. Welling M. Auto-encoding variational Bayes.arXiv. 2013; 1312: 6114Google Scholar are only available as preprints. In addition, we urge caution with any protein design studies that are purely in silico, and we highlight those that include experimental validation as a sign of their trustworthiness. The prediction of protein 3D structure from amino acid sequence has been a grand challenge in computational biophysics for decades.33Pauling L. Niemann C. The structure of proteins.J. Am. Chem. Soc. 1939; 61: 1860-1867Crossref Google Scholar,34Kuhlman B. Bradley P. Advances in protein structure prediction and design.Nat. Rev. Mol. Cell Biol. 2019; 20: 681-697Crossref PubMed Scopus (146) Google Scholar Folding of peptide chains is a fundamental concept in biophysics, and atomic-level structures of proteins and complexes are often the starting point to understand their function and to modulate or engineer them. Thanks to the recent advances in next-generation sequencing technology, there are now over 180 million protein sequences recorded in the UniProt dataset.35UniProt-Consortium UniProt: a worldwide hub of protein knowledge.Nucleic Acids Res. 2019; 47: D506-D515Crossref PubMed Scopus (1094) Google Scholar In contrast, only 158,000 experimentally determined structures are available in the Protein Data Bank. Thus, computational structure prediction is a critical problem of both practical and theoretical interest. More recently, the advances in structure prediction have led to an increasing interest in the protein design problem. In design, the objective is to obtain a novel protein sequence that will fold into a desired structure or perform a specific function, such as catalysis. Naturally occurring proteins represent only an infinitesimal subset of all possible amino acid sequences selected by the evolutionary process to perform a specific biological function.7Huang P.S. Boyken S.E. Baker D. The coming of age of de novo protein design.Nature. 2016; 537: 320-327Crossref PubMed Scopus (641) Google Scholar Proteins with more robustness (higher thermal stability, resistance to degradation) or enhanced properties (faster catalysis, tighter binding) might lie in the space that has not been explored by nature, but is potentially accessible by de novo design. The current approach for computational de novo design is based on physical and evolutionary principles and requires significant domain expertise. Some successful examples include novel folds,36Kuhlman B. Dantas G. Ireton G.C. Varani G. Stoddard B.L. Baker D. Design of a novel globular protein fold with atomic-level accuracy.Science. 2003; 302: 1364-1368Crossref PubMed Scopus (1167) Google Scholar enzymes,37Fisher M.A. McKinley K.L. Bradley L.H. Viola S.R. Hecht M.H. De novo designed proteins from a library of artificial sequences function in Escherichia coli and enable cell growth.PLoS One. 2011; 6: e15364Crossref PubMed Scopus (0) Google Scholar vaccines,38Correia B.E. Bates J.T. Loomis R.J. Baneyx G. Carrico C. Jardine J.G. Rupert P. Correnti C. Kalyuzhniy O. Vittal V. et al.Proof of principle for epitope-focused vaccine design.Nature. 2014; 507: 201Crossref PubMed Scopus (313) Google Scholar novel protein assemblies,39King N.P. Sheffler W. Sawaya M.R. Vollmar B.S. Sumida J.P. André I. Gonen T. Yeates T.O. Baker D. Computational design of self-assembling protein nanomaterials with atomic level accuracy.Science. 2012; 336: 1171-1174Crossref PubMed Scopus (433) Google Scholar ligand-binding protein,40Tinberg C.E. Khare S.D. Dou J. Doyle L. Nelson J.W. Schena A. Jankowski W. Kalodimos C.G. Johnsson K. Stoddard B.L. et al.Computational design of ligand-binding proteins with high affinity and selectivity.Nature. 2013; 501: 212-216Crossref PubMed Scopus (271) Google Scholar and membrane proteins.41Joh N.H. Wang T. Bhate M.P. Acharya R. Wu Y. Grabe M. Hong M. Grigoryan G. DeGrado W.F. De novo design of a transmembrane Zn2+-transporting four-helix bundle.Science. 2014; 346: 1520-1524Crossref PubMed Scopus (188) Google Scholar While some papers occasionally refer to redesign of naturally occurring proteins or interfaces as “de novo”, in this review we restrict that term only to works where completely new folds or interfaces are created. The current methodology for computational protein structure prediction is largely based on Anfinsen's42Anfinsen C.B. Principles that govern the folding of protein chains.Science. 1973; 181: 223-230Crossref PubMed Scopus (4962) Google Scholar thermodynamic hypothesis, which states that the native structure of a protein must be the one with the lowest free energy, governed by the energy landscape of all possible conformations associated with its sequence. Finding the lowest-energy state is challenging because of the immense space of possible conformations available to a protein, also known as the “sampling problem” or Levinthal's43Levinthal C. Are there pathways for protein folding?.J. Chim. Phys. 1968; 65: 44-45Crossref Google Scholar paradox. Furthermore, the approach requires accurate free energy functions to describe the protein energy landscape and rank different conformations based on their energy, referred to as the “scoring problem.” In light of these challenges, current computational techniques rely heavily on multiscale approaches. Low-resolution, coarse-grained energy functions are used to capture large-scale conformational sampling, such as the hydrophobic burial and formation of local secondary structural elements. Higher-resolution energy functions are used to explicitly model finer details, such as amino acid side-chain packing, hydrogen bonding, and salt bridges.44Li B. Fooksa M. Heinze S. Meiler J. Finding the needle in the haystack: towards solving the protein-folding problem computationally.Crit. Rev. Biochem. Mol. Biol. 2018; 53: 1-28Crossref PubMed Scopus (10) Google Scholar Protein design problems, sometimes known as the inverse of structure prediction problems, require a similar toolbox. Instead of sampling the conformational space, a protein design protocol samples the sequence space that folds into the desired topology. Past efforts can be broadly divided into two broad classes: modifying an existing protein with known sequence and properties, or generating novel proteins with sequences and/or folds unrelated to those found in nature. The former class evolves an existing protein's amino acid sequence (and as a result, structure and properties) and can be loosely referred to as protein engineering or protein redesign. The latter class of methods is called de novo protein design, a term originally coined in 1997 when Dahiyat and Mayo45Dahiyat B.I. Mayo S.L. De novo protein design: fully automated sequence selection.Science. 1997; 278: 82-87Crossref PubMed Scopus (919) Google Scholar designed the FSD-1 protein, a soluble protein with a completely new sequence that folded into the previously known structure of a zinc finger. Korendovych and DeGrado's46Korendovych I.V. DeGrado W.F. De novo protein design, a retrospective.Q. Rev. Biophys. 2020; 53https://doi.org/10.1017/S0033583519000131Crossref PubMed Scopus (49) Google Scholar recent retrospective chronicles the development of de novo design. Originally de novo design meant creation of entirely new proteins from scratch exploiting a target structure but, especially in the DL era, many authors now use the term to include methods that ignore structure in creating new sequences, often using extensive training data from known proteins in a particular functional class. In this review, we split our discussion of methods according to whether they trained directly between sequence and function (as certain natural language processing [NLP]-based DL paradigms allow), or whether they directly include protein structural data (like historical methods in rational protein design; see below in the section on “Protein Design”). Despite significant progress in the last several decades in the field of computational protein structure prediction and design,7Huang P.S. Boyken S.E. Baker D. The coming of age of de novo protein design.Nature. 2016; 537: 320-327Crossref PubMed Scopus (641) Google Scholar,34Kuhlman B. Bradley P. Advances in protein structure prediction and design.Nat. Rev. Mol. Cell Biol. 2019; 20: 681-697Crossref PubMed Scopus (146) Google Scholar accurate structure prediction and reliable design both remain challenging. Conventional approaches rely heavily on the accuracy of the energy functions to describe protein physics and the efficiency of sampling algorithms to explore the immense protein sequence and structure space. Both protein engineering and de novo approaches are often combined with experimental directed evolution8Yang K.K. Wu Z. Arnold F.H. Machine-learning-guided directed evolution for protein engineering.Nat. Methods. 2019; 16: 687-694Crossref PubMed Scopus (192) Google Scholar,47Dougherty M.J. Arnold F.H. Directed evolution: new parts and optimized function.Curr. Opin. Biotechnol. 2009; 20: 486-491Crossref PubMed Scopus (96) Google Scholar to achieve the optimal final molecules.7Huang P.S. Boyken S.E. Baker D. The coming of age of de novo protein design.Nature. 2016; 537: 320-327Crossref PubMed Scopus (641) Google Scholar In conventional computational approaches, predictions from data are made by means of physical equations and modeling. Machine learning puts forward a different paradigm in which algorithms automatically infer—or learn—a relationship between inputs and outputs from a set of hypotheses. Consider a collection of N training samples comprising features x in an input space X (e.g., amino acid sequences), and corresponding labels y in some output space Y (e.g., residue pairwise distances), where {xi,yi}i=1N are sampled independently and identically distributed from some joint distribution P. In addition, consider a function f:X→Y in some function class H, and a loss function ℓ:Y×Y→R that measures how much f(x) deviates from the corresponding label y. The goal of supervised learning is to find a function f∈H that minimizes the expected loss, E[ℓ(f(x),y)], for (x,y) sampled from P. Since one does not have access to the true distribution but rather N samples from it, the popular empirical risk minimization (ERM) approach seeks to minimize the loss over the training samples instead. In neural network models, in particular, the function class is parameterized by a collection of weights. Denoting these parameters collectively by θ, ERM boils down to an optimization problem of the formminθ1N∑i=1Nℓ(fθ(xi),yi).(Equation 1) The choice of the network determines how the hypothesis class is parameterized. Deep neural networks typically implement a non-linear function as the composition of affine maps, Wl:Rnl→Rnl+1, where Wlx=Wlx+bl, and other non-linear activation functions, σ(⋅). Rectifying linear units and max-pooling are some of the most popular non-linear transformations applied in practice. The architecture of the model determines how these functions are composed, the most popular option being their sequential composition f(x)=WLσ(WL−1σ(WL−2σ(…W2σ(W1x)))) for a network with L layers. Computing f(x) is typically referred to as the forward pass. We will not dwell on the details of the optimization problem in Equation (1), which is typically carried out via stochastic gradient descent algorithms or variations thereof, efficiently implemented via back-propagation (see instead, e.g., LeCun et al.,14LeCun Y. Bengio Y. Hinton G. Deep learning.Nature. 2015; 521: 436-444Crossref PubMed Scopus (32294) Google Scholar Sun,48Sun R. Optimization for deep learning: theory and algorithms.arXiv. 2019; 1912: 08957Google Scholar and Schmidhuber).49Schmidhuber J. Deep learning in neural networks: an overview.Neural Networks. 2015; 61: 85-117Crossref PubMed Scopus (8529) Google Scholar Rather, in this section we summarize some of the most popular models widely used in protein structural modeling, including how different approaches are best suited for particular data types or applications. High-level diagrams of the major architectures are shown in Figures 3.Figure 4Different Types of Representation Schemes Applied to a ProteinView Large Image Figure ViewerDownload (PPT) Convolutional networks architectures50LeCun Y. Boser B.E. Denker J.S. Henderson D. Howard R.E. Hubbard W.E. Jackel L.D. Handwritten digit recognition with a back-propagation network.Adv. Neural Inf. Process. Syst. 1990; : 396-404Google Scholar are most commonly applied to image analysis or other problems where shift-invariance or covariance is needed. Inspired by the fact that an object on an image can be shifted in the image and still be the same object, convolutional neural networks (CNNs) adopt convolutional kernels for the layer-wise affine transformation to capture this translational invariance. A 2D convolutional kernel w applied to a 2D image data x can be defined asS(i,j)=(x∗w)(i,j)=∑m∑nx(m,n)w(i−m,j−n),(Equation 2) where S(i,j) represents the output at position (i,j), x(m,n) is the value of the input x at position (m,n), w(i−m,j−n) is the parameter of kernel w at position (i−m,j−n), and the summation is over all possible positions. An important variant of CNN is the residual network (ResNet),51He, K.; Zhang, X.; Ren, S.; Sun, J. Deep residual learning for image recognition. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition 2016, 770–778.Google Scholar which incorporates skip-connections between layers. These modification have shown great advantages in practice, aiding the optimization of these typically huge models. CNNs, especially ResNets, have been widely used in protein structure prediction. An example is AlphaFold,22Senior A.W. Evans R. Jumper J. Kirkpatrick J. Sifre L. Green T. Chongli Q. Žídek A. Nelson A.W.R. Bridgland A. et al.Protein structure prediction using multiple deep neural networks in the 13th Critical Assessment of Protein Structure Prediction (CASP13).Proteins. 2019; 87: 1141-1148Crossref PubMed Scopus (24) Google Scholar which used ResNets to predict protein inter-residue distance maps from amino acid sequences (Figure 3A). Recurrent architectures are based on applying several iterations of the same function along a sequential input.52Jordan M.I. Serial order: a parallel distributed processing approach.Adv. Psychol. 1997; 121: 471-495Crossref Scopus (32) Google Scholar This can be seen as an unfolded architecture, and it has been widely used to process sequential data, such as time series data and written text (i.e., NLP). With an initial hidden state h(0) and sequential data [x(1),x(2),…,x(n)], we can obtain hidden states recursively:h(t)=g(t)(x(t),x(t−1),x(t−2),…,x(1))=f(h(t−1),x(t);θ),(Equation 3) where f represents a function or transformation from one position to the next, and g(t) represents the accumulative transformation up to position t. The hidden stat" @default.
- W3100159300 created "2020-11-23" @default.
- W3100159300 creator A5007754274 @default.
- W3100159300 creator A5009383312 @default.
- W3100159300 creator A5009895447 @default.
- W3100159300 creator A5086776097 @default.
- W3100159300 date "2020-12-01" @default.
- W3100159300 modified "2023-10-14" @default.
- W3100159300 title "Deep Learning in Protein Structural Modeling and Design" @default.
- W3100159300 cites W113688923 @default.
- W3100159300 cites W1485981043 @default.
- W3100159300 cites W1501531009 @default.
- W3100159300 cites W1938173378 @default.
- W3100159300 cites W1954297438 @default.
- W3100159300 cites W1967292572 @default.
- W3100159300 cites W1969644422 @default.
- W3100159300 cites W1970858396 @default.
- W3100159300 cites W1972728295 @default.
- W3100159300 cites W1974312616 @default.
- W3100159300 cites W1979046104 @default.
- W3100159300 cites W1981364167 @default.
- W3100159300 cites W1984794455 @default.
- W3100159300 cites W1984967071 @default.
- W3100159300 cites W1989777815 @default.
- W3100159300 cites W1996793152 @default.
- W3100159300 cites W2000093524 @default.
- W3100159300 cites W2006431107 @default.
- W3100159300 cites W2006927892 @default.
- W3100159300 cites W2008241299 @default.
- W3100159300 cites W2011658408 @default.
- W3100159300 cites W2014159272 @default.
- W3100159300 cites W2015526636 @default.
- W3100159300 cites W2017421343 @default.
- W3100159300 cites W2020610175 @default.
- W3100159300 cites W2025444507 @default.
- W3100159300 cites W2035755292 @default.
- W3100159300 cites W2039465881 @default.
- W3100159300 cites W2040299410 @default.
- W3100159300 cites W2043338013 @default.
- W3100159300 cites W2045777307 @default.
- W3100159300 cites W2049902088 @default.
- W3100159300 cites W2051210555 @default.
- W3100159300 cites W2052130353 @default.
- W3100159300 cites W2053536934 @default.
- W3100159300 cites W2058373514 @default.
- W3100159300 cites W2060872117 @default.
- W3100159300 cites W2061042699 @default.
- W3100159300 cites W2064675550 @default.
- W3100159300 cites W2073338313 @default.
- W3100159300 cites W2076063813 @default.
- W3100159300 cites W2085497102 @default.
- W3100159300 cites W2086361102 @default.
- W3100159300 cites W2092750499 @default.
- W3100159300 cites W2094403468 @default.
- W3100159300 cites W2095791446 @default.
- W3100159300 cites W2099438806 @default.
- W3100159300 cites W2104467962 @default.
- W3100159300 cites W2106648157 @default.
- W3100159300 cites W2108101947 @default.
- W3100159300 cites W2108598243 @default.
- W3100159300 cites W2110872379 @default.
- W3100159300 cites W2113178668 @default.
- W3100159300 cites W2114340287 @default.
- W3100159300 cites W2115185759 @default.
- W3100159300 cites W2125732073 @default.
- W3100159300 cites W2126396156 @default.
- W3100159300 cites W2135815512 @default.
- W3100159300 cites W2136799255 @default.
- W3100159300 cites W2142239596 @default.
- W3100159300 cites W2143035592 @default.
- W3100159300 cites W2152655599 @default.
- W3100159300 cites W2160784118 @default.
- W3100159300 cites W2163922914 @default.
- W3100159300 cites W2166701319 @default.
- W3100159300 cites W2176950688 @default.
- W3100159300 cites W2189911347 @default.
- W3100159300 cites W2194775991 @default.
- W3100159300 cites W2250334067 @default.
- W3100159300 cites W2252523470 @default.
- W3100159300 cites W2325521056 @default.
- W3100159300 cites W2342838938 @default.
- W3100159300 cites W2412714128 @default.
- W3100159300 cites W2413334978 @default.
- W3100159300 cites W2502949459 @default.
- W3100159300 cites W2519539312 @default.
- W3100159300 cites W2527189750 @default.
- W3100159300 cites W2541404351 @default.
- W3100159300 cites W2555451376 @default.
- W3100159300 cites W2558748708 @default.
- W3100159300 cites W2606439133 @default.
- W3100159300 cites W2742127985 @default.
- W3100159300 cites W2751052002 @default.
- W3100159300 cites W2764301816 @default.
- W3100159300 cites W2765744127 @default.
- W3100159300 cites W2766447205 @default.
- W3100159300 cites W2778051509 @default.
- W3100159300 cites W2779910604 @default.
- W3100159300 cites W2784883284 @default.