Matches in SemOpenAlex for { <https://semopenalex.org/work/W4296806433> ?p ?o ?g. }
- W4296806433 endingPage "4310" @default.
- W4296806433 startingPage "4301" @default.
- W4296806433 abstract "Collagen is one of the most important structural proteins in biology, and its structural hierarchy plays a crucial role in many mechanically important biomaterials. Here, we demonstrate how transformer models can be used to predict, directly from the primary amino acid sequence, the thermal stability of collagen triple helices, measured via the melting temperature Tm. We report two distinct transformer architectures to compare performance. First, we train a small transformer model from scratch, using our collagen data set featuring only 633 sequence-to-Tm pairings. Second, we use a large pretrained transformer model, ProtBERT, and fine-tune it for a particular downstream task by utilizing sequence-to-Tm pairings, using a deep convolutional network to translate natural language processing BERT embeddings into required features. Both the small transformer model and the fine-tuned ProtBERT model have similar R2 values of test data (R2 = 0.84 vs 0.79, respectively), but the ProtBERT is a much larger pretrained model that may not always be applicable for other biological or biomaterials questions. Specifically, we show that the small transformer model requires only 0.026% of the number of parameters compared to the much larger model but reaches almost the same accuracy for the test set. We compare the performance of both models against 71 newly published sequences for which Tm has been obtained as a validation set and find reasonable agreement, with ProtBERT outperforming the small transformer model. The results presented here are, to our best knowledge, the first demonstration of the use of transformer models for relatively small data sets and for the prediction of specific biophysical properties of interest. We anticipate that the work presented here serves as a starting point for transformer models to be applied to other biophysical problems." @default.
- W4296806433 created "2022-09-24" @default.
- W4296806433 creator A5003646134 @default.
- W4296806433 creator A5025854647 @default.
- W4296806433 creator A5057871048 @default.
- W4296806433 creator A5059141710 @default.
- W4296806433 date "2022-09-23" @default.
- W4296806433 modified "2023-10-14" @default.
- W4296806433 title "CollagenTransformer: End-to-End Transformer Model to Predict Thermal Stability of Collagen Triple Helices Using an NLP Approach" @default.
- W4296806433 cites W1832142381 @default.
- W4296806433 cites W1965940146 @default.
- W4296806433 cites W1969489801 @default.
- W4296806433 cites W1992849341 @default.
- W4296806433 cites W1996075947 @default.
- W4296806433 cites W2006573345 @default.
- W4296806433 cites W2010127700 @default.
- W4296806433 cites W2013043496 @default.
- W4296806433 cites W2021911831 @default.
- W4296806433 cites W2023108847 @default.
- W4296806433 cites W2026958540 @default.
- W4296806433 cites W2027238053 @default.
- W4296806433 cites W2033103139 @default.
- W4296806433 cites W2035066110 @default.
- W4296806433 cites W2036354205 @default.
- W4296806433 cites W2039130480 @default.
- W4296806433 cites W2041844677 @default.
- W4296806433 cites W2043002857 @default.
- W4296806433 cites W2052717554 @default.
- W4296806433 cites W2053506111 @default.
- W4296806433 cites W2057852412 @default.
- W4296806433 cites W2070044747 @default.
- W4296806433 cites W2071100665 @default.
- W4296806433 cites W2087514098 @default.
- W4296806433 cites W2088418501 @default.
- W4296806433 cites W2097707959 @default.
- W4296806433 cites W2099885273 @default.
- W4296806433 cites W2102461176 @default.
- W4296806433 cites W2107302205 @default.
- W4296806433 cites W2121876863 @default.
- W4296806433 cites W2140737741 @default.
- W4296806433 cites W2143659399 @default.
- W4296806433 cites W2145165079 @default.
- W4296806433 cites W2145914528 @default.
- W4296806433 cites W2147834652 @default.
- W4296806433 cites W2159185781 @default.
- W4296806433 cites W2161162374 @default.
- W4296806433 cites W2270379082 @default.
- W4296806433 cites W2465481521 @default.
- W4296806433 cites W2785273668 @default.
- W4296806433 cites W2794367096 @default.
- W4296806433 cites W2800389754 @default.
- W4296806433 cites W2883482411 @default.
- W4296806433 cites W2963341956 @default.
- W4296806433 cites W2971837816 @default.
- W4296806433 cites W3008186720 @default.
- W4296806433 cites W3047183405 @default.
- W4296806433 cites W3047717040 @default.
- W4296806433 cites W3130438518 @default.
- W4296806433 cites W3154582384 @default.
- W4296806433 cites W3177500196 @default.
- W4296806433 cites W3177828909 @default.
- W4296806433 cites W3202237470 @default.
- W4296806433 cites W3207838037 @default.
- W4296806433 cites W3207839364 @default.
- W4296806433 cites W3209986690 @default.
- W4296806433 cites W4210738256 @default.
- W4296806433 cites W4224308101 @default.
- W4296806433 cites W4225758372 @default.
- W4296806433 cites W4237641822 @default.
- W4296806433 cites W4281620463 @default.
- W4296806433 cites W4287751334 @default.
- W4296806433 cites W4292265045 @default.
- W4296806433 cites W4292508613 @default.
- W4296806433 doi "https://doi.org/10.1021/acsbiomaterials.2c00737" @default.
- W4296806433 hasPubMedId "https://pubmed.ncbi.nlm.nih.gov/36149671" @default.
- W4296806433 hasPublicationYear "2022" @default.
- W4296806433 type Work @default.
- W4296806433 citedByCount "7" @default.
- W4296806433 countsByYear W42968064332022 @default.
- W4296806433 countsByYear W42968064332023 @default.
- W4296806433 crossrefType "journal-article" @default.
- W4296806433 hasAuthorship W4296806433A5003646134 @default.
- W4296806433 hasAuthorship W4296806433A5025854647 @default.
- W4296806433 hasAuthorship W4296806433A5057871048 @default.
- W4296806433 hasAuthorship W4296806433A5059141710 @default.
- W4296806433 hasConcept C11413529 @default.
- W4296806433 hasConcept C119599485 @default.
- W4296806433 hasConcept C127413603 @default.
- W4296806433 hasConcept C153180895 @default.
- W4296806433 hasConcept C154945302 @default.
- W4296806433 hasConcept C165801399 @default.
- W4296806433 hasConcept C169903167 @default.
- W4296806433 hasConcept C186060115 @default.
- W4296806433 hasConcept C192562407 @default.
- W4296806433 hasConcept C41008148 @default.
- W4296806433 hasConcept C66322947 @default.
- W4296806433 hasConcept C86803240 @default.
- W4296806433 hasConceptScore W4296806433C11413529 @default.