Matches in SemOpenAlex for { <https://semopenalex.org/work/W4387303685> ?p ?o ?g. }
- W4387303685 abstract "ABSTRACT Large-scale protein language models (PLMs), such as the ESM family, have achieved remarkable performance in various downstream tasks related to protein structure and function by undergoing unsupervised training on residue sequences. They have become essential tools for researchers and practitioners in biology. However, a limitation of vanilla PLMs is their lack of explicit consideration for protein structure information, which suggests the potential for further improvement. Motivated by this, we introduce the concept of a “structure-aware vocabulary” that integrates residue tokens with structure tokens. The structure tokens are derived by encoding the 3D structure of proteins using Foldseek. We then propose SaProt, a large-scale general-purpose PLM trained on an extensive dataset comprising approximately 40 million protein sequences and structures. Through extensive evaluation, our SaProt model surpasses well-established and renowned baselines across 10 significant downstream tasks, demonstrating its exceptional capacity and broad applicability. We have made the code 1 , pretrained model, and all relevant materials available at https://github.com/westlake-repl/SaProt ." @default.
- W4387303685 created "2023-10-04" @default.
- W4387303685 creator A5008860077 @default.
- W4387303685 creator A5009528070 @default.
- W4387303685 creator A5027100437 @default.
- W4387303685 creator A5029409322 @default.
- W4387303685 creator A5073407872 @default.
- W4387303685 creator A5081665927 @default.
- W4387303685 date "2023-10-02" @default.
- W4387303685 modified "2023-10-10" @default.
- W4387303685 title "SaProt: Protein Language Modeling with Structure-aware Vocabulary" @default.
- W4387303685 cites W1926568554 @default.
- W4387303685 cites W2044523575 @default.
- W4387303685 cites W2730472814 @default.
- W4387303685 cites W2742834898 @default.
- W4387303685 cites W2770026599 @default.
- W4387303685 cites W2902353954 @default.
- W4387303685 cites W2919831875 @default.
- W4387303685 cites W2943495267 @default.
- W4387303685 cites W2980789587 @default.
- W4387303685 cites W2998496395 @default.
- W4387303685 cites W3034999214 @default.
- W4387303685 cites W3037888463 @default.
- W4387303685 cites W3133458480 @default.
- W4387303685 cites W3158236124 @default.
- W4387303685 cites W3164046276 @default.
- W4387303685 cites W3172620719 @default.
- W4387303685 cites W3177828909 @default.
- W4387303685 cites W3179485843 @default.
- W4387303685 cites W3209435229 @default.
- W4387303685 cites W3211795435 @default.
- W4387303685 cites W3213545574 @default.
- W4387303685 cites W4210840673 @default.
- W4387303685 cites W4213112325 @default.
- W4387303685 cites W4223581484 @default.
- W4387303685 cites W4281632800 @default.
- W4387303685 cites W4281961563 @default.
- W4387303685 cites W4283816281 @default.
- W4387303685 cites W4296032638 @default.
- W4387303685 cites W4300861364 @default.
- W4387303685 cites W4323304388 @default.
- W4387303685 cites W4365444089 @default.
- W4387303685 cites W4375858802 @default.
- W4387303685 cites W4385255463 @default.
- W4387303685 doi "https://doi.org/10.1101/2023.10.01.560349" @default.
- W4387303685 hasPublicationYear "2023" @default.
- W4387303685 type Work @default.
- W4387303685 citedByCount "0" @default.
- W4387303685 crossrefType "posted-content" @default.
- W4387303685 hasAuthorship W4387303685A5008860077 @default.
- W4387303685 hasAuthorship W4387303685A5009528070 @default.
- W4387303685 hasAuthorship W4387303685A5027100437 @default.
- W4387303685 hasAuthorship W4387303685A5029409322 @default.
- W4387303685 hasAuthorship W4387303685A5073407872 @default.
- W4387303685 hasAuthorship W4387303685A5081665927 @default.
- W4387303685 hasBestOaLocation W43873036851 @default.
- W4387303685 hasConcept C121332964 @default.
- W4387303685 hasConcept C125411270 @default.
- W4387303685 hasConcept C127413603 @default.
- W4387303685 hasConcept C137293760 @default.
- W4387303685 hasConcept C138885662 @default.
- W4387303685 hasConcept C154945302 @default.
- W4387303685 hasConcept C177264268 @default.
- W4387303685 hasConcept C199360897 @default.
- W4387303685 hasConcept C204321447 @default.
- W4387303685 hasConcept C21547014 @default.
- W4387303685 hasConcept C2776207758 @default.
- W4387303685 hasConcept C2776760102 @default.
- W4387303685 hasConcept C2777601683 @default.
- W4387303685 hasConcept C2778755073 @default.
- W4387303685 hasConcept C41008148 @default.
- W4387303685 hasConcept C41895202 @default.
- W4387303685 hasConcept C62520636 @default.
- W4387303685 hasConceptScore W4387303685C121332964 @default.
- W4387303685 hasConceptScore W4387303685C125411270 @default.
- W4387303685 hasConceptScore W4387303685C127413603 @default.
- W4387303685 hasConceptScore W4387303685C137293760 @default.
- W4387303685 hasConceptScore W4387303685C138885662 @default.
- W4387303685 hasConceptScore W4387303685C154945302 @default.
- W4387303685 hasConceptScore W4387303685C177264268 @default.
- W4387303685 hasConceptScore W4387303685C199360897 @default.
- W4387303685 hasConceptScore W4387303685C204321447 @default.
- W4387303685 hasConceptScore W4387303685C21547014 @default.
- W4387303685 hasConceptScore W4387303685C2776207758 @default.
- W4387303685 hasConceptScore W4387303685C2776760102 @default.
- W4387303685 hasConceptScore W4387303685C2777601683 @default.
- W4387303685 hasConceptScore W4387303685C2778755073 @default.
- W4387303685 hasConceptScore W4387303685C41008148 @default.
- W4387303685 hasConceptScore W4387303685C41895202 @default.
- W4387303685 hasConceptScore W4387303685C62520636 @default.
- W4387303685 hasLocation W43873036851 @default.
- W4387303685 hasOpenAccess W4387303685 @default.
- W4387303685 hasPrimaryLocation W43873036851 @default.
- W4387303685 hasRelatedWork W2057738282 @default.
- W4387303685 hasRelatedWork W2349021146 @default.
- W4387303685 hasRelatedWork W2436192316 @default.
- W4387303685 hasRelatedWork W2625997096 @default.
- W4387303685 hasRelatedWork W2970287324 @default.
- W4387303685 hasRelatedWork W3040203686 @default.
- W4387303685 hasRelatedWork W35583307 @default.