Matches in SemOpenAlex for { <https://semopenalex.org/work/W4385571251> ?p ?o ?g. }
Showing items 1 to 73 of
73
with 100 items per page.
- W4385571251 abstract "Despite the widespread success of Transformers on NLP tasks, recent works have found that they struggle to model several formal languages when compared to recurrent models. This raises the question of why Transformers perform well in practice and whether they have any properties that enable them to generalize better than recurrent models. In this work, we conduct an extensive empirical study on Boolean functions to demonstrate the following: (i) Random Transformers are relatively more biased towards functions of low sensitivity. (ii) When trained on Boolean functions, both Transformers and LSTMs prioritize learning functions of low sensitivity, with Transformers ultimately converging to functions of lower sensitivity. (iii) On sparse Boolean functions which have low sensitivity, we find that Transformers generalize near perfectly even in the presence of noisy labels whereas LSTMs overfit and achieve poor generalization accuracy. Overall, our results provide strong quantifiable evidence that suggests differences in the inductive biases of Transformers and recurrent models which may help explain Transformer’s effective generalization performance despite relatively limited expressiveness." @default.
- W4385571251 created "2023-08-05" @default.
- W4385571251 creator A5015473787 @default.
- W4385571251 creator A5036811894 @default.
- W4385571251 creator A5065655845 @default.
- W4385571251 creator A5067127146 @default.
- W4385571251 date "2023-01-01" @default.
- W4385571251 modified "2023-09-24" @default.
- W4385571251 title "Simplicity Bias in Transformers and their Ability to Learn Sparse Boolean Functions" @default.
- W4385571251 doi "https://doi.org/10.18653/v1/2023.acl-long.317" @default.
- W4385571251 hasPublicationYear "2023" @default.
- W4385571251 type Work @default.
- W4385571251 citedByCount "0" @default.
- W4385571251 crossrefType "proceedings-article" @default.
- W4385571251 hasAuthorship W4385571251A5015473787 @default.
- W4385571251 hasAuthorship W4385571251A5036811894 @default.
- W4385571251 hasAuthorship W4385571251A5065655845 @default.
- W4385571251 hasAuthorship W4385571251A5067127146 @default.
- W4385571251 hasBestOaLocation W43855712511 @default.
- W4385571251 hasConcept C11413529 @default.
- W4385571251 hasConcept C119599485 @default.
- W4385571251 hasConcept C119857082 @default.
- W4385571251 hasConcept C127413603 @default.
- W4385571251 hasConcept C134306372 @default.
- W4385571251 hasConcept C154945302 @default.
- W4385571251 hasConcept C162324750 @default.
- W4385571251 hasConcept C165801399 @default.
- W4385571251 hasConcept C177148314 @default.
- W4385571251 hasConcept C187455244 @default.
- W4385571251 hasConcept C187736073 @default.
- W4385571251 hasConcept C197352929 @default.
- W4385571251 hasConcept C22019652 @default.
- W4385571251 hasConcept C2780451532 @default.
- W4385571251 hasConcept C28006648 @default.
- W4385571251 hasConcept C33923547 @default.
- W4385571251 hasConcept C41008148 @default.
- W4385571251 hasConcept C50644808 @default.
- W4385571251 hasConcept C66322947 @default.
- W4385571251 hasConceptScore W4385571251C11413529 @default.
- W4385571251 hasConceptScore W4385571251C119599485 @default.
- W4385571251 hasConceptScore W4385571251C119857082 @default.
- W4385571251 hasConceptScore W4385571251C127413603 @default.
- W4385571251 hasConceptScore W4385571251C134306372 @default.
- W4385571251 hasConceptScore W4385571251C154945302 @default.
- W4385571251 hasConceptScore W4385571251C162324750 @default.
- W4385571251 hasConceptScore W4385571251C165801399 @default.
- W4385571251 hasConceptScore W4385571251C177148314 @default.
- W4385571251 hasConceptScore W4385571251C187455244 @default.
- W4385571251 hasConceptScore W4385571251C187736073 @default.
- W4385571251 hasConceptScore W4385571251C197352929 @default.
- W4385571251 hasConceptScore W4385571251C22019652 @default.
- W4385571251 hasConceptScore W4385571251C2780451532 @default.
- W4385571251 hasConceptScore W4385571251C28006648 @default.
- W4385571251 hasConceptScore W4385571251C33923547 @default.
- W4385571251 hasConceptScore W4385571251C41008148 @default.
- W4385571251 hasConceptScore W4385571251C50644808 @default.
- W4385571251 hasConceptScore W4385571251C66322947 @default.
- W4385571251 hasLocation W43855712511 @default.
- W4385571251 hasOpenAccess W4385571251 @default.
- W4385571251 hasPrimaryLocation W43855712511 @default.
- W4385571251 hasRelatedWork W1996541855 @default.
- W4385571251 hasRelatedWork W2940336242 @default.
- W4385571251 hasRelatedWork W2953328427 @default.
- W4385571251 hasRelatedWork W2989932438 @default.
- W4385571251 hasRelatedWork W3099765033 @default.
- W4385571251 hasRelatedWork W4210794429 @default.
- W4385571251 hasRelatedWork W4313159793 @default.
- W4385571251 hasRelatedWork W4327988962 @default.
- W4385571251 hasRelatedWork W4361732492 @default.
- W4385571251 hasRelatedWork W4362499066 @default.
- W4385571251 isParatext "false" @default.
- W4385571251 isRetracted "false" @default.
- W4385571251 workType "article" @default.