Matches in SemOpenAlex for { <https://semopenalex.org/work/W4387560839> ?p ?o ?g. }
Showing items 1 to 73 of
73
with 100 items per page.
- W4387560839 abstract "Despite recent successes in language models, their ability to represent numbers is insufficient. Humans conceptualize numbers based on their magnitudes, effectively projecting them on a number line; whereas subword tokenization fails to explicitly capture magnitude by splitting numbers into arbitrary chunks. To alleviate this shortcoming, alternative approaches have been proposed that modify numbers at various stages of the language modeling pipeline. These methods change either the (1) notation in which numbers are written (eg scientific vs decimal), the (2) vocabulary used to represent numbers or the entire (3) architecture of the underlying language model, to directly regress to a desired number. Previous work suggests that architectural change helps achieve state-of-the-art on number estimation but we find an insightful ablation: changing the model's vocabulary instead (eg introduce a new token for numbers in range 10-100) is a far better trade-off. In the context of masked number prediction, a carefully designed tokenization scheme is both the simplest to implement and sufficient, ie with similar performance to the state-of-the-art approach that requires making significant architectural changes. Finally, we report similar trends on the downstream task of numerical fact estimation (for Fermi Problems) and discuss reasons behind our findings." @default.
- W4387560839 created "2023-10-12" @default.
- W4387560839 creator A5008902336 @default.
- W4387560839 creator A5073443117 @default.
- W4387560839 creator A5081842804 @default.
- W4387560839 date "2023-10-09" @default.
- W4387560839 modified "2023-10-13" @default.
- W4387560839 title "Estimating Numbers without Regression" @default.
- W4387560839 doi "https://doi.org/10.48550/arxiv.2310.06204" @default.
- W4387560839 hasPublicationYear "2023" @default.
- W4387560839 type Work @default.
- W4387560839 citedByCount "0" @default.
- W4387560839 crossrefType "posted-content" @default.
- W4387560839 hasAuthorship W4387560839A5008902336 @default.
- W4387560839 hasAuthorship W4387560839A5073443117 @default.
- W4387560839 hasAuthorship W4387560839A5081842804 @default.
- W4387560839 hasBestOaLocation W43875608391 @default.
- W4387560839 hasConcept C138885662 @default.
- W4387560839 hasConcept C151730666 @default.
- W4387560839 hasConcept C154945302 @default.
- W4387560839 hasConcept C159985019 @default.
- W4387560839 hasConcept C176982825 @default.
- W4387560839 hasConcept C192562407 @default.
- W4387560839 hasConcept C199360897 @default.
- W4387560839 hasConcept C204323151 @default.
- W4387560839 hasConcept C2777601683 @default.
- W4387560839 hasConcept C2779343474 @default.
- W4387560839 hasConcept C33923547 @default.
- W4387560839 hasConcept C38652104 @default.
- W4387560839 hasConcept C41008148 @default.
- W4387560839 hasConcept C41895202 @default.
- W4387560839 hasConcept C43521106 @default.
- W4387560839 hasConcept C48145219 @default.
- W4387560839 hasConcept C65045869 @default.
- W4387560839 hasConcept C80444323 @default.
- W4387560839 hasConcept C86803240 @default.
- W4387560839 hasConcept C94375191 @default.
- W4387560839 hasConceptScore W4387560839C138885662 @default.
- W4387560839 hasConceptScore W4387560839C151730666 @default.
- W4387560839 hasConceptScore W4387560839C154945302 @default.
- W4387560839 hasConceptScore W4387560839C159985019 @default.
- W4387560839 hasConceptScore W4387560839C176982825 @default.
- W4387560839 hasConceptScore W4387560839C192562407 @default.
- W4387560839 hasConceptScore W4387560839C199360897 @default.
- W4387560839 hasConceptScore W4387560839C204323151 @default.
- W4387560839 hasConceptScore W4387560839C2777601683 @default.
- W4387560839 hasConceptScore W4387560839C2779343474 @default.
- W4387560839 hasConceptScore W4387560839C33923547 @default.
- W4387560839 hasConceptScore W4387560839C38652104 @default.
- W4387560839 hasConceptScore W4387560839C41008148 @default.
- W4387560839 hasConceptScore W4387560839C41895202 @default.
- W4387560839 hasConceptScore W4387560839C43521106 @default.
- W4387560839 hasConceptScore W4387560839C48145219 @default.
- W4387560839 hasConceptScore W4387560839C65045869 @default.
- W4387560839 hasConceptScore W4387560839C80444323 @default.
- W4387560839 hasConceptScore W4387560839C86803240 @default.
- W4387560839 hasConceptScore W4387560839C94375191 @default.
- W4387560839 hasLocation W43875608391 @default.
- W4387560839 hasOpenAccess W4387560839 @default.
- W4387560839 hasPrimaryLocation W43875608391 @default.
- W4387560839 hasRelatedWork W115994059 @default.
- W4387560839 hasRelatedWork W187312621 @default.
- W4387560839 hasRelatedWork W2323185448 @default.
- W4387560839 hasRelatedWork W2350482071 @default.
- W4387560839 hasRelatedWork W2890000101 @default.
- W4387560839 hasRelatedWork W2945402993 @default.
- W4387560839 hasRelatedWork W3093768914 @default.
- W4387560839 hasRelatedWork W3107113894 @default.
- W4387560839 hasRelatedWork W4298195702 @default.
- W4387560839 hasRelatedWork W4313403876 @default.
- W4387560839 isParatext "false" @default.
- W4387560839 isRetracted "false" @default.
- W4387560839 workType "article" @default.