Matches in SemOpenAlex for { <https://semopenalex.org/work/W4323556896> ?p ?o ?g. }
Showing items 1 to 53 of
53
with 100 items per page.
- W4323556896 abstract "We explore the possibility of meta-learning for the language-independent unsupervised tokenization problem for English, Russian, and Chinese. We implement the meta-learning approach for automatic determination of hyper-parameters of the unsupervised tokenization model proposed in earlier works, relying on various human-independent fitness functions such as normalised anti-entropy, compression factor and cross-split F1 score, as well as additive and multiplicative composite combinations of the three metrics, testing them against the conventional F1 tokenization score. We find a fairly good correlation between the latter and the additive combination of the former three metrics for English and Russian. In case of Chinese, we find a significant correlation between the F 1 score and the compression factor. Our results suggest the possibility of robust unsupervised tokenization of low-resource and dead languages and allow us to think about human languages in terms of the evolution of efficient symbolic communication codes with different structural optimisation schemes that have evolved in different human cultures." @default.
- W4323556896 created "2023-03-09" @default.
- W4323556896 creator A5014703807 @default.
- W4323556896 date "2023-03-04" @default.
- W4323556896 modified "2023-09-27" @default.
- W4323556896 title "Self-tuning hyper-parameters for unsupervised cross-lingual tokenization" @default.
- W4323556896 doi "https://doi.org/10.48550/arxiv.2303.02427" @default.
- W4323556896 hasPublicationYear "2023" @default.
- W4323556896 type Work @default.
- W4323556896 citedByCount "0" @default.
- W4323556896 crossrefType "posted-content" @default.
- W4323556896 hasAuthorship W4323556896A5014703807 @default.
- W4323556896 hasBestOaLocation W43235568961 @default.
- W4323556896 hasConcept C119857082 @default.
- W4323556896 hasConcept C134306372 @default.
- W4323556896 hasConcept C154945302 @default.
- W4323556896 hasConcept C176982825 @default.
- W4323556896 hasConcept C199360897 @default.
- W4323556896 hasConcept C204321447 @default.
- W4323556896 hasConcept C2524010 @default.
- W4323556896 hasConcept C2781039887 @default.
- W4323556896 hasConcept C33923547 @default.
- W4323556896 hasConcept C41008148 @default.
- W4323556896 hasConcept C42747912 @default.
- W4323556896 hasConcept C90805587 @default.
- W4323556896 hasConceptScore W4323556896C119857082 @default.
- W4323556896 hasConceptScore W4323556896C134306372 @default.
- W4323556896 hasConceptScore W4323556896C154945302 @default.
- W4323556896 hasConceptScore W4323556896C176982825 @default.
- W4323556896 hasConceptScore W4323556896C199360897 @default.
- W4323556896 hasConceptScore W4323556896C204321447 @default.
- W4323556896 hasConceptScore W4323556896C2524010 @default.
- W4323556896 hasConceptScore W4323556896C2781039887 @default.
- W4323556896 hasConceptScore W4323556896C33923547 @default.
- W4323556896 hasConceptScore W4323556896C41008148 @default.
- W4323556896 hasConceptScore W4323556896C42747912 @default.
- W4323556896 hasConceptScore W4323556896C90805587 @default.
- W4323556896 hasLocation W43235568961 @default.
- W4323556896 hasOpenAccess W4323556896 @default.
- W4323556896 hasPrimaryLocation W43235568961 @default.
- W4323556896 hasRelatedWork W1690232763 @default.
- W4323556896 hasRelatedWork W2220350356 @default.
- W4323556896 hasRelatedWork W2412793475 @default.
- W4323556896 hasRelatedWork W2577871717 @default.
- W4323556896 hasRelatedWork W2624106077 @default.
- W4323556896 hasRelatedWork W2743781660 @default.
- W4323556896 hasRelatedWork W3016376159 @default.
- W4323556896 hasRelatedWork W3100772908 @default.
- W4323556896 hasRelatedWork W3107474891 @default.
- W4323556896 hasRelatedWork W2121178211 @default.
- W4323556896 isParatext "false" @default.
- W4323556896 isRetracted "false" @default.
- W4323556896 workType "article" @default.