Matches in SemOpenAlex for { <https://semopenalex.org/work/W4387060314> ?p ?o ?g. }
Showing items 1 to 86 of
86
with 100 items per page.
- W4387060314 endingPage "76" @default.
- W4387060314 startingPage "76" @default.
- W4387060314 abstract "The evaluation of similarities between natural languages often relies on prior knowledge of the languages being studied. We describe three methods for building phylogenetic trees and clustering languages without the use of language-specific information. The input to our methods is a set of document vectors trained on a corpus of parallel translations of the Bible into 22 Indo-European languages, representing 4 language families: Indo-Iranian, Slavic, Germanic, and Romance. This text corpus consists of a set of 532,092 Bible verses, with 24,186 identical verses translated into each language. The methods are (A) hierarchical clustering using distance between language vector centroids, (B) hierarchical clustering using a network-derived distance measure, and (C) Deep Embedded Clustering (DEC) of language vectors. We evaluate our methods using a ground-truth tree and language families derived from said tree. All three achieve clustering F-scores above 0.9 on the Indo-Iranian and Slavic families; most confusion is between the Germanic and Romance families. The mean F-scores across all families are 0.864 (centroid clustering), 0.953 (network partitioning), and 0.763 (DEC). This shows that document vectors can be used to capture and compare linguistic features of multilingual texts, and thus could help extend language similarity and other translation studies research." @default.
- W4387060314 created "2023-09-27" @default.
- W4387060314 creator A5078722503 @default.
- W4387060314 creator A5086831276 @default.
- W4387060314 date "2023-09-26" @default.
- W4387060314 modified "2023-09-27" @default.
- W4387060314 title "Analyzing Indo-European Language Similarities Using Document Vectors" @default.
- W4387060314 cites W1979618484 @default.
- W4387060314 cites W1987836480 @default.
- W4387060314 cites W2009506985 @default.
- W4387060314 cites W2016381774 @default.
- W4387060314 cites W2060425093 @default.
- W4387060314 cites W2095293504 @default.
- W4387060314 cites W2117801354 @default.
- W4387060314 cites W2131681506 @default.
- W4387060314 cites W2148374900 @default.
- W4387060314 cites W2171990733 @default.
- W4387060314 cites W2222512263 @default.
- W4387060314 cites W2566957588 @default.
- W4387060314 cites W2740656274 @default.
- W4387060314 cites W2777073510 @default.
- W4387060314 cites W2886643713 @default.
- W4387060314 cites W2963826397 @default.
- W4387060314 cites W2991388718 @default.
- W4387060314 cites W3017290615 @default.
- W4387060314 cites W3212926742 @default.
- W4387060314 doi "https://doi.org/10.3390/informatics10040076" @default.
- W4387060314 hasPublicationYear "2023" @default.
- W4387060314 type Work @default.
- W4387060314 citedByCount "0" @default.
- W4387060314 crossrefType "journal-article" @default.
- W4387060314 hasAuthorship W4387060314A5078722503 @default.
- W4387060314 hasAuthorship W4387060314A5086831276 @default.
- W4387060314 hasBestOaLocation W43870603141 @default.
- W4387060314 hasConcept C103278499 @default.
- W4387060314 hasConcept C113174947 @default.
- W4387060314 hasConcept C115961682 @default.
- W4387060314 hasConcept C121894898 @default.
- W4387060314 hasConcept C134306372 @default.
- W4387060314 hasConcept C138885662 @default.
- W4387060314 hasConcept C154945302 @default.
- W4387060314 hasConcept C177264268 @default.
- W4387060314 hasConcept C199360897 @default.
- W4387060314 hasConcept C204321447 @default.
- W4387060314 hasConcept C2780566098 @default.
- W4387060314 hasConcept C33923547 @default.
- W4387060314 hasConcept C41008148 @default.
- W4387060314 hasConcept C41895202 @default.
- W4387060314 hasConcept C73555534 @default.
- W4387060314 hasConcept C92835128 @default.
- W4387060314 hasConceptScore W4387060314C103278499 @default.
- W4387060314 hasConceptScore W4387060314C113174947 @default.
- W4387060314 hasConceptScore W4387060314C115961682 @default.
- W4387060314 hasConceptScore W4387060314C121894898 @default.
- W4387060314 hasConceptScore W4387060314C134306372 @default.
- W4387060314 hasConceptScore W4387060314C138885662 @default.
- W4387060314 hasConceptScore W4387060314C154945302 @default.
- W4387060314 hasConceptScore W4387060314C177264268 @default.
- W4387060314 hasConceptScore W4387060314C199360897 @default.
- W4387060314 hasConceptScore W4387060314C204321447 @default.
- W4387060314 hasConceptScore W4387060314C2780566098 @default.
- W4387060314 hasConceptScore W4387060314C33923547 @default.
- W4387060314 hasConceptScore W4387060314C41008148 @default.
- W4387060314 hasConceptScore W4387060314C41895202 @default.
- W4387060314 hasConceptScore W4387060314C73555534 @default.
- W4387060314 hasConceptScore W4387060314C92835128 @default.
- W4387060314 hasIssue "4" @default.
- W4387060314 hasLocation W43870603141 @default.
- W4387060314 hasOpenAccess W4387060314 @default.
- W4387060314 hasPrimaryLocation W43870603141 @default.
- W4387060314 hasRelatedWork W1969469788 @default.
- W4387060314 hasRelatedWork W2056256868 @default.
- W4387060314 hasRelatedWork W2150657319 @default.
- W4387060314 hasRelatedWork W2359107151 @default.
- W4387060314 hasRelatedWork W2390573116 @default.
- W4387060314 hasRelatedWork W2786195892 @default.
- W4387060314 hasRelatedWork W3116600026 @default.
- W4387060314 hasRelatedWork W3207101233 @default.
- W4387060314 hasRelatedWork W4206585807 @default.
- W4387060314 hasRelatedWork W4304204731 @default.
- W4387060314 hasVolume "10" @default.
- W4387060314 isParatext "false" @default.
- W4387060314 isRetracted "false" @default.
- W4387060314 workType "article" @default.