Matches in SemOpenAlex for { <https://semopenalex.org/work/W3100772908> ?p ?o ?g. }
- W3100772908 abstract "Cross-lingual word embedding (CWE) algorithms represent words in multiple languages in a unified vector space. Multi-Word Expressions (MWE) are common in every language. When training word embeddings, each component word of an MWE gets its own separate embedding, and thus, MWEs are not translated by CWEs. We propose a simple method for word translation of MWEs to and from English in ten languages: we first compile lists of MWEs in each language and then tokenize the MWEs as single tokens before training word embeddings. CWEs are trained on a word-translation task using the dictionaries that only contain single words. In order to evaluate MWE translation, we created bilingual word lists from multilingual WordNet that include single-token words and MWEs, and most importantly, include MWEs that correspond to single words in another language. We release these dictionaries to the research community. We show that the pre-tokenization of MWEs as single tokens performs better than averaging the embeddings of the individual tokens of the MWE. We can translate MWEs at a top-10 precision of 30-60%. The tokenization of MWEs makes the occurrences of single words in a training corpus more sparse, but we show that it does not pose negative impacts on single-word translations." @default.
- W3100772908 created "2020-11-23" @default.
- W3100772908 creator A5008257790 @default.
- W3100772908 creator A5022129531 @default.
- W3100772908 creator A5068619322 @default.
- W3100772908 creator A5071109457 @default.
- W3100772908 creator A5074750562 @default.
- W3100772908 creator A5081417146 @default.
- W3100772908 date "2020-01-01" @default.
- W3100772908 modified "2023-09-23" @default.
- W3100772908 title "Pre-tokenization of Multi-word Expressions in Cross-lingual Word Embeddings" @default.
- W3100772908 cites W1549944563 @default.
- W3100772908 cites W1828724394 @default.
- W3100772908 cites W202122741 @default.
- W3100772908 cites W2126168798 @default.
- W3100772908 cites W2126725946 @default.
- W3100772908 cites W2160918952 @default.
- W3100772908 cites W2224490803 @default.
- W3100772908 cites W2250473257 @default.
- W3100772908 cites W2250599741 @default.
- W3100772908 cites W2250601048 @default.
- W3100772908 cites W2250621296 @default.
- W3100772908 cites W2251765408 @default.
- W3100772908 cites W2294774419 @default.
- W3100772908 cites W2467466606 @default.
- W3100772908 cites W2493916176 @default.
- W3100772908 cites W2573530704 @default.
- W3100772908 cites W2594021297 @default.
- W3100772908 cites W2664496537 @default.
- W3100772908 cites W2673116624 @default.
- W3100772908 cites W2740132093 @default.
- W3100772908 cites W2741602058 @default.
- W3100772908 cites W2757950651 @default.
- W3100772908 cites W2788353357 @default.
- W3100772908 cites W2882319491 @default.
- W3100772908 cites W2887838996 @default.
- W3100772908 cites W2898914189 @default.
- W3100772908 cites W2962712421 @default.
- W3100772908 cites W2963047628 @default.
- W3100772908 cites W2963118869 @default.
- W3100772908 cites W2963165489 @default.
- W3100772908 cites W2963626558 @default.
- W3100772908 cites W2963667932 @default.
- W3100772908 cites W2970095260 @default.
- W3100772908 cites W2970854517 @default.
- W3100772908 cites W3035537076 @default.
- W3100772908 cites W3087901827 @default.
- W3100772908 doi "https://doi.org/10.18653/v1/2020.emnlp-main.360" @default.
- W3100772908 hasPublicationYear "2020" @default.
- W3100772908 type Work @default.
- W3100772908 sameAs 3100772908 @default.
- W3100772908 citedByCount "1" @default.
- W3100772908 countsByYear W31007729082023 @default.
- W3100772908 crossrefType "proceedings-article" @default.
- W3100772908 hasAuthorship W3100772908A5008257790 @default.
- W3100772908 hasAuthorship W3100772908A5022129531 @default.
- W3100772908 hasAuthorship W3100772908A5068619322 @default.
- W3100772908 hasAuthorship W3100772908A5071109457 @default.
- W3100772908 hasAuthorship W3100772908A5074750562 @default.
- W3100772908 hasAuthorship W3100772908A5081417146 @default.
- W3100772908 hasBestOaLocation W31007729081 @default.
- W3100772908 hasConcept C104317684 @default.
- W3100772908 hasConcept C105580179 @default.
- W3100772908 hasConcept C149364088 @default.
- W3100772908 hasConcept C154945302 @default.
- W3100772908 hasConcept C157659113 @default.
- W3100772908 hasConcept C176982825 @default.
- W3100772908 hasConcept C185592680 @default.
- W3100772908 hasConcept C203005215 @default.
- W3100772908 hasConcept C204321447 @default.
- W3100772908 hasConcept C2524010 @default.
- W3100772908 hasConcept C2777462759 @default.
- W3100772908 hasConcept C2779235283 @default.
- W3100772908 hasConcept C28490314 @default.
- W3100772908 hasConcept C33923547 @default.
- W3100772908 hasConcept C38652104 @default.
- W3100772908 hasConcept C41008148 @default.
- W3100772908 hasConcept C41608201 @default.
- W3100772908 hasConcept C48145219 @default.
- W3100772908 hasConcept C55493867 @default.
- W3100772908 hasConcept C90805587 @default.
- W3100772908 hasConceptScore W3100772908C104317684 @default.
- W3100772908 hasConceptScore W3100772908C105580179 @default.
- W3100772908 hasConceptScore W3100772908C149364088 @default.
- W3100772908 hasConceptScore W3100772908C154945302 @default.
- W3100772908 hasConceptScore W3100772908C157659113 @default.
- W3100772908 hasConceptScore W3100772908C176982825 @default.
- W3100772908 hasConceptScore W3100772908C185592680 @default.
- W3100772908 hasConceptScore W3100772908C203005215 @default.
- W3100772908 hasConceptScore W3100772908C204321447 @default.
- W3100772908 hasConceptScore W3100772908C2524010 @default.
- W3100772908 hasConceptScore W3100772908C2777462759 @default.
- W3100772908 hasConceptScore W3100772908C2779235283 @default.
- W3100772908 hasConceptScore W3100772908C28490314 @default.
- W3100772908 hasConceptScore W3100772908C33923547 @default.
- W3100772908 hasConceptScore W3100772908C38652104 @default.
- W3100772908 hasConceptScore W3100772908C41008148 @default.
- W3100772908 hasConceptScore W3100772908C41608201 @default.
- W3100772908 hasConceptScore W3100772908C48145219 @default.
- W3100772908 hasConceptScore W3100772908C55493867 @default.