Matches in SemOpenAlex for { <https://semopenalex.org/work/W2953405964> ?p ?o ?g. }
Showing items 1 to 65 of
65
with 100 items per page.
- W2953405964 abstract "The digital era floods us with an excessive amount of text data. To make sense of such data automatically, there is an increasing demand for accurate numerical word representations. The complexity of natural languages motivates to represent words with high dimensional vectors. However, learning in a high dimensional space is challenging when the amount of training data is noisy and scarce. Additionally, lingual dependencies complicate learning, mostly because computational resources are limited and typically insufficient to account for all possible dependencies. This thesis addresses both challenges by following a probabilistic machine learning approach to find vectors, word embeddings, performing well under aforementioned limitations. An important finding of this thesis is that by bounding the length of the vector that represents a word as well as penalizing the discrepancy between vectors representing different words make a word embedding robust, which is especially beneficial when noisy and little training data is available. This finding is important because it shows how current word embedding methods are sensitive to small variations in the training data. Although, one might conclude from this finding that more data is not necessary anymore, this thesis does show that training on multiple sources, such as dictionaries and thesaurus, improves performance. But, each data source should be treated carefully, and it is important to weigh informative parts of each data source appropriately. To deal with lingual dependencies, this thesis introduces statistical negative sampling with which the learning objective of a word embedding can be approximated. There are many degrees of freedom in the approximated learning objective, and this thesis argues that current embedding strategies are based on weak heuristics to constrain these freedoms. Novel and more theoretical founded constraints are being proposed to constrain the approximations that are based on global statistics and maximum entropy. Finally, many words in a natural language have multiple meanings, and current word embeddings do not address this because they are built on a common assumption that one vector per word representation can capture all word meanings. This thesis shows that a representation based on multiple vectors per word easily overcomes this limitation by having different vectors representing the different meanings of a word. Taken together, this thesis proposes new insights and a more theoretical foundation for word embeddings which are important to create more powerful models able to deal with the complexity of natural languages." @default.
- W2953405964 created "2019-07-12" @default.
- W2953405964 creator A5051348049 @default.
- W2953405964 date "2019-06-07" @default.
- W2953405964 modified "2023-09-27" @default.
- W2953405964 title "Exponential Word Embeddings: Models and Approximate Learning" @default.
- W2953405964 hasPublicationYear "2019" @default.
- W2953405964 type Work @default.
- W2953405964 sameAs 2953405964 @default.
- W2953405964 citedByCount "0" @default.
- W2953405964 crossrefType "journal-article" @default.
- W2953405964 hasAuthorship W2953405964A5051348049 @default.
- W2953405964 hasConcept C111919701 @default.
- W2953405964 hasConcept C119857082 @default.
- W2953405964 hasConcept C154945302 @default.
- W2953405964 hasConcept C204321447 @default.
- W2953405964 hasConcept C2524010 @default.
- W2953405964 hasConcept C2777462759 @default.
- W2953405964 hasConcept C2778572836 @default.
- W2953405964 hasConcept C33923547 @default.
- W2953405964 hasConcept C41008148 @default.
- W2953405964 hasConcept C41608201 @default.
- W2953405964 hasConcept C49937458 @default.
- W2953405964 hasConcept C63584917 @default.
- W2953405964 hasConcept C90805587 @default.
- W2953405964 hasConceptScore W2953405964C111919701 @default.
- W2953405964 hasConceptScore W2953405964C119857082 @default.
- W2953405964 hasConceptScore W2953405964C154945302 @default.
- W2953405964 hasConceptScore W2953405964C204321447 @default.
- W2953405964 hasConceptScore W2953405964C2524010 @default.
- W2953405964 hasConceptScore W2953405964C2777462759 @default.
- W2953405964 hasConceptScore W2953405964C2778572836 @default.
- W2953405964 hasConceptScore W2953405964C33923547 @default.
- W2953405964 hasConceptScore W2953405964C41008148 @default.
- W2953405964 hasConceptScore W2953405964C41608201 @default.
- W2953405964 hasConceptScore W2953405964C49937458 @default.
- W2953405964 hasConceptScore W2953405964C63584917 @default.
- W2953405964 hasConceptScore W2953405964C90805587 @default.
- W2953405964 hasLocation W29534059641 @default.
- W2953405964 hasOpenAccess W2953405964 @default.
- W2953405964 hasPrimaryLocation W29534059641 @default.
- W2953405964 hasRelatedWork W1937075317 @default.
- W2953405964 hasRelatedWork W1986321089 @default.
- W2953405964 hasRelatedWork W2563660874 @default.
- W2953405964 hasRelatedWork W2585927507 @default.
- W2953405964 hasRelatedWork W2587764909 @default.
- W2953405964 hasRelatedWork W2735548109 @default.
- W2953405964 hasRelatedWork W2738321088 @default.
- W2953405964 hasRelatedWork W2746855331 @default.
- W2953405964 hasRelatedWork W2804398514 @default.
- W2953405964 hasRelatedWork W2809234507 @default.
- W2953405964 hasRelatedWork W2921634120 @default.
- W2953405964 hasRelatedWork W2948607002 @default.
- W2953405964 hasRelatedWork W2949201587 @default.
- W2953405964 hasRelatedWork W2964474226 @default.
- W2953405964 hasRelatedWork W2976763303 @default.
- W2953405964 hasRelatedWork W2978411033 @default.
- W2953405964 hasRelatedWork W3028770966 @default.
- W2953405964 hasRelatedWork W3036683340 @default.
- W2953405964 hasRelatedWork W3116609822 @default.
- W2953405964 hasRelatedWork W3122121542 @default.
- W2953405964 isParatext "false" @default.
- W2953405964 isRetracted "false" @default.
- W2953405964 magId "2953405964" @default.
- W2953405964 workType "article" @default.