Matches in SemOpenAlex for { <https://semopenalex.org/work/W4200430816> ?p ?o ?g. }
- W4200430816 endingPage "640" @default.
- W4200430816 startingPage "613" @default.
- W4200430816 abstract "Abstract In low-resource domains, it is challenging to achieve good performance using existing machine learning methods due to a lack of training data and mixed data types (numeric and categorical). In particular, categorical variables with high cardinality pose a challenge to machine learning tasks such as classification and regression because training requires sufficiently many data points for the possible values of each variable. Since interpolation is not possible, nothing can be learned for values not seen in the training set. This paper presents a method that uses prior knowledge of the application domain to support machine learning in cases with insufficient data. We propose to address this challenge by using embeddings for categorical variables that are based on an explicit representation of domain knowledge (KR), namely a hierarchy of concepts. Our approach is to 1. define a semantic similarity measure between categories, based on the hierarchy—we propose a purely hierarchy-based measure, but other similarity measures from the literature can be used—and 2. use that similarity measure to define a modified one-hot encoding. We propose two embedding schemes for single-valued and multi-valued categorical data. We perform experiments on three different use cases. We first compare existing similarity approaches with our approach on a word pair similarity use case. This is followed by creating word embeddings using different similarity approaches. A comparison with existing methods such as Google, Word2Vec and GloVe embeddings on several benchmarks shows better performance on concept categorisation tasks when using knowledge-based embeddings. The third use case uses a medical dataset to compare the performance of semantic-based embeddings and standard binary encodings. Significant improvement in performance of the downstream classification tasks is achieved by using semantic information." @default.
- W4200430816 created "2021-12-31" @default.
- W4200430816 creator A5034963323 @default.
- W4200430816 creator A5074502354 @default.
- W4200430816 date "2021-12-28" @default.
- W4200430816 modified "2023-09-26" @default.
- W4200430816 title "Hierarchy-based semantic embeddings for single-valued & multi-valued categorical variables" @default.
- W4200430816 cites W1516154272 @default.
- W4200430816 cites W1571447506 @default.
- W4200430816 cites W1578620296 @default.
- W4200430816 cites W1597533344 @default.
- W4200430816 cites W1659833910 @default.
- W4200430816 cites W1742842785 @default.
- W4200430816 cites W1923180085 @default.
- W4200430816 cites W1990836268 @default.
- W4200430816 cites W2047878524 @default.
- W4200430816 cites W2080100102 @default.
- W4200430816 cites W2084377579 @default.
- W4200430816 cites W2103318667 @default.
- W4200430816 cites W2111685427 @default.
- W4200430816 cites W2140480387 @default.
- W4200430816 cites W2140898374 @default.
- W4200430816 cites W2141628941 @default.
- W4200430816 cites W2148143831 @default.
- W4200430816 cites W2163952039 @default.
- W4200430816 cites W2250539671 @default.
- W4200430816 cites W2396881363 @default.
- W4200430816 cites W2509960255 @default.
- W4200430816 cites W2765449478 @default.
- W4200430816 cites W2783777501 @default.
- W4200430816 cites W2806031239 @default.
- W4200430816 cites W2811024573 @default.
- W4200430816 cites W2884238387 @default.
- W4200430816 cites W2889198823 @default.
- W4200430816 cites W2947817682 @default.
- W4200430816 cites W2954671964 @default.
- W4200430816 cites W2954699761 @default.
- W4200430816 cites W2956530858 @default.
- W4200430816 cites W2963105378 @default.
- W4200430816 cites W2982781344 @default.
- W4200430816 cites W3104846106 @default.
- W4200430816 cites W3216404684 @default.
- W4200430816 cites W4213009331 @default.
- W4200430816 cites W4292402161 @default.
- W4200430816 cites W89242458 @default.
- W4200430816 doi "https://doi.org/10.1007/s10844-021-00693-2" @default.
- W4200430816 hasPublicationYear "2021" @default.
- W4200430816 type Work @default.
- W4200430816 citedByCount "4" @default.
- W4200430816 countsByYear W42004308162022 @default.
- W4200430816 countsByYear W42004308162023 @default.
- W4200430816 crossrefType "journal-article" @default.
- W4200430816 hasAuthorship W4200430816A5034963323 @default.
- W4200430816 hasAuthorship W4200430816A5074502354 @default.
- W4200430816 hasBestOaLocation W42004308161 @default.
- W4200430816 hasConcept C103278499 @default.
- W4200430816 hasConcept C115961682 @default.
- W4200430816 hasConcept C119857082 @default.
- W4200430816 hasConcept C124101348 @default.
- W4200430816 hasConcept C130318100 @default.
- W4200430816 hasConcept C154945302 @default.
- W4200430816 hasConcept C162324750 @default.
- W4200430816 hasConcept C177264268 @default.
- W4200430816 hasConcept C199360897 @default.
- W4200430816 hasConcept C204321447 @default.
- W4200430816 hasConcept C2776461190 @default.
- W4200430816 hasConcept C2776517306 @default.
- W4200430816 hasConcept C2780009758 @default.
- W4200430816 hasConcept C31170391 @default.
- W4200430816 hasConcept C34447519 @default.
- W4200430816 hasConcept C41008148 @default.
- W4200430816 hasConcept C41608201 @default.
- W4200430816 hasConcept C5274069 @default.
- W4200430816 hasConcept C87117476 @default.
- W4200430816 hasConceptScore W4200430816C103278499 @default.
- W4200430816 hasConceptScore W4200430816C115961682 @default.
- W4200430816 hasConceptScore W4200430816C119857082 @default.
- W4200430816 hasConceptScore W4200430816C124101348 @default.
- W4200430816 hasConceptScore W4200430816C130318100 @default.
- W4200430816 hasConceptScore W4200430816C154945302 @default.
- W4200430816 hasConceptScore W4200430816C162324750 @default.
- W4200430816 hasConceptScore W4200430816C177264268 @default.
- W4200430816 hasConceptScore W4200430816C199360897 @default.
- W4200430816 hasConceptScore W4200430816C204321447 @default.
- W4200430816 hasConceptScore W4200430816C2776461190 @default.
- W4200430816 hasConceptScore W4200430816C2776517306 @default.
- W4200430816 hasConceptScore W4200430816C2780009758 @default.
- W4200430816 hasConceptScore W4200430816C31170391 @default.
- W4200430816 hasConceptScore W4200430816C34447519 @default.
- W4200430816 hasConceptScore W4200430816C41008148 @default.
- W4200430816 hasConceptScore W4200430816C41608201 @default.
- W4200430816 hasConceptScore W4200430816C5274069 @default.
- W4200430816 hasConceptScore W4200430816C87117476 @default.
- W4200430816 hasFunder F4320323260 @default.
- W4200430816 hasFunder F4320323299 @default.
- W4200430816 hasIssue "3" @default.
- W4200430816 hasLocation W42004308161 @default.
- W4200430816 hasOpenAccess W4200430816 @default.