Matches in SemOpenAlex for { <https://semopenalex.org/work/W2912656237> ?p ?o ?g. }
- W2912656237 abstract "In this work, we introduce the MOldavian and ROmanian Dialectal COrpus (MOROCO), which is freely available for download at this https URL. The corpus contains 33564 samples of text (with over 10 million tokens) collected from the news domain. The samples belong to one of the following six topics: culture, finance, politics, science, sports and tech. The data set is divided into 21719 samples for training, 5921 samples for validation and another 5924 samples for testing. For each sample, we provide corresponding dialectal and category labels. This allows us to perform empirical studies on several classification tasks such as (i) binary discrimination of Moldavian versus Romanian text samples, (ii) intra-dialect multi-class categorization by topic and (iii) cross-dialect multi-class categorization by topic. We perform experiments using a shallow approach based on string kernels, as well as a novel deep approach based on character-level convolutional neural networks containing Squeeze-and-Excitation blocks. We also present and analyze the most discriminative features of our best performing model, before and after named entity removal." @default.
- W2912656237 created "2019-02-21" @default.
- W2912656237 creator A5030278465 @default.
- W2912656237 creator A5081017623 @default.
- W2912656237 date "2019-01-19" @default.
- W2912656237 modified "2023-09-26" @default.
- W2912656237 title "MOROCO: The Moldavian and Romanian Dialectal Corpus" @default.
- W2912656237 cites W1480643256 @default.
- W2912656237 cites W1510073064 @default.
- W2912656237 cites W1560965982 @default.
- W2912656237 cites W1665214252 @default.
- W2912656237 cites W1832693441 @default.
- W2912656237 cites W1938755728 @default.
- W2912656237 cites W2101609803 @default.
- W2912656237 cites W2109704865 @default.
- W2912656237 cites W2112796928 @default.
- W2912656237 cites W2125446229 @default.
- W2912656237 cites W2127421665 @default.
- W2912656237 cites W2153579005 @default.
- W2912656237 cites W2163605009 @default.
- W2912656237 cites W2250272113 @default.
- W2912656237 cites W2250539671 @default.
- W2912656237 cites W2250698219 @default.
- W2912656237 cites W2251281667 @default.
- W2912656237 cites W2440817164 @default.
- W2912656237 cites W244375653 @default.
- W2912656237 cites W2552839021 @default.
- W2912656237 cites W2561747913 @default.
- W2912656237 cites W2573786697 @default.
- W2912656237 cites W2575880437 @default.
- W2912656237 cites W2620806258 @default.
- W2912656237 cites W2741287244 @default.
- W2912656237 cites W2741798456 @default.
- W2912656237 cites W2752530998 @default.
- W2912656237 cites W2785748711 @default.
- W2912656237 cites W2794998650 @default.
- W2912656237 cites W2806573136 @default.
- W2912656237 cites W2806962830 @default.
- W2912656237 cites W2888160543 @default.
- W2912656237 cites W2899166839 @default.
- W2912656237 cites W2914798463 @default.
- W2912656237 cites W2963012544 @default.
- W2912656237 cites W2963486098 @default.
- W2912656237 cites W2963499843 @default.
- W2912656237 cites W2963830885 @default.
- W2912656237 cites W2963842834 @default.
- W2912656237 cites W2963921497 @default.
- W2912656237 cites W2963970792 @default.
- W2912656237 cites W2964078312 @default.
- W2912656237 cites W2964121744 @default.
- W2912656237 cites W2964335814 @default.
- W2912656237 hasPublicationYear "2019" @default.
- W2912656237 type Work @default.
- W2912656237 sameAs 2912656237 @default.
- W2912656237 citedByCount "2" @default.
- W2912656237 countsByYear W29126562372019 @default.
- W2912656237 crossrefType "posted-content" @default.
- W2912656237 hasAuthorship W2912656237A5030278465 @default.
- W2912656237 hasAuthorship W2912656237A5081017623 @default.
- W2912656237 hasConcept C129400051 @default.
- W2912656237 hasConcept C138885662 @default.
- W2912656237 hasConcept C154945302 @default.
- W2912656237 hasConcept C157486923 @default.
- W2912656237 hasConcept C177264268 @default.
- W2912656237 hasConcept C185592680 @default.
- W2912656237 hasConcept C198531522 @default.
- W2912656237 hasConcept C199360897 @default.
- W2912656237 hasConcept C204321447 @default.
- W2912656237 hasConcept C2777212361 @default.
- W2912656237 hasConcept C33923547 @default.
- W2912656237 hasConcept C37914503 @default.
- W2912656237 hasConcept C41008148 @default.
- W2912656237 hasConcept C41895202 @default.
- W2912656237 hasConcept C43617362 @default.
- W2912656237 hasConcept C48372109 @default.
- W2912656237 hasConcept C81363708 @default.
- W2912656237 hasConcept C94124525 @default.
- W2912656237 hasConcept C94375191 @default.
- W2912656237 hasConcept C97931131 @default.
- W2912656237 hasConceptScore W2912656237C129400051 @default.
- W2912656237 hasConceptScore W2912656237C138885662 @default.
- W2912656237 hasConceptScore W2912656237C154945302 @default.
- W2912656237 hasConceptScore W2912656237C157486923 @default.
- W2912656237 hasConceptScore W2912656237C177264268 @default.
- W2912656237 hasConceptScore W2912656237C185592680 @default.
- W2912656237 hasConceptScore W2912656237C198531522 @default.
- W2912656237 hasConceptScore W2912656237C199360897 @default.
- W2912656237 hasConceptScore W2912656237C204321447 @default.
- W2912656237 hasConceptScore W2912656237C2777212361 @default.
- W2912656237 hasConceptScore W2912656237C33923547 @default.
- W2912656237 hasConceptScore W2912656237C37914503 @default.
- W2912656237 hasConceptScore W2912656237C41008148 @default.
- W2912656237 hasConceptScore W2912656237C41895202 @default.
- W2912656237 hasConceptScore W2912656237C43617362 @default.
- W2912656237 hasConceptScore W2912656237C48372109 @default.
- W2912656237 hasConceptScore W2912656237C81363708 @default.
- W2912656237 hasConceptScore W2912656237C94124525 @default.
- W2912656237 hasConceptScore W2912656237C94375191 @default.
- W2912656237 hasConceptScore W2912656237C97931131 @default.
- W2912656237 hasLocation W29126562371 @default.