Matches in SemOpenAlex for { <https://semopenalex.org/work/W4295867992> ?p ?o ?g. }
- W4295867992 endingPage "1130" @default.
- W4295867992 startingPage "1103" @default.
- W4295867992 abstract "In recent years, Cross-Lingual Text Reuse Detection (CLTRD) has attracted the attention of the research community because large digital repositories and efficient Machine Translation systems are readily and freely available, which makes it easier to reuse text across the languages and very difficult to detect it. In the previous studies, the problem of CLTRD for the English-Urdu language pair has been explored at the sentence/passage and document level, and benchmark corpora and methods have been developed. However, there is a lack of benchmark corpora and methods for the CLTRD for the English-Urdu language pair at the lexical, syntactical, and phrasal levels. To fulfill this research gap, this study presents three large benchmark corpora for detecting the Cross-Lingual Text Reuse (CLTR) at three levels of rewrite (Wholly Derived (WD), Partially Derived (PD), and Non Derived (ND)). The CLEU-Lex, CLEU-Syn and CLEU-Phr corpora contain 66,485 (WD = 22,236, PD = 20,315 and ND = 23,934), 60,267 (WD = 20,007, PD = 16,979 and ND = 23,281) and 60,106 (WD = 23,862, PD = 15,878 and ND = 20,366) CLTR pairs respectively. As a secondary major contribution, we have applied the Cross-Lingual Word Embedding (CLWE), Cross-Lingual Semantic Tagger (CLST), and Cross-Lingual Sentence Transformer (CLSTR) based methods on our three proposed corpora for the CLTRD. Our extensive experimentation showed that for the binary classification task, the best results on the CLEU-Lex corpus were obtained using the cross-lingual sentence transformer ( $$F_{1}$$ = 0.80). For the CLEU-Syn and CLEU-Phr corpora, the best results were obtained using the cross-lingual sentence transformer and a combination of the CLWE, CLST and CLSTR methods ( $$F_{1}$$ = 0.92 on CLEU-Syn and $$F_{1}$$ = 0.94 on CLEU-Phr). For the ternary classification task, the best results on the CLEU-Lex corpus were obtained using the cross-lingual sentence transformer method ( $$F_{1}$$ = 0.69). For the CLEU-Syn corpus, the best results were obtained using a combination of the CLWE, CLST, and CLSTR methods ( $$F_{1}$$ = 0.82). For the CLEU-Phr corpus the best results were obtained using cross-lingual sentence transformer and combination of CLWE, CLST, and CLSTR methods ( $$F_{1}$$ = 0.78). To foster and promote research in Urdu (a low-resourced language) all the three proposed corpora are free and publicly available for research purposes." @default.
- W4295867992 created "2022-09-15" @default.
- W4295867992 creator A5005027058 @default.
- W4295867992 creator A5063867136 @default.
- W4295867992 date "2022-09-09" @default.
- W4295867992 modified "2023-09-26" @default.
- W4295867992 title "Develop corpora and methods for cross-lingual text reuse detection for English Urdu language pair at lexical, syntactical, and phrasal levels" @default.
- W4295867992 cites W1019309611 @default.
- W4295867992 cites W1496503889 @default.
- W4295867992 cites W1518684876 @default.
- W4295867992 cites W1542647698 @default.
- W4295867992 cites W1543039027 @default.
- W4295867992 cites W1574855751 @default.
- W4295867992 cites W1814992895 @default.
- W4295867992 cites W1986926120 @default.
- W4295867992 cites W2019102947 @default.
- W4295867992 cites W2028742638 @default.
- W4295867992 cites W2028776121 @default.
- W4295867992 cites W2077656994 @default.
- W4295867992 cites W2100022607 @default.
- W4295867992 cites W2109310450 @default.
- W4295867992 cites W2131752763 @default.
- W4295867992 cites W2250539671 @default.
- W4295867992 cites W2250879510 @default.
- W4295867992 cites W2288894564 @default.
- W4295867992 cites W2471561731 @default.
- W4295867992 cites W2472403012 @default.
- W4295867992 cites W2491664569 @default.
- W4295867992 cites W2497040301 @default.
- W4295867992 cites W2500036977 @default.
- W4295867992 cites W2508661403 @default.
- W4295867992 cites W2508865106 @default.
- W4295867992 cites W2510940142 @default.
- W4295867992 cites W2524168323 @default.
- W4295867992 cites W2568128436 @default.
- W4295867992 cites W2768029001 @default.
- W4295867992 cites W2768744233 @default.
- W4295867992 cites W2789536546 @default.
- W4295867992 cites W2794617941 @default.
- W4295867992 cites W2910818272 @default.
- W4295867992 cites W2921254041 @default.
- W4295867992 cites W2922047327 @default.
- W4295867992 cites W2947265974 @default.
- W4295867992 cites W2962795068 @default.
- W4295867992 cites W2962843304 @default.
- W4295867992 cites W2963918774 @default.
- W4295867992 cites W2970641574 @default.
- W4295867992 cites W2972812366 @default.
- W4295867992 cites W2974216175 @default.
- W4295867992 cites W2989622715 @default.
- W4295867992 cites W3008768290 @default.
- W4295867992 cites W3094038429 @default.
- W4295867992 cites W3100598124 @default.
- W4295867992 cites W3100806282 @default.
- W4295867992 cites W3117341716 @default.
- W4295867992 cites W3154065069 @default.
- W4295867992 cites W3174499303 @default.
- W4295867992 doi "https://doi.org/10.1007/s10579-022-09613-4" @default.
- W4295867992 hasPublicationYear "2022" @default.
- W4295867992 type Work @default.
- W4295867992 citedByCount "0" @default.
- W4295867992 crossrefType "journal-article" @default.
- W4295867992 hasAuthorship W4295867992A5005027058 @default.
- W4295867992 hasAuthorship W4295867992A5063867136 @default.
- W4295867992 hasBestOaLocation W42958679921 @default.
- W4295867992 hasConcept C121332964 @default.
- W4295867992 hasConcept C13280743 @default.
- W4295867992 hasConcept C138885662 @default.
- W4295867992 hasConcept C154945302 @default.
- W4295867992 hasConcept C165801399 @default.
- W4295867992 hasConcept C185798385 @default.
- W4295867992 hasConcept C203005215 @default.
- W4295867992 hasConcept C204321447 @default.
- W4295867992 hasConcept C205649164 @default.
- W4295867992 hasConcept C2777350258 @default.
- W4295867992 hasConcept C2777530160 @default.
- W4295867992 hasConcept C41008148 @default.
- W4295867992 hasConcept C41895202 @default.
- W4295867992 hasConcept C62520636 @default.
- W4295867992 hasConcept C66322947 @default.
- W4295867992 hasConceptScore W4295867992C121332964 @default.
- W4295867992 hasConceptScore W4295867992C13280743 @default.
- W4295867992 hasConceptScore W4295867992C138885662 @default.
- W4295867992 hasConceptScore W4295867992C154945302 @default.
- W4295867992 hasConceptScore W4295867992C165801399 @default.
- W4295867992 hasConceptScore W4295867992C185798385 @default.
- W4295867992 hasConceptScore W4295867992C203005215 @default.
- W4295867992 hasConceptScore W4295867992C204321447 @default.
- W4295867992 hasConceptScore W4295867992C205649164 @default.
- W4295867992 hasConceptScore W4295867992C2777350258 @default.
- W4295867992 hasConceptScore W4295867992C2777530160 @default.
- W4295867992 hasConceptScore W4295867992C41008148 @default.
- W4295867992 hasConceptScore W4295867992C41895202 @default.
- W4295867992 hasConceptScore W4295867992C62520636 @default.
- W4295867992 hasConceptScore W4295867992C66322947 @default.
- W4295867992 hasIssue "4" @default.
- W4295867992 hasLocation W42958679921 @default.
- W4295867992 hasOpenAccess W4295867992 @default.