Matches in SemOpenAlex for { <https://semopenalex.org/work/W162629228> ?p ?o ?g. }
- W162629228 abstract "Abstract The development of linguistic data, especially annotated corpora, is imperative for the human language technology enablement of any language. The annotation process is, however, often time-consuming and expensive. As such, various projects make use of several strategies to expedite the development of human language technology resources. For resource-scarce languages – those with limited resources, finances and expertise – the efficiency of these strategies has not been conclusively established. This study investigates the efficiency of some of these strategies in the development of resources for resource-scarce languages, in order to provide recommendations for future projects facing decisions regarding which strategies they should implement. For all experiments, Afrikaans is used as an example of a resource-scarce language. Two tasks, viz. lemmatisation of text data and orthographic transcription of audio data, are evaluated in terms of quality and in terms of the time required to perform the task. The main focus of the study is on the skill level of the annotators, software environments which aim to improve the quality and time needed to perform annotations, and whether it is beneficial to annotate more data, or to increase the quality of the data. We outline and conduct systematic experiments on each of the three focus areas in order to determine the efficiency of each. First, we investigated the influence of a respondent’s skill level on data annotation by using untrained, sourced respondents for annotation of linguistic data for Afrikaans. We compared data annotated by experts, novices and laymen. From the results it was evident that the experts outperformed the nonexperts on both tasks, and that the differences in performance were statistically significant. Next, we investigated the effect of software environments on data annotation to determine the benefits of using tailor-made software as opposed to general-purpose or domain-specific software. The comparison showed that, for these two specific projects, it was beneficial in terms of time and quality to use tailor-made software rather than domain-specific or general-purpose software. However, in the context of linguistic annotation of data for resource-scarce languages, the additional time needed to develop tailor-made software is not justified by the savings in annotation time. Finally, we compared systems trained with data of varying levels of quality and quantity, to determine the impact of quality versus quantity on the performance of systems. When comparing systems trained with gold standard data to systems trained with more data containing a low level of errors, the systems" @default.
- W162629228 created "2016-06-24" @default.
- W162629228 creator A5049837150 @default.
- W162629228 date "2014-01-01" @default.
- W162629228 modified "2023-09-24" @default.
- W162629228 title "Efficient development of human language technology resources for resource-scarce languages" @default.
- W162629228 cites W103425580 @default.
- W162629228 cites W10477584 @default.
- W162629228 cites W104891186 @default.
- W162629228 cites W109203355 @default.
- W162629228 cites W110723894 @default.
- W162629228 cites W130097250 @default.
- W162629228 cites W130690827 @default.
- W162629228 cites W148348866 @default.
- W162629228 cites W1485955713 @default.
- W162629228 cites W1495975772 @default.
- W162629228 cites W1502599399 @default.
- W162629228 cites W1529524735 @default.
- W162629228 cites W1531318651 @default.
- W162629228 cites W1531393276 @default.
- W162629228 cites W1545419611 @default.
- W162629228 cites W1561665100 @default.
- W162629228 cites W1577841485 @default.
- W162629228 cites W1590411898 @default.
- W162629228 cites W1599116104 @default.
- W162629228 cites W1602773505 @default.
- W162629228 cites W1604134566 @default.
- W162629228 cites W164440181 @default.
- W162629228 cites W166137614 @default.
- W162629228 cites W1666146316 @default.
- W162629228 cites W1724528691 @default.
- W162629228 cites W1785101966 @default.
- W162629228 cites W1832601430 @default.
- W162629228 cites W1875231349 @default.
- W162629228 cites W1898031563 @default.
- W162629228 cites W1898139072 @default.
- W162629228 cites W1901790381 @default.
- W162629228 cites W1905966190 @default.
- W162629228 cites W19134732 @default.
- W162629228 cites W191422183 @default.
- W162629228 cites W1925341558 @default.
- W162629228 cites W19634895 @default.
- W162629228 cites W1970381522 @default.
- W162629228 cites W1972873825 @default.
- W162629228 cites W1973200191 @default.
- W162629228 cites W1973972371 @default.
- W162629228 cites W1985565920 @default.
- W162629228 cites W1990399826 @default.
- W162629228 cites W1994550352 @default.
- W162629228 cites W2003274120 @default.
- W162629228 cites W2010772122 @default.
- W162629228 cites W2024853524 @default.
- W162629228 cites W2028325794 @default.
- W162629228 cites W2034841618 @default.
- W162629228 cites W2039087642 @default.
- W162629228 cites W2040298461 @default.
- W162629228 cites W2043502271 @default.
- W162629228 cites W2043963840 @default.
- W162629228 cites W2044138293 @default.
- W162629228 cites W2049151043 @default.
- W162629228 cites W205373223 @default.
- W162629228 cites W2062256091 @default.
- W162629228 cites W2066154464 @default.
- W162629228 cites W2067098334 @default.
- W162629228 cites W2069485562 @default.
- W162629228 cites W2072271845 @default.
- W162629228 cites W2096335861 @default.
- W162629228 cites W2099170998 @default.
- W162629228 cites W2104225263 @default.
- W162629228 cites W2107031757 @default.
- W162629228 cites W2109439962 @default.
- W162629228 cites W2110764733 @default.
- W162629228 cites W2114269021 @default.
- W162629228 cites W2115199072 @default.
- W162629228 cites W2115915304 @default.
- W162629228 cites W2125943921 @default.
- W162629228 cites W2130155538 @default.
- W162629228 cites W2132232498 @default.
- W162629228 cites W2132903577 @default.
- W162629228 cites W2134899753 @default.
- W162629228 cites W2137228101 @default.
- W162629228 cites W2138445383 @default.
- W162629228 cites W2141573835 @default.
- W162629228 cites W2142279020 @default.
- W162629228 cites W2143539737 @default.
- W162629228 cites W2143562645 @default.
- W162629228 cites W2145111356 @default.
- W162629228 cites W2149816711 @default.
- W162629228 cites W2151739402 @default.
- W162629228 cites W2154624463 @default.
- W162629228 cites W2158847908 @default.
- W162629228 cites W2158880898 @default.
- W162629228 cites W2160322327 @default.
- W162629228 cites W2161964681 @default.
- W162629228 cites W2162545477 @default.
- W162629228 cites W2163037194 @default.
- W162629228 cites W2168842392 @default.
- W162629228 cites W2169463693 @default.
- W162629228 cites W2185195896 @default.
- W162629228 cites W2250734863 @default.