SemOpenAlex |

SemOpenAlex

Matches in SemOpenAlex for { <https://semopenalex.org/work/W4380136083> ?p ?o ?g. }

Showing items 1 to 67 of 67 with 100 items per page.

W4380136083 abstract "In computational social science (CSS), researchers analyze documents to explain social and political phenomena. In most scenarios, CSS researchers first obtain labels for documents and then explain labels using interpretable regression analyses in the second step. The recent advancements in large language models (LLMs) can lower costs for CSS research by annotating documents cheaply at scale, but such surrogate labels are often imperfect and biased. We present a new algorithm for using outputs from LLMs for downstream statistical analyses while guaranteeing statistical properties -- like asymptotic unbiasedness and proper uncertainty quantification -- which are fundamental to CSS research. We show that direct use of LLM-predicted surrogate labels in downstream statistical analyses leads to substantial bias and invalid confidence intervals, even with high surrogate accuracy of 80--90%. To address this, we build on debiased machine learning to propose the design-based semi-supervised learning (DSL) estimator. DSL employs a doubly-robust procedure to combine surrogate labels with a smaller number of gold-standard labels. Our approach guarantees valid inference for downstream statistical analyses, even when surrogates are arbitrarily biased, without requiring stringent assumptions, by controlling the probability of sampling documents for gold-standard labeling. Both our theoretical analysis and experimental results show that DSL provides valid statistical inference while achieving root mean squared errors comparable to existing alternatives that focus only on prediction without statistical guarantees." @default.
W4380136083 created "2023-06-10" @default.
W4380136083 creator A5040129440 @default.
W4380136083 creator A5041445378 @default.
W4380136083 creator A5088390401 @default.
W4380136083 creator A5092130648 @default.
W4380136083 date "2023-06-07" @default.
W4380136083 modified "2023-09-25" @default.
W4380136083 title "Using Large Language Model Annotations for Valid Downstream Statistical Inference in Social Science: Design-Based Semi-Supervised Learning" @default.
W4380136083 doi "https://doi.org/10.48550/arxiv.2306.04746" @default.
W4380136083 hasPublicationYear "2023" @default.
W4380136083 type Work @default.
W4380136083 citedByCount "0" @default.
W4380136083 crossrefType "posted-content" @default.
W4380136083 hasAuthorship W4380136083A5040129440 @default.
W4380136083 hasAuthorship W4380136083A5041445378 @default.
W4380136083 hasAuthorship W4380136083A5088390401 @default.
W4380136083 hasAuthorship W4380136083A5092130648 @default.
W4380136083 hasBestOaLocation W43801360831 @default.
W4380136083 hasConcept C105795698 @default.
W4380136083 hasConcept C114289077 @default.
W4380136083 hasConcept C119857082 @default.
W4380136083 hasConcept C124101348 @default.
W4380136083 hasConcept C131675550 @default.
W4380136083 hasConcept C134261354 @default.
W4380136083 hasConcept C154945302 @default.
W4380136083 hasConcept C162324750 @default.
W4380136083 hasConcept C185429906 @default.
W4380136083 hasConcept C201374245 @default.
W4380136083 hasConcept C21547014 @default.
W4380136083 hasConcept C2776207758 @default.
W4380136083 hasConcept C2776214188 @default.
W4380136083 hasConcept C33923547 @default.
W4380136083 hasConcept C41008148 @default.
W4380136083 hasConcept C76155785 @default.
W4380136083 hasConceptScore W4380136083C105795698 @default.
W4380136083 hasConceptScore W4380136083C114289077 @default.
W4380136083 hasConceptScore W4380136083C119857082 @default.
W4380136083 hasConceptScore W4380136083C124101348 @default.
W4380136083 hasConceptScore W4380136083C131675550 @default.
W4380136083 hasConceptScore W4380136083C134261354 @default.
W4380136083 hasConceptScore W4380136083C154945302 @default.
W4380136083 hasConceptScore W4380136083C162324750 @default.
W4380136083 hasConceptScore W4380136083C185429906 @default.
W4380136083 hasConceptScore W4380136083C201374245 @default.
W4380136083 hasConceptScore W4380136083C21547014 @default.
W4380136083 hasConceptScore W4380136083C2776207758 @default.
W4380136083 hasConceptScore W4380136083C2776214188 @default.
W4380136083 hasConceptScore W4380136083C33923547 @default.
W4380136083 hasConceptScore W4380136083C41008148 @default.
W4380136083 hasConceptScore W4380136083C76155785 @default.
W4380136083 hasLocation W43801360831 @default.
W4380136083 hasOpenAccess W4380136083 @default.
W4380136083 hasPrimaryLocation W43801360831 @default.
W4380136083 hasRelatedWork W2949919985 @default.
W4380136083 hasRelatedWork W2963058055 @default.
W4380136083 hasRelatedWork W2981347089 @default.
W4380136083 hasRelatedWork W3001657888 @default.
W4380136083 hasRelatedWork W3159730769 @default.
W4380136083 hasRelatedWork W3183730129 @default.
W4380136083 hasRelatedWork W4236246625 @default.
W4380136083 hasRelatedWork W4287197217 @default.
W4380136083 hasRelatedWork W4323074878 @default.
W4380136083 hasRelatedWork W3123288520 @default.
W4380136083 isParatext "false" @default.
W4380136083 isRetracted "false" @default.
W4380136083 workType "article" @default.