Matches in SemOpenAlex for { <https://semopenalex.org/work/W4386839907> ?p ?o ?g. }
Showing items 1 to 75 of
75
with 100 items per page.
- W4386839907 abstract "Large language models have astounded the world with fascinating new capabilities. However, they currently lack the ability to teach themselves new skills, relying instead on being trained on large amounts of human-generated data. We introduce SECToR (Self-Education via Chain-of-Thought Reasoning), a proof-of-concept demonstration that language models can successfully teach themselves new skills using chain-of-thought reasoning. Inspired by previous work in both reinforcement learning (Silver et al., 2017) and human cognition (Kahneman, 2011), SECToR first uses chain-of-thought reasoning to slowly think its way through problems. SECToR then fine-tunes the model to generate those same answers, this time without using chain-of-thought reasoning. Language models trained via SECToR autonomously learn to add up to 29-digit numbers without any access to any ground truth examples beyond an initial supervised fine-tuning phase consisting only of numbers with 6 or fewer digits. Our central hypothesis is that chain-of-thought reasoning can act as a policy improvement operator, analogously to how Monte-Carlo Tree Search is used in AlphaZero. We hope that this research can lead to new directions in which language models can learn to teach themselves without the need for human demonstrations." @default.
- W4386839907 created "2023-09-19" @default.
- W4386839907 creator A5047201942 @default.
- W4386839907 creator A5086173064 @default.
- W4386839907 date "2023-09-15" @default.
- W4386839907 modified "2023-09-27" @default.
- W4386839907 title "Chain-of-Thought Reasoning is a Policy Improvement Operator" @default.
- W4386839907 doi "https://doi.org/10.48550/arxiv.2309.08589" @default.
- W4386839907 hasPublicationYear "2023" @default.
- W4386839907 type Work @default.
- W4386839907 citedByCount "0" @default.
- W4386839907 crossrefType "posted-content" @default.
- W4386839907 hasAuthorship W4386839907A5047201942 @default.
- W4386839907 hasAuthorship W4386839907A5086173064 @default.
- W4386839907 hasBestOaLocation W43868399071 @default.
- W4386839907 hasConcept C104317684 @default.
- W4386839907 hasConcept C105795698 @default.
- W4386839907 hasConcept C121332964 @default.
- W4386839907 hasConcept C127413603 @default.
- W4386839907 hasConcept C1276947 @default.
- W4386839907 hasConcept C154945302 @default.
- W4386839907 hasConcept C15744967 @default.
- W4386839907 hasConcept C158448853 @default.
- W4386839907 hasConcept C169760540 @default.
- W4386839907 hasConcept C169900460 @default.
- W4386839907 hasConcept C17020691 @default.
- W4386839907 hasConcept C185592680 @default.
- W4386839907 hasConcept C18762648 @default.
- W4386839907 hasConcept C188147891 @default.
- W4386839907 hasConcept C19499675 @default.
- W4386839907 hasConcept C199185054 @default.
- W4386839907 hasConcept C33923547 @default.
- W4386839907 hasConcept C41008148 @default.
- W4386839907 hasConcept C46149586 @default.
- W4386839907 hasConcept C55493867 @default.
- W4386839907 hasConcept C78519656 @default.
- W4386839907 hasConcept C86339819 @default.
- W4386839907 hasConceptScore W4386839907C104317684 @default.
- W4386839907 hasConceptScore W4386839907C105795698 @default.
- W4386839907 hasConceptScore W4386839907C121332964 @default.
- W4386839907 hasConceptScore W4386839907C127413603 @default.
- W4386839907 hasConceptScore W4386839907C1276947 @default.
- W4386839907 hasConceptScore W4386839907C154945302 @default.
- W4386839907 hasConceptScore W4386839907C15744967 @default.
- W4386839907 hasConceptScore W4386839907C158448853 @default.
- W4386839907 hasConceptScore W4386839907C169760540 @default.
- W4386839907 hasConceptScore W4386839907C169900460 @default.
- W4386839907 hasConceptScore W4386839907C17020691 @default.
- W4386839907 hasConceptScore W4386839907C185592680 @default.
- W4386839907 hasConceptScore W4386839907C18762648 @default.
- W4386839907 hasConceptScore W4386839907C188147891 @default.
- W4386839907 hasConceptScore W4386839907C19499675 @default.
- W4386839907 hasConceptScore W4386839907C199185054 @default.
- W4386839907 hasConceptScore W4386839907C33923547 @default.
- W4386839907 hasConceptScore W4386839907C41008148 @default.
- W4386839907 hasConceptScore W4386839907C46149586 @default.
- W4386839907 hasConceptScore W4386839907C55493867 @default.
- W4386839907 hasConceptScore W4386839907C78519656 @default.
- W4386839907 hasConceptScore W4386839907C86339819 @default.
- W4386839907 hasLocation W43868399071 @default.
- W4386839907 hasOpenAccess W4386839907 @default.
- W4386839907 hasPrimaryLocation W43868399071 @default.
- W4386839907 hasRelatedWork W2075742431 @default.
- W4386839907 hasRelatedWork W2081348319 @default.
- W4386839907 hasRelatedWork W2157280774 @default.
- W4386839907 hasRelatedWork W2218506202 @default.
- W4386839907 hasRelatedWork W2770256266 @default.
- W4386839907 hasRelatedWork W2808901550 @default.
- W4386839907 hasRelatedWork W2900848205 @default.
- W4386839907 hasRelatedWork W2954459523 @default.
- W4386839907 hasRelatedWork W3021042235 @default.
- W4386839907 hasRelatedWork W597958479 @default.
- W4386839907 isParatext "false" @default.
- W4386839907 isRetracted "false" @default.
- W4386839907 workType "article" @default.