Matches in SemOpenAlex for { <https://semopenalex.org/work/W4387156632> ?p ?o ?g. }
Showing items 1 to 73 of
73
with 100 items per page.
- W4387156632 abstract "Inference-time search algorithms such as Monte-Carlo Tree Search (MCTS) may seem unnecessary when generating natural language text based on state-of-the-art reinforcement learning such as Proximal Policy Optimization (PPO). In this paper, we demonstrate that it is possible to get extra mileage out of PPO by integrating MCTS on top. The key idea is not to throw out the value network, a byproduct of PPO training for evaluating partial output sequences, when decoding text out of the policy network. More concretely, we present a novel value-guided decoding algorithm called PPO-MCTS, which can integrate the value network from PPO to work closely with the policy network during inference-time generation. Compared to prior approaches based on MCTS for controlled text generation, the key strength of our approach is to reduce the fundamental mismatch of the scoring mechanisms of the partial outputs between training and test. Evaluation on four text generation tasks demonstrate that PPO-MCTS greatly improves the preferability of generated text compared to the standard practice of using only the PPO policy. Our results demonstrate the promise of search algorithms even on top of the aligned language models from PPO, and the under-explored benefit of the value network." @default.
- W4387156632 created "2023-09-30" @default.
- W4387156632 creator A5028127962 @default.
- W4387156632 creator A5030468199 @default.
- W4387156632 creator A5045464993 @default.
- W4387156632 creator A5049167382 @default.
- W4387156632 creator A5075564427 @default.
- W4387156632 creator A5082305994 @default.
- W4387156632 date "2023-09-26" @default.
- W4387156632 modified "2023-09-30" @default.
- W4387156632 title "Making PPO even better: Value-Guided Monte-Carlo Tree Search decoding" @default.
- W4387156632 doi "https://doi.org/10.48550/arxiv.2309.15028" @default.
- W4387156632 hasPublicationYear "2023" @default.
- W4387156632 type Work @default.
- W4387156632 citedByCount "0" @default.
- W4387156632 crossrefType "posted-content" @default.
- W4387156632 hasAuthorship W4387156632A5028127962 @default.
- W4387156632 hasAuthorship W4387156632A5030468199 @default.
- W4387156632 hasAuthorship W4387156632A5045464993 @default.
- W4387156632 hasAuthorship W4387156632A5049167382 @default.
- W4387156632 hasAuthorship W4387156632A5075564427 @default.
- W4387156632 hasAuthorship W4387156632A5082305994 @default.
- W4387156632 hasBestOaLocation W43871566321 @default.
- W4387156632 hasConcept C105795698 @default.
- W4387156632 hasConcept C113174947 @default.
- W4387156632 hasConcept C11413529 @default.
- W4387156632 hasConcept C119857082 @default.
- W4387156632 hasConcept C125583679 @default.
- W4387156632 hasConcept C134306372 @default.
- W4387156632 hasConcept C154945302 @default.
- W4387156632 hasConcept C19499675 @default.
- W4387156632 hasConcept C26517878 @default.
- W4387156632 hasConcept C2776214188 @default.
- W4387156632 hasConcept C2776291640 @default.
- W4387156632 hasConcept C33923547 @default.
- W4387156632 hasConcept C38652104 @default.
- W4387156632 hasConcept C41008148 @default.
- W4387156632 hasConcept C46149586 @default.
- W4387156632 hasConcept C57273362 @default.
- W4387156632 hasConcept C97541855 @default.
- W4387156632 hasConceptScore W4387156632C105795698 @default.
- W4387156632 hasConceptScore W4387156632C113174947 @default.
- W4387156632 hasConceptScore W4387156632C11413529 @default.
- W4387156632 hasConceptScore W4387156632C119857082 @default.
- W4387156632 hasConceptScore W4387156632C125583679 @default.
- W4387156632 hasConceptScore W4387156632C134306372 @default.
- W4387156632 hasConceptScore W4387156632C154945302 @default.
- W4387156632 hasConceptScore W4387156632C19499675 @default.
- W4387156632 hasConceptScore W4387156632C26517878 @default.
- W4387156632 hasConceptScore W4387156632C2776214188 @default.
- W4387156632 hasConceptScore W4387156632C2776291640 @default.
- W4387156632 hasConceptScore W4387156632C33923547 @default.
- W4387156632 hasConceptScore W4387156632C38652104 @default.
- W4387156632 hasConceptScore W4387156632C41008148 @default.
- W4387156632 hasConceptScore W4387156632C46149586 @default.
- W4387156632 hasConceptScore W4387156632C57273362 @default.
- W4387156632 hasConceptScore W4387156632C97541855 @default.
- W4387156632 hasLocation W43871566321 @default.
- W4387156632 hasOpenAccess W4387156632 @default.
- W4387156632 hasPrimaryLocation W43871566321 @default.
- W4387156632 hasRelatedWork W1785631772 @default.
- W4387156632 hasRelatedWork W1980980984 @default.
- W4387156632 hasRelatedWork W1999772761 @default.
- W4387156632 hasRelatedWork W2045617684 @default.
- W4387156632 hasRelatedWork W2303512055 @default.
- W4387156632 hasRelatedWork W3171665292 @default.
- W4387156632 hasRelatedWork W4293057751 @default.
- W4387156632 hasRelatedWork W4319083788 @default.
- W4387156632 hasRelatedWork W4380791871 @default.
- W4387156632 hasRelatedWork W4381571188 @default.
- W4387156632 isParatext "false" @default.
- W4387156632 isRetracted "false" @default.
- W4387156632 workType "article" @default.