SemOpenAlex |

SemOpenAlex

Matches in SemOpenAlex for { <https://semopenalex.org/work/W4385644214> ?p ?o ?g. }

Showing items 1 to 79 of 79 with 100 items per page.

W4385644214 abstract "Recent months have seen the emergence of a powerful new trend in which large language models (LLMs) are augmented to become autonomous language agents capable of performing objective oriented multi-step tasks on their own, rather than merely responding to queries from human users. Most existing language agents, however, are not optimized using environment-specific rewards. Although some agents enable iterative refinement through verbal feedback, they do not reason and plan in ways that are compatible with gradient-based learning from rewards. This paper introduces a principled framework for reinforcing large language agents by learning a retrospective model, which automatically tunes the language agent prompts from environment feedback through policy gradient. Specifically, our proposed agent architecture learns from rewards across multiple environments and tasks, for fine-tuning a pre-trained language model which refines the language agent prompt by summarizing the root cause of prior failed attempts and proposing action plans. Experimental results on various tasks demonstrate that the language agents improve over time and that our approach considerably outperforms baselines that do not properly leverage gradients from the environment. This demonstrates that using policy gradient optimization to improve language agents, for which we believe our work is one of the first, seems promising and can be applied to optimize other models in the agent architecture to enhance agent performances over time." @default.
W4385644214 created "2023-08-08" @default.
W4385644214 creator A5000111662 @default.
W4385644214 creator A5002591086 @default.
W4385644214 creator A5009988205 @default.
W4385644214 creator A5018518655 @default.
W4385644214 creator A5026898193 @default.
W4385644214 creator A5032046813 @default.
W4385644214 creator A5034197490 @default.
W4385644214 creator A5042646536 @default.
W4385644214 creator A5051953837 @default.
W4385644214 creator A5056643573 @default.
W4385644214 creator A5062531175 @default.
W4385644214 creator A5062870769 @default.
W4385644214 creator A5063103006 @default.
W4385644214 creator A5067027007 @default.
W4385644214 creator A5090818171 @default.
W4385644214 date "2023-08-04" @default.
W4385644214 modified "2023-09-24" @default.
W4385644214 title "Retroformer: Retrospective Large Language Agents with Policy Gradient Optimization" @default.
W4385644214 doi "https://doi.org/10.48550/arxiv.2308.02151" @default.
W4385644214 hasPublicationYear "2023" @default.
W4385644214 type Work @default.
W4385644214 citedByCount "0" @default.
W4385644214 crossrefType "posted-content" @default.
W4385644214 hasAuthorship W4385644214A5000111662 @default.
W4385644214 hasAuthorship W4385644214A5002591086 @default.
W4385644214 hasAuthorship W4385644214A5009988205 @default.
W4385644214 hasAuthorship W4385644214A5018518655 @default.
W4385644214 hasAuthorship W4385644214A5026898193 @default.
W4385644214 hasAuthorship W4385644214A5032046813 @default.
W4385644214 hasAuthorship W4385644214A5034197490 @default.
W4385644214 hasAuthorship W4385644214A5042646536 @default.
W4385644214 hasAuthorship W4385644214A5051953837 @default.
W4385644214 hasAuthorship W4385644214A5056643573 @default.
W4385644214 hasAuthorship W4385644214A5062531175 @default.
W4385644214 hasAuthorship W4385644214A5062870769 @default.
W4385644214 hasAuthorship W4385644214A5063103006 @default.
W4385644214 hasAuthorship W4385644214A5067027007 @default.
W4385644214 hasAuthorship W4385644214A5090818171 @default.
W4385644214 hasBestOaLocation W43856442141 @default.
W4385644214 hasConcept C119857082 @default.
W4385644214 hasConcept C123657996 @default.
W4385644214 hasConcept C137293760 @default.
W4385644214 hasConcept C142362112 @default.
W4385644214 hasConcept C153083717 @default.
W4385644214 hasConcept C153349607 @default.
W4385644214 hasConcept C154945302 @default.
W4385644214 hasConcept C166957645 @default.
W4385644214 hasConcept C2776505523 @default.
W4385644214 hasConcept C41008148 @default.
W4385644214 hasConcept C95457728 @default.
W4385644214 hasConceptScore W4385644214C119857082 @default.
W4385644214 hasConceptScore W4385644214C123657996 @default.
W4385644214 hasConceptScore W4385644214C137293760 @default.
W4385644214 hasConceptScore W4385644214C142362112 @default.
W4385644214 hasConceptScore W4385644214C153083717 @default.
W4385644214 hasConceptScore W4385644214C153349607 @default.
W4385644214 hasConceptScore W4385644214C154945302 @default.
W4385644214 hasConceptScore W4385644214C166957645 @default.
W4385644214 hasConceptScore W4385644214C2776505523 @default.
W4385644214 hasConceptScore W4385644214C41008148 @default.
W4385644214 hasConceptScore W4385644214C95457728 @default.
W4385644214 hasLocation W43856442141 @default.
W4385644214 hasOpenAccess W4385644214 @default.
W4385644214 hasPrimaryLocation W43856442141 @default.
W4385644214 hasRelatedWork W1989705153 @default.
W4385644214 hasRelatedWork W2961085424 @default.
W4385644214 hasRelatedWork W3046775127 @default.
W4385644214 hasRelatedWork W3170094116 @default.
W4385644214 hasRelatedWork W4205958290 @default.
W4385644214 hasRelatedWork W4285260836 @default.
W4385644214 hasRelatedWork W4286629047 @default.
W4385644214 hasRelatedWork W4306321456 @default.
W4385644214 hasRelatedWork W4306674287 @default.
W4385644214 hasRelatedWork W4224009465 @default.
W4385644214 isParatext "false" @default.
W4385644214 isRetracted "false" @default.
W4385644214 workType "article" @default.