Matches in SemOpenAlex for { <https://semopenalex.org/work/W4386269358> ?p ?o ?g. }
Showing items 1 to 71 of
71
with 100 items per page.
- W4386269358 abstract "Large language models (LLMs) have recently demonstrated their impressive ability to provide context-aware responses via text. This ability could potentially be used to predict plausible solutions in sequential decision making tasks pertaining to pattern completion. For example, by observing a partial stack of cubes, LLMs can predict the correct sequence in which the remaining cubes should be stacked by extrapolating the observed patterns (e.g., cube sizes, colors or other attributes) in the partial stack. In this work, we introduce LaGR (Language-Guided Reinforcement learning), which uses this predictive ability of LLMs to propose solutions to tasks that have been partially completed by a primary reinforcement learning (RL) agent, in order to subsequently guide the latter's training. However, as RL training is generally not sample-efficient, deploying this approach would inherently imply that the LLM be repeatedly queried for solutions; a process that can be expensive and infeasible. To address this issue, we introduce SEQ (sample efficient querying), where we simultaneously train a secondary RL agent to decide when the LLM should be queried for solutions. Specifically, we use the quality of the solutions emanating from the LLM as the reward to train this agent. We show that our proposed framework LaGR-SEQ enables more efficient primary RL training, while simultaneously minimizing the number of queries to the LLM. We demonstrate our approach on a series of tasks and highlight the advantages of our approach, along with its limitations and potential future research directions." @default.
- W4386269358 created "2023-08-31" @default.
- W4386269358 creator A5011012522 @default.
- W4386269358 creator A5024215125 @default.
- W4386269358 creator A5043916805 @default.
- W4386269358 creator A5045540854 @default.
- W4386269358 creator A5078515449 @default.
- W4386269358 creator A5085471517 @default.
- W4386269358 creator A5090916970 @default.
- W4386269358 date "2023-08-20" @default.
- W4386269358 modified "2023-09-26" @default.
- W4386269358 title "LaGR-SEQ: Language-Guided Reinforcement Learning with Sample-Efficient Querying" @default.
- W4386269358 doi "https://doi.org/10.48550/arxiv.2308.13542" @default.
- W4386269358 hasPublicationYear "2023" @default.
- W4386269358 type Work @default.
- W4386269358 citedByCount "0" @default.
- W4386269358 crossrefType "posted-content" @default.
- W4386269358 hasAuthorship W4386269358A5011012522 @default.
- W4386269358 hasAuthorship W4386269358A5024215125 @default.
- W4386269358 hasAuthorship W4386269358A5043916805 @default.
- W4386269358 hasAuthorship W4386269358A5045540854 @default.
- W4386269358 hasAuthorship W4386269358A5078515449 @default.
- W4386269358 hasAuthorship W4386269358A5085471517 @default.
- W4386269358 hasAuthorship W4386269358A5090916970 @default.
- W4386269358 hasBestOaLocation W43862693581 @default.
- W4386269358 hasConcept C119857082 @default.
- W4386269358 hasConcept C151730666 @default.
- W4386269358 hasConcept C154945302 @default.
- W4386269358 hasConcept C185592680 @default.
- W4386269358 hasConcept C198531522 @default.
- W4386269358 hasConcept C199360897 @default.
- W4386269358 hasConcept C2778112365 @default.
- W4386269358 hasConcept C2779343474 @default.
- W4386269358 hasConcept C41008148 @default.
- W4386269358 hasConcept C43617362 @default.
- W4386269358 hasConcept C54355233 @default.
- W4386269358 hasConcept C86803240 @default.
- W4386269358 hasConcept C9395851 @default.
- W4386269358 hasConcept C97541855 @default.
- W4386269358 hasConcept C98045186 @default.
- W4386269358 hasConceptScore W4386269358C119857082 @default.
- W4386269358 hasConceptScore W4386269358C151730666 @default.
- W4386269358 hasConceptScore W4386269358C154945302 @default.
- W4386269358 hasConceptScore W4386269358C185592680 @default.
- W4386269358 hasConceptScore W4386269358C198531522 @default.
- W4386269358 hasConceptScore W4386269358C199360897 @default.
- W4386269358 hasConceptScore W4386269358C2778112365 @default.
- W4386269358 hasConceptScore W4386269358C2779343474 @default.
- W4386269358 hasConceptScore W4386269358C41008148 @default.
- W4386269358 hasConceptScore W4386269358C43617362 @default.
- W4386269358 hasConceptScore W4386269358C54355233 @default.
- W4386269358 hasConceptScore W4386269358C86803240 @default.
- W4386269358 hasConceptScore W4386269358C9395851 @default.
- W4386269358 hasConceptScore W4386269358C97541855 @default.
- W4386269358 hasConceptScore W4386269358C98045186 @default.
- W4386269358 hasLocation W43862693581 @default.
- W4386269358 hasOpenAccess W4386269358 @default.
- W4386269358 hasPrimaryLocation W43862693581 @default.
- W4386269358 hasRelatedWork W2348126836 @default.
- W4386269358 hasRelatedWork W260766989 @default.
- W4386269358 hasRelatedWork W2959276766 @default.
- W4386269358 hasRelatedWork W2961085424 @default.
- W4386269358 hasRelatedWork W3074294383 @default.
- W4386269358 hasRelatedWork W3139193008 @default.
- W4386269358 hasRelatedWork W4206669594 @default.
- W4386269358 hasRelatedWork W4295941380 @default.
- W4386269358 hasRelatedWork W4306674287 @default.
- W4386269358 hasRelatedWork W4319083788 @default.
- W4386269358 isParatext "false" @default.
- W4386269358 isRetracted "false" @default.
- W4386269358 workType "article" @default.