Matches in SemOpenAlex for { <https://semopenalex.org/work/W4383989076> ?p ?o ?g. }
Showing items 1 to 75 of
75
with 100 items per page.
- W4383989076 abstract "The goal of program synthesis, or code generation, is to generate executable code based on given descriptions. Recently, there has been an increasing number of studies employing reinforcement learning (RL) to improve the performance of large language models (LLMs) for code. However, these RL methods have only used offline frameworks, limiting their exploration of new sample spaces. Additionally, current approaches that utilize unit test signals are rather simple, not accounting for specific error locations within the code. To address these issues, we proposed RLTF, i.e., Reinforcement Learning from Unit Test Feedback, a novel online RL framework with unit test feedback of multi-granularity for refining code LLMs. Our approach generates data in real-time during training and simultaneously utilizes fine-grained feedback signals to guide the model towards producing higher-quality code. Extensive experiments show that RLTF achieves state-of-the-art performance on the APPS and the MBPP benchmarks. Our code can be found at: https://github.com/Zyq-scut/RLTF." @default.
- W4383989076 created "2023-07-12" @default.
- W4383989076 creator A5013642582 @default.
- W4383989076 creator A5017615143 @default.
- W4383989076 creator A5023152887 @default.
- W4383989076 creator A5037539383 @default.
- W4383989076 creator A5054067619 @default.
- W4383989076 creator A5071548562 @default.
- W4383989076 creator A5073681676 @default.
- W4383989076 date "2023-07-10" @default.
- W4383989076 modified "2023-09-23" @default.
- W4383989076 title "RLTF: Reinforcement Learning from Unit Test Feedback" @default.
- W4383989076 doi "https://doi.org/10.48550/arxiv.2307.04349" @default.
- W4383989076 hasPublicationYear "2023" @default.
- W4383989076 type Work @default.
- W4383989076 citedByCount "0" @default.
- W4383989076 crossrefType "posted-content" @default.
- W4383989076 hasAuthorship W4383989076A5013642582 @default.
- W4383989076 hasAuthorship W4383989076A5017615143 @default.
- W4383989076 hasAuthorship W4383989076A5023152887 @default.
- W4383989076 hasAuthorship W4383989076A5037539383 @default.
- W4383989076 hasAuthorship W4383989076A5054067619 @default.
- W4383989076 hasAuthorship W4383989076A5071548562 @default.
- W4383989076 hasAuthorship W4383989076A5073681676 @default.
- W4383989076 hasBestOaLocation W43839890761 @default.
- W4383989076 hasConcept C111472728 @default.
- W4383989076 hasConcept C119857082 @default.
- W4383989076 hasConcept C122637931 @default.
- W4383989076 hasConcept C138885662 @default.
- W4383989076 hasConcept C145420912 @default.
- W4383989076 hasConcept C148027188 @default.
- W4383989076 hasConcept C154945302 @default.
- W4383989076 hasConcept C160145156 @default.
- W4383989076 hasConcept C177264268 @default.
- W4383989076 hasConcept C177774035 @default.
- W4383989076 hasConcept C199360897 @default.
- W4383989076 hasConcept C2776760102 @default.
- W4383989076 hasConcept C2777904410 @default.
- W4383989076 hasConcept C2779530757 @default.
- W4383989076 hasConcept C33923547 @default.
- W4383989076 hasConcept C41008148 @default.
- W4383989076 hasConcept C97541855 @default.
- W4383989076 hasConceptScore W4383989076C111472728 @default.
- W4383989076 hasConceptScore W4383989076C119857082 @default.
- W4383989076 hasConceptScore W4383989076C122637931 @default.
- W4383989076 hasConceptScore W4383989076C138885662 @default.
- W4383989076 hasConceptScore W4383989076C145420912 @default.
- W4383989076 hasConceptScore W4383989076C148027188 @default.
- W4383989076 hasConceptScore W4383989076C154945302 @default.
- W4383989076 hasConceptScore W4383989076C160145156 @default.
- W4383989076 hasConceptScore W4383989076C177264268 @default.
- W4383989076 hasConceptScore W4383989076C177774035 @default.
- W4383989076 hasConceptScore W4383989076C199360897 @default.
- W4383989076 hasConceptScore W4383989076C2776760102 @default.
- W4383989076 hasConceptScore W4383989076C2777904410 @default.
- W4383989076 hasConceptScore W4383989076C2779530757 @default.
- W4383989076 hasConceptScore W4383989076C33923547 @default.
- W4383989076 hasConceptScore W4383989076C41008148 @default.
- W4383989076 hasConceptScore W4383989076C97541855 @default.
- W4383989076 hasLocation W43839890761 @default.
- W4383989076 hasOpenAccess W4383989076 @default.
- W4383989076 hasPrimaryLocation W43839890761 @default.
- W4383989076 hasRelatedWork W1594844924 @default.
- W4383989076 hasRelatedWork W1969324738 @default.
- W4383989076 hasRelatedWork W2063468672 @default.
- W4383989076 hasRelatedWork W2909382770 @default.
- W4383989076 hasRelatedWork W3022038857 @default.
- W4383989076 hasRelatedWork W3214609260 @default.
- W4383989076 hasRelatedWork W4285742602 @default.
- W4383989076 hasRelatedWork W4319083788 @default.
- W4383989076 hasRelatedWork W2583313884 @default.
- W4383989076 hasRelatedWork W3089673996 @default.
- W4383989076 isParatext "false" @default.
- W4383989076 isRetracted "false" @default.
- W4383989076 workType "article" @default.