Matches in SemOpenAlex for { <https://semopenalex.org/work/W4380714532> ?p ?o ?g. }
Showing items 1 to 51 of
51
with 100 items per page.
- W4380714532 abstract "When using adversarial training, it is common practice to train against the most egregious failures. However, this might imply using examples with sensitive information (such as leaked passwords or security vulnerabilities) as training data. One might assume that language models trained with gradient descent never generate text snippets which were only present in examples associated with the lowest possible reward. In this paper, we show that this assumption is wrong: in some situations, large language models do learn from such negatively-reinforced examples. We present a specific training setup that enables Pythia-160M to guess passwords 13% more often than it would by guessing randomly, despite only showing it these passwords on examples where the model is incentivized to not output these passwords. Our code is available at www.github.com/FabienRoger/Learning-From-Negative-Examples" @default.
- W4380714532 created "2023-06-15" @default.
- W4380714532 creator A5028042353 @default.
- W4380714532 date "2023-06-13" @default.
- W4380714532 modified "2023-10-16" @default.
- W4380714532 title "Large Language Models Sometimes Generate Purely Negatively-Reinforced Text" @default.
- W4380714532 doi "https://doi.org/10.48550/arxiv.2306.07567" @default.
- W4380714532 hasPublicationYear "2023" @default.
- W4380714532 type Work @default.
- W4380714532 citedByCount "0" @default.
- W4380714532 crossrefType "posted-content" @default.
- W4380714532 hasAuthorship W4380714532A5028042353 @default.
- W4380714532 hasBestOaLocation W43807145321 @default.
- W4380714532 hasConcept C109297577 @default.
- W4380714532 hasConcept C119857082 @default.
- W4380714532 hasConcept C137293760 @default.
- W4380714532 hasConcept C154945302 @default.
- W4380714532 hasConcept C177264268 @default.
- W4380714532 hasConcept C199360897 @default.
- W4380714532 hasConcept C204321447 @default.
- W4380714532 hasConcept C2776760102 @default.
- W4380714532 hasConcept C37736160 @default.
- W4380714532 hasConcept C38652104 @default.
- W4380714532 hasConcept C41008148 @default.
- W4380714532 hasConceptScore W4380714532C109297577 @default.
- W4380714532 hasConceptScore W4380714532C119857082 @default.
- W4380714532 hasConceptScore W4380714532C137293760 @default.
- W4380714532 hasConceptScore W4380714532C154945302 @default.
- W4380714532 hasConceptScore W4380714532C177264268 @default.
- W4380714532 hasConceptScore W4380714532C199360897 @default.
- W4380714532 hasConceptScore W4380714532C204321447 @default.
- W4380714532 hasConceptScore W4380714532C2776760102 @default.
- W4380714532 hasConceptScore W4380714532C37736160 @default.
- W4380714532 hasConceptScore W4380714532C38652104 @default.
- W4380714532 hasConceptScore W4380714532C41008148 @default.
- W4380714532 hasLocation W43807145321 @default.
- W4380714532 hasOpenAccess W4380714532 @default.
- W4380714532 hasPrimaryLocation W43807145321 @default.
- W4380714532 hasRelatedWork W142374489 @default.
- W4380714532 hasRelatedWork W1803932089 @default.
- W4380714532 hasRelatedWork W1985007624 @default.
- W4380714532 hasRelatedWork W2176369193 @default.
- W4380714532 hasRelatedWork W2359001871 @default.
- W4380714532 hasRelatedWork W2972280650 @default.
- W4380714532 hasRelatedWork W3107474891 @default.
- W4380714532 hasRelatedWork W4281395811 @default.
- W4380714532 hasRelatedWork W4379255972 @default.
- W4380714532 hasRelatedWork W2584532118 @default.
- W4380714532 isParatext "false" @default.
- W4380714532 isRetracted "false" @default.
- W4380714532 workType "article" @default.