Matches in SemOpenAlex for { <https://semopenalex.org/work/W3085121432> ?p ?o ?g. }
- W3085121432 abstract "In many real world applications, reinforcement learning agents have to optimize multiple objectives while following certain rules or satisfying a list of constraints. Classical methods based on reward shaping, i.e. a weighted combination of different objectives in the reward signal, or Lagrangian methods, including constraints in the loss function, have no guarantees that the agent satisfies the constraints at all points in time and can lead to undesired behavior. When a discrete policy is extracted from an action-value function, safe actions can be ensured by restricting the action space at maximization, but can lead to sub-optimal solutions among feasible alternatives. In this work, we propose Constrained Q-learning, a novel off-policy reinforcement learning framework restricting the action space directly in the Q-update to learn the optimal Q-function for the induced constrained MDP and the corresponding safe policy. In addition to single-step constraints referring only to the next action, we introduce a formulation for approximate multi-step constraints under the current target policy based on truncated value-functions. We analyze the advantages of Constrained Q-learning in the tabular case and compare Constrained DQN to reward shaping and Lagrangian methods in the application of high-level decision making in autonomous driving, considering constraints for safety, keeping right and comfort. We train our agent in the open-source simulator SUMO and on the real HighD data set." @default.
- W3085121432 created "2020-09-21" @default.
- W3085121432 creator A5027250190 @default.
- W3085121432 creator A5033350585 @default.
- W3085121432 creator A5038908529 @default.
- W3085121432 creator A5072387211 @default.
- W3085121432 date "2020-03-20" @default.
- W3085121432 modified "2023-09-26" @default.
- W3085121432 title "Deep Constrained Q-learning" @default.
- W3085121432 cites W1518931405 @default.
- W3085121432 cites W1965455100 @default.
- W3085121432 cites W2002305926 @default.
- W3085121432 cites W2070570138 @default.
- W3085121432 cites W2073314543 @default.
- W3085121432 cites W2117626647 @default.
- W3085121432 cites W2119567691 @default.
- W3085121432 cites W2145339207 @default.
- W3085121432 cites W2158782408 @default.
- W3085121432 cites W2257979135 @default.
- W3085121432 cites W2787196707 @default.
- W3085121432 cites W2798766386 @default.
- W3085121432 cites W2896642734 @default.
- W3085121432 cites W2904263972 @default.
- W3085121432 cites W2910781732 @default.
- W3085121432 cites W2962803570 @default.
- W3085121432 cites W2963430173 @default.
- W3085121432 cites W2963704132 @default.
- W3085121432 cites W2963758131 @default.
- W3085121432 cites W2964161785 @default.
- W3085121432 cites W2964222567 @default.
- W3085121432 cites W2966735560 @default.
- W3085121432 cites W2967193622 @default.
- W3085121432 cites W2968185997 @default.
- W3085121432 cites W2976035615 @default.
- W3085121432 cites W3005329984 @default.
- W3085121432 cites W3049143319 @default.
- W3085121432 cites W3091528360 @default.
- W3085121432 cites W3102797050 @default.
- W3085121432 cites W3103780890 @default.
- W3085121432 hasPublicationYear "2020" @default.
- W3085121432 type Work @default.
- W3085121432 sameAs 3085121432 @default.
- W3085121432 citedByCount "0" @default.
- W3085121432 crossrefType "posted-content" @default.
- W3085121432 hasAuthorship W3085121432A5027250190 @default.
- W3085121432 hasAuthorship W3085121432A5033350585 @default.
- W3085121432 hasAuthorship W3085121432A5038908529 @default.
- W3085121432 hasAuthorship W3085121432A5072387211 @default.
- W3085121432 hasConcept C111919701 @default.
- W3085121432 hasConcept C119857082 @default.
- W3085121432 hasConcept C121332964 @default.
- W3085121432 hasConcept C126255220 @default.
- W3085121432 hasConcept C14036430 @default.
- W3085121432 hasConcept C14646407 @default.
- W3085121432 hasConcept C154945302 @default.
- W3085121432 hasConcept C177264268 @default.
- W3085121432 hasConcept C188116033 @default.
- W3085121432 hasConcept C199360897 @default.
- W3085121432 hasConcept C2776291640 @default.
- W3085121432 hasConcept C2776330181 @default.
- W3085121432 hasConcept C2778572836 @default.
- W3085121432 hasConcept C2780791683 @default.
- W3085121432 hasConcept C33923547 @default.
- W3085121432 hasConcept C41008148 @default.
- W3085121432 hasConcept C62520636 @default.
- W3085121432 hasConcept C78458016 @default.
- W3085121432 hasConcept C86803240 @default.
- W3085121432 hasConcept C97541855 @default.
- W3085121432 hasConceptScore W3085121432C111919701 @default.
- W3085121432 hasConceptScore W3085121432C119857082 @default.
- W3085121432 hasConceptScore W3085121432C121332964 @default.
- W3085121432 hasConceptScore W3085121432C126255220 @default.
- W3085121432 hasConceptScore W3085121432C14036430 @default.
- W3085121432 hasConceptScore W3085121432C14646407 @default.
- W3085121432 hasConceptScore W3085121432C154945302 @default.
- W3085121432 hasConceptScore W3085121432C177264268 @default.
- W3085121432 hasConceptScore W3085121432C188116033 @default.
- W3085121432 hasConceptScore W3085121432C199360897 @default.
- W3085121432 hasConceptScore W3085121432C2776291640 @default.
- W3085121432 hasConceptScore W3085121432C2776330181 @default.
- W3085121432 hasConceptScore W3085121432C2778572836 @default.
- W3085121432 hasConceptScore W3085121432C2780791683 @default.
- W3085121432 hasConceptScore W3085121432C33923547 @default.
- W3085121432 hasConceptScore W3085121432C41008148 @default.
- W3085121432 hasConceptScore W3085121432C62520636 @default.
- W3085121432 hasConceptScore W3085121432C78458016 @default.
- W3085121432 hasConceptScore W3085121432C86803240 @default.
- W3085121432 hasConceptScore W3085121432C97541855 @default.
- W3085121432 hasLocation W30851214321 @default.
- W3085121432 hasOpenAccess W3085121432 @default.
- W3085121432 hasPrimaryLocation W30851214321 @default.
- W3085121432 hasRelatedWork W1545422271 @default.
- W3085121432 hasRelatedWork W22619879 @default.
- W3085121432 hasRelatedWork W2565610523 @default.
- W3085121432 hasRelatedWork W2798588334 @default.
- W3085121432 hasRelatedWork W2809317103 @default.
- W3085121432 hasRelatedWork W2889634868 @default.
- W3085121432 hasRelatedWork W2921611804 @default.
- W3085121432 hasRelatedWork W2972586385 @default.
- W3085121432 hasRelatedWork W2985692297 @default.