Matches in SemOpenAlex for { <https://semopenalex.org/work/W4287118006> ?p ?o ?g. }
Showing items 1 to 67 of
67
with 100 items per page.
- W4287118006 abstract "Recent development of Deep Reinforcement Learning (DRL) has demonstrated superior performance of neural networks in solving challenging problems with large or even continuous state spaces. One specific approach is to deploy neural networks to approximate value functions by minimising the Mean Squared Bellman Error (MSBE) function. Despite great successes of DRL, development of reliable and efficient numerical algorithms to minimise the MSBE is still of great scientific interest and practical demand. Such a challenge is partially due to the underlying optimisation problem being highly non-convex or using incomplete gradient information as done in Semi-Gradient algorithms. In this work, we analyse the MSBE from a smooth optimisation perspective and develop an efficient Approximate Newton's algorithm. First, we conduct a critical point analysis of the error function and provide technical insights on optimisation and design choices for neural networks. When the existence of global minima is assumed and the objective fulfils certain conditions, suboptimal local minima can be avoided when using over-parametrised neural networks. We construct a Gauss Newton Residual Gradient algorithm based on the analysis in two variations. The first variation applies to discrete state spaces and exact learning. We confirm theoretical properties of this algorithm such as being locally quadratically convergent to a global minimum numerically. The second employs sampling and can be used in the continuous setting. We demonstrate feasibility and generalisation capabilities of the proposed algorithm empirically using continuous control problems and provide a numerical verification of our critical point analysis. We outline the difficulties of combining Semi-Gradient approaches with Hessian information. To benefit from second-order information complete derivatives of the MSBE must be considered during training." @default.
- W4287118006 created "2022-07-25" @default.
- W4287118006 creator A5020980906 @default.
- W4287118006 creator A5065309879 @default.
- W4287118006 creator A5067323219 @default.
- W4287118006 creator A5074346065 @default.
- W4287118006 date "2021-06-16" @default.
- W4287118006 modified "2023-09-23" @default.
- W4287118006 title "Analysis and Optimisation of Bellman Residual Errors with Neural Function Approximation" @default.
- W4287118006 doi "https://doi.org/10.48550/arxiv.2106.08774" @default.
- W4287118006 hasPublicationYear "2021" @default.
- W4287118006 type Work @default.
- W4287118006 citedByCount "0" @default.
- W4287118006 crossrefType "posted-content" @default.
- W4287118006 hasAuthorship W4287118006A5020980906 @default.
- W4287118006 hasAuthorship W4287118006A5065309879 @default.
- W4287118006 hasAuthorship W4287118006A5067323219 @default.
- W4287118006 hasAuthorship W4287118006A5074346065 @default.
- W4287118006 hasBestOaLocation W42871180061 @default.
- W4287118006 hasConcept C11413529 @default.
- W4287118006 hasConcept C115680565 @default.
- W4287118006 hasConcept C126255220 @default.
- W4287118006 hasConcept C134306372 @default.
- W4287118006 hasConcept C14036430 @default.
- W4287118006 hasConcept C153258448 @default.
- W4287118006 hasConcept C154945302 @default.
- W4287118006 hasConcept C155512373 @default.
- W4287118006 hasConcept C186633575 @default.
- W4287118006 hasConcept C33923547 @default.
- W4287118006 hasConcept C41008148 @default.
- W4287118006 hasConcept C50644808 @default.
- W4287118006 hasConcept C78458016 @default.
- W4287118006 hasConcept C86803240 @default.
- W4287118006 hasConcept C91873725 @default.
- W4287118006 hasConcept C97541855 @default.
- W4287118006 hasConceptScore W4287118006C11413529 @default.
- W4287118006 hasConceptScore W4287118006C115680565 @default.
- W4287118006 hasConceptScore W4287118006C126255220 @default.
- W4287118006 hasConceptScore W4287118006C134306372 @default.
- W4287118006 hasConceptScore W4287118006C14036430 @default.
- W4287118006 hasConceptScore W4287118006C153258448 @default.
- W4287118006 hasConceptScore W4287118006C154945302 @default.
- W4287118006 hasConceptScore W4287118006C155512373 @default.
- W4287118006 hasConceptScore W4287118006C186633575 @default.
- W4287118006 hasConceptScore W4287118006C33923547 @default.
- W4287118006 hasConceptScore W4287118006C41008148 @default.
- W4287118006 hasConceptScore W4287118006C50644808 @default.
- W4287118006 hasConceptScore W4287118006C78458016 @default.
- W4287118006 hasConceptScore W4287118006C86803240 @default.
- W4287118006 hasConceptScore W4287118006C91873725 @default.
- W4287118006 hasConceptScore W4287118006C97541855 @default.
- W4287118006 hasLocation W42871180061 @default.
- W4287118006 hasOpenAccess W4287118006 @default.
- W4287118006 hasPrimaryLocation W42871180061 @default.
- W4287118006 hasRelatedWork W1508245070 @default.
- W4287118006 hasRelatedWork W1530191702 @default.
- W4287118006 hasRelatedWork W1646707810 @default.
- W4287118006 hasRelatedWork W1999466943 @default.
- W4287118006 hasRelatedWork W2018500972 @default.
- W4287118006 hasRelatedWork W2113517874 @default.
- W4287118006 hasRelatedWork W2373152179 @default.
- W4287118006 hasRelatedWork W2391119003 @default.
- W4287118006 hasRelatedWork W3184324323 @default.
- W4287118006 hasRelatedWork W4287118006 @default.
- W4287118006 isParatext "false" @default.
- W4287118006 isRetracted "false" @default.
- W4287118006 workType "article" @default.