Matches in SemOpenAlex for { <https://semopenalex.org/work/W2891364703> ?p ?o ?g. }
Showing items 1 to 91 of
91
with 100 items per page.
- W2891364703 endingPage "4032" @default.
- W2891364703 startingPage "4020" @default.
- W2891364703 abstract "We consider the tasks of feature selection and policy evaluation based on linear value function approximation in reinforcement learning problems. High-dimension feature vectors and limited number of samples can easily cause over-fitting and computation expensive. To prevent this problem, ℓ <sub xmlns:mml=http://www.w3.org/1998/Math/MathML xmlns:xlink=http://www.w3.org/1999/xlink>1</sub> -regularized method obtains sparse solutions and thus improves generalization performance. We propose an efficient ℓ <sub xmlns:mml=http://www.w3.org/1998/Math/MathML xmlns:xlink=http://www.w3.org/1999/xlink>1</sub> -regularized recursive least squares-based online algorithm with O(n <sup xmlns:mml=http://www.w3.org/1998/Math/MathML xmlns:xlink=http://www.w3.org/1999/xlink>2</sup> ) complexity per time-step, termed ℓ <sub xmlns:mml=http://www.w3.org/1998/Math/MathML xmlns:xlink=http://www.w3.org/1999/xlink>1</sub> -RC. With the help of nested optimization decomposition, ℓ <sub xmlns:mml=http://www.w3.org/1998/Math/MathML xmlns:xlink=http://www.w3.org/1999/xlink>1</sub> -RC solves a series of standard optimization problems and avoids minimizing mean squares projected Bellman error with ℓ <sub xmlns:mml=http://www.w3.org/1998/Math/MathML xmlns:xlink=http://www.w3.org/1999/xlink>1</sub> -regularization directly. In ℓ <sub xmlns:mml=http://www.w3.org/1998/Math/MathML xmlns:xlink=http://www.w3.org/1999/xlink>1</sub> -RC, we propose RC with iterative refinement to minimize the operator error, and we propose an alternating direction method of multipliers with proximal operator to minimize the fixed-point error. The convergence of ℓ <sub xmlns:mml=http://www.w3.org/1998/Math/MathML xmlns:xlink=http://www.w3.org/1999/xlink>1</sub> -RC is established based on ordinary differential equation method and some extensions are also given. In empirical computations, some state-of-the-art ℓ <sub xmlns:mml=http://www.w3.org/1998/Math/MathML xmlns:xlink=http://www.w3.org/1999/xlink>1</sub> -regularized methods are chosen as the baselines, and ℓ <sub xmlns:mml=http://www.w3.org/1998/Math/MathML xmlns:xlink=http://www.w3.org/1999/xlink>1</sub> -RC are tested in both policy evaluation and learning control benchmarks. The empirical results show the effectiveness and advantages of ℓ <sub xmlns:mml=http://www.w3.org/1998/Math/MathML xmlns:xlink=http://www.w3.org/1999/xlink>1</sub> -RC." @default.
- W2891364703 created "2018-09-27" @default.
- W2891364703 creator A5013008417 @default.
- W2891364703 creator A5028365881 @default.
- W2891364703 creator A5044835196 @default.
- W2891364703 creator A5064212043 @default.
- W2891364703 date "2020-11-01" @default.
- W2891364703 modified "2023-10-16" @default.
- W2891364703 title "Sparse Proximal Reinforcement Learning via Nested Optimization" @default.
- W2891364703 cites W13294968 @default.
- W2891364703 cites W166862392 @default.
- W2891364703 cites W1904705466 @default.
- W2891364703 cites W1967281400 @default.
- W2891364703 cites W1987725948 @default.
- W2891364703 cites W2008809493 @default.
- W2891364703 cites W2036103676 @default.
- W2891364703 cites W2048687352 @default.
- W2891364703 cites W2071983464 @default.
- W2891364703 cites W2072931156 @default.
- W2891364703 cites W2073384958 @default.
- W2891364703 cites W2075268401 @default.
- W2891364703 cites W2088542838 @default.
- W2891364703 cites W2100677568 @default.
- W2891364703 cites W2104753538 @default.
- W2891364703 cites W2108430473 @default.
- W2891364703 cites W2112264645 @default.
- W2891364703 cites W2113455337 @default.
- W2891364703 cites W2114901408 @default.
- W2891364703 cites W2118556122 @default.
- W2891364703 cites W2121863487 @default.
- W2891364703 cites W2122825543 @default.
- W2891364703 cites W2130005627 @default.
- W2891364703 cites W2134042548 @default.
- W2891364703 cites W2134882417 @default.
- W2891364703 cites W2139418546 @default.
- W2891364703 cites W2141022000 @default.
- W2891364703 cites W2158738729 @default.
- W2891364703 cites W2158883409 @default.
- W2891364703 cites W2177967097 @default.
- W2891364703 cites W2296319761 @default.
- W2891364703 cites W2338719424 @default.
- W2891364703 cites W2509062068 @default.
- W2891364703 cites W2526254636 @default.
- W2891364703 cites W2568019971 @default.
- W2891364703 cites W359568995 @default.
- W2891364703 cites W2110158343 @default.
- W2891364703 doi "https://doi.org/10.1109/tsmc.2018.2865505" @default.
- W2891364703 hasPublicationYear "2020" @default.
- W2891364703 type Work @default.
- W2891364703 sameAs 2891364703 @default.
- W2891364703 citedByCount "10" @default.
- W2891364703 countsByYear W28913647032019 @default.
- W2891364703 countsByYear W28913647032020 @default.
- W2891364703 countsByYear W28913647032022 @default.
- W2891364703 crossrefType "journal-article" @default.
- W2891364703 hasAuthorship W2891364703A5013008417 @default.
- W2891364703 hasAuthorship W2891364703A5028365881 @default.
- W2891364703 hasAuthorship W2891364703A5044835196 @default.
- W2891364703 hasAuthorship W2891364703A5064212043 @default.
- W2891364703 hasConcept C11413529 @default.
- W2891364703 hasConcept C154945302 @default.
- W2891364703 hasConcept C2776135515 @default.
- W2891364703 hasConcept C41008148 @default.
- W2891364703 hasConceptScore W2891364703C11413529 @default.
- W2891364703 hasConceptScore W2891364703C154945302 @default.
- W2891364703 hasConceptScore W2891364703C2776135515 @default.
- W2891364703 hasConceptScore W2891364703C41008148 @default.
- W2891364703 hasFunder F4320321001 @default.
- W2891364703 hasFunder F4320322919 @default.
- W2891364703 hasFunder F4320335787 @default.
- W2891364703 hasIssue "11" @default.
- W2891364703 hasLocation W28913647031 @default.
- W2891364703 hasOpenAccess W2891364703 @default.
- W2891364703 hasPrimaryLocation W28913647031 @default.
- W2891364703 hasRelatedWork W1504109401 @default.
- W2891364703 hasRelatedWork W2142892896 @default.
- W2891364703 hasRelatedWork W2365875115 @default.
- W2891364703 hasRelatedWork W2372025596 @default.
- W2891364703 hasRelatedWork W2386767533 @default.
- W2891364703 hasRelatedWork W2755289022 @default.
- W2891364703 hasRelatedWork W2767424169 @default.
- W2891364703 hasRelatedWork W4288267976 @default.
- W2891364703 hasRelatedWork W4293167047 @default.
- W2891364703 hasRelatedWork W4315706785 @default.
- W2891364703 hasVolume "50" @default.
- W2891364703 isParatext "false" @default.
- W2891364703 isRetracted "false" @default.
- W2891364703 magId "2891364703" @default.
- W2891364703 workType "article" @default.