Matches in SemOpenAlex for { <https://semopenalex.org/work/W3046918767> ?p ?o ?g. }
- W3046918767 abstract "We study the global convergence and global optimality of actor-critic, one of the most popular families of reinforcement learning algorithms. While most existing works on actor-critic employ bi-level or two-timescale updates, we focus on the more practical single-timescale setting, where the actor and critic are updated simultaneously. Specifically, in each iteration, the critic update is obtained by applying the Bellman evaluation operator only once while the actor is updated in the policy gradient direction computed using the critic. Moreover, we consider two function approximation settings where both the actor and critic are represented by linear or deep neural networks. For both cases, we prove that the actor sequence converges to a globally optimal policy at a sublinear $O(K^{-1/2})$ rate, where $K$ is the number of iterations. To the best of our knowledge, we establish the rate of convergence and global optimality of single-timescale actor-critic with linear function approximation for the first time. Moreover, under the broader scope of policy optimization with nonlinear function approximation, we prove that actor-critic with deep neural network finds the globally optimal policy at a sublinear rate for the first time." @default.
- W3046918767 created "2020-08-07" @default.
- W3046918767 creator A5025567827 @default.
- W3046918767 creator A5048272675 @default.
- W3046918767 creator A5078210646 @default.
- W3046918767 date "2020-08-02" @default.
- W3046918767 modified "2023-10-17" @default.
- W3046918767 title "Single-Timescale Actor-Critic Provably Finds Globally Optimal Policy" @default.
- W3046918767 cites W1503230535 @default.
- W3046918767 cites W1515851193 @default.
- W3046918767 cites W1938444229 @default.
- W3046918767 cites W2027184806 @default.
- W3046918767 cites W2047364871 @default.
- W3046918767 cites W2072931156 @default.
- W3046918767 cites W2094387729 @default.
- W3046918767 cites W2100677568 @default.
- W3046918767 cites W2104753538 @default.
- W3046918767 cites W2117355432 @default.
- W3046918767 cites W2119567691 @default.
- W3046918767 cites W2123917165 @default.
- W3046918767 cites W2125612430 @default.
- W3046918767 cites W2128812357 @default.
- W3046918767 cites W2130801532 @default.
- W3046918767 cites W2136602922 @default.
- W3046918767 cites W2141091023 @default.
- W3046918767 cites W2151416233 @default.
- W3046918767 cites W2154761920 @default.
- W3046918767 cites W2155027007 @default.
- W3046918767 cites W2156737235 @default.
- W3046918767 cites W2158738729 @default.
- W3046918767 cites W2161270100 @default.
- W3046918767 cites W2520501711 @default.
- W3046918767 cites W2557283755 @default.
- W3046918767 cites W2580175322 @default.
- W3046918767 cites W2736601468 @default.
- W3046918767 cites W2739559388 @default.
- W3046918767 cites W2766447205 @default.
- W3046918767 cites W2781726626 @default.
- W3046918767 cites W2788366696 @default.
- W3046918767 cites W2809090039 @default.
- W3046918767 cites W2899748887 @default.
- W3046918767 cites W2900103278 @default.
- W3046918767 cites W2900959181 @default.
- W3046918767 cites W2904838594 @default.
- W3046918767 cites W2907626093 @default.
- W3046918767 cites W2911867426 @default.
- W3046918767 cites W2914811257 @default.
- W3046918767 cites W2919115771 @default.
- W3046918767 cites W2945496654 @default.
- W3046918767 cites W2946840143 @default.
- W3046918767 cites W2948432982 @default.
- W3046918767 cites W2949099147 @default.
- W3046918767 cites W2949608212 @default.
- W3046918767 cites W2949804919 @default.
- W3046918767 cites W2951207584 @default.
- W3046918767 cites W2951576642 @default.
- W3046918767 cites W2951915386 @default.
- W3046918767 cites W2951990408 @default.
- W3046918767 cites W2952500758 @default.
- W3046918767 cites W2956068307 @default.
- W3046918767 cites W2958746411 @default.
- W3046918767 cites W2960567166 @default.
- W3046918767 cites W2962785728 @default.
- W3046918767 cites W2963774238 @default.
- W3046918767 cites W2964043796 @default.
- W3046918767 cites W2964990165 @default.
- W3046918767 cites W2970032917 @default.
- W3046918767 cites W2970128053 @default.
- W3046918767 cites W2970355847 @default.
- W3046918767 cites W2970999177 @default.
- W3046918767 cites W2971026276 @default.
- W3046918767 cites W2971587637 @default.
- W3046918767 cites W2980452497 @default.
- W3046918767 cites W2981030070 @default.
- W3046918767 cites W2981972696 @default.
- W3046918767 cites W2991522342 @default.
- W3046918767 cites W3000543388 @default.
- W3046918767 cites W3021175792 @default.
- W3046918767 cites W3023564296 @default.
- W3046918767 cites W3034871777 @default.
- W3046918767 cites W3039845099 @default.
- W3046918767 cites W3041129870 @default.
- W3046918767 cites W3044451384 @default.
- W3046918767 cites W3100944043 @default.
- W3046918767 cites W3129154373 @default.
- W3046918767 cites W3136903997 @default.
- W3046918767 cites W594357522 @default.
- W3046918767 cites W3007576019 @default.
- W3046918767 doi "https://doi.org/10.48550/arxiv.2008.00483" @default.
- W3046918767 hasPublicationYear "2020" @default.
- W3046918767 type Work @default.
- W3046918767 sameAs 3046918767 @default.
- W3046918767 citedByCount "12" @default.
- W3046918767 countsByYear W30469187672020 @default.
- W3046918767 countsByYear W30469187672021 @default.
- W3046918767 crossrefType "posted-content" @default.
- W3046918767 hasAuthorship W3046918767A5025567827 @default.
- W3046918767 hasAuthorship W3046918767A5048272675 @default.
- W3046918767 hasAuthorship W3046918767A5078210646 @default.
- W3046918767 hasBestOaLocation W30469187671 @default.