Matches in SemOpenAlex for { <https://semopenalex.org/work/W4379538424> ?p ?o ?g. }
Showing items 1 to 75 of
75
with 100 items per page.
- W4379538424 abstract "In this paper, we study the problem of (finite horizon tabular) Markov decision processes (MDPs) with heavy-tailed rewards under the constraint of differential privacy (DP). Compared with the previous studies for private reinforcement learning that typically assume rewards are sampled from some bounded or sub-Gaussian distributions to ensure DP, we consider the setting where reward distributions have only finite $(1+v)$-th moments with some $v in (0,1]$. By resorting to robust mean estimators for rewards, we first propose two frameworks for heavy-tailed MDPs, i.e., one is for value iteration and another is for policy optimization. Under each framework, we consider both joint differential privacy (JDP) and local differential privacy (LDP) models. Based on our frameworks, we provide regret upper bounds for both JDP and LDP cases and show that the moment of distribution and privacy budget both have significant impacts on regrets. Finally, we establish a lower bound of regret minimization for heavy-tailed MDPs in JDP model by reducing it to the instance-independent lower bound of heavy-tailed multi-armed bandits in DP model. We also show the lower bound for the problem in LDP by adopting some private minimax methods. Our results reveal that there are fundamental differences between the problem of private RL with sub-Gaussian and that with heavy-tailed rewards." @default.
- W4379538424 created "2023-06-07" @default.
- W4379538424 creator A5012987620 @default.
- W4379538424 creator A5051891202 @default.
- W4379538424 creator A5052162709 @default.
- W4379538424 creator A5052304130 @default.
- W4379538424 date "2023-06-01" @default.
- W4379538424 modified "2023-10-16" @default.
- W4379538424 title "Differentially Private Episodic Reinforcement Learning with Heavy-tailed Rewards" @default.
- W4379538424 doi "https://doi.org/10.48550/arxiv.2306.01121" @default.
- W4379538424 hasPublicationYear "2023" @default.
- W4379538424 type Work @default.
- W4379538424 citedByCount "0" @default.
- W4379538424 crossrefType "posted-content" @default.
- W4379538424 hasAuthorship W4379538424A5012987620 @default.
- W4379538424 hasAuthorship W4379538424A5051891202 @default.
- W4379538424 hasAuthorship W4379538424A5052162709 @default.
- W4379538424 hasAuthorship W4379538424A5052304130 @default.
- W4379538424 hasBestOaLocation W43795384241 @default.
- W4379538424 hasConcept C105795698 @default.
- W4379538424 hasConcept C106189395 @default.
- W4379538424 hasConcept C11413529 @default.
- W4379538424 hasConcept C119857082 @default.
- W4379538424 hasConcept C121332964 @default.
- W4379538424 hasConcept C126255220 @default.
- W4379538424 hasConcept C134306372 @default.
- W4379538424 hasConcept C149728462 @default.
- W4379538424 hasConcept C154945302 @default.
- W4379538424 hasConcept C159886148 @default.
- W4379538424 hasConcept C163716315 @default.
- W4379538424 hasConcept C185429906 @default.
- W4379538424 hasConcept C23130292 @default.
- W4379538424 hasConcept C33923547 @default.
- W4379538424 hasConcept C34388435 @default.
- W4379538424 hasConcept C41008148 @default.
- W4379538424 hasConcept C50817715 @default.
- W4379538424 hasConcept C62520636 @default.
- W4379538424 hasConcept C77553402 @default.
- W4379538424 hasConcept C97541855 @default.
- W4379538424 hasConceptScore W4379538424C105795698 @default.
- W4379538424 hasConceptScore W4379538424C106189395 @default.
- W4379538424 hasConceptScore W4379538424C11413529 @default.
- W4379538424 hasConceptScore W4379538424C119857082 @default.
- W4379538424 hasConceptScore W4379538424C121332964 @default.
- W4379538424 hasConceptScore W4379538424C126255220 @default.
- W4379538424 hasConceptScore W4379538424C134306372 @default.
- W4379538424 hasConceptScore W4379538424C149728462 @default.
- W4379538424 hasConceptScore W4379538424C154945302 @default.
- W4379538424 hasConceptScore W4379538424C159886148 @default.
- W4379538424 hasConceptScore W4379538424C163716315 @default.
- W4379538424 hasConceptScore W4379538424C185429906 @default.
- W4379538424 hasConceptScore W4379538424C23130292 @default.
- W4379538424 hasConceptScore W4379538424C33923547 @default.
- W4379538424 hasConceptScore W4379538424C34388435 @default.
- W4379538424 hasConceptScore W4379538424C41008148 @default.
- W4379538424 hasConceptScore W4379538424C50817715 @default.
- W4379538424 hasConceptScore W4379538424C62520636 @default.
- W4379538424 hasConceptScore W4379538424C77553402 @default.
- W4379538424 hasConceptScore W4379538424C97541855 @default.
- W4379538424 hasLocation W43795384241 @default.
- W4379538424 hasOpenAccess W4379538424 @default.
- W4379538424 hasPrimaryLocation W43795384241 @default.
- W4379538424 hasRelatedWork W1850488217 @default.
- W4379538424 hasRelatedWork W3091875946 @default.
- W4379538424 hasRelatedWork W3093210876 @default.
- W4379538424 hasRelatedWork W3111617249 @default.
- W4379538424 hasRelatedWork W3208660434 @default.
- W4379538424 hasRelatedWork W3212144406 @default.
- W4379538424 hasRelatedWork W4214593279 @default.
- W4379538424 hasRelatedWork W4284890489 @default.
- W4379538424 hasRelatedWork W4294827289 @default.
- W4379538424 hasRelatedWork W4327873626 @default.
- W4379538424 isParatext "false" @default.
- W4379538424 isRetracted "false" @default.
- W4379538424 workType "article" @default.