Matches in SemOpenAlex for { <https://semopenalex.org/work/W2980175401> ?p ?o ?g. }
Showing items 1 to 81 of
81
with 100 items per page.
- W2980175401 abstract "We investigate the sample complexity of networks with bounds on the magnitude of its weights. In particular, we consider the class [ H=left{W_tcircrhocirc ldotscircrhocirc W_{1} :W_1,ldots,W_{t-1}in M_{d, d}, W_tin M_{1,d}right} ] where the spectral norm of each $W_i$ is bounded by $O(1)$, the Frobenius norm is bounded by $R$, and $rho$ is the sigmoid function $frac{e^x}{1+e^x}$ or the smoothened ReLU function $ ln (1+e^x)$. We show that for any depth $t$, if the inputs are in $[-1,1]^d$, the sample complexity of $H$ is $tilde Oleft(frac{dR^2}{epsilon^2}right)$. This bound is optimal up to log-factors, and substantially improves over the previous state of the art of $tilde Oleft(frac{d^2R^2}{epsilon^2}right)$. We furthermore show that this bound remains valid if instead of considering the magnitude of the $W_i$'s, we consider the magnitude of $W_i - W_i^0$, where $W_i^0$ are some reference matrices, with spectral norm of $O(1)$. By taking the $W_i^0$ to be the matrices at the onset of the training process, we get sample complexity bounds that are sub-linear in the number of parameters, in many typical regimes of parameters. To establish our results we develop a new technique to analyze the sample complexity of families $H$ of predictors. We start by defining a new notion of a randomized approximate description of functions $f:Xtomathbb{R}^d$. We then show that if there is a way to approximately describe functions in a class $H$ using $d$ bits, then $d/epsilon^2$ examples suffices to guarantee uniform convergence. Namely, that the empirical loss of all the functions in the class is $epsilon$-close to the true loss. Finally, we develop a set of tools for calculating the approximate description length of classes of functions that can be presented as a composition of linear function classes and non-linear functions." @default.
- W2980175401 created "2019-10-18" @default.
- W2980175401 creator A5049826794 @default.
- W2980175401 creator A5081084711 @default.
- W2980175401 date "2019-10-13" @default.
- W2980175401 modified "2023-09-27" @default.
- W2980175401 title "Generalization Bounds for Neural Networks via Approximate Description Length" @default.
- W2980175401 cites W104184427 @default.
- W2980175401 cites W1542886316 @default.
- W2980175401 cites W1553313034 @default.
- W2980175401 cites W2044230505 @default.
- W2980175401 cites W2579923771 @default.
- W2980175401 cites W2906967080 @default.
- W2980175401 cites W2912260645 @default.
- W2980175401 cites W2962857907 @default.
- W2980175401 cites W2963236897 @default.
- W2980175401 cites W2963285844 @default.
- W2980175401 cites W2963664410 @default.
- W2980175401 cites W2965497096 @default.
- W2980175401 cites W3119586787 @default.
- W2980175401 cites W607505555 @default.
- W2980175401 hasPublicationYear "2019" @default.
- W2980175401 type Work @default.
- W2980175401 sameAs 2980175401 @default.
- W2980175401 citedByCount "0" @default.
- W2980175401 crossrefType "posted-content" @default.
- W2980175401 hasAuthorship W2980175401A5049826794 @default.
- W2980175401 hasAuthorship W2980175401A5081084711 @default.
- W2980175401 hasConcept C114614502 @default.
- W2980175401 hasConcept C118615104 @default.
- W2980175401 hasConcept C121332964 @default.
- W2980175401 hasConcept C134306372 @default.
- W2980175401 hasConcept C154945302 @default.
- W2980175401 hasConcept C17744445 @default.
- W2980175401 hasConcept C191795146 @default.
- W2980175401 hasConcept C199539241 @default.
- W2980175401 hasConcept C2778445095 @default.
- W2980175401 hasConcept C33923547 @default.
- W2980175401 hasConcept C34388435 @default.
- W2980175401 hasConcept C41008148 @default.
- W2980175401 hasConcept C77553402 @default.
- W2980175401 hasConceptScore W2980175401C114614502 @default.
- W2980175401 hasConceptScore W2980175401C118615104 @default.
- W2980175401 hasConceptScore W2980175401C121332964 @default.
- W2980175401 hasConceptScore W2980175401C134306372 @default.
- W2980175401 hasConceptScore W2980175401C154945302 @default.
- W2980175401 hasConceptScore W2980175401C17744445 @default.
- W2980175401 hasConceptScore W2980175401C191795146 @default.
- W2980175401 hasConceptScore W2980175401C199539241 @default.
- W2980175401 hasConceptScore W2980175401C2778445095 @default.
- W2980175401 hasConceptScore W2980175401C33923547 @default.
- W2980175401 hasConceptScore W2980175401C34388435 @default.
- W2980175401 hasConceptScore W2980175401C41008148 @default.
- W2980175401 hasConceptScore W2980175401C77553402 @default.
- W2980175401 hasLocation W29801754011 @default.
- W2980175401 hasOpenAccess W2980175401 @default.
- W2980175401 hasPrimaryLocation W29801754011 @default.
- W2980175401 hasRelatedWork W128487529 @default.
- W2980175401 hasRelatedWork W1968071479 @default.
- W2980175401 hasRelatedWork W2063089937 @default.
- W2980175401 hasRelatedWork W2278314150 @default.
- W2980175401 hasRelatedWork W2341536123 @default.
- W2980175401 hasRelatedWork W2508778980 @default.
- W2980175401 hasRelatedWork W2748531157 @default.
- W2980175401 hasRelatedWork W2767507454 @default.
- W2980175401 hasRelatedWork W2782868354 @default.
- W2980175401 hasRelatedWork W2949705094 @default.
- W2980175401 hasRelatedWork W2950482546 @default.
- W2980175401 hasRelatedWork W2955767323 @default.
- W2980175401 hasRelatedWork W2959342769 @default.
- W2980175401 hasRelatedWork W2963756312 @default.
- W2980175401 hasRelatedWork W2970956502 @default.
- W2980175401 hasRelatedWork W2974878803 @default.
- W2980175401 hasRelatedWork W2978682257 @default.
- W2980175401 hasRelatedWork W3017006464 @default.
- W2980175401 hasRelatedWork W3128869953 @default.
- W2980175401 hasRelatedWork W3204478900 @default.
- W2980175401 isParatext "false" @default.
- W2980175401 isRetracted "false" @default.
- W2980175401 magId "2980175401" @default.
- W2980175401 workType "article" @default.