Matches in SemOpenAlex for { <https://semopenalex.org/work/W3043673595> ?p ?o ?g. }
- W3043673595 abstract "The ability of neural networks to provide `best in class' approximation across a wide range of applications is well-documented. Nevertheless, the powerful expressivity of neural networks comes to naught if one is unable to effectively train (choose) the parameters defining the network. In general, neural networks are trained by gradient descent type optimization methods, or a stochastic variant thereof. In practice, such methods result in the loss function decreases rapidly at the beginning of training but then, after a relatively small number of steps, significantly slow down. The loss may even appear to stagnate over the period of a large number of epochs, only to then suddenly start to decrease fast again for no apparent reason. This so-called plateau phenomenon manifests itself in many learning tasks. The present work aims to identify and quantify the root causes of plateau phenomenon. No assumptions are made on the number of neurons relative to the number of training data, and our results hold for both the lazy and adaptive regimes. The main findings are: plateaux correspond to periods during which activation patterns remain constant, where activation pattern refers to the number of data points that activate a given neuron; quantification of convergence of the gradient flow dynamics; and, characterization of stationary points in terms solutions of local least squares regression lines over subsets of the training data. Based on these conclusions, we propose a new iterative training method, the Active Neuron Least Squares (ANLS), characterised by the explicit adjustment of the activation pattern at each step, which is designed to enable a quick exit from a plateau. Illustrative numerical examples are included throughout." @default.
- W3043673595 created "2020-07-23" @default.
- W3043673595 creator A5023632213 @default.
- W3043673595 creator A5091196582 @default.
- W3043673595 date "2020-07-14" @default.
- W3043673595 modified "2023-09-25" @default.
- W3043673595 title "Plateau Phenomenon in Gradient Descent Training of ReLU networks: Explanation, Quantification and Avoidance" @default.
- W3043673595 cites W1514218977 @default.
- W3043673595 cites W1521738998 @default.
- W3043673595 cites W1522301498 @default.
- W3043673595 cites W1576278180 @default.
- W3043673595 cites W1677182931 @default.
- W3043673595 cites W1988485873 @default.
- W3043673595 cites W1995842804 @default.
- W3043673595 cites W1999590492 @default.
- W3043673595 cites W2020107577 @default.
- W3043673595 cites W2079224763 @default.
- W3043673595 cites W2103496339 @default.
- W3043673595 cites W2125911849 @default.
- W3043673595 cites W2158581396 @default.
- W3043673595 cites W2166116275 @default.
- W3043673595 cites W2402144811 @default.
- W3043673595 cites W2523246573 @default.
- W3043673595 cites W2528305538 @default.
- W3043673595 cites W2798909945 @default.
- W3043673595 cites W2798986185 @default.
- W3043673595 cites W2809090039 @default.
- W3043673595 cites W2894604724 @default.
- W3043673595 cites W2895143189 @default.
- W3043673595 cites W2899748887 @default.
- W3043673595 cites W2900959181 @default.
- W3043673595 cites W2913892099 @default.
- W3043673595 cites W2919115771 @default.
- W3043673595 cites W2951520791 @default.
- W3043673595 cites W2952204734 @default.
- W3043673595 cites W2962698540 @default.
- W3043673595 cites W2963095610 @default.
- W3043673595 cites W2963146412 @default.
- W3043673595 cites W2963791871 @default.
- W3043673595 cites W2966530573 @default.
- W3043673595 cites W2970217468 @default.
- W3043673595 cites W2970721719 @default.
- W3043673595 cites W2970723196 @default.
- W3043673595 cites W2970971581 @default.
- W3043673595 cites W2989989463 @default.
- W3043673595 cites W2996676336 @default.
- W3043673595 cites W3099849883 @default.
- W3043673595 doi "https://doi.org/10.48550/arxiv.2007.07213" @default.
- W3043673595 hasPublicationYear "2020" @default.
- W3043673595 type Work @default.
- W3043673595 sameAs 3043673595 @default.
- W3043673595 citedByCount "1" @default.
- W3043673595 countsByYear W30436735952021 @default.
- W3043673595 crossrefType "posted-content" @default.
- W3043673595 hasAuthorship W3043673595A5023632213 @default.
- W3043673595 hasAuthorship W3043673595A5091196582 @default.
- W3043673595 hasBestOaLocation W30436735951 @default.
- W3043673595 hasConcept C105795698 @default.
- W3043673595 hasConcept C121332964 @default.
- W3043673595 hasConcept C134306372 @default.
- W3043673595 hasConcept C14036430 @default.
- W3043673595 hasConcept C153258448 @default.
- W3043673595 hasConcept C154945302 @default.
- W3043673595 hasConcept C159985019 @default.
- W3043673595 hasConcept C162324750 @default.
- W3043673595 hasConcept C189237950 @default.
- W3043673595 hasConcept C192562407 @default.
- W3043673595 hasConcept C204323151 @default.
- W3043673595 hasConcept C206688291 @default.
- W3043673595 hasConcept C2777303404 @default.
- W3043673595 hasConcept C2780030769 @default.
- W3043673595 hasConcept C33923547 @default.
- W3043673595 hasConcept C41008148 @default.
- W3043673595 hasConcept C50335755 @default.
- W3043673595 hasConcept C50522688 @default.
- W3043673595 hasConcept C50644808 @default.
- W3043673595 hasConcept C62520636 @default.
- W3043673595 hasConcept C78458016 @default.
- W3043673595 hasConcept C83546350 @default.
- W3043673595 hasConcept C86803240 @default.
- W3043673595 hasConceptScore W3043673595C105795698 @default.
- W3043673595 hasConceptScore W3043673595C121332964 @default.
- W3043673595 hasConceptScore W3043673595C134306372 @default.
- W3043673595 hasConceptScore W3043673595C14036430 @default.
- W3043673595 hasConceptScore W3043673595C153258448 @default.
- W3043673595 hasConceptScore W3043673595C154945302 @default.
- W3043673595 hasConceptScore W3043673595C159985019 @default.
- W3043673595 hasConceptScore W3043673595C162324750 @default.
- W3043673595 hasConceptScore W3043673595C189237950 @default.
- W3043673595 hasConceptScore W3043673595C192562407 @default.
- W3043673595 hasConceptScore W3043673595C204323151 @default.
- W3043673595 hasConceptScore W3043673595C206688291 @default.
- W3043673595 hasConceptScore W3043673595C2777303404 @default.
- W3043673595 hasConceptScore W3043673595C2780030769 @default.
- W3043673595 hasConceptScore W3043673595C33923547 @default.
- W3043673595 hasConceptScore W3043673595C41008148 @default.
- W3043673595 hasConceptScore W3043673595C50335755 @default.
- W3043673595 hasConceptScore W3043673595C50522688 @default.
- W3043673595 hasConceptScore W3043673595C50644808 @default.
- W3043673595 hasConceptScore W3043673595C62520636 @default.