Matches in SemOpenAlex for { <https://semopenalex.org/work/W104184427> ?p ?o ?g. }
- W104184427 endingPage "1147" @default.
- W104184427 startingPage "1139" @default.
- W104184427 abstract "Deep and recurrent neural networks (DNNs and RNNs respectively) are powerful models that were considered to be almost impossible to train using stochastic gradient descent with momentum. In this paper, we show that when stochastic gradient descent with momentum uses a well-designed random initialization and a particular type of slowly increasing schedule for the momentum parameter, it can train both DNNs and RNNs (on datasets with long-term dependencies) to levels of performance that were previously achievable only with Hessian-Free optimization. We find that both the initialization and the momentum are crucial since poorly initialized networks cannot be trained with momentum and well-initialized networks perform markedly worse when the momentum is absent or poorly tuned.Our success training these models suggests that previous attempts to train deep and recurrent neural networks from random initializations have likely failed due to poor initialization schemes. Furthermore, carefully tuned momentum methods suffice for dealing with the curvature issues in deep and recurrent network training objectives without the need for sophisticated second-order methods." @default.
- W104184427 created "2016-06-24" @default.
- W104184427 creator A5006446297 @default.
- W104184427 creator A5024209719 @default.
- W104184427 creator A5046229829 @default.
- W104184427 creator A5047062711 @default.
- W104184427 date "2013-06-16" @default.
- W104184427 modified "2023-10-06" @default.
- W104184427 title "On the importance of initialization and momentum in deep learning" @default.
- W104184427 cites W134467184 @default.
- W104184427 cites W1408639475 @default.
- W104184427 cites W1533861849 @default.
- W104184427 cites W1576278180 @default.
- W104184427 cites W1828163288 @default.
- W104184427 cites W196214544 @default.
- W104184427 cites W196761320 @default.
- W104184427 cites W1988720110 @default.
- W104184427 cites W1993882792 @default.
- W104184427 cites W2023695846 @default.
- W104184427 cites W2024484010 @default.
- W104184427 cites W2064675550 @default.
- W104184427 cites W2100495367 @default.
- W104184427 cites W2102486516 @default.
- W104184427 cites W2107878631 @default.
- W104184427 cites W2108948681 @default.
- W104184427 cites W2110798204 @default.
- W104184427 cites W2118706537 @default.
- W104184427 cites W2124541940 @default.
- W104184427 cites W2136922672 @default.
- W104184427 cites W2147768505 @default.
- W104184427 cites W2148141518 @default.
- W104184427 cites W2152424459 @default.
- W104184427 cites W2184045248 @default.
- W104184427 cites W2618530766 @default.
- W104184427 cites W2914484425 @default.
- W104184427 cites W3141595720 @default.
- W104184427 cites W71081281 @default.
- W104184427 hasPublicationYear "2013" @default.
- W104184427 type Work @default.
- W104184427 sameAs 104184427 @default.
- W104184427 citedByCount "1431" @default.
- W104184427 countsByYear W1041844272013 @default.
- W104184427 countsByYear W1041844272014 @default.
- W104184427 countsByYear W1041844272015 @default.
- W104184427 countsByYear W1041844272016 @default.
- W104184427 countsByYear W1041844272017 @default.
- W104184427 countsByYear W1041844272018 @default.
- W104184427 countsByYear W1041844272019 @default.
- W104184427 countsByYear W1041844272020 @default.
- W104184427 countsByYear W1041844272021 @default.
- W104184427 countsByYear W1041844272022 @default.
- W104184427 countsByYear W1041844272023 @default.
- W104184427 crossrefType "proceedings-article" @default.
- W104184427 hasAuthorship W104184427A5006446297 @default.
- W104184427 hasAuthorship W104184427A5024209719 @default.
- W104184427 hasAuthorship W104184427A5046229829 @default.
- W104184427 hasAuthorship W104184427A5047062711 @default.
- W104184427 hasConcept C10138342 @default.
- W104184427 hasConcept C108583219 @default.
- W104184427 hasConcept C111919701 @default.
- W104184427 hasConcept C11413529 @default.
- W104184427 hasConcept C114466953 @default.
- W104184427 hasConcept C119857082 @default.
- W104184427 hasConcept C126255220 @default.
- W104184427 hasConcept C147168706 @default.
- W104184427 hasConcept C153258448 @default.
- W104184427 hasConcept C154945302 @default.
- W104184427 hasConcept C162324750 @default.
- W104184427 hasConcept C199360897 @default.
- W104184427 hasConcept C206688291 @default.
- W104184427 hasConcept C2984842247 @default.
- W104184427 hasConcept C33923547 @default.
- W104184427 hasConcept C41008148 @default.
- W104184427 hasConcept C50644808 @default.
- W104184427 hasConcept C60718061 @default.
- W104184427 hasConcept C68387754 @default.
- W104184427 hasConceptScore W104184427C10138342 @default.
- W104184427 hasConceptScore W104184427C108583219 @default.
- W104184427 hasConceptScore W104184427C111919701 @default.
- W104184427 hasConceptScore W104184427C11413529 @default.
- W104184427 hasConceptScore W104184427C114466953 @default.
- W104184427 hasConceptScore W104184427C119857082 @default.
- W104184427 hasConceptScore W104184427C126255220 @default.
- W104184427 hasConceptScore W104184427C147168706 @default.
- W104184427 hasConceptScore W104184427C153258448 @default.
- W104184427 hasConceptScore W104184427C154945302 @default.
- W104184427 hasConceptScore W104184427C162324750 @default.
- W104184427 hasConceptScore W104184427C199360897 @default.
- W104184427 hasConceptScore W104184427C206688291 @default.
- W104184427 hasConceptScore W104184427C2984842247 @default.
- W104184427 hasConceptScore W104184427C33923547 @default.
- W104184427 hasConceptScore W104184427C41008148 @default.
- W104184427 hasConceptScore W104184427C50644808 @default.
- W104184427 hasConceptScore W104184427C60718061 @default.
- W104184427 hasConceptScore W104184427C68387754 @default.
- W104184427 hasLocation W1041844271 @default.
- W104184427 hasOpenAccess W104184427 @default.
- W104184427 hasPrimaryLocation W1041844271 @default.