Matches in SemOpenAlex for { <https://semopenalex.org/work/W2995151505> ?p ?o ?g. }
Showing items 1 to 71 of
71
with 100 items per page.
- W2995151505 abstract "While Standard gradient descent is one very popular optimisation method, its convergence cannot be proven beyond the class of functions whose gradient is globally Lipschitz continuous. As such, it is not actually applicable to realistic applications such as Deep Neural Networks. In this paper, we prove that its backtracking variant behaves very nicely, in particular convergence can be shown for all Morse functions. The main theoretical result of this paper is as follows. Theorem. Let $f:mathbb{R}^krightarrow mathbb{R}$ be a $C^1$ function, and ${z_n}$ a sequence constructed from the Backtracking gradient descent algorithm. (1) Either $lim _{nrightarrowinfty}||z_n||=infty$ or $lim _{nrightarrowinfty}||z_{n+1}-z_n||=0$. (2) Assume that $f$ has at most countably many critical points. Then either $lim _{nrightarrowinfty}||z_n||=infty$ or ${z_n}$ converges to a critical point of $f$. (3) More generally, assume that all connected components of the set of critical points of $f$ are compact. Then either $lim _{nrightarrowinfty}||z_n||=infty$ or ${z_n}$ is bounded. Moreover, in the latter case the set of cluster points of ${z_n}$ is connected. Some generalised versions of this result, including an inexact version, are included. Another result in this paper concerns the problem of saddle points. We then present a heuristic argument to explain why Standard gradient descent method works so well, and modifications of the backtracking versions of GD, MMT and NAG. Experiments with datasets CIFAR10 and CIFAR100 on various popular architectures verify the heuristic argument also for the mini-batch practice and show that our new algorithms, while automatically fine tuning learning rates, perform better than current state-of-the-art methods such as MMT, NAG, Adagrad, Adadelta, RMSProp, Adam and Adamax." @default.
- W2995151505 created "2019-12-26" @default.
- W2995151505 creator A5002059087 @default.
- W2995151505 creator A5062449758 @default.
- W2995151505 date "2018-08-15" @default.
- W2995151505 modified "2023-09-27" @default.
- W2995151505 title "Backtracking gradient descent method for general $C^1$ functions" @default.
- W2995151505 hasPublicationYear "2018" @default.
- W2995151505 type Work @default.
- W2995151505 sameAs 2995151505 @default.
- W2995151505 citedByCount "0" @default.
- W2995151505 crossrefType "posted-content" @default.
- W2995151505 hasAuthorship W2995151505A5002059087 @default.
- W2995151505 hasAuthorship W2995151505A5062449758 @default.
- W2995151505 hasConcept C11413529 @default.
- W2995151505 hasConcept C114614502 @default.
- W2995151505 hasConcept C118615104 @default.
- W2995151505 hasConcept C119857082 @default.
- W2995151505 hasConcept C134306372 @default.
- W2995151505 hasConcept C153258448 @default.
- W2995151505 hasConcept C156884757 @default.
- W2995151505 hasConcept C22324862 @default.
- W2995151505 hasConcept C2778112365 @default.
- W2995151505 hasConcept C33923547 @default.
- W2995151505 hasConcept C34388435 @default.
- W2995151505 hasConcept C41008148 @default.
- W2995151505 hasConcept C50644808 @default.
- W2995151505 hasConcept C54355233 @default.
- W2995151505 hasConcept C86803240 @default.
- W2995151505 hasConceptScore W2995151505C11413529 @default.
- W2995151505 hasConceptScore W2995151505C114614502 @default.
- W2995151505 hasConceptScore W2995151505C118615104 @default.
- W2995151505 hasConceptScore W2995151505C119857082 @default.
- W2995151505 hasConceptScore W2995151505C134306372 @default.
- W2995151505 hasConceptScore W2995151505C153258448 @default.
- W2995151505 hasConceptScore W2995151505C156884757 @default.
- W2995151505 hasConceptScore W2995151505C22324862 @default.
- W2995151505 hasConceptScore W2995151505C2778112365 @default.
- W2995151505 hasConceptScore W2995151505C33923547 @default.
- W2995151505 hasConceptScore W2995151505C34388435 @default.
- W2995151505 hasConceptScore W2995151505C41008148 @default.
- W2995151505 hasConceptScore W2995151505C50644808 @default.
- W2995151505 hasConceptScore W2995151505C54355233 @default.
- W2995151505 hasConceptScore W2995151505C86803240 @default.
- W2995151505 hasLocation W29951515051 @default.
- W2995151505 hasOpenAccess W2995151505 @default.
- W2995151505 hasPrimaryLocation W29951515051 @default.
- W2995151505 hasRelatedWork W1836708065 @default.
- W2995151505 hasRelatedWork W1969414885 @default.
- W2995151505 hasRelatedWork W1969795824 @default.
- W2995151505 hasRelatedWork W2008134816 @default.
- W2995151505 hasRelatedWork W2095426926 @default.
- W2995151505 hasRelatedWork W2135596970 @default.
- W2995151505 hasRelatedWork W2417107316 @default.
- W2995151505 hasRelatedWork W2777387026 @default.
- W2995151505 hasRelatedWork W2885557996 @default.
- W2995151505 hasRelatedWork W2908445374 @default.
- W2995151505 hasRelatedWork W2914000446 @default.
- W2995151505 hasRelatedWork W2937221077 @default.
- W2995151505 hasRelatedWork W2951195472 @default.
- W2995151505 hasRelatedWork W2952223661 @default.
- W2995151505 hasRelatedWork W2952780727 @default.
- W2995151505 hasRelatedWork W2952870759 @default.
- W2995151505 hasRelatedWork W3000740936 @default.
- W2995151505 hasRelatedWork W3009948090 @default.
- W2995151505 hasRelatedWork W3136525024 @default.
- W2995151505 hasRelatedWork W37341030 @default.
- W2995151505 isParatext "false" @default.
- W2995151505 isRetracted "false" @default.
- W2995151505 magId "2995151505" @default.
- W2995151505 workType "article" @default.