Matches in SemOpenAlex for { <https://semopenalex.org/work/W3213641637> ?p ?o ?g. }
- W3213641637 abstract "Deep neural networks have achieved impressive performance in many areas. Designing a fast and provable method for training neural networks is a fundamental question in machine learning. The classical training method requires paying $Omega(mnd)$ cost for both forward computation and backward computation, where $m$ is the width of the neural network, and we are given $n$ training points in $d$-dimensional space. In this paper, we propose two novel preprocessing ideas to bypass this $Omega(mnd)$ barrier: $bullet$ First, by preprocessing the initial weights of the neural networks, we can train the neural network in $widetilde{O}(m^{1-Theta(1/d)} n d)$ cost per iteration. $bullet$ Second, by preprocessing the input data points, we can train the neural network in $widetilde{O} (m^{4/5} nd )$ cost per iteration. From the technical perspective, our result is a sophisticated combination of tools in different fields, greedy-type convergence analysis in optimization, sparsity observation in practical work, high-dimensional geometric search in data structure, concentration and anti-concentration in probability. Our results also provide theoretical insights for a large number of previously established fast training methods. In addition, our classical algorithm can be generalized to the Quantum computation model. Interestingly, we can get a similar sublinear cost per iteration but avoid preprocessing initial weights or input data points." @default.
- W3213641637 created "2021-11-22" @default.
- W3213641637 creator A5003393009 @default.
- W3213641637 creator A5008524479 @default.
- W3213641637 creator A5091070408 @default.
- W3213641637 date "2021-10-09" @default.
- W3213641637 modified "2023-10-18" @default.
- W3213641637 title "Does Preprocessing Help Training Over-parameterized Neural Networks?" @default.
- W3213641637 cites W1455310343 @default.
- W3213641637 cites W1547671143 @default.
- W3213641637 cites W1667961663 @default.
- W3213641637 cites W1678167890 @default.
- W3213641637 cites W1964691748 @default.
- W3213641637 cites W2012833704 @default.
- W3213641637 cites W2017851434 @default.
- W3213641637 cites W2028801888 @default.
- W3213641637 cites W2035144082 @default.
- W3213641637 cites W2048729985 @default.
- W3213641637 cites W2084652510 @default.
- W3213641637 cites W2097117768 @default.
- W3213641637 cites W2103893986 @default.
- W3213641637 cites W2112796928 @default.
- W3213641637 cites W2118323718 @default.
- W3213641637 cites W2129635682 @default.
- W3213641637 cites W2139230981 @default.
- W3213641637 cites W2147717514 @default.
- W3213641637 cites W2158899491 @default.
- W3213641637 cites W2160775966 @default.
- W3213641637 cites W2162006472 @default.
- W3213641637 cites W2163605009 @default.
- W3213641637 cites W2194775991 @default.
- W3213641637 cites W2257979135 @default.
- W3213641637 cites W2570965098 @default.
- W3213641637 cites W2766447205 @default.
- W3213641637 cites W2791219474 @default.
- W3213641637 cites W2809090039 @default.
- W3213641637 cites W2811318564 @default.
- W3213641637 cites W2886067286 @default.
- W3213641637 cites W2945554113 @default.
- W3213641637 cites W2947461788 @default.
- W3213641637 cites W2948046738 @default.
- W3213641637 cites W2952763926 @default.
- W3213641637 cites W2962698540 @default.
- W3213641637 cites W2962921664 @default.
- W3213641637 cites W2963056065 @default.
- W3213641637 cites W2963239103 @default.
- W3213641637 cites W2963341956 @default.
- W3213641637 cites W2963703787 @default.
- W3213641637 cites W2963813662 @default.
- W3213641637 cites W2964089577 @default.
- W3213641637 cites W2964098911 @default.
- W3213641637 cites W2964161337 @default.
- W3213641637 cites W2966524351 @default.
- W3213641637 cites W2970265440 @default.
- W3213641637 cites W2970332347 @default.
- W3213641637 cites W2970443625 @default.
- W3213641637 cites W2971043187 @default.
- W3213641637 cites W2971055146 @default.
- W3213641637 cites W2991290085 @default.
- W3213641637 cites W2994673210 @default.
- W3213641637 cites W2995838034 @default.
- W3213641637 cites W2996168800 @default.
- W3213641637 cites W3017378904 @default.
- W3213641637 cites W3021189130 @default.
- W3213641637 cites W3034810578 @default.
- W3213641637 cites W3037005949 @default.
- W3213641637 cites W3039295366 @default.
- W3213641637 cites W3091118633 @default.
- W3213641637 cites W3098047114 @default.
- W3213641637 cites W3098560605 @default.
- W3213641637 cites W3126188815 @default.
- W3213641637 cites W3128934904 @default.
- W3213641637 cites W3133750907 @default.
- W3213641637 cites W3135601583 @default.
- W3213641637 cites W3138433673 @default.
- W3213641637 cites W3217033513 @default.
- W3213641637 doi "https://doi.org/10.48550/arxiv.2110.04622" @default.
- W3213641637 hasPublicationYear "2021" @default.
- W3213641637 type Work @default.
- W3213641637 sameAs 3213641637 @default.
- W3213641637 citedByCount "0" @default.
- W3213641637 crossrefType "posted-content" @default.
- W3213641637 hasAuthorship W3213641637A5003393009 @default.
- W3213641637 hasAuthorship W3213641637A5008524479 @default.
- W3213641637 hasAuthorship W3213641637A5091070408 @default.
- W3213641637 hasBestOaLocation W32136416371 @default.
- W3213641637 hasConcept C11413529 @default.
- W3213641637 hasConcept C126255220 @default.
- W3213641637 hasConcept C154945302 @default.
- W3213641637 hasConcept C162324750 @default.
- W3213641637 hasConcept C165464430 @default.
- W3213641637 hasConcept C2777303404 @default.
- W3213641637 hasConcept C33923547 @default.
- W3213641637 hasConcept C34736171 @default.
- W3213641637 hasConcept C41008148 @default.
- W3213641637 hasConcept C45374587 @default.
- W3213641637 hasConcept C50522688 @default.
- W3213641637 hasConcept C50644808 @default.
- W3213641637 hasConceptScore W3213641637C11413529 @default.
- W3213641637 hasConceptScore W3213641637C126255220 @default.