Matches in SemOpenAlex for { <https://semopenalex.org/work/W3046354018> ?p ?o ?g. }
Showing items 1 to 99 of
99
with 100 items per page.
- W3046354018 abstract "Recently, large scale Transformer-based language models such as BERT, GPT-2, and XLNet have brought about exciting leaps in state-of-the-art results for many Natural Language Processing (NLP) tasks. One of the common trends in these recent models is a significant increase in model complexity, which introduces both more weights and computation. Moreover, with the advent of large-scale unsupervised datasets, training time is further extended due to the increased amount of data samples within a single training epoch. As a result, to train these models within a reasonable time, machine learning (ML) programmers often require advanced hardware setups such as the premium GPU-enabled NVIDIA DGX workstations or specialized accelerators such as Google's TPU Pods. Our work addresses this limitation and demonstrates that the BERT pre-trained model can be trained within 2 weeks on an academic-size cluster of widely available GPUs through careful algorithmic and software optimizations. In this paper, we present these optimizations on how to improve single device training throughput, distribute the training workload over multiple nodes and GPUs, and overcome the communication bottleneck introduced by the large data exchanges over the network. We show that we are able to perform pre-training on BERT within a reasonable time budget (12 days) in an academic setting, but with a much less expensive and less aggressive hardware resource requirement than in previously demonstrated industrial settings based on NVIDIA DGX machines or Google's TPU Pods." @default.
- W3046354018 created "2020-08-07" @default.
- W3046354018 creator A5007585346 @default.
- W3046354018 creator A5022526821 @default.
- W3046354018 creator A5082233189 @default.
- W3046354018 date "2020-08-01" @default.
- W3046354018 modified "2023-10-18" @default.
- W3046354018 title "Multi-node Bert-pretraining: Cost-efficient Approach" @default.
- W3046354018 cites W1566289585 @default.
- W3046354018 cites W1598866093 @default.
- W3046354018 cites W2064675550 @default.
- W3046354018 cites W2133564696 @default.
- W3046354018 cites W2194775991 @default.
- W3046354018 cites W2427527485 @default.
- W3046354018 cites W2462831000 @default.
- W3046354018 cites W2525778437 @default.
- W3046354018 cites W2622263826 @default.
- W3046354018 cites W2626778328 @default.
- W3046354018 cites W2763421725 @default.
- W3046354018 cites W2766140019 @default.
- W3046354018 cites W2900167092 @default.
- W3046354018 cites W2930786691 @default.
- W3046354018 cites W2948223045 @default.
- W3046354018 cites W2949640717 @default.
- W3046354018 cites W2949888546 @default.
- W3046354018 cites W2950813464 @default.
- W3046354018 cites W2952564229 @default.
- W3046354018 cites W2963341956 @default.
- W3046354018 cites W2965373594 @default.
- W3046354018 cites W2979245724 @default.
- W3046354018 cites W2989916540 @default.
- W3046354018 cites W2991040477 @default.
- W3046354018 cites W3030163527 @default.
- W3046354018 doi "https://doi.org/10.48550/arxiv.2008.00177" @default.
- W3046354018 hasPublicationYear "2020" @default.
- W3046354018 type Work @default.
- W3046354018 sameAs 3046354018 @default.
- W3046354018 citedByCount "3" @default.
- W3046354018 countsByYear W30463540182021 @default.
- W3046354018 crossrefType "posted-content" @default.
- W3046354018 hasAuthorship W3046354018A5007585346 @default.
- W3046354018 hasAuthorship W3046354018A5022526821 @default.
- W3046354018 hasAuthorship W3046354018A5082233189 @default.
- W3046354018 hasBestOaLocation W30463540181 @default.
- W3046354018 hasConcept C111919701 @default.
- W3046354018 hasConcept C113775141 @default.
- W3046354018 hasConcept C118524514 @default.
- W3046354018 hasConcept C119857082 @default.
- W3046354018 hasConcept C120314980 @default.
- W3046354018 hasConcept C137293760 @default.
- W3046354018 hasConcept C149635348 @default.
- W3046354018 hasConcept C154945302 @default.
- W3046354018 hasConcept C173608175 @default.
- W3046354018 hasConcept C199360897 @default.
- W3046354018 hasConcept C2777904410 @default.
- W3046354018 hasConcept C2778119891 @default.
- W3046354018 hasConcept C2778476105 @default.
- W3046354018 hasConcept C2780513914 @default.
- W3046354018 hasConcept C2781335571 @default.
- W3046354018 hasConcept C29140674 @default.
- W3046354018 hasConcept C41008148 @default.
- W3046354018 hasConcept C45374587 @default.
- W3046354018 hasConcept C67953723 @default.
- W3046354018 hasConceptScore W3046354018C111919701 @default.
- W3046354018 hasConceptScore W3046354018C113775141 @default.
- W3046354018 hasConceptScore W3046354018C118524514 @default.
- W3046354018 hasConceptScore W3046354018C119857082 @default.
- W3046354018 hasConceptScore W3046354018C120314980 @default.
- W3046354018 hasConceptScore W3046354018C137293760 @default.
- W3046354018 hasConceptScore W3046354018C149635348 @default.
- W3046354018 hasConceptScore W3046354018C154945302 @default.
- W3046354018 hasConceptScore W3046354018C173608175 @default.
- W3046354018 hasConceptScore W3046354018C199360897 @default.
- W3046354018 hasConceptScore W3046354018C2777904410 @default.
- W3046354018 hasConceptScore W3046354018C2778119891 @default.
- W3046354018 hasConceptScore W3046354018C2778476105 @default.
- W3046354018 hasConceptScore W3046354018C2780513914 @default.
- W3046354018 hasConceptScore W3046354018C2781335571 @default.
- W3046354018 hasConceptScore W3046354018C29140674 @default.
- W3046354018 hasConceptScore W3046354018C41008148 @default.
- W3046354018 hasConceptScore W3046354018C45374587 @default.
- W3046354018 hasConceptScore W3046354018C67953723 @default.
- W3046354018 hasLocation W30463540181 @default.
- W3046354018 hasOpenAccess W3046354018 @default.
- W3046354018 hasPrimaryLocation W30463540181 @default.
- W3046354018 hasRelatedWork W1521841817 @default.
- W3046354018 hasRelatedWork W1556736978 @default.
- W3046354018 hasRelatedWork W1950440938 @default.
- W3046354018 hasRelatedWork W2034384303 @default.
- W3046354018 hasRelatedWork W2086059776 @default.
- W3046354018 hasRelatedWork W2338146185 @default.
- W3046354018 hasRelatedWork W2358053162 @default.
- W3046354018 hasRelatedWork W2386862665 @default.
- W3046354018 hasRelatedWork W2564785367 @default.
- W3046354018 hasRelatedWork W3046354018 @default.
- W3046354018 isParatext "false" @default.
- W3046354018 isRetracted "false" @default.
- W3046354018 magId "3046354018" @default.
- W3046354018 workType "article" @default.