SemOpenAlex |

SemOpenAlex

Matches in SemOpenAlex for { <https://semopenalex.org/work/W3006131567> ?p ?o ?g. }

Showing items 1 to 75 of 75 with 100 items per page.

W3006131567 abstract "Widely popular transformer-based NLP models such as BERT and Turing-NLG have enormous capacity trending to billions of parameters. Current execution methods demand brute-force resources such as HBM devices and high speed interconnectivity for data parallelism. In this paper, we introduce a new relay-style execution technique called L2L (layer-to-layer) where at any given moment, the device memory is primarily populated only with the executing layer(s)'s footprint. The model resides in the DRAM memory attached to either a CPU or an FPGA as an entity we call eager param-server (EPS). To overcome the bandwidth issues of shuttling parameters to and from EPS, the model is executed a layer at a time across many micro-batches instead of the conventional method of minibatches over whole model. L2L is implemented using 16GB V100 devices for BERT-Large running it with a device batch size of up to 256. Our results show 45% reduction in memory and 40% increase in the throughput compared to the state-of-the-art baseline. L2L is also able to fit models up to 50 Billion parameters on a machine with a single 16GB V100 and 512GB CPU memory and without requiring any model partitioning. L2L scales to arbitrary depth allowing researchers to develop on affordable devices which is a big step toward democratizing AI. By running the optimizer in the host EPS, we show a new form of mixed precision for faster throughput and convergence. In addition, the EPS enables dynamic neural architecture approaches by varying layers across iterations. Finally, we also propose and demonstrate a constant memory variation of L2L and we propose future enhancements. This work has been performed on GPUs first, but also targeted towards all high TFLOPS/Watt accelerators." @default.
W3006131567 created "2020-02-24" @default.
W3006131567 creator A5005150184 @default.
W3006131567 creator A5019831449 @default.
W3006131567 creator A5070741036 @default.
W3006131567 creator A5082855669 @default.
W3006131567 date "2020-02-13" @default.
W3006131567 modified "2023-09-27" @default.
W3006131567 title "Training Large Neural Networks with Constant Memory using a New Execution Algorithm" @default.
W3006131567 cites W1598866093 @default.
W3006131567 cites W2168231600 @default.
W3006131567 cites W2336650964 @default.
W3006131567 cites W2338908902 @default.
W3006131567 cites W2807147113 @default.
W3006131567 cites W2896457183 @default.
W3006131567 cites W2945785363 @default.
W3006131567 cites W2962950660 @default.
W3006131567 cites W2963310665 @default.
W3006131567 cites W2963903325 @default.
W3006131567 cites W2964174152 @default.
W3006131567 cites W2980282514 @default.
W3006131567 cites W2991040477 @default.
W3006131567 cites W3025935268 @default.
W3006131567 cites W3030163527 @default.
W3006131567 hasPublicationYear "2020" @default.
W3006131567 type Work @default.
W3006131567 sameAs 3006131567 @default.
W3006131567 citedByCount "6" @default.
W3006131567 countsByYear W30061315672019 @default.
W3006131567 countsByYear W30061315672020 @default.
W3006131567 countsByYear W30061315672021 @default.
W3006131567 crossrefType "posted-content" @default.
W3006131567 hasAuthorship W3006131567A5005150184 @default.
W3006131567 hasAuthorship W3006131567A5019831449 @default.
W3006131567 hasAuthorship W3006131567A5070741036 @default.
W3006131567 hasAuthorship W3006131567A5082855669 @default.
W3006131567 hasConcept C111919701 @default.
W3006131567 hasConcept C173608175 @default.
W3006131567 hasConcept C41008148 @default.
W3006131567 hasConcept C7366592 @default.
W3006131567 hasConcept C74912251 @default.
W3006131567 hasConcept C9390403 @default.
W3006131567 hasConceptScore W3006131567C111919701 @default.
W3006131567 hasConceptScore W3006131567C173608175 @default.
W3006131567 hasConceptScore W3006131567C41008148 @default.
W3006131567 hasConceptScore W3006131567C7366592 @default.
W3006131567 hasConceptScore W3006131567C74912251 @default.
W3006131567 hasConceptScore W3006131567C9390403 @default.
W3006131567 hasLocation W30061315671 @default.
W3006131567 hasOpenAccess W3006131567 @default.
W3006131567 hasPrimaryLocation W30061315671 @default.
W3006131567 hasRelatedWork W2436522418 @default.
W3006131567 hasRelatedWork W2510461687 @default.
W3006131567 hasRelatedWork W2575834995 @default.
W3006131567 hasRelatedWork W2586877049 @default.
W3006131567 hasRelatedWork W2790490830 @default.
W3006131567 hasRelatedWork W2802840548 @default.
W3006131567 hasRelatedWork W2910132948 @default.
W3006131567 hasRelatedWork W2963341956 @default.
W3006131567 hasRelatedWork W2964174152 @default.
W3006131567 hasRelatedWork W2969868335 @default.
W3006131567 hasRelatedWork W2973727699 @default.
W3006131567 hasRelatedWork W2991040477 @default.
W3006131567 hasRelatedWork W3017114573 @default.
W3006131567 hasRelatedWork W3037774634 @default.
W3006131567 hasRelatedWork W3095639648 @default.
W3006131567 hasRelatedWork W3101434632 @default.
W3006131567 hasRelatedWork W3129831491 @default.
W3006131567 hasRelatedWork W3177073825 @default.
W3006131567 hasRelatedWork W3206328251 @default.
W3006131567 hasRelatedWork W3210803593 @default.
W3006131567 isParatext "false" @default.
W3006131567 isRetracted "false" @default.
W3006131567 magId "3006131567" @default.
W3006131567 workType "article" @default.