Matches in SemOpenAlex for { <https://semopenalex.org/work/W2948798935> ?p ?o ?g. }
- W2948798935 abstract "Transformer is the state-of-the-art model in recent machine translation evaluations. Two strands of research are promising to improve models of this kind: the first uses wide networks (a.k.a. Transformer-Big) and has been the de facto standard for the development of the Transformer system, and the other uses deeper language representation but faces the difficulty arising from learning deep networks. Here, we continue the line of research on the latter. We claim that a truly deep Transformer model can surpass the Transformer-Big counterpart by 1) proper use of layer normalization and 2) a novel way of passing the combination of previous layers to the next. On WMT'16 English- German, NIST OpenMT'12 Chinese-English and larger WMT'18 Chinese-English tasks, our deep system (30/25-layer encoder) outperforms the shallow Transformer-Big/Base baseline (6-layer encoder) by 0.4-2.4 BLEU points. As another bonus, the deep model is 1.6X smaller in size and 3X faster in training than Transformer-Big." @default.
- W2948798935 created "2019-06-14" @default.
- W2948798935 creator A5010462322 @default.
- W2948798935 creator A5023494462 @default.
- W2948798935 creator A5025187681 @default.
- W2948798935 creator A5025832925 @default.
- W2948798935 creator A5032686951 @default.
- W2948798935 creator A5062470278 @default.
- W2948798935 creator A5067078932 @default.
- W2948798935 date "2019-06-04" @default.
- W2948798935 modified "2023-09-30" @default.
- W2948798935 title "Learning Deep Transformer Models for Machine Translation" @default.
- W2948798935 cites W1522301498 @default.
- W2948798935 cites W1543750907 @default.
- W2948798935 cites W1597944220 @default.
- W2948798935 cites W1815076433 @default.
- W2948798935 cites W1902237438 @default.
- W2948798935 cites W2113104171 @default.
- W2948798935 cites W2128892113 @default.
- W2948798935 cites W2130942839 @default.
- W2948798935 cites W2194775991 @default.
- W2948798935 cites W2302255633 @default.
- W2948798935 cites W2525778437 @default.
- W2948798935 cites W2798761464 @default.
- W2948798935 cites W2817535134 @default.
- W2948798935 cites W2886490473 @default.
- W2948798935 cites W2888520903 @default.
- W2948798935 cites W2890964657 @default.
- W2948798935 cites W2896060389 @default.
- W2948798935 cites W2902081112 @default.
- W2948798935 cites W2952564229 @default.
- W2948798935 cites W2962739339 @default.
- W2948798935 cites W2962784628 @default.
- W2948798935 cites W2962931466 @default.
- W2948798935 cites W2963212250 @default.
- W2948798935 cites W2963216553 @default.
- W2948798935 cites W2963302407 @default.
- W2948798935 cites W2963341956 @default.
- W2948798935 cites W2963403868 @default.
- W2948798935 cites W2963418779 @default.
- W2948798935 cites W2963599677 @default.
- W2948798935 cites W2963755523 @default.
- W2948798935 cites W2963807318 @default.
- W2948798935 cites W2963925437 @default.
- W2948798935 cites W2963991316 @default.
- W2948798935 cites W2964088127 @default.
- W2948798935 cites W2964308564 @default.
- W2948798935 doi "https://doi.org/10.48550/arxiv.1906.01787" @default.
- W2948798935 hasPublicationYear "2019" @default.
- W2948798935 type Work @default.
- W2948798935 sameAs 2948798935 @default.
- W2948798935 citedByCount "45" @default.
- W2948798935 countsByYear W29487989352019 @default.
- W2948798935 countsByYear W29487989352020 @default.
- W2948798935 countsByYear W29487989352021 @default.
- W2948798935 countsByYear W29487989352022 @default.
- W2948798935 crossrefType "posted-content" @default.
- W2948798935 hasAuthorship W2948798935A5010462322 @default.
- W2948798935 hasAuthorship W2948798935A5023494462 @default.
- W2948798935 hasAuthorship W2948798935A5025187681 @default.
- W2948798935 hasAuthorship W2948798935A5025832925 @default.
- W2948798935 hasAuthorship W2948798935A5032686951 @default.
- W2948798935 hasAuthorship W2948798935A5062470278 @default.
- W2948798935 hasAuthorship W2948798935A5067078932 @default.
- W2948798935 hasBestOaLocation W29487989351 @default.
- W2948798935 hasConcept C108583219 @default.
- W2948798935 hasConcept C111219384 @default.
- W2948798935 hasConcept C111919701 @default.
- W2948798935 hasConcept C118505674 @default.
- W2948798935 hasConcept C119599485 @default.
- W2948798935 hasConcept C127413603 @default.
- W2948798935 hasConcept C136886441 @default.
- W2948798935 hasConcept C144024400 @default.
- W2948798935 hasConcept C154945302 @default.
- W2948798935 hasConcept C165801399 @default.
- W2948798935 hasConcept C17744445 @default.
- W2948798935 hasConcept C19165224 @default.
- W2948798935 hasConcept C199539241 @default.
- W2948798935 hasConcept C203005215 @default.
- W2948798935 hasConcept C204321447 @default.
- W2948798935 hasConcept C2992317946 @default.
- W2948798935 hasConcept C41008148 @default.
- W2948798935 hasConcept C66322947 @default.
- W2948798935 hasConceptScore W2948798935C108583219 @default.
- W2948798935 hasConceptScore W2948798935C111219384 @default.
- W2948798935 hasConceptScore W2948798935C111919701 @default.
- W2948798935 hasConceptScore W2948798935C118505674 @default.
- W2948798935 hasConceptScore W2948798935C119599485 @default.
- W2948798935 hasConceptScore W2948798935C127413603 @default.
- W2948798935 hasConceptScore W2948798935C136886441 @default.
- W2948798935 hasConceptScore W2948798935C144024400 @default.
- W2948798935 hasConceptScore W2948798935C154945302 @default.
- W2948798935 hasConceptScore W2948798935C165801399 @default.
- W2948798935 hasConceptScore W2948798935C17744445 @default.
- W2948798935 hasConceptScore W2948798935C19165224 @default.
- W2948798935 hasConceptScore W2948798935C199539241 @default.
- W2948798935 hasConceptScore W2948798935C203005215 @default.
- W2948798935 hasConceptScore W2948798935C204321447 @default.
- W2948798935 hasConceptScore W2948798935C2992317946 @default.
- W2948798935 hasConceptScore W2948798935C41008148 @default.