Matches in SemOpenAlex for { <https://semopenalex.org/work/W2911109671> ?p ?o ?g. }
- W2911109671 abstract "Transformers have a potential of learning longer-term dependency, but are limited by a fixed-length context in the setting of language modeling. We propose a novel neural architecture Transformer-XL that enables learning dependency beyond a fixed length without disrupting temporal coherence. It consists of a segment-level recurrence mechanism and a novel positional encoding scheme. Our method not only enables capturing longer-term dependency, but also resolves the context fragmentation problem. As a result, Transformer-XL learns dependency that is 80% longer than RNNs and 450% longer than vanilla Transformers, achieves better performance on both short and long sequences, and is up to 1,800+ times faster than vanilla Transformers during evaluation. Notably, we improve the state-of-the-art results of bpc/perplexity to 0.99 on enwiki8, 1.08 on text8, 18.3 on WikiText-103, 21.8 on One Billion Word, and 54.5 on Penn Treebank (without finetuning). When trained only on WikiText-103, Transformer-XL manages to generate reasonably coherent, novel text articles with thousands of tokens. Our code, pretrained models, and hyperparameters are available in both Tensorflow and PyTorch." @default.
- W2911109671 created "2019-01-25" @default.
- W2911109671 creator A5037034281 @default.
- W2911109671 creator A5041434096 @default.
- W2911109671 creator A5062362044 @default.
- W2911109671 creator A5071983998 @default.
- W2911109671 creator A5088551093 @default.
- W2911109671 creator A5091869105 @default.
- W2911109671 date "2019-01-09" @default.
- W2911109671 modified "2023-10-01" @default.
- W2911109671 title "Transformer-XL: Attentive Language Models Beyond a Fixed-Length Context" @default.
- W2911109671 cites W1591801644 @default.
- W2911109671 cites W179875071 @default.
- W2911109671 cites W1800356822 @default.
- W2911109671 cites W1810943226 @default.
- W2911109671 cites W1999965501 @default.
- W2911109671 cites W2064675550 @default.
- W2911109671 cites W2118776487 @default.
- W2911109671 cites W2132339004 @default.
- W2911109671 cites W2145543707 @default.
- W2911109671 cites W2170973209 @default.
- W2911109671 cites W2197913429 @default.
- W2911109671 cites W2207587218 @default.
- W2911109671 cites W2259472270 @default.
- W2911109671 cites W2326533993 @default.
- W2911109671 cites W2402302915 @default.
- W2911109671 cites W2473934411 @default.
- W2911109671 cites W2510842514 @default.
- W2911109671 cites W2514713644 @default.
- W2911109671 cites W2519314406 @default.
- W2911109671 cites W2525246036 @default.
- W2911109671 cites W2525332836 @default.
- W2911109671 cites W2540404261 @default.
- W2911109671 cites W2549416390 @default.
- W2911109671 cites W2553303224 @default.
- W2911109671 cites W2567070169 @default.
- W2911109671 cites W2605203995 @default.
- W2911109671 cites W2743945814 @default.
- W2911109671 cites W2767321762 @default.
- W2911109671 cites W2778817245 @default.
- W2911109671 cites W2785366763 @default.
- W2911109671 cites W2787560479 @default.
- W2911109671 cites W2789541106 @default.
- W2911109671 cites W2792376130 @default.
- W2911109671 cites W2792764867 @default.
- W2911109671 cites W2793273050 @default.
- W2911109671 cites W2795285343 @default.
- W2911109671 cites W2798702047 @default.
- W2911109671 cites W2804845563 @default.
- W2911109671 cites W2886490473 @default.
- W2911109671 cites W2891815651 @default.
- W2911109671 cites W2894175714 @default.
- W2911109671 cites W2900096133 @default.
- W2911109671 cites W2950527759 @default.
- W2911109671 cites W2951104886 @default.
- W2911109671 cites W2951210602 @default.
- W2911109671 cites W2951672049 @default.
- W2911109671 cites W2951714314 @default.
- W2911109671 cites W2952276042 @default.
- W2911109671 cites W2952339051 @default.
- W2911109671 cites W2952723479 @default.
- W2911109671 cites W2963266340 @default.
- W2911109671 cites W2963341956 @default.
- W2911109671 cites W2963403868 @default.
- W2911109671 cites W2963573053 @default.
- W2911109671 cites W2964059481 @default.
- W2911109671 cites W2964269252 @default.
- W2911109671 cites W2964308564 @default.
- W2911109671 cites W2964347220 @default.
- W2911109671 cites W36903255 @default.
- W2911109671 cites W1525783482 @default.
- W2911109671 doi "https://doi.org/10.48550/arxiv.1901.02860" @default.
- W2911109671 hasPublicationYear "2019" @default.
- W2911109671 type Work @default.
- W2911109671 sameAs 2911109671 @default.
- W2911109671 citedByCount "299" @default.
- W2911109671 countsByYear W29111096712018 @default.
- W2911109671 countsByYear W29111096712019 @default.
- W2911109671 countsByYear W29111096712020 @default.
- W2911109671 countsByYear W29111096712021 @default.
- W2911109671 countsByYear W29111096712022 @default.
- W2911109671 countsByYear W29111096712023 @default.
- W2911109671 crossrefType "posted-content" @default.
- W2911109671 hasAuthorship W2911109671A5037034281 @default.
- W2911109671 hasAuthorship W2911109671A5041434096 @default.
- W2911109671 hasAuthorship W2911109671A5062362044 @default.
- W2911109671 hasAuthorship W2911109671A5071983998 @default.
- W2911109671 hasAuthorship W2911109671A5088551093 @default.
- W2911109671 hasAuthorship W2911109671A5091869105 @default.
- W2911109671 hasBestOaLocation W29111096711 @default.
- W2911109671 hasConcept C100279451 @default.
- W2911109671 hasConcept C119599485 @default.
- W2911109671 hasConcept C127413603 @default.
- W2911109671 hasConcept C137293760 @default.
- W2911109671 hasConcept C154945302 @default.
- W2911109671 hasConcept C165801399 @default.
- W2911109671 hasConcept C19768560 @default.
- W2911109671 hasConcept C204321447 @default.
- W2911109671 hasConcept C206134035 @default.
- W2911109671 hasConcept C41008148 @default.