NLP-KG
Semantic Search

Publication:

Transformers are Universal Predictors

Sourya BasuMoulik ChorariaL. Varshney • @arXiv • 15 July 2023

TLDR: Theoretical analysis of the Transformer architecture for language modeling finds limits and shows it has a universal prediction property in an information-theoretic sense and is validated with experiments on both synthetic and real datasets.

Citations: 2
Abstract: We find limits to the Transformer architecture for language modeling and show it has a universal prediction property in an information-theoretic sense. We further analyze performance in non-asymptotic data regimes to understand the role of various components of the Transformer architecture, especially in the context of data-efficient training. We validate our theoretical analysis with experiments on both synthetic and real datasets.

Related Fields of Study

loading

Citations

Sort by
Previous
Next

Showing results 1 to 0 of 0

Previous
Next

References

Sort by
Previous
Next

Showing results 1 to 0 of 0

Previous
Next