Publication:
Transformers are Universal Predictors
Sourya Basu, Moulik Choraria, L. Varshney • @arXiv • 15 July 2023
TLDR: Theoretical analysis of the Transformer architecture for language modeling finds limits and shows it has a universal prediction property in an information-theoretic sense and is validated with experiments on both synthetic and real datasets.
Citations: 2
Abstract: We find limits to the Transformer architecture for language modeling and show it has a universal prediction property in an information-theoretic sense. We further analyze performance in non-asymptotic data regimes to understand the role of various components of the Transformer architecture, especially in the context of data-efficient training. We validate our theoretical analysis with experiments on both synthetic and real datasets.
Related Fields of Study
loading
Citations
Sort by
Previous
Next
Showing results 1 to 0 of 0
Previous
Next
References
Sort by
Previous
Next
Showing results 1 to 0 of 0
Previous
Next