Analysis of mean-field models arising from self-attention dynamics in transformer architectures with layer normalization (Q6761164)
From MaRDI portal
| This is the item page for this Wikibase entity, intended for internal use and editing purposes. Please use this page instead for the normal view: [[]] |
scientific article from arXiv
| Language | Label | Description | Also known as |
|---|---|---|---|
| English | Analysis of mean-field models arising from self-attention dynamics in transformer architectures with layer normalization |
scientific article from arXiv |
Statements
math.AP
0 references
2025
0 references