Generalizing Adam to Manifolds for Efficiently Training Transformers
From MaRDI portal
Publication:6438114
arXiv2305.16901MaRDI QIDQ6438114
Publication date: 26 May 2023
Artificial neural networks and deep learning (68T07) Nonconvex programming, global optimization (90C26) Differential geometry of homogeneous manifolds (53C30) Parallel algorithms in computer science (68W10) Applications of differential geometry to data and computer science (53Z50)
This page was built for publication: Generalizing Adam to Manifolds for Efficiently Training Transformers