Statistics and Data Science Seminar: "Transformer: A Dissection from an Amateur Applied Mathematician"

Speaker: Shuhao Cao, Washington University in Saint Louis

Abstract: Transformer is the current state-of-art and de-facto model for NLP modeling that almost cracks the holy grail of machine translation. As the heart and soul of Transformer, the Attention mechanism remains somewhat mysterious to the stats and math community. Being able to map a variable length sequence with variable length features to another variable length sequence, Attention resembles an operator between two Hilbert spaces discretized using various sized meshes. In this talk, we will share some of the remarkable but simple result on the approximation property of the Attention operator.

Host: Likai Chen