Łukasz Kaiser – Senior Research Scientist at Google and mentor at Pi School, School of Artificial Intelligence – is speaking four years after the seminal paper, Attention is all you need.
In this talk, Łukasz introduces a new efficient variant of the Transformer. He takes us through the main methods needed for efficiency and shows how they address the main problems of high memory use and low performance on long sequences which limited the use of some Transformers before. He finishes with new applications that open up.
Łukasz Kaiser invented Transformers. They have been used in a variety of fields and yield great results on many NLP tasks, for instance inside BERT, GPT-3, and many other models. However, they can be inefficient and it can be hard to apply them. So Transformer are a “yes, but” tool: great successes in language and vision, they’re great also to encode (unordered) set-shaped input, but in each self-attention layer, every position attends to every other position, resulting in an O(n²) complexity. Łukasz has been working on very strong ideas to solve this issue.