Transformers—data models based on neural networks—will radically change how machines interact with us.

Artificial intelligence has promised much, but there has been something holding it back from being used successfully by billions of people: a frustrating struggle for humans and machines to understand one another in natural language.

This is now changing, thanks to the arrival of large language models powered by transformer architectures, one of the most important AI breakthroughs in the past 20 years.

Transformers are neural networks designed to model sequential data and generate a prediction of what should come next in a series. Core to their success is the idea of “attention,” which allows the transformer to “attend” to the most salient features of an input rather than trying to process everything.

These new models have delivered significant improvements to applications using natural language like language translation, summarization, information retrieval, and, most important, text generation. In the past, each required bespoke architectures. Now transformers are delivering state-of-the-art results across the board.

Although Google pioneered transformer architecture, OpenAI became the first to demonstrate its power at scale, in 2020, with the launch of GPT-3 (Generative Pre-Trained Transformer 3). At the time, it was the largest language model ever created.

GPT-3’s ability to produce humanlike text generated a wave of excitement. It was only the start. Large language models are now improving at a truly impressive rate.

“Parameter count” is generally accepted as a rough proxy for a model’s capabilities. So far, we’ve seen models perform better on a wide range of tasks as the parameter count scales up. Models have been growing by almost an order of magnitude every year for the past five years, so it’s no surprise that the results have been impressive. However, these very large models are expensive to serve in production.