First, they struggled to handle large sequences of text, like long paragraphs or essays. So any model that’s going to understand language must capture word order, and recurrent neural networks did this by processing one word at a time, in a sequence.īut RNNs had issues. Means something very different from the sentence: The key word here is “sequential.” In language, the order of words matters and you can’t just shuffle them around. An RNN would take as input an English sentence, process the words one at a time, and then, sequentially, spit out their French counterparts. Let’s say you wanted to translate a sentence from English to French. That was unfortunate, because language is the main way we humans communicate.īefore Transformers were introduced in 2017, the way we used deep learning to understand text was with a type of model called a Recurrent Neural Network or RNN that looked something like this: Credit: Wikimedia A typical Recurrent Neural Network (RNN) But for a long time, nothing comparably good existed for language tasks (translation, text summarization, text generation, named entity recognition, etc). Credit: Renanar2 / Wikicommons A typical Convolutional Neural NetworkĪnd since around 2012, we’ve been quite successful at solving vision problems with CNNs, like identifying objects in photos, recognizing faces, and reading handwritten digits. For example, for analyzing images, we’ll typically use convolutional neural networks or “CNNs.” Vaguely, they mimic the way the human brain processes visual information. But there are different types of neural networks optimized for different types of data.
To recap, neural nets are a very effective type of model for analyzing complex data types like images, videos, audio, and text. So in this post, we’ll talk about what they are, how they work, and why they’ve been so impactful.Ī Transformer is a type of neural network architecture. If you want to stay hip in machine learning and especially NLP, you have to know at least a bit about Transformers. You might say they’re more than meets the… ugh, forget it. In fact, lots of the amazing research I write about on is built on Transformers, like AlphaFold 2, the model that predicts the structures of proteins from their genetic sequences, as well as powerful natural language processing (NLP) models like GPT-3, BERT, T5, Switch, Meena, and others. Transformers are models that can be designed to translate text, write poems and op eds, and even generate computer code.
You know that expression When you have a hammer, everything looks like a nail? Well, in machine learning, it seems like we really have discovered a magical hammer for which everything is, in fact, a nail, and they’re called Transformers.