Are embeddings all you need?

Share on facebook
Share on twitter
Share on linkedin

It is arguable that the single most important and ubiquitous concept in modern machine-learning (ML) algorithms is the concept of embeddings.

Question: What exactly is an embedding?
Answer: An embedding is an informative (and usually abstract) representation of a piece of data.

Not helpful? Let’s look inside of our own brains to make a clarifying analogy. What are the first thoughts that pop into your brain when you hear the word “bank”?

This big blue “bag of information” is my informative and abstract representation of the word “bank”. It is my embedding for the word “bank”.

As you can see, when I think about the word “bank”, my brain automatically generates a huge number of related ideas and/or thoughts that I associate with the word “bank”. These thoughts are based mostly on my personal experience and whimsy, but they play a fundamental role in defining what the word “bank” means to me personally. If you engage me in a conversation about banks, these are the kinds of thoughts that will probably find their way into the conversation given enough time. My embedding for the word “bank” is a bit like the “raw material” for any thing else you might expect me to do with the word “bank”.

Most modern ML models go through a very similar process when training for a particular task. For example, in natural language processing (NLP) the basic bits of data are words (the ML version of a “word” is a called a “token” but nevermind all that). So any word can be processed through an NLP model to obtain its embedding. Embeddings generated by ML models tend to be a bit more abstract and a bit more restricted than the embeddings generated by a typical human being. Usually they look something like what’s shown in the cartoon below.

Each column of the embedding represents one dimension of informative description (modern pre-trained models like BERT have 768 dimensions of informative descriptions by default). The degree of similarity between two words can be defined quantitively as the “distance” between the vectors corresponding to the two words in the “space” of embeddings.

In reality, the numerical values of the embeddings generated by ML models do not correspond to familiar characteristics as in the example above, but are abstract representations containing the same information. One nice thing that embeddings do out-of-the-box is they define a notion of similarity. For well-trained word embeddings, two words that are synonyms or have similar meanings will also have similar embeddings. For example by looking at the three words above “bank”, “bill”, and “jack”, you can get a sense for which words are similar by looking at their embeddings. This kind of representation is necessary for dealing with the inherent ambiguities of human-language. For example the three sentences below illustrate the ambiguities in distinguishing people from places.

 “jack stole hundreds of bills from the bank
bill took money from the bank to buy a jack to fix his car”
“the bank robber lives near the bank of the river”

Embeddings are more-or-less a necessary by-product of any neural-network based ML model, however most of the recent advances in ML have been the result of a shift towards algorithms that specifically focus on maximizing the quality and complexity of these embeddings. These advances include not only sizable leaps in performance, but also greatly increased flexibility in performing complex multi-step tasks, due to the fact that high quality embeddings are typically useful for a wide variety of down-stream tasks. Indeed there are many research groups currently interested in testing the limits of the kind of “multi-task” learning shown in the graphic below.

In “multi-task leaning”, several ML models with different objectives are trained using a common set of embeddings.

So… are embeddings all you need? Hopefully it was obvious that the title was a bit facetious. But I think it’s safe to say that at least for human beings, if your embeddings are strong, you’ll probably do okay in life.

Data Scientist | mpark@appliedinfo.com | + posts

Michael Park is a researcher and developer in the field of applied machine learning with a background in theoretical particle physics. When he's not writing code, you'll probably find him pondering the origins of the universe while jamming out to some true-school underground Hip Hop.