In this Sunday’s Sunday Reading Notes (which actually is posted on Monday), I am venturing into the applied machine learning world and discussing two blog posts about the application of embeddings at AirBnB and Etsy.
It is a fascinating read for me because I am deeply attracted by the versatility of the algorithm. Neither blog posts focused on the details on the training of embeddings. Instead they are written to motivate readers to understand the intuition behind the algorithms and how to adapt the loss function to each websites specific needs.
Semantic embeddings are invented in Natural Language Process fields to learn continuous low-dimensional representations of high-dimensional sparse-vectors. Needless to say, working in a lower dimension makes computations faster. These neural network based algorithms are trained with large text data sets and are based on the intuitions that words that appear together a lot are related. AirBnB and Etsy used embedding to model user behavior. Using the analogy at Nishan provided in the Etsy article, each user session is a sentence and the sequence of actions by the user are the words.
The AirBnB article provided more details on the negative sampling, the algorithm used to train embeddings.
This diagram from the AirBnB post illustrate the training process of the embeddings. The ‘booked listing’ (in purple) is what the AirBnB team added to the algorithm. The booked listing serves as a global context token, aiding the prediction of eventual booking in the embedding training process. In addition to training the central listing with context listings and the booked listing, because the travelers often only search with-in the same market, AirBnB engineers also added a randomly selected listing from the same market of the central listing. As a result listings within the same market should be closer in the embedding space. Indeed encoding geographical information in the embedding achieved as evident from the plot below (from the AirBnB post).
What the Etsy engineer’s do differently is that they added some ‘implicit contextual information’ into the lose function:
Training a Skip-gram model on only randomly selected negatives, however, ignores implicit contextual signals that we have found to be indicative of user preference in other contexts. For example, if a user clicks on the second item for a search query, the user most likely saw, but did not like, the first item that showed up in the search results. We extend the Skip-gram loss function by appending these implicit negative signals to the Skip-gram loss directly.
This is quite interesting because they considered the ordering of items (words/clicks) and thus both concurrence and ordering are considered in the loss function.
Both articles give concrete examples of how embeddings improved its performance on search ranking and similar items recommendations. I highly encouraged interested readers to check them out because you do not want to spoil your fun with my post.
What I find interesting in this paper is how engineers from both firms adopted this model from natural language processing and used in to serve its own customers. I have been so narrow minded about using NLP algorithms for NLP problems only. More than that, due to their respective needs they modified the loss function. I really enjoyed thinking through why they needed this adaptations.