Decoding Monolith: The Backbone of TikTok’s Addictive Algorithm

One minute, the U.S. said, "It's over, we're banning you!" and the next, "Just kidding, come back!"

Talk about mixed signals.

It's like when you swear off junk food, but then someone waves a slice of pizza under your nose; resistance is futile.

Meanwhile, in India, TikTok was most definitely banned. My only exposure to TikTok was when my cousin from California visited and roped my grandparents into making videos. It was like watching a cat try to swim; adorable but slightly concerning.

Back in the U.S., TikTok creators were in a frenzy. With the looming threat of a ban, they scrambled to back up their content, ranting on other platforms about their digital dilemma. It was like watching someone try to save their favorite TV shows before the streaming service removes them (looking at you, Atypical) panic, urgency, and a lot of late-night downloading. For many, TikTok wasn't just a fun app; it was their main source of income and livelihood. The idea of losing it overnight was as terrifying as forgetting your phone at home. (Been there, done that. More than a few times).

This got me thinking: what makes TikTok so irresistible? Sure, the dance challenges and lip-syncs are entertaining, but there's more to it. The secret sauce? the Monolith Algorithm, AND the Architecture that is built to engineer the algorithm into practice. It's like the app knows you better than you know yourself, serving content that's eerily on point.

As always, we begin by understanding, why this algorithm was needed, and why it exists in the first place. After that, I shall be covering the following:

How does it work
Extrapolated Concepts
Key Features
Architecture Overview
Conclusion

Introduction

Let’s begin by understanding recommender systems. We’ve all used them in some form, shopping on Amazon, streaming on Netflix, etc. They are like personal assistants who guide us through an ocean of content. There are three basic types behind the mechanics of these systems:

Collaborative Filtering: Imagine you and a friend creating a Spotify Blend playlist. If you both vibe to Charli XCX(Speed Drive ftw), the system will suggest more of Charli's tracks, based on the fact that you both share a love for her music.
Content-Based Filtering: This is like having Spotify recommend more songs similar to Kendrick Lamar if you've been listening to his albums nonstop at the gym while slamming the battle ropes. It’s all about finding music that matches your personal taste based on what you’ve already enjoyed.
Hybrid Systems: Now, imagine a Spotify Blend playlist that combines the best of both worlds—tracks that Charli XCX and Kendrick Lamar fans love, plus some fresh beats that match your unique style. It’s the perfect mix of shared favorites and personalized recommendations. Also, congrats to the Grammy ‘25 winners!

Now, one major challenge in these recommendation systems is the non-stationarity of users’ tastes and behaviors. What you liked last week might not be what you're into today, and as your preferences shift, the system's data distribution becomes outdated, leading to less accurate recommendations. Take platforms like Netflix, where millions of movies and shows are constantly being added. Encoding every single piece of content into a format the system can understand takes a huge amount of memory, which could be a logistical nightmare. It’s like a Swiftie trying to recall every hidden Easter egg, trying to account for every little detail in hundreds of songs (or items) would require a massive memory upgrade.

The Need for Monolith in Handling Complex Systems

These are the challenges that TikTok’s Monolith algorithm has been designed to address. It is designed to handle the sheer scale and dynamic shifts in user preferences efficiently, keeping recommendations fresh, fast, and relevant.

Despite the widespread use of deep learning frameworks like TensorFlow and PyTorch, these general-purpose tools often miss the mark when it comes to handling the unique demands of recommendation systems. These frameworks are built for batch processing, where training and serving are done in separate stages, making it difficult for models to incorporate real-time user feedback. This becomes a problem when dealing with dynamic, sparse features—like the rapidly changing tastes of users; a hallmark for platforms like TikTok. When you’re trying to tweak systems based on static parameters and dense computations, you’ll end up with recommendations that quickly fall out of sync with what the user wants.

TikTok’s Monolith algorithm sidesteps these issues with some clever optimizations. It uses techniques like expirable embeddings and frequency filtering to reduce memory usage, which is key when you're dealing with millions of users and items. Instead of waiting for periodic updates, the Monolith system allows for real-time learning, adapting as users interact with content. It’s like the system is constantly taking notes on what you enjoy, then adjusting itself accordingly, without ever missing a beat. This approach might mean trading off some system reliability, but in exchange, TikTok gets a more flexible, real-time recommendation engine that keeps up with shifting user preferences and ensures the content stays relevant and fresh.

Extrapolated Concepts

In the world of recommendation systems, especially at the scale of TikTok, trying to apply deep learning models faces some unique hurdles. These problems arise because the data, which comes from real-world user behavior, doesn’t behave like the data you’d find in tasks like language modeling or computer vision. There are two major differences:

Sparse and Categorical Features: Unlike the dense, continuous data used in language models, user data in recommendation systems is often sparse and categorical. For example, a user’s interactions with content (e.g., likes, shares) can vary greatly, and they may interact with thousands of different items in ways that aren't always easily quantified or predicted.
Non-Stationary Data (Concept Drift): Over time, user preferences change—this is known as concept drift. It's like when you’re obsessing over one Taylor Swift album, only to find yourself gravitating toward another as her music evolves. The underlying distribution of data shifts as user interests change, making it hard for a model to predict future behavior based on past patterns. This constant shift in preferences requires the model to continuously adapt, and if it doesn’t, it starts serving stale, irrelevant recommendations.

In practice, mapping these dynamic, sparse features into a high-dimensional embedding space, as you would in deep learning models, brings a host of challenges. For one, the sheer number of users and items in a platform like TikTok or Netflix is far greater than the number of word pieces in a language model, making the embedding table much larger. This massive table won’t even fit into a single host’s memory. As new users and items are constantly being added, the size of the embedding table grows and grows, quickly becoming unwieldy.

In many recommendation systems, a common trick to reduce memory usage is low-collision hashing, which essentially maps IDs to smaller values. It’s a bit like squeezing a giant pizza into a tiny box—you can make it fit, but it's going to get a bit messy. This method assumes that the IDs in the embedding table are distributed evenly and that collisions (when two items map to the same hash value) won’t hurt the model's quality. The reality? It’s rarely that neat. In the wild world of recommendation systems, a small group of users or items usually generates a disproportionate amount of data, causing hash collisions. And as the embedding table grows—thanks to the influx of new users and items—the chances of collision increase. The result? A drop in model performance. It’s like your playlist app continually recommending that same popular song over and over, despite your evolving taste.

That’s where Monolith steps in, designed specifically to tackle these issues. Monolith rethinks the traditional hashing approach by creating a collisionless hash table, which drastically reduces memory issues. I’ve explained what this is in the next section, so fret not! It also incorporates a dynamic feature eviction mechanism—think of it as a clutter-free closet(my mum’s dream for me), where the system makes space for more relevant items as user preferences evolve. Thanks to this architecture, Monolith outperforms systems that rely on collision-heavy hashing, all while keeping memory usage in check and delivering a state-of-the-art online serving AUC (Area Under the Curve).

Key Features:

Real-Time Adaptation: Monolith's online training system enables it to quickly adjust to changes in user behavior, ensuring that recommendations remain relevant.
Scalability: The system is built to handle TikTok's extensive user base and content library, processing vast amounts of data efficiently.
Personalization: By analyzing user interactions, Monolith delivers content tailored to individual preferences, enhancing user engagement.

Architecture Overview

In short, Monolith doesn’t just handle big data—it thrives on it, offering the agility and scalability needed to keep up with the ever-changing landscape of user behavior. It’s a great solution for today’s production-scale recommendation systems.

Monolith’s architecture is like a well-coordinated band, where every instrument plays its part to create a seamless performance. It follows the Worker-ParameterServer (WorkerPS) model, a common setup seen in frameworks like TensorFlow. Think of the Worker machines as the band members; the ones playing the instruments and making things happen. These Workers carry out computations, much like the musicians executing their parts in a song. The ParameterServers (PS), on the other hand, are like the backstage crew, storing the critical "parameters" (like the sheet music or audio samples) and ensuring they get updated as needed.

In the case of recommendation models, parameters come in two flavors: dense and sparse. Dense parameters are akin to the weights or variables in a deep neural network, and sparse parameters represent the embedding tables. Think of these as a playlist of user preferences or items.

Here’s where Monolith stands out: While TensorFlow uses a traditional way of handling dense parameters via a Variable that gets stored in the ParameterServer, Monolith tweaks things for the sparse parameters (embedding tables). It introduces a collisionless HashTable. Imagine if every song in your playlist could be tagged with a unique, non-overlapping code, ensuring no song ever gets mixed up. That’s what Monolith does. It removes the risk of overlapping entries (or collisions) in the hash table, ensuring smooth operation no matter how large the data set gets.

Now, while TensorFlow separates the training phase from inference (or serving), Monolith’s clever solution allows the model to learn in real time. It’s like learning a new song as you perform it on stage, adjusting based on the audience’s response. This real-time learning is possible because Monolith uses elastic online training, where parameters are synchronized seamlessly from the training-PS to the online serving-PS.

Fault tolerance is key to making sure systems stay reliable, and Monolith nails this with its approach. Think of it like having a backup plan in case something goes wrong; so the show doesn’t have to stop. Monolith does this by taking daily snapshots of its parameter servers (PS). This way, if something breaks, the system can recover from the last snapshot. The catch is that taking snapshots too often can use up a lot of memory and disk space, but Monolith has found that a daily snapshot keeps things running smoothly without too much performance loss. It’s a smart balance of staying reliable without overloading the system.

Monolith’s training is split into two stages: Batch Training and Online Training. In Batch Training, think of it like when Taylor is reviewing her setlist in preparation for a tour. She’ll practice songs, tweak the arrangements, and make sure everything’s in sync with the overall show. The batch system goes through historical data, updates the model once, and stores those changes. It’s great for making major adjustments, like tweaking your setlist based on past performances.

But when Monolith goes into Online Training, it’s like Taylor performing live (hopefully in India someday). Instead of rehearsing in a quiet studio, she’s on stage, constantly interacting with the crowd. If the audience sings along or cheers louder during a particular song, she feels that energy and adapts in real-time, switching up the setlist or giving a little extra flair with the surprise song’s mashup. That’s how Monolith works; constantly learning and adjusting from real-time user feedback. This is different from the typical batch training, where the system just goes through the motions with historical data. Batch training is great for trying out new model architectures, but once deployed, Monolith switches gears.

This live feedback loop keeps the performance fresh, making recommendations more accurate with each interaction.

The secret to this seamless experience is Monolith’s Streaming Engine. It’s like the backup band that’s always in sync with the crowd. It uses Kafka to log user actions (think of it like tracking fan reactions), and Flink helps tie those reactions to features. The engine makes sure that whether Monolith is pulling data for batch training or learning online, the flow is smooth.

As for syncing the performance? Just like how Taylor’s band keeps pace with her, Monolith’s Parameter Servers (PS) are constantly syncing the updated parameters to make sure every interaction is part of the learning process. This ensures the system is always on point, delivering recommendations that reflect what users want at the moment.

The lag of user action could also be a problem. Let’s tackle the lag in user action with another Taylor Swift metaphor—this time, think of it as her waiting for a response to a surprise song request. Imagine she performs a deep cut in her setlist, and the fans are slow to react. Some fans may need time to process, think about their favorite lyric, or chat with a friend before they decide to post a video of the performance. This delayed reaction is similar to how a user might take their time to decide on purchasing an item they saw days ago.

In Monolith, dealing with this delay is like having a backup plan for those fans who need extra time to cheer. The system can’t simply hold every user interaction in memory because there’s just not enough space for all that excitement. Instead, Monolith uses on-disk key-value storage as a clever workaround. It's like having a vault where all the delayed reactions (user actions) are stored until the moment is right. When a user logs in to interact, Monolith first checks its in-memory cache—like Taylor checking her live audience for any immediate reactions. If the action isn’t found there, it looks into the key-value storage, pulling out the data that might have been delayed, but is still important to the show.

Handling Collisions: The Cuckoo Way

Monolith deals with a vast and dynamic set of sparse features, such as user IDs and item IDs, which can change over time. Traditional embedding tables, which map these IDs to dense vectors, often struggle with collisions as new IDs are introduced, leading to degraded model performance. By employing a cuckoo hash table, Monolith ensures that each feature has a unique representation, effectively managing the challenges posed by the dynamic nature of real-world user behavior. Cuckoo Hashing is named after the cuckoo bird, which is known for laying its eggs in another bird’s nest and forcing the original eggs out.

Hashtables always remind me of those LeetCode days when I was just trying to figure out how to not mess up the indexes. Now, it's actually exciting to see them powering something like Monolith! Let’s understand Cuckoo Hashtables {swiftie edish}

Imagine you're curating a playlist, and you want to ensure that each song is placed in its perfect spot without any conflicts. In computer science, organizing data efficiently is akin to this task, and one effective method is called cuckoo hashing. When storing data (like songs) in a hash table, each piece of data is assigned a specific position based on a hash function. However, sometimes two pieces of data might be assigned the same position, leading to a collision. Traditional methods handle these collisions in various ways, but cuckoo hashing offers a unique solution. Cuckoo hashing uses two hash functions and two separate tables. Each song can be placed in one of two possible positions—one in each table. If a position is already occupied, the existing song is "kicked out" and moved to its alternative position, potentially displacing another song, which then undergoes the same process. This "kicking out" continues until all songs are placed without conflict or until a loop is detected, in which case the tables are resized, and the process restarts.

Consider a real example when I was making a playlist for a road trip as data entries:

Suppose our hash functions assign "Cruel Summer" and "Our Song" to the same position. Using cuckoo hashing, we place "Cruel Summer" in its designated spot. Then we try to place "Our Song" and find the spot occupied, we "kick out" "Cruel Summer" to its alternative position and place "Our Song" in the now-vacant spot. If "Cruel Summer" encounters another occupied spot in its new position, the process repeats until all albums are placed without conflict.

If visualizing it via code is something that interests you, then check out this notebook I’ve made explaining the same example: colab.research.google.com/drive/1Bu3J_9Tg1Q..

Benefits of Cuckoo Hashing

Constant Time Lookup: With two possible locations per entry, searches are efficient, often requiring only two checks.
Simplified Collision Resolution: Instead of chaining or probing, cuckoo hashing systematically relocates entries to resolve conflicts.

Conclusion

Monolith tackles key challenges in recommendation systems by ensuring collisionless embeddings, optimizing memory efficiency, and enabling real-time updates. With Cuckoo HashMap embeddings and rapid parameter synchronization, it stays aligned with user behavior while balancing fault tolerance and speed. The result? A scalable, high-performance system built to handle the ever-changing landscape of user preferences.

Make sure to read the base paper to learn about the experiments they conducted that answer questions like :

(1) How much can we benefit from a collisionless HashTable? (2) How important is real-time online training? (3) Is Monolith’s design of parameter synchronization robust enough in a large-scale production scenario?

Last but not the least.

Why the Term "Monolith"?

The name "Monolith" might suggest a single, unchanging structure. However, in TikTok's context, it represents a unified system capable of handling vast amounts of data and adapting in real time. It's a testament to the system's robustness and scalability.

Resources to check out:

https://github.com/bytedance/monolith

https://arxiv.org/pdf/2209.07663

Decoding Monolith: The Backbone of TikTok’s Addictive Algorithm

By: Harini Anand