The video explores the system design of Instagram, focusing on essential features like media uploads, user feeds, and search functionalities. It emphasizes scalability, high availability, and low latency, with a design that supports 500 million daily users and 100 million daily posts. The video outlines a hybrid feed model combining push and pull strategies to efficiently manage user feeds, especially for high-profile accounts. Key components include a RESTful API, a media upload process using object storage, and a user feed service leveraging a graph database. Additional design considerations include comments, notifications, analytics, and data consistency.
Designing Instagram: Key Concepts
- Instagram is a widely used social media platform, often discussed in system design interviews.
- The focus is on designing a system that supports main user flows and is easily extendable.
- Key features include media uploads, user interactions, feed generation, and post searching.
Functional Requirements
- Users can upload media, including images and videos.
- Users have the ability to follow and unfollow other users.
- User feeds should be created in reverse chronological order based on follows.
- Users can search posts by captions or hashtags.
"Users should be able to upload media so that would include both image and video users should also be able to follow and unfollow other users."
- The system must support essential user interactions and content sharing functionalities.
"We should be able to create user feeds and so we're going to assume reverse chronological order based on who the user follows."
- Feeds are generated based on user connections without machine learning algorithms.
Non-Functional Requirements
- High scalability is crucial to accommodate hundreds of millions of users.
- High availability ensures users can access content consistently.
- Eventual consistency is acceptable due to the nature of the data.
- High durability means data must never be lost.
- Low latency is needed for quick feed generation.
"We want high scalability because of course we want the system to be able to handle hundreds of millions of users."
- Scalability is essential to manage the vast user base effectively.
"We also want high durability meaning that data can never be lost and we also want low latency so feeds user feeds should be generated very quickly."
- Durability and low latency are critical for maintaining user satisfaction and data integrity.
Storage Estimation
- Assume 500 million daily active users, each posting once every 5 days.
- Results in 100 million posts per day, with an average post size of 1 Megabyte.
- Daily storage needs are roughly 97,500 gigabytes.
- Yearly storage requires approximately 34 petabytes.
"Let's assume we've got 500 million daily active users and then every user posts once every 5 days."
- Estimations are based on user activity and average post size to determine storage needs.
"If we've got 100 million posts and then the average post size is let's say 1 Megabyte well then we've got 100 million megabytes of data."
- Storage calculations are essential for designing a system capable of handling large volumes of data.
Queries Per Second
- With 100 million posts per day and 86,400 seconds in a day, there are roughly 1,150 writes per second.
- Assuming a read-to-write ratio of 100 to 1, there are approximately 115,000 reads per second.
- Total queries per second are about 116,000.
"The number of Writes per second is 100 million divided by 86,400 which roughly equals 1,150 writes per second."
- Calculating writes per second helps in understanding the system's load and performance requirements.
"We can calculate reads per second to be 100 multiplied by the writes per second so which is 1,150 giving us roughly 115,000 reads per second."
- Understanding read and write operations is crucial for optimizing system performance.
Data Model
- A basic outline of core tables is necessary for the Instagram data model.
- The design must support the essential functionalities discussed.
"This is a very basic outline of some of the core tables that could be included in an Instagram data model."
- The data model is foundational to implementing the system's functionality and supporting its scalability and performance goals.
Database Structure and Core Tables
- The database consists of several core tables essential for managing user interactions and media content.
- The User Table contains information related to users.
- The Followers Table includes two foreign keys, the follower ID and the followed by ID, to track user connections.
- The Media Table includes media ID (unique identifier), user ID (indicating uploader), media type, and file URL.
- The Post Table includes post ID (unique identifier), user ID (creator), caption, and creation timestamp.
- The Post Media Table links posts to their associated media, allowing for multiple media items per post.
"The user table will obviously contain information related to the user."
- The User Table is fundamental for storing user-specific information.
"The followers table will contain two foreign keys, the follower ID and the follow by ID which enable the system to know which users are following each other."
- The Followers Table is crucial for managing user relationships and interactions.
"The media table will contain the media ID which will uniquely identify each media ID, the user ID which links each media item to a user indicating who uploaded it."
- The Media Table organizes media content and associates it with the uploader.
"The post table will contain the post ID uniquely identifies each post, the user ID links each post to a user indicating who created it."
- The Post Table manages post-specific information and links it to the creator.
"The post media table will contain the post ID, the media ID which will link posts and their Associated media."
- The Post Media Table facilitates the association of multiple media items with a single post.
API Design for Instagram
- A classic RESTful API is suggested for interacting with the data due to its simplicity, widespread use, and ability to support caching.
- The REST API consists of three main endpoints: Post Upload, Get Post, and Get Feed.
- The Post Upload Endpoint accepts a binary file and metadata, returning a 201 Created response code.
- The Get Post Endpoint retrieves individual posts using a post ID, expecting a 200 Success response code.
- The Get Feed Endpoint retrieves user-specific feeds with pagination, also expecting a 200 Success response code.
"In terms of the API design for Instagram, we could use a classic restful API to interact with the data."
- RESTful API is chosen for its efficiency and ease of use in handling data interactions.
"Our rest API will comprise of three main endpoints: the post API upload endpoint, the get endpoint atapi poost slid, and the get feed endpoint."
- The API includes endpoints for uploading posts, retrieving individual posts, and fetching user feeds.
"The parameters it could take could take a file which will be a binary file of the photo or video and it will also take some metadata information."
- The Post Upload Endpoint handles media uploads and associated metadata.
System Flow and Architecture
- The media upload flow involves a client sending a POST request to an API Gateway.
- The API Gateway manages routing, rate limiting, authentication, and authorization.
- The request body contains binary file data, with headers specifying content type and boundary for data delimitation.
- The API Gateway forwards requests to a load balancer, which distributes them across instances of the post service.
- Load balancing ensures even distribution of requests, handling high traffic, and maintaining high availability.
- The Post Service is horizontally scaled to manage increasing demand and traffic efficiently.
"When a user uploads an image or a video, a client will send a post request to an API Gateway."
- The media upload process begins with a client request to the API Gateway.
"The API Gateway will handle stuff like routing, rate limiting, authentication, and authorization."
- The API Gateway is responsible for essential request management tasks.
"The API Gateway could then forward that request to a load balancer, and the load balancer could then route that request to an instance of the post service."
- Load balancing is utilized to distribute requests evenly and ensure system reliability.
Image Upload and Storage Strategy
- The system utilizes object storage, such as Amazon S3, for storing images, ensuring efficient handling of large traffic volumes and preventing single points of failure.
- For files exceeding a certain size (e.g., 5 megabytes), a multi-part upload strategy is employed, dividing files into smaller chunks for individual uploads. This enhances upload efficiency and allows resumption if interrupted.
- Integration with a CDN like Amazon CloudFront improves image delivery speed by caching images closer to users' geographic locations.
"The post service could then upload the image to an object storage so something like Amazon S3."
- Object storage is used to manage images, ensuring scalability and reliability in handling large volumes.
"If a file exceeds a certain size threshold, for example, 5 megabytes, the system could employ a multi-part upload strategy."
- Multi-part uploads are used to manage large files by dividing them into chunks, improving efficiency and reliability.
"Integration with a CDN like Amazon CloudFront improves the delivery speed of images to users by caching the images closer to the user's geographic location."
- CDNs are used to cache images near users, enhancing delivery speed and user experience.
- The post service uploads metadata information to the main PostgreSQL storage, ensuring data consistency through tight integration with the image upload workflow.
- An atomic transaction guarantees that both image upload and metadata storage either succeed or fail together, preventing data inconsistencies.
"The post service can then upload the metadata information about that post to the main PostgreSQL storage."
- Metadata is stored in PostgreSQL to maintain a structured and consistent data environment.
"You could employ an atomic transaction here where we can guarantee that both the image upload and the metadata storage either complete successfully or fail altogether."
- Atomic transactions ensure that operations are completed as a unit, maintaining data integrity.
Asynchronous Message Processing with Kafka
- The post service sends a denormalized message to Kafka for asynchronous processing, enhancing system responsiveness and scalability.
- Kafka is configured to handle high message volumes efficiently, using partitioned topics and consumer groups to optimize throughput.
- Denormalized messages reduce the need for additional queries, simplifying downstream processing.
"By using Kafka for asynchronous message processing, we can update various services, allowing decoupling of services and enhancing the overall system responsiveness and scalability."
- Kafka enables asynchronous processing, decoupling services for better performance and scalability.
"The message is denormalized to provide all necessary data in a single message, reducing the need for downstream services to make additional queries or joins."
- Denormalization simplifies data processing, making it more efficient by minimizing additional queries.
Updating Social Network and Search Indexes
- Kafka consumers like Neo4j update the graph database with new post relationships, enhancing social network analysis and relationship-based queries.
- Search indexes are updated with new post content, such as captions and hashtags, using Elasticsearch for efficient searching and retrieval.
"Kafka consumers could include Neo4j, which could update the graph database with the new post relationship."
- Neo4j updates enhance the capability for social network analysis and relationship queries.
"Indexes could be implemented using Elasticsearch, which is a distributed RESTful search and analytics engine."
- Elasticsearch is used for updating search indexes, facilitating fast and efficient data retrieval.
User Feed Publishing Strategy
- A hybrid model combining push and pull strategies is used for pre-generating user feeds, ensuring quick retrieval.
- New content is immediately pushed to followers' feeds in the cache, speeding up read operations.
- The "Hooty Problem" arises with users having millions of followers, making pre-generated feeds resource-intensive.
"When new content, for example, a post is published, it is immediately pushed to the feeds of all the users' followers in the cache, ensuring that each user's feed is pre-generated for quick retrieval."
- The hybrid model ensures feeds are pre-generated, improving retrieval speed for users.
"However, it poses a challenge known as the Hooty Problem whereby pre-generated feeds for users with millions of followers can become exceedingly resource-intensive."
- The Hooty Problem highlights the resource demands of maintaining pre-generated feeds for users with large followings.
Hybrid Approach to Feed Generation
- The hybrid approach combines push and pull models to optimize feed generation.
- Push model updates the feed caches of followers when most users post.
- Pull model is used for users with large followings (celebrities), requiring followers to request the latest posts.
"In the hybrid approach, we use a combination of the push and pull models. We use the push model for the majority of users, so when most people post, the feed caches of their followers are updated."
- The push model is efficient for non-celebrities, updating followers' feeds immediately.
"For non-celebrities, when they post, the feeds of their followers are updated."
- The pull model prevents system overload by having users fetch celebrity posts on request.
"We use the pull model for celebrities, so people with large followings. To avoid overloading the system, we force each user to get the latest post from celebrities they follow on read."
User Feed Service
- User Feed Service fetches followers from Neo4j and updates their feeds with new posts.
- For non-celebrity posts, the service updates both the user feed cache and database.
"The user feed service can then update the feeds, so it can update the user feed cache and the user feed database for each follower of the user that posted the post with this new post."
- For user feed requests, the service retrieves pre-generated feeds and combines them with the latest celebrity posts.
"The user feed service will then retrieve the pre-generated feed. It'll first check the user feed cache for pre-generated posts. If it's not found in the cache, it'll then fetch it from the user feed database."
- Celebrities are identified by follower count, and their posts are fetched from the main database.
"Celebrities will be identified based on their follower count, and finally, the user feed service will then retrieve the latest posts of those celebrities."
Additional System Components
- Comment Service: Handles creating, editing, and deleting comments, supporting multi-level nesting with hierarchical data structures.
"You want to again ensure these efficient and reliable operations. Maybe you'd also want to support Nest comments through hierarchical data structures allowing for multi-level nesting."
- Notification Service: Manages notifications for events like new followers, likes, and comments, using Kafka for event listening.
"A notification service would obviously manage and deliver notifications for events like new followers, likes, and comments. It would listen for events via Kafka."
- Analytic Service: Provides comprehensive monitoring and logging for system components to detect issues and understand traffic patterns.
"You could integrate comprehensive monitoring and logging throughout the system, especially at the API Gateway, load balancer, and CFA."
- Data Consistency: Ensures consistency between post servers, Kafka, and databases using techniques like eventual consistency and transactions.
"You'd want to ensure data consistency using techniques like eventual consistency, idempotency and requests, and using transactions where necessary."