Speaker 1: In my previous video, we designed YouTube, a video sharing platform, and it consisted of several components. In this video, we'll design Instagram, which is one of the most popular photo sharing sites. It also includes several components, including an image upload service, a feed service, a notification service, and a search service. While in the YouTube system design, I focused on video streaming and search, in Instagram, we prioritize image sharing and social interactions through its feed and notification services. So let's get started. Let's talk about business requirements. Users should be able to upload images from their mobile devices, be it iOS or Android, and the system should support different image formats and sizes. Users should be able to like and comment on images, and should be able to view the likes and comments on images, and they should be also able to receive notifications when their images are liked or commented on. Users should be also able to follow other users, and should be able to receive notifications when they are followed by other users. The system should also generate a news feed of image based on the user's interest and activity. The news feed should be also sorted based on relevancy and freshness. Other requirements you may consider include user authentication and authorization, direct messaging, analytics and reporting, or any kind of advertisement support. Speaking of technical requirements, the system should be able to support image formats, be it JPEG, PNG, or GIF. The maximum allowed image size should be 10 MB. The system should be able to handle at least 100,000 concurrent uploads, and users should be able to upload 10 images per day with a maximum of 500 images per month. The system should be also able to handle 1 million concurrent likes and comments, and should be able to handle at least 1 billion registered users. Users should be able to follow at least 1,000 other users. Now based on the technical requirements we set up, let's do some capacity planning. Assuming an average image size of 5 MB, if a user uploads 10 images per day for a month, the maximum storage capacity required for image uploads would be 10 petabytes. However, not all users will upload images daily, so we can use a more realistic assumption that 10% of registered users upload images daily. With this assumption, the storage capacity required would be 1 exabyte per month. Assuming an average image upload time of 5 seconds and a maximum of 100,000 concurrent uploads, the system needs to support a minimum network bandwidth of 50 Gbps. And to handle 100,000 concurrent uploads, the system needs to have at least 100,000 processing cores available. Assuming a server with 32 cores, we would need at least 3,000 servers. As far as our database capacity is concerned, assuming an average of 5 likes or comments per image, each image would require 25 bytes of storage for the likes and comments of metadata. So to handle 1 million concurrent likes and comments, the system needs to support a minimum database capacity of 25 Mbps. Overall, these capacity planning requirements show that a system of this scale would require a significant amount of infrastructure, compute resources, and network bandwidth to handle the volume of image uploads, likes, comments, and follow relationships. Now in my YouTube system design video, I mainly focused on video uploads, search, and streaming. In Instagram design, apart from image upload, we are also looking into the relationships between users, such as who is following whom, and likes and comments in the post. And so we should be talking about the database schema to ensure it suits our needs and use cases. Now please let me know in comments if you can think of a better schema. So for the database, we have a user table, a post table, a like table, a comments table, and a follow table. And these tables have the following relationships. A user has many posts, and the post belongs to a single user. The post can have many likes, and likes belongs to a user and a post. Posts have many comments, and comments belong to a user and a post. A user has many follows as a follower, and user has many follows as a following user. Now with this schema, you can easily run queries, such as number of likes for a post, or if you want to get all the posts for a specific user, or you want to know who follows the user ByteMonk, or which users ByteMonk follows. Now these queries should be efficient with appropriate indexing on the tables, and the database should be able to handle the scale of the system with the specified requirements. Here is the high level system design architecture for Instagram. The Instagram app on iOS or Android is the primary client for the system. The client interacts with the backend through REST APIs. The API gateways act as a single entry point for all REST API requests. It performs authentication and authorization checks, rate limiting, and other security related functions. The backend handles request processing and interacts with metadata database and Blobstore. In the backend, we have a web application servers and microservices. The web application servers host the application code and handle user requests. These servers are responsible for generating responses to user requests, which may include querying databases, interacting with caches, and invoking microservices. The system should use microservices to handle specific functions such as user authentication, image processing, and generating news feeds. Each microservice should be self-contained and scalable. Speaking of the metadata database, this database stores all user data including images, comments, likes, follows, and user profiles. The system should use a distributed database such as Cassandra or MongoDB, which can handle this high volume of read and writes. Speaking of cache, the system should use a distributed cache such as Redis to reduce the load on the database and improve response times for frequently accessed data. The cache should store data such as user profiles, images, and news feeds. The system should also use a Blobstore such as Amazon S3 to store images and other large media files. The Blobstore should be integrated with the backend to ensure that images are uploaded and downloaded efficiently. And finally, the system should also use a CDN such as Akamai or CloudFront to serve images and other media files to users around the world. The CDN should be integrated with the Blobstore to ensure that files are cached and delivered efficiently. And if you want to deep dive into CDNs, I have created a video, please go and have a look. Now based on these requirements, here are the few REST API endpoints that the system should have. We have a POST endpoint to upload a new image, a GET endpoint to retrieve a specific image, a POST to follow another user, a POST to like a specific image, a POST to add a comment to a specific image, and a GET to retrieve a user's news feed. Speaking of backend service, we have an image service, a feed service, a fanout service, a like service, a comment service, a metadata service, and a distributed queue. The image service handles image processing and storage, including image uploads, retrievals, and deletions. It interacts with the Blobstore for image storage and retrieval, and communicates with a metadata database to update image metadata, such as captions, locations, and tags. The feed service generates news feeds for the users based on their follows and likes. It communicates with the metadata database to retrieve information on the users and images that a given user follows or has liked, and uses this information to generate personalized feed. Now when a user likes or comments on a post, the like service or comment service will add the corresponding event to the distributed queue. The fanout service will then consume the event from the queue and distribute it to all the downstream services that are interested in the event. For example, if a user likes a post, the fanout service will distribute the event to the feed service. The like service handles the operation for images, including adding and removing likes. It communicates with the metadata database to update the likes counts for a given image, and interacts with the distributed queue to process like requests. When a user likes an image, the like service adds a like request to the distributed queue, which is then processed asynchronously by a separate worker process. This approach ensures that the system can handle a large number of like requests without slowing down the response time for other requests. The comment service handles comments operations for images, including adding and removing comments. It communicates with the metadata database to update the comments count for a given image, and interacts with the distributed queue to process comment requests. When a user adds a comment to an image, the comment service adds a comment request to the distributed queue, which is then processed asynchronously by a separate worker process. This approach ensures that the system can handle a large number of comment requests without slowing down the response time for other requests. The metadata service provides access to the metadata database, and handles updates to image metadata, such as captions, locations, and tags. It also communicates with the image service, user service, feed service, like service, and comment service to ensure that all the components have access to most up-to-date image metadata. The metadata service also interacts with the cache to ensure that the frequently accessed metadata is available in memory for improved performance. And finally, our system uses a distributed queue, such as RabbitMQ or Kafka, to handle like and comment requests. When a user likes or comments on an image, a request is added to the queue, which is then processed asynchronously by a separate worker process. This approach ensures that the system can handle a large number of requests without slowing down the response time for other requests. The like and comment services interact with the distributed queue to add and remove requests, and the worker processes read requests from the queue and update the metadata database accordingly. Alright, let's talk about our use cases. Here is how the sequence of invocations for the Upload a New Image use case would look like. The user initiates the upload process by selecting an image and clicking the Upload button in the Instagram app. The client app sends an HTTP POST request to the API gateway with the new image data. The API gateway performs authentication and authorization checks, rate limiting, and other security-related functions. It then forwards the request to the image service. The image service receives the image data and stores it in the Blob Store. It then generates a unique ID for the new image and stores the metadata for the image example, the owner, timestamp, image ID, etc. in the metadata database. The image service triggers a fanout request to the fanout service, which retrieves the list of followers for the user who uploaded the image from the metadata service. The fanout service creates a notification for each follower and adds it to the distributed queue. The like and comment service retrieves the notification from the distributed queue and processes it. If the notification is for a new image upload, they update the cache with the new image metadata. The feed service periodically queries the metadata database and cache to generate news feeds for each user based on their followers and likes. The client app receives a response from the API gateway indicating that the image upload was successful along with the new image ID. Here are the sequence of events for follow another user use case. The user sends a request to follow another user to the image service. Now for the sake of modularity, you can also have a separate follow service, but I will be using image service. You may think of this as a monolith service. The image service authenticates the user and retrieves the list of followers for the user being followed from the metadata service. The image service adds the user to the list of followers and updates the metadata service. The fan out service retrieves the list of followers for the user being followed from the metadata service. The fan out service adds the new follower to the follower feeds for the user being followed. The fan out service also updates the cache for the user being followed and the new follower. The feed service retrieves the follower feeds for the user being followed from the caches. The feed service also updates the news feed caches for all the followers of the user being followed. And finally the user receives a success response from the image service. Now let's talk a little bit on the followers feed, which I have also spoken about in my Twitter fan out architecture video. In our system, each user has a follower feed that aggregates all the images, likes, and comments from the users they follow. Whenever a user uploads an image, likes an image, or comments on an image, the fan out service retrieves the list of followers for that user from the metadata service and adds the corresponding notifications to each follower's followers feed. For example, let's say user A uploads an image and user B follows user A. When user A's image is uploaded, the fan out service retrieves user B from the metadata service and adds a notification of the new image to user B's follower feed. Similarly, when user A's image is liked or commented on by user C, the fan out service retrieves that list of user A's followers, which also includes user B, and adds the corresponding notification to their follower feeds. This way, whenever a user accesses their follower feed, they can see all the recent activity from the users they follow in one place. Now, note that the followers feed is not explicitly represented in the database schema I provided. It is a virtual representation of the activity feed that includes posts from users that are particular to user follows. The followers feed is generated dynamically by the feed service by querying the follow table and post table to determine which posts should be included in the feed for a given user. It is not stored as a separate table in the database, but rather generated on the fly based on the user's activity. Here is the sequence of events for our final use case, retrieve a user's news feed, which is pretty simple. The client sends a get request to the feed service to retrieve the user's news feed. The feed service retrieves the list of users that the given user follows from the metadata service. The feed service then queries the post table in the image service to get all posts created by the users in the list of followed users. The feed service then retrieves the likes and comments for each post from the like service and comment service respectively. And then it generates the news feeds by combining the post likes and comments data. The feed service finally returns this news feed response to the client. Note that in this use case, the distributed queue and fan out service are not directly involved. The feed service retrieves the latest post from the metadata service and returns them to the client in the form of a news feed. The feed service uses the metadata service to retrieve the post that were created by the users that the authenticated user follows. The fan out service is responsible for adding the latest post to the feeds of the users that follows the creators of those posts. The distributed queue and fan out service are directly involved in use cases where a user interacts with the content such as liking or commenting on a post or following another user. In these cases, the fan out service retrieves the list of followers from the metadata service and adds the new follower to the followers feed for the users being followed. The like and comment services then process a notification from the distributed queue and update the appropriate data in the metadata service and cache. Additionally, the distributed queue may also be involved in use cases where a large scale processing is required such as bulk image uploads or batch updates to user profiles. The queue allows for asynchronous processing which can help to reduce the load on the system and improve performance. Also, note that the caching can also be used to improve the performance of this use case. The feed service can cache the list of followed users and the post likes and comments data for a certain period of time to avoid making frequent requests to the backend services. Both YouTube and Instagram adopt a microservices architecture approach in their system design, allowing them to scale their systems to handle large amounts of traffic and data. However, the specific services and technologies used differ with YouTube placing a greater emphasis on video processing and delivery while Instagram focuses more on image processing and social interaction, such as following a user for which it heavily relies on distributed queue and fan out services.
Generate a brief summary highlighting the main points of the transcript.
GenerateGenerate a concise and relevant title for the transcript based on the main themes and content discussed.
GenerateIdentify and highlight the key words or phrases most relevant to the content of the transcript.
GenerateAnalyze the emotional tone of the transcript to determine whether the sentiment is positive, negative, or neutral.
GenerateCreate interactive quizzes based on the content of the transcript to test comprehension or engage users.
GenerateWe’re Ready to Help
Call or Book a Meeting Now