Answer by floriank for Facebook database design?

TL;DR:

They use a stack architecture with cached graphs for everything above the MySQL bottom of their stack.

Long Answer:

I did some research on this myself because I was curious how they handle their huge amount of data and search it in a quick way. I've seen people complaining about custom made social network scripts becoming slow when the user base grows. After I did some benchmarking myself with just 10k users and 2.5 million friend connections - not even trying to bother about group permissions and likes and wall posts - it quickly turned out that this approach is flawed. So I've spent some time searching the web on how to do it better and came across this official Facebook article:

I really recommend you to watch the presentation of the first link above before continue reading. It's probably the best explanation of how FB works behind the scenes you can find.

The video and article tells you a few things:

They're using MySQL at the very bottom of their stack
Above the SQL DB there is the TAO layer which contains at least two levels of caching and is using graphs to describe the connections.
I could not find anything on what software / DB they actually use for their cached graphs

Let's take a look at this, friend connections are top left:

enter image description here

Well, this is a graph. :) It doesn't tell you how to build it in SQL, there are several ways to do it but this site has a good amount of different approaches. Attention: Consider that a relational DB is what it is: It's thought to store normalised data, not a graph structure. So it won't perform as good as a specialised graph database.

Also consider that you have to do more complex queries than just friends of friends, for example when you want to filter all locations around a given coordinate that you and your friends of friends like. A graph is the perfect solution here.

I can't tell you how to build it so that it will perform well but it clearly requires some trial and error and benchmarking.

Here is my disappointing test for just findings friends of friends:

DB Schema:

CREATE TABLE IF NOT EXISTS `friends` (`id` int(11) NOT NULL,  `user_id` int(11) NOT NULL,  `friend_id` int(11) NOT NULL) ENGINE=InnoDB AUTO_INCREMENT=2 DEFAULT CHARSET=utf8;

Friends of Friends Query:

(        select friend_id        from friends        where user_id = 1    ) union (        select distinct ff.friend_id        from            friends f            join friends ff on ff.user_id = f.friend_id        where f.user_id = 1    )

I really recommend you to create you some sample data with at least 10k user records and each of them having at least 250 friend connections and then run this query. On my machine (i7 4770k, SSD, 16gb RAM) the result was ~0.18 seconds for that query. Maybe it can be optimized, I'm not a DB genius (suggestions are welcome). However, if this scales linear you're already at 1.8 seconds for just 100k users, 18 seconds for 1 million users.

This might still sound OKish for ~100k users but consider that you just fetched friends of friends and didn't do any more complex query like "display me only posts from friends of friends + do the permission check if I'm allowed or NOT allowed to see some of them + do a sub query to check if I liked any of them". You want to let the DB do the check on if you liked a post already or not or you'll have to do in code. Also consider that this is not the only query you run and that your have more than active user at the same time on a more or less popular site.

I think my answer answers the question how Facebook designed their friends relationship very well but I'm sorry that I can't tell you how to implement it in a way it will work fast. Implementing a social network is easy but making sure it performs well is clearly not - IMHO.

I've started experimenting with OrientDB to do the graph-queries and mapping my edges to the underlying SQL DB. If I ever get it done I'll write an article about it.

How can I create a well performing social network site?

Update 2021-04-10: I'll probably never ever write the article ;) but here are a few bullet points how you could try to scale it:

Use different read and write repositories
Build specific read repositories based on faster non-relational DB systems made for that purpose, don't be afraid of denormalizing data. Write to a normalized DB but read from specialized views.
Use eventual consistence
Take a look at CQRS
For a social network graphs based read repositories might be also good idea.
Use Redis as a read repository in which you store whole serialized data sets

If you combine the points from the above list in a smart way you can build a very well performing system. The list is not a "todo" list, you'll still have to understand, think and adept it! https://microservices.io/ is a nice site that covers a few of the topics I mentioned before.

What I do is to store events that are generated by aggregates and use projects and handlers to write to different DBs as mentioned above. The cool thing about this is, I can re-build my data as needed at any time.

Answer by floriank for Facebook database design?

TL;DR:

Long Answer:

How can I create a well performing social network site?

Trending Articles

Practice Sheet of Right form of verbs for HSC Students

Bureau of Internal Revenue: Regional Offices (Directory)

NATHAN CARL DAHLIN Arrested by Clackamas County Sheriff's Office on May 15, 2020

Telangana State MP MLA Mobile Numbers Full Information

Re: 古いPDFが開けない

PDFファイルの「名前を付けて保存」ができない

Outlook のコマンドラインスイッチと初期化される情報について

Panini (Spain) - Adrenalyn XL LaLiga Santander 2022-23 (07) - Platinum Pocket...

Call of Duty Black Ops 3 Compatibility Pack 1

Re: Subqueries in ABAP CDS Views

The Nightmare Before Christmas 1993 3D HSBS MULTISUBS 1080p BluRay x264...

Program RSUSR003 Reports "Security violation" in SM21 system log

[MP3] Texzy Ft Dr. Ritzy –“Leg Over” (Prod. @DrRitzy & @KezzyKlef)

Riley County Arrest Report Tuesday April 30

CALVIN ESSIX Arrested by Miami-Dade County Corrections on Feb 14, 2017

The Mother and the Murderer: Woman confronts son’s killer in prison

Union County Arrests and Mugshots 09-05-2019

Outlook 2010 で「予定表」と「Calendar」の二つの予定表が作成される問題

AOMEI Backupper is in progress, please wait

Dhadak Title Song Lyrics Translation | Tere Naam ki Koi Dhadak Hai Na