Facebook, Instagram and WhatsApp went offline for six hours earlier this week.
The world’s largest social media platform suffered its most significant outage this week.
All Facebook platforms, including Instagram, WhatsApp and Oculus, went offline for six hours – the company’s longest downtime since a 2019 error that took the site offline for more than 24 hours.
However, unlike the 2019 incident, which disrupted services to varying degrees, this week’s incident completely took down the company’s products.
Facebook likes to run almost all of its operations on its own servers. As a result, the company’s internal tools were offline too. This left employees unable to communicate with each other, potentially slowing down its path to resolution.
The downtime also had an impact on other products and services too, both positively and negatively.
So how did the largest social network suddenly lose its network? And what impact did this have on Facebook and other companies?
How did Facebook go offline? (in human terms)
Facebook said that “configuration changes on the backbone routers that coordinate network traffic between our data centres caused issues that interrupted this communication”. But what does that mean in human terms?
The company has a large server that handles all network requests. When you use a Facebook product, your phone pings a nearby server, which then pings a larger server.
During routine maintenance, someone at Facebook accidentally ran a command that took down the large server entirely.
In a blog post, Facebook clarified that it has an audit tool that typically detects faulty commands. However, a bug in the tool meant that this destructive command went by undetected.
As a result, when you opened a Facebook product on Monday, your phone pinged nearby servers as it usually does but then couldn’t find the larger data centre since it was taken offline. This is how – what can we assume was one employee’s actions – disconnected Facebook from the internet.
Why did it take Facebook six hours to come back online?
Internal errors such as this are often reversible and don’t usually take six hours to fix. So why did this downtime last so long?
Facebook runs almost all of its internal operations on Facebook servers. So its engineers faced several obstacles that slowed down recovery.
Firstly, when Facebook sent engineers onsite to fix the data centre, they couldn’t get in. According to the company’s blog post, this is because these centres have “high levels of physical security” and that “they’re hard to get into”.
Once they got in, the devices in the data centres were also designed to be secure. Even physical access doesn’t give an engineer permission to modify the machine. This further delayed the recovery process and kept the social network offline.
It also didn’t help that Facebook’s internal messaging tools were offline due to the downtime. This prevented employees from communicating with each other and made it even harder to solve the issue. Some staff reported that they used Outlook emails to communicate since all their tools were down.
Embarrassingly, Facebook admitted that the security hurdles they designed to prevent hackers backfired as the company itself struggled to bypass them. It then vouched to improve its training, just in case such an issue happens again.
How did this impact the rest of the internet?
While Facebook employees scrambled to find a solution, and one silently prayed for forgiveness, other companies enjoyed the occasion. Facebook going offline often means a surge of users to other apps.
One such app is Telegram, which celebrated 70 million new users in one day. To put that into perspective, Twitter has 200-300 million monthly active users.
Twitter also celebrated with a cheeky “hello literally everyone” tweet, which gathered over 3 million likes. Ironically, Twitter later faced issues itself as it struggled to handle the increased activity on its platform.
Is It Down Right Now, a website that monitors downtime, went offline too. This was likely due to hundreds of millions of Facebook users checking to see if the site was down.
Notably, the platform went offline as Facebook’s global head of safety was live on CNBC defending the company against accusations.
Frances Haugen, the Facebook whistleblower, recently leaked data to the WSJ, accusing Facebook of hiding critical studies that highlight its impact on teens.
She also claimed that the company puts profit over people. Allegedly, a 2018 change in the algorithm encouraged engagement on the platform by promoting hateful posts.
Haugen said the change was implemented because “it’s easier to inspire people to anger than it is to other emotions”.
Facebook’s Vice President of Global Affairs Nick Clegg defended the company, explaining that people blame social media for global issues because it gives them a “false comfort”.
Regardless, Facebook’s stock price has steadily dropped in the past 30 days.
Facebook going offline and taking down Instagram, WhatsApp, Oculus and internal tools for employees worldwide served as a stark reminder of what happens when you hand so much power to one company.
As millions of businesses and billions of people depend on these privately operated services, it’s important to be aware of the impact a downtime such as this can have.
By trusting one data centre with so many operations, we have given so much power to that one employee that accidentally took down so much of what we depend on.
As Facebook faces scrutiny for its monopolising strategies, this week’s incident will make it even harder for the social media giant to defend itself.