Vendredi, 24 Septembre 2010 03:43
Facebook Gives A Post-Mortem On Worst Downtime In Four Years
Facebook’s had a rough day. In fact, it’s had its worst day performance-wise in over four years, with 2.5 hours of downtime that resulted in countless complaints from users. Perhaps more important, it also had a bevy API problems, and its Like buttons — which are embedded on over 350,000 sites across the web — were apparently busted too. When Facebook goes down, it’s a big deal.
This evening Facebook Director of Software Engineering Robert Johnson has written a post-mortem of the outage, explaining what caused the site to fail.
According to Johnson’s post, the problem stemmed from an automated system Facebook had built to check for invalid configuration values in its cache. Unfortunately, that automated check backfired — to the point that Facebook had to turn off the site entirely to recover. Here’s a portion of the explanation:
Today we made a change to the persistent copy of a configuration value that was interpreted as invalid. This meant that every single client saw the invalid value and attempted to fix it. Because the fix involves making a query to a cluster of databases, that cluster was quickly overwhelmed by hundreds of thousands of queries a second.
To make matters worse, every time a client got an error attempting to query one of the databases it interpreted it as an invalid value, and deleted the corresponding cache key. This meant that even after the original problem had been fixed, the stream of queries continued. As long as the databases failed to service some of the requests, they were causing even more requests to themselves. We had entered a feedback loop that didn’t allow the databases to recover.
The way to stop the feedback cycle was quite painful – we had to stop all traffic to this database cluster, which meant turning off the site. Once the databases had recovered and the root cause had been fixed, we slowly allowed more people back onto the site.
Facebook has generally had a good track record in terms of keeping its homepage alive, but I’ve heard repeated complaints about the integrity of its API. And given Facebook’s goal of becoming the social fabric of the web — which entails maintaining a presence on countless third party sites — it’s imperative that it keeps its various widgets and authentication buttons working properly.
CrunchBase InformationFacebookInformation provided by CrunchBase
0
0
1
1
2
2
3
3
4
4
5
5
6
6
7
7Authors: Jason Kincaid
Read 5510 times
Published in
News Technologique-Tech News
More in this category:
« Réferencement : Soyez visible depuis googleMaps
Internet Control Issues: It’s Not Just China »
Last WebBuzz
-
WebBuzz du 24/11/2017: Pérou décoller comme superman-Peru Reverse bungee aka Superman Jump
Read 37597 times
-
WebBuzz du 22/11/2017: Une Femme Saoudienne fait du surf dans les rues-Saudi girl Car Surfing after heavy rains and flood in Saudi Arabia
Read 37962 times
-
WebBuzz du 20/11/2017: Maxi crach au grand prix GT à Macau-Huge pile up Crash 2017 Macau Grand Prix FIA GT World Cup
Read 32865 times
-
WebBuzz du 17/11/2017: Boston Dynamics fait le cirque avec ses robots-BD prepare to build a circus with his robots
Read 35604 times
-
WebBuzz du 16/11/2017: Une illusion d'optique féminine-a feminine optical illusion
Read 33689 times
-
WebBuzz du 14/11/2017: Roumanie un bus de police évite un tram de justesse-Close call between a tram and police's bus
Read 30315 times
-
WebBuzz du 13/11/2017: Arrivée fracassante d'un bateau sur les docks de San Diego-Whale Watching Boat Crashing Into San Diego Dock
Read 28592 times
-
WebBuzz du 08/11/2017: Créer des flammes de toutes les couleurs-How to make colourful flames
Read 29452 times
-
WebBuzz du 07/11/2017: Echec test du système de détection des piètons de la Volvo S60-Volvo S60 Pedestrian Detection System Test failed
Read 30660 times
-
WebBuzz du 03/11/2017: Slacker dans la forêt-slackline in the forest
Read 33110 times
accident
Amazing
animal
animals
animaux
art
avec
baby
car
Cat
chat
chien
comment
Crazy
Cute
dans
Dog
droles
Echec
fail
fait
From
funny
how
jump
musique
nature
new
people
plus
pour
route
russia
russie
saut
sauvage
Sport
stupid
sur
Technique
The
usa
vehicules
video
video du jour
videos
voiture
webbuzz
wild
with