Twitter Quietly Launched A New Search Backend Weeks Ago

Jeudi, 07 Octobre 2010 01:20

Twitter Quietly Launched A New Search Backend Weeks Ago

E-mail

Rate this item

(0 Votes)

While everyone was busy trying out New Twitter or tweeting about how they want New Twitter, Twitter itself was doing something secret behind the scenes. The startup quietly flipped the switch on an entirely new backend for their search, they reveal in a blog post today.

“One of our main goals, but also biggest

challenges, was a smooth switch from the old architecture to the new one, without any downtime or inconsistencies in search results,” they write in the post. Mission: accomplished, it seems, as no one outside of Twitter even seemed to be aware that they switched anything.

Twitter notes that they had to build this new backend because they were still using the search technology that they acquired in the Summize deal. Obviously, that tech was great at the time, but Twitter was much smaller at the time of that deal; they’ve grown massively since then. “Scaling the old MySQL-based system had become increasingly challenging,” they note.

So what is this new search? “Since we love Open Source here at Twitter we chose Lucene, a search engine library written in Java, as a starting point,” Twitter notes in the post. But they say that they had to modify it give their demands for real-time search. What type of demands? These types of demands:

Our demands on the new system are immense: With over 1,000 TPS (Tweets/sec) and 12,000 QPS (queries/sec) = over 1 billion queries per day (!) we already put a very high load on our machines. As we want the new system to last for several years, the goal was to support at least an order of magnitude more load.

They also have a goal of indexing a tweet after it’s tweeted in less than 10 seconds.

Twitter says that any custom work they did on Lucene will be put back into the open source.

And the new system, which again, has been rolling out for a few weeks now, seems to be working. Twitter estimates that they’re only using about 5 percent of the available backend resources. They say the new indexer could also run 50 times more tweets per second than they currently get. In other words, it will scale.