Twitter notes that they had to build this new backend because they were still using the search technology that they acquired in the Summize deal. Obviously, that tech was great at the time, but Twitter was much smaller at the time of that deal; they’ve grown massively since then. “Scaling the old MySQL-based system had become increasingly challenging,” they note.
So what is this new search? “Since we love Open Source here at Twitter we chose Lucene, a search engine library written in Java, as a starting point,” Twitter notes in the post. But they say that they had to modify it give their demands for real-time search. What type of demands? These types of demands:
Our demands on the new system are immense: With over 1,000 TPS (Tweets/sec) and 12,000 QPS (queries/sec) = over 1 billion queries per day (!) we already put a very high load on our machines. As we want the new system to last for several years, the goal was to support at least an order of magnitude more load.
They also have a goal of indexing a tweet after it’s tweeted in less than 10 seconds.
Twitter says that any custom work they did on Lucene will be put back into the open source.
And the new system, which again, has been rolling out for a few weeks now, seems to be working. Twitter estimates that they’re only using about 5 percent of the available backend resources. They say the new indexer could also run 50 times more tweets per second than they currently get. In other words, it will scale.
7 7
Authors: MG Siegler