Twitter notes that they had to build this new backend because they were still using the search technology that they acquired in the Summize deal. Obviously, that tech was great at the time, but Twitter was much smaller at the time of that deal, they’ve grown massively since then. “Scaling the old MySQL-based system had become increasingly challenging,” they note.
So what is this new search? “Since we love Open Source here at Twitter we chose Lucene, a search engine library written in Java, as a starting point,” Twitter notes in the post. But they note that they had to modify it give their demands for real-time search. What type of demands? These types of demands:
Our demands on the new system are immense: With over 1,000 TPS (Tweets/sec) and 12,000 QPS (queries/sec) = over 1 billion queries per day (!) we already put a very high load on our machines. As we want the new system to last for several years, the goal was to support at least an order of magnitude more load.
Twitter says that any custom work they did on Lucene will be put back into the open source.
7 7
Authors: MG Siegler