專訪Twitter工程師:SNS產品發展途中的那些事兒
原創【51CTO獨家特稿】2006年創建的Twitter已經是如今全球最為火熱的SNS服務之一,更是微博客這一概念的頭號先行者。Twitter目前的注冊用戶數已經超過了一億,而現在整個Twitter團隊大約只有180名員工,其中技術工程師占據70到80個左右。巨大的信息流量不斷沖擊著Twitter服務器的上限,而做為一個SNS服務,功能的更新也同樣不可忽視。對于這樣一個發展迅速的服務,我們要如何有效地把握它的發展脈絡?在今年的北京QCon大會上,51CTO開發頻道有幸邀請到了Twitter的系統工程師Nick Kallen來和我們分享Twitter技術團隊的經驗。
Nick Kallen原本是一位軟件咨詢師,同時也是Rails3框架的基礎Arel、NamedScope、分布式緩存框架Cache Money和JavaScript行為驅動開發框架Screw.Unit等多個開源項目的作者。兩年前他應邀解決Twitter的可伸縮性問題,并由此加入了Twitter的技術團隊。對于現在Twitter的規模而言,可高效查詢的可伸縮式數據模式是最為緊迫的任務之一,而Nick現在的主要關注方向之一就是一個通用的分布式數據庫。
Nick的Twitter是@nk。
Twitter的發展早期
做為流行度增長最快的SNS服務,Twitter常常被拿來與2004年創建的Facebook作比較。與Facebook日益劇增的新功能與新應用形成鮮明對比的是,Twitter在誕生之后的三年內一直鮮有新特性的加入,直到最近這一年才逐漸加入了圖片支持、地理位置顯示等新功能。根據Nick的介紹,早期的Twitter團隊對Twitter的定位是以“最小化服務”為目標的,當時團隊認為添加新的功能將會對Twitter“最小化服務”的定位產生不好的影響。
不過對當時常年被Fail Whale所糾纏的Twitter團隊而言,的確也抽不出足夠的人力和精力放在開發新功能上面。Twitter誕生后的三年時間中,用戶數量一直飛速發展,數據量驟然攀升。Twitter最開始以LAMP架構(Linux+Apache+MySQL+PHP)創建,這個系統很快便不堪重負。Nick十分坦誠的向我們描述了Twitter團隊早些年的狀況:
“系統的可伸縮性一直是如此緊迫的任務,導致我們的工程師們幾乎沒有什么機會考慮新特性方面的事情。所有的工作就是維持服務運作,以及將其擴展。……在一開始的LAMP架構下,通常就是一個MySQL主數據庫來做垂直伸縮。這當然不是不可伸縮的設計,不過無法滿足我們的需求。”
Twitter的可伸縮性
在用戶接觸不到的后臺,Twitter在四年間其實一直經歷著很多改變。比如在應用層,服務器后臺處理這一塊,Twitter在08到09年間使用Scala語言重寫了后臺應用,大大增強了多進程異步處理的能力,提升了性能。
這期間***的變化,應該就是數據層的變革了。2009年開始掀起的“NoSQL革命”在Web領域造成了極大的反響,Twitter也在今年年初開始對其Tweets數據類型引進新生的NoSQL數據庫Cassandra。而根據Nick的介紹,Twitter早在一年半之前便開始專注于高效的分布式數據存儲解決方案,該方案的重點在于Partitioning策略,即數據的分割。
“以前我們將所有的數據和服務存儲在一個組件上。而數據分割的做法,就是將數據分割成小塊,然后存儲在多個組件之上。因為大塊的數據被切割成了小塊,我們就可以并行的、以小任務的方式完成查詢和操作的工作。無論是我們開始發一個推,還是我們開始一個社交圖(social graph),還是我們開始一個搜索,每一個主要組件都在過去的一年半中通過不同的策略完成了數據分割。這就是現在Twitter可伸縮性的實現。”
近一年多持續增長的時間內,Twitter Fail Whale的出現次數已經降低了很多,應用層與數據層改進可謂是卓有成效。
Twitter API與新功能
相比Twitter的流量和用戶數,現在的Twitter技術團隊仍然顯得人手不那么充足。不過Twitter有一個很有意思的地方,就是在于其第三方應用的流行。2009年7月的統計顯示Twitter的第三方桌面應用、手機應用、Web應用和瀏覽器擴展總數剛剛過萬,而截止到現在,這個數量已經超過了十萬。雖然Twitter的核心功能一直沒什么重要改變,但在全世界上萬開發者的擺弄之下,這個社交平臺顯示了驚人的生命力與創新能力。像是TwitPic圖片存儲服務和iPhone客戶端這種應用都是來自第三方開發者,而這與Twitter的開放API是密不可分的。可以說,Twitter的大部分活躍和創新都應該歸功于它的開放API。
Twitter開發者大會
對于Twitter API的由來,Nick做了一點簡單的介紹:
“其實一開始的API開發,就是因為一個以前在Twitter的工程師離職去了德國,而他想要把Twitter集成到自己的聊天機器人里面。最初的API就是為了這位老兄的小玩具而設計的,不過我們很快就看到,將有更多人使用API來創造更多的東西。因此我們在很早的時候就開始在API上投入精力。”
和Facebook一樣,Twitter對于開發者社區非常關注。Twitter剛剛在4月14日和15日舉辦了Twitter開發者大會Chirp,剛好在Facebook的F8開發者大會之前一周。Chirp大會上剛剛公開了Twitter開發者網站的上線,相信對于Twitter開發者而言是個令人興奮的消息。
更多的變革
正如同Nick所介紹的那樣,Twitter已經發生了轉變。與之前的“最小化”定位不同,Twitter已經開始積極的推出新的特性。而對于開發者社區,Nick也表示了自己的期待:
“開發者社區的一大挑戰便是,我們構建的產品核心功能要如何通過API來實現更多的創新——那些我們將不會在核心中包括的創新,而不僅僅是Web的一個替代品。程序使用API的方式和人使用網站的方式是有著極大區別的。程序總是不停地問Twitter:有更多數據么?有新的信息么?如此這般。不過人是不同的,他們在一天當中的特定時間段查看頁面,比如午飯的時候。所以很有意思的是,所有的API使用都很同質化——功能相似,速度很快,重復性強。如何在這種方式下進行高效的訪問,是工程師需要思考的問題。而每個用戶不同的、不規范的使用習慣,也是另一個需要思考的問題。”
#p#
附錄:Nick Kallen專訪文字整理
(右為Twitter系統工程師Nick Kallen,左為51CTO開發頻道編輯楊賽)
51CTO: How did you join twitter, and how many people are there in the current twitter development team?
NK: I joined twitter about 2 years ago. Originally I consulted for them to help with scalability issues, and I really enjoyed working there. They wanted to hire me, they made me an offer and I accepted, that’s how I originally joined. I believe there is 180 employees now, approximately. As for how many engineers are there, I think there is about 40% or 50% of engineers, so about 70 to 80.
51CTO: Twitter has been cautiously adding new features over the past 4 years. How do you decide whether a new feature should be added?
NK: There is a lot of reasons why twitter has in the past been cautious adding new features. For the first couple of years of twitter history, and until recently, scalability has been such an urgent concern that there hasn’t been as much opportunities for the engineers to work on any new features, they’ve been so busy keeping the site up on, making it scale. I think also early on, twitter is sort of a minimal service, I mean, many people contrast it to Facebook. Facebook is a rich set of features like photos, all sorts of things. And traditionally twitter has been very minimal, hasn’t added like extensive conversation functionality abilities like photo features. And so, the culture for a while, we have been reluctant to add features, which distracts the minimal of twitter. I think that is changing now, though, I think we are pretty aggressively adding new features, and riching new feature sets has been rather experimental. So, vary to the minimalism we used to be.
51CTO: Did you consider high scalability from the very beginning? How did Twitter's scalability improve over the past year?
NK: Well it definitely wasn’t designed for scalability from the beginning. It was designed using kind of the traditional LAMP style – Linux, Apache, MySQL, PHP architecture – usually the single MySQL master database, vertically scaled. That is kind of how the original version of twitter was architectured. That is definitely not unscalable design.
Last year and a half, we were focused on basically partitioning strategies for our data storage. That means, instead of storing all the data or service in one component, you take that data and divide it into small pieces, and you store it across multiple components. So you can answer queries and manipulate it in parallel and in smaller jobs, because you’ve take a huge amount of data and divided it into pieces. So every major components, from how we start tweets, to how we start a social graph, to how we start search indices, has basically been partitioned using different strategies over the last year and a half. That’s what makes twitter scale now.
51CTO: Twitter APIs has been a main reason for Twitter's success as a service. How does such an architecture different from a normal Web 2.0 product?
NK: Originally the API was developed because, when the first engineers of twitter left and moved to Germany, and he wanted to integrate twitter with an IRC bot, and the original API was designed to support him doing his little toy, and it quickly became apparent that people could have created things using the API, so we early on invested on the API functionality. That gives twitter a main advantage since we have been a small engineering team for a long time, and by opening the API we allow other people to build core functionality for us, an obvious example would be twitpic, where we didn’t have the resources to build photo storage/services, because there weren’t enough engineers. But by having the API, those core services could be built by other people, including like an iPhone client these days.
The challenge in the community now, though, is as we are able to build central parts of the products for people to have more creative uses of the API, and use it for not just as an alternative to the web, but creative kinds of things that we are not going to build into the core product. For scaling, APIs vs. the web, there is a big difference between the way software queries APIs vs. the way human beings use the websites. Programs would constantly pull – they keep pinning twitter, can I get more data, is there anything more recent, etc. The human beings don’t behave that way, they check it a few times, during lunch or something like that. So the interesting thing about API usage is that it’s very homogeneous, very similar, and is very high velocity and repetitive. And so you need to engineer your system to support that style of access efficiently. And that’s a different problem in supporting the kind of varied and irregular use cases of human beings.
51CTO專訪Nick Kallen視頻請見下一頁
#p#
視頻采訪實錄