Facebook is one example of social networking who succses in scaling their system. The challenge for Facebook’s engineers is to keep the site up and running smoothly inspite of handling more than half a billion active users. During the scaling process, some of great technology developed inside Facebook. This article takes a look at the greatest technical accomplishments by facebook engineers.
HipHop for PHP transforms PHP source code into highly optimized C++. HipHop transforms your PHP source code into highly optimized C++ and then compiles it with g++ to build binary files. You keep coding in simpler PHP, then HipHop executes your source code in a semantically equivalent manner and sacrifices some rarely used features – such as eval() – in exchange for improved performance. Facebook sees about a 50% reduction in CPU usage when serving equal amounts of Web traffic when compared to Apache and PHP. Facebook’s API tier can serve twice the traffic using 30% less CPU. It was developed by Facebook and was released as open source in early 2010.
Thrift is a software framework for scalable cross-language services development. It combines a software stack with a code generation engine to build services that work efficiently and seamlessly between C++, Java, Python, PHP, Ruby, Erlang, Perl, Haskell, C#, Cocoa, Smalltalk, and OCaml. Originally developed at Facebook, Thrift was open sourced in April 2007 and entered the Apache Incubator in May, 2008.
Hive is a data warehouse infrastructure built on top of Hadoop that provides tools to enable easy data summarization, adhoc querying and analysis of large datasets data stored in Hadoop files. It provides a mechanism to put structure on this data and it also provides a simple query language called Hive QL which is based on SQL and which enables users familiar with SQL to query this data. At the same time, this language also allows traditional map/reduce programmers to be able to plug in their custom mappers and reducers to do more sophisticated analysis which may not be supported by the built-in capabilities of the language.
Apache Cassandra is a distributed storage system for managing structured data that is designed to scale to a very large size across many commodity servers, with no single point of failure. Cassandra was open sourced by Facebook in 2008, and is now developed by Apache committers and contributors from many companies.
Scribe is a scalable service for aggregating log data streamed in real time from a large number of servers. It is designed to be scalable and reliable. It is designed to scale to a very large number of nodes and be robust to network and node failures. There is a scribe server running on every node in the system, configured to aggregate messages and send them to a central scribe server (or servers) in larger groups. If the central scribe server isn’t available the local scribe server writes the messages to a file on local disk and sends them when the central server recovers. The central scribe server(s) can write the messages to the files that are their final destination, typically on an nfs filer or a distributed filesystem, or send them to another layer of scribe servers.
Tornado is an open source version of the scalable, non-blocking web server and tools that power FriendFeed. The FriendFeed application is written using a web framework that looks a bit like web.py or Google’s webapp, but with additional tools and optimizations to take advantage of the underlying non-blocking infrastructure.
That is some of the technologies developed in the facebook. In case missed something, you can add in the comments section. Facebook maintains a list of its open-source contributions at http://developers.facebook.com/opensource