Facebook Opens Up Osquery Tool
The hyperscale datacenter operators take a lot from the open source software community to run their businesses, but they also give back. Facebook has opened up its server, storage, and datacenter designs through the Open Compute Project and has also put out source code to help database, caching, and other programs in its application stack work better. The latest such tool that the social network has given back to the community is called osquery, and by opening it up, there is a possibility that the tool will spread to environments other than those it was created to help manage inside of Facebook itself.
The osquery tool is actually an umbrella term for a number of tools that Facebook has created so it can ask questions of its infrastructure, explains Mike Arpaia, who is a software engineer at the company with a specialty in security software and who has previously worked out online handmade marketplace Etsy in a similar role. The key component of the osquery tools is called osqueryb, and it is a daemon that runs on operating systems that hooks into low-level APIs in operating systems and extracts key metrics on settings and performance of these operating systems and puts the results into a key-value store so they can be analyzed in a time series. The neat bit is that osquery allows for system administrators to use normal SQL statements to query the tables that osquery builds, providing a kind of universal query language that will be able to span multiple and incompatible operating systems.
Officially, Arpaia tells EnterpriseTech, osquery supports the latest CentOS 6.X releases, which is the variant of Red Hat Enterprise Linux 6.X that Facebook uses internally on its infrastructure. Canonical's Ubuntu Server 12.04 is also officially supported with the osquery tool, as is the case with all of the software that Facebook builds, according to Arpaia. (You always want your software to work on at least two different releases, just in case there is a reason to shift.) CentOS is the freebie version of RHEL, and Ubuntu Server is a popular choice among the larger hyperscale community, so this is the obvious next choice.
Arpaia says that there is no technical reason why the osqueryd daemon should not compile down for any Linux out there, and adds that in the past two weeks the community that is taking over osquery development out on GitHub has made sure it works with the FreeBSD variant of the Unix operating system. Facebook itself also supports MacOS X with osquery and does so because it wanted a tool that could allow for the querying of OS information on both its massive fleet of servers in its infrastructure and on the Apple Macs that its coders and other users employ in the course of their daily work. Adding another operating system for the osquery tools into Facebook's internal Jenkins-based build system is no big deal, according to Arpaia.
The osqueryd daemon and the osqueryi interactive query engine that interfaces with it are written in C++ for Linux with some cross compiling in Apple's Objective C to provide support for the MacOS X platform.
Thus far, the osquery community has created around 50 distinct tables that expose various aspects of an operating system and accumulate data about their performance. This includes such things as what kernel modules are loaded on a particular machine, open network connections, or a list of running processes, just to name a few. And because the data is stored in database tables, tables can be joined to do sophisticated queries, such as merging a table for running processes and another one for open network ports and then asking what processes are listening in on what ports. The data behind osquery is stored in a RocksDB database, which is itself a variant of Google's LevelDB key-value store.
At the moment, osquery does not support the dominant operating system on the desktop and in the datacenter, which is of course Microsoft's Windows platform. The RocksDB database does not currently compile for Windows machines, but a version of the osquery tools could be created using another data store, such as SQLite that does work on Windows and that does have decent performance. Arpaia says that he is not a big Windows user, but did admit to just buying a Windows laptop so he can see how the APIs work in Windows and adds that as for Windows being supported, he the "would love the community to do it."