Lustre File System Moving From Lab To Big Data Apps
Proponents of the Lustre file system maintain it is poised to make the jump from HPC installations into the enterprise, driven by the surging amount of data that companies are pooling.
If that prediction pans out, a parallel file system backed by an industry consortium could emerge as a standard across multiple platforms, an industry-sponsored study concludes.
Lustre boosters are touting the open source file system for big data as growth levels off in the HPC sector, where it is widely used by government and academic researchers. Big data presents an opportunity for Lustre to crack the commercial market as the need for more scalable I/O performance grows.
The first annual report on the Lustre market by OpenSFS, the consortium that promotes the open-source file system user community, asserts: "The opportunity still exists for Lustre to gain substantial adoption in commercial markets, particularly considering the dynamics surrounding big data – influenced deployments, which require I/O performance at scale beyond what organizations may have previously encountered."
The report cites projections that the software storage market could reach $950 million by 2017. Since storage is increasingly sold as "pass-through" with servers, the report notes that vendors must renew their focus on enterprise server and component vendors.
Moreover, the report emphasizes that Lustre resides at the intersection of high-growth areas such as storage and software and services like cloud computing.
"Just as we have seen standards emerge for components such as higher performance interconnects (InfiniBand) and accelerators (GPUs), Lustre could become a standard for high-performance I/O systems, carrying it through HPC and beyond into big data and scalable enterprise IT markets," the report concludes.
The report's authors said they defined "big data" not as an application but as a "set of trends" affecting different end users and applications. It has also fueled creation and better access to data. The ability to improve data management is therefore seen as a key to technology innovation and economic growth.
Among the big data challenges that Lustre could presumably address, a user survey identified managing very large files (33 percent) and managing large numbers of files (49 percent of respondents). The proliferation of files is being driven by the growing amounts of digitized data, higher data fidelity, greater access to data, and extended data storage.
Enter the parallel file system, the Lustre report asserts, which could help commercial users overcome file access traffic jams as I/O servers retrieving and delivering files become choke points. "Parallel file systems improve on the distributed file system model by allowing every cluster node to act as an I/O server with equal access to data, thereby doling the traffic out across multiple channels and alleviating the bottleneck."
The report also found that survey respondents considered I/O bandwidth (55 percent), I/O latency (52 percent), and I/O throughput (51 percent) to be the key challenges posed by big data. Indeed, overall I/O performance presented the largest "satisfaction gap" (90 percent) among users surveyed.
Whether open source systems like Lustre are ready to overcome big data challenges remains to be seen. The Lustre consortium's report notes that parallel file systems have so far failed to catch on because most are proprietary. The report acknowledged that IBM's General Parallel File System has made some inroads in the commercial market.
The advent of big data promises to broaden the commercial market for open source file systems such as Lustre, the report concludes.
The inaugural Lustre annual report was commissioned by OpenSFS from Intersect360 Research. The number of survey respondents ranged from 245 to 269 end users, the researcher said.