Fujitsu Builds Big Data Processor on Ceph
Addressing the growing need for speedier processing in big data storage systems that handle everything up to data analytics, Japan’s Fujitsu Laboratories has come up with a technology said to provide high-speed data processing of big data in distributed storage systems.
Tokyo-based Fujitsu Labs said this week its platform implemented on the Ceph distributed storage framework addresses the current bottleneck created when processing servers read data from bulging storage systems. The huge volumes flowing between storage and servers is steadily increasing system latency. Further, Fujitsu engineers said, processing data in storage speeds things up because data need not be moved between storage and server.
“Nonetheless, this makes it difficult to analyze unstructured data distributed across the storage system, and to maintain stable operations in the system's original storage functionality,” they noted
Fujitsu Lab’s approach, dubbed “Dataffinic Computing,” seeks to accelerate big data processing by reducing data movement. The architecture targets distributed storage by connecting multiple servers via a network while maintaining original storage functionality, the company said Thursday (Sept. 20).
Much of the focus is on the growing volumes of unstructured video and log data that is quickly filling distributed storage systems. The Dataffinic approach breaks down unstructured data along the connections within the data, storing it in a state where individual pieces can be more readily accessed and crunched.
“This means that the pieces of data scattered across the distributed storage can be processed individually, maintaining the scalability of access performance and improving the system performance as a whole,” Fujitsu claimed.
The system also predicts storage resource requirements for maintaining data such as automatic recovery processing after an error. The approach is designed to monitor data processing requirements and timely allocation of those resources. The result is high-speed processing without sacrificing storage functionality.
Fujitsu Labs said it implemented Dataffinic on Ceph, the open source object storage framework implemented in a distributed computing cluster. The prototype consisted of five storage nodes and five servers linked via a 1-Gbps network. Data processing performance was measured by extracting objects from 50 gigabytes of video data.
Conventional processing takes about 500 seconds. The Fujitsu engineers claim their approach in which data processing was completed on storage nodes, thereby eliminating the need to “bring the data together, yielded a ten-fold increase in processing performance, or 50 seconds.
“This technology enables scalable and efficient processing of explosively increasing amounts of data,” they asserted.
Next steps include verifying the big data processing platform for commercial applications. Fujitsu Ltd. (OTCMKTS: FJTSY) currently plans to release a product based on the architecture in 2019.