IBM Launches Mainframe Platform for Spark
IBM is expanding its embrace of Apache Spark with the release of a mainframe platform that would allow the emerging open-source analytics framework to run natively on the company's mainframe operating system.
IBM (NYSE: IBM) said Tuesday (March 29) its z/OS platform for Apache Spark aims to make it easier to access and analyze data "in-place" via the company's z Systems mainframe. The combination is part of an overall trend toward accessing larger datasets and analyzing data in real time.
The in-place data analysis capability means data scientists utilizing Apache Spark could sift through data "on the system of origin" without having to extract, transform and load (ETL), a step IBM says breaks the link between an analytics library and the underlying file system.
The company said its platform comes with Spark open library capabilities that include Spark SQL, Spark Streaming, Machine Learning Library and Graphx, the Apache API for graphs that includes a built-in library of common algorithms. These components along with the Apache Spark core are combined with what IBM claims is the first "mainframe-resident" Spark data abstraction approach.
The IBM platform also seeks to leverage Spark's in-memory processing approach to crunching data. Hence, the z Systems platform includes data abstraction and integration services so that z/OS analytics applications can leverage standard Spark APIs. That approach eliminates processing and security issues associated with ETL while allowing organizations to analyze data in-place.
IBM asserted that the ability to run Apache Spark on its z Systems and other platforms would allow users to "perform analytics alongside the transactional systems that house key data, while drawing contextual insights from other data sources."
Meanwhile, data abstraction services are intended to simplify access to traditional enterprise data formats using familiar tools via Apache Spark APIs.
Platform vendors are increasingly focusing on helping developers build and test applications so they can be deployed faster. With that in mind, IBM said its Spark analytics platform would allow data scientists to use existing programming languages such as Python, R, Scala and SQL to streamline development.
The Spark platform also allows analysts gathering data—especially unstructured data—from a growing list of sources to use preferred formats and tools to collect and sift through data.
IBM added that the real-time analytics platform tailored to Spark also includes accelerators supplied by the company's z Systems partners. New partner DataFactZ helped IBM develop analytics applications based on Spark SQL and Machine Learning Library for data and transactions processed on the z Systems mainframe.
The company said it extended its collaboration with Rocket Software to the z/OS Apache Spark platform via new tools like a "launch pad" that allows users to test drive the platform using data on z/OS. The IBM partner recently announced a data virtualization tool for IBM z Systems mainframe customers.
Finally, IBM said partner Zementis would complement its in-transaction predictive analytics offering for z/OS with an execution engine for Apache Spark. The standard approach would allow users to deploy predictive models used to compute risk or detect fraud while processing transactions.
The z/OS platform for Apache Spark is available now, the company said.
The Spark mainframe platform reflects IBM's growing commitment to the emerging analytics framework. Last year, it announced plans to deploy more than 3,500 developers to advanced Apache Spark projects. The company also said this week it would add a new GitHub group to spur development of new tools around z/OS on Spark.