15X Spark Speed-ups Coming to the Open Source Community, Say HPE and Hortonworks
Hortonworks and HPE this week announced a collaboration to enable enterprise-wide Apache Spark implementation, an effort based on work done to the Spark kernel by HPE Labs that the companies say could improve shuffle engine performance by 15x and memory efficiency by 50 percent. The collaboration will center on big data analytics workloads that benefit from large pools of shared memory.
The new technologies will be contributed to the Apache Spark Open Source community. The project combines HPE Lab’s technical work while Hortonworks’ primary role will be to shepherd the technologies through approvals from the Apache Software Foundation along with integration testing.
According to Martin Fink, HPE EVP and CTO, software engineers at HPE Labs re-wrote several algorithms in the Spark shuffle engine as part of work done with HPE customers in the financial services industry. “We saw we’d made much more efficient use of memory and found ways to scale memory even more. We said, ‘We’ve got this body of code that is really different and delivers incredible performance enhancements.’ You pay attention when that stuff happens.”
Fink said the decision to contribute the work done on the Spark kernel to the Open Source community gets at the key value of Open Source.
“A lot of people look at Open Source primarily as a way to get visibility into the code,” he said. “Some companies are consumers of Open Source and then they build a whole bunch of propriety stuff on top. Another class of companies locks down Open Source and treats it like it’s proprietary and actively prevents collaboration from occurring. But the real importance is to look at Open Source in a collaborative way. So part of the project with Hortonworks is we saw this as a unique combination of 100 percent open, and collaborative and enterprise-scale.”
Scott Gnau, Hortonworks’ CTO, said the Hortonworks will continue its focus on the integration of Spark into broad data architectures supported by Apache YARN, as well as performance upgrades and better access points for applications like Apache Zeppelin. He told EnterpriseTech the project with HPE is “compelling because some of these analytics can be very sophisticated algorithms that people are trying to interact with and search for relationships and search for value in a nearly interactive state, at the speed of human thought, so it’s really important because it helps the entire process flow.”
Gnau said all Hortonworks applications are based on Open Community software, while the company generates revenue from maintenance and customer support services, along packaging and other Open Source community support services.
“We look out across all the different Apache projects and bundle them up, and do integration testing,” Gnau said. “Each Open Source project is responsible to itself alone, so we take the combination of all of them and create a platform and do integration testing across multiple projects so we can understand the interaction of different projects and make sure the software is stable and dependable.”