Advanced Computing in the Age of AI | Tuesday, March 19, 2024

Cask Aims to Plug Hadoop Problems with CDAP 3.0 

Rather than invent new applications for existing processes, internal Hadoop developers typically try to use pre-existing programs but Apache Hadoop's complexities may prevent these professionals from empowering their enterprises to rapidly reap big data opportunities.

This challenge is growing as the number of pre-existing applications and data grow and the availability of well-qualified Hadoop developers drops. Instead of quickly meeting business goals, Hadoop-driven initiatives can take many months, said Jonathan Gray, founder and CEO of Cask Data, in an interview. In a 2014 Gartner poll, 50 percent of respondents cited Hadoop's "undefined value proposition" as the biggest barrier to Hadoop adoption, while 21 percent said it was the cost of acquiring skills and 19 percent credited problems related to integration with other infrastructure.

Looking to resolve these issues by, in part, empowering Java developers to participate in big data programming, Cask today released Cask Data Application Platform (CDAP) v3. In development for more than three years, the open source application development platform includes a new role-based user interface that allows developers to work with data without having to write code, plus a new interactive shell, said Gray.

Cask focuses on building an integrated framework atop Hadoop so developers can work faster and smarter, the company said. In other words, CDAP is the "WebLogic of big data," said Gray. The software runs on top of Hadoop platforms including Cloudera Enterprise Data Hub and Hortonworks Data Platform. It delivers an abstraction of data residing in the Hadoop environment via logical representations of underlying data and portability of applications by decoupling underlying infrastructures. Cask's services and tools allow developers to create applications faster, integrate components of the Hadoop ecosystem in one platform, and retain more operational control in production by following enterprise best practices, Gray said.

Without this framework, when developers download Hadoop to work on a particular system, they receive a lot of different pieces that ultimately will make up the big data solution, he said. Many of these applications require different areas of coding or language expertise – an expensive and time-consuming proposition for internal IT departments which generally must either recruit this specific expertise or cede some control and find it from an external partner, Gray said.

"If you're a developer and you're trying to build a recommendation system – a simple ingestion pipeline – that could take eight projects and six months," he added. "CDAP is an open source integrated platform for developers and organizations to build, deploy, and manage data applications. CDAP adds an abstraction layer on top which hides the complexity of the underlying API. When you get CDAP you don't have to understand how to design a schema on a NoSQL database. You can just use OLAP."

CDAP

CDAP

Version 3.0 includes a new role-based user interface with the ability to create user-defined dashboards; code-free data ingestion, exploration, and transformations from user interface and shell; pre-built support for real-time and batch ETL (extract, transform, load) pipelines; reusable application templates and plug-ins for faster development; OLAP Cube dataset; enhanced metrics and workflow support, and support for multi-tenancy, according to Cask. It exposes APIs so developers can create applications and access CDAP services. CDAP itself uses an array of services which integrate with Hadoop infrastructures including HBase, HDFS, YARN, MapReduce, Hive, and Spark, according to Cask.

It integrates with Cloudera Enterprise Data Hub; business intelligence tools SquirrelSQL and Pentaho Data Integration, and Cask's CDAP JDBC Driver.

"[CDAP] provides new capabilities out of the box around data ingestion, exploration, and transformation," Gray said. "We built a UI and a shell. For more advanced users and organizations, all these code-free things are extensible. It's all very extensible so people can take what they already have and extend it."

As a result, developers see an 80 percent reduction in the amount of time it takes to develop a Hadoop application and the lines of code, according to Gray. Benefits could be greater if it takes an organization longer to find developers with specific expertise, he said.

"What we have is actually familiar to a traditional Java developer," said Gray. "They'll be able to onboard onto it in a couple of days. They get to focus on application logic. These guys are application developers in a line of business or in a product team. They're trying to build a new product or service for their company. Instead, they've been spending a lot of time learning about infrastructure and gluing together infrastructure."

Because it is open source, CDAP is free to developers, Chief Operating Officer Boyd Davis told Enterprise Technology. The company makes money through its network of developer partners such as Salesforce.com, which includes CDAP as an offering to its customers, said Davis, who formerly managed Intel's Datacenter Software division. Cask also has a strategic partnership with Cloudera, which invested in Cask in February and plans on publishing a joint roadmap.

"We have this mindset we're going to save you a couple of developers," said Davis. "When you start to think about saving people the cost of a Hadoop developer's time, that's an enormous amount of savings."

 

 

About the author: Alison Diana

Managing editor of Enterprise Technology. I've been covering tech and business for many years, for publications such as InformationWeek, Baseline Magazine, and Florida Today. A native Brit and longtime Yankees fan, I live with my husband, daughter, and two cats on the Space Coast in Florida.

EnterpriseAI