ADP, Dow Jones Help Mainstream Node.js
The one thing that all enterprises are always on the hunt for is a hardware or software technology that can significantly speed up an existing process or, in some cases, make a new process possible that was not feasible in the past. By definition, EnterpriseTech is very keen on such technologies and it looks like yet another software technology that was spawned in the hyperscale world, called Node.js, is getting set to go mainstream.
Two such customers made presentations at the Node on the Road event in New York. The first was Automatic Data Processing, the $11.3 billion company that does payroll and benefits processing for more than 620,000 organizations in 125 countries. The company manages one in five payrolls in the United States and one in ten worldwide. About three years ago, to get closer to the hottest programming talent in the New York region (ADP Is located in Roseland, New Jersey, not exactly where hip young programmers want to live), the payroll processor set up ADP Innovation Labs in the Chelsea section of the Big Apple, not too far from Google’s digs, and now has 85 software engineers there plugging away on new technologies.
Fast forward a few years and the new application that ADP Innovation Labs is working on is called Semantic Search. “What we wanted to do is create a search engine that spanned all of the objects at ADP,” explained Masiero. “We have a lot of objects at ADP. Human capital management and payroll is a complicated thing, and we do benefits and retirement, and so on. We needed a search engine that would go across all of those objects. And we wanted to do even more, such as instead of just looking at nouns, we do verbs, too.”
So, for instance, end users can search for people and open positions at their firms and hire them by simply typing into the Semantic Search engine:
“We needed to be very fast. We wanted the same thing that Google gives you, millisecond response – very, very fast. When the user issues a query predicate, we needed to break this into two queries, one against metadata and one against the index data itself. We need to parallelize these things like crazy because we decided to use Instant [the instant search add-on for the Apache Solr search engine], which means that every character that you put on the predicate is a new query firing against the server. We said, I don’t think PHP can do that, and so we decided to test this Node thing.”
That was about two and a half years ago. “We wrote it in Node and it was awesome. The thing just screams. It is every easy even though you are doing very complicated manipulation of the predicate.”
The stack for the Semantic Search starts with Linux, of course, and the Nginx high-performance Web server that is used by a slew of Web properties, including Netflix, Hulu, Zappos, Pinterest, Airbnb, and Zynga. The search engine is Apache Solr and metadata describing the objects in the ADP application portfolio is stored in the MongoDB NoSQL data store. Node.js is what glues it all together.
“When we demoed this on the tablet we added voice commands and it’s cool. You can literally walk the hallways and say, ‘Fire John!’”
Having created their mobile application for smartphones with PHP, which now has a million users, Masiero said that when they started to think about creating one for tablets, sticking with PHP just “didn’t sound right.” The reason is that ADP wanted to code its application dashboards for end users as separate little tiles, each portion of it being updated in parallel using the asynchronous back-end of Node.js. This is how modern Web pages work, with elements loading separately rather than all at once. The application is aware of your location and the dashboard changes based on where you are. For instance, it will show you your retirement benefits when it knows you are at home but your daily planner schedule when it knows you are in the office. (The app is able to use proximity sensors installed at employers to see people check in and check out of work automatically.)
Now here’s the interesting bit. All of this Node.js front end is running on six servers, three apiece in each of ADP’s two datacenters. That is how little iron it takes to grab data from all of those ADP applications and rip it out to those users. This is important. ADP is facing is the same issue one that most enterprises are wrestling with: What happens when you expose applications to users who have smartphones and tablets and who can flood into the system at any time? There are no predictable patterns to user access anymore – everyone is always working – and that means your systems have to be architected to run as efficiently as possible and be able to handle large peaks. ADP was one of the early adopters of IBM mainframes and still has these systems at its core today, but they are cloaked in layers of systems and machines to make the apps snappy and modern. Node.js is a key part of that, and specifically, it creates what Masiero calls an API Multi Proxy, or APIMP for short. This sits behind some BIG-IP firewalls from F5 Networks and in front of all of the ADP applications.
As is common with users of open source technology, ADP is giving back to the community. It has created a tool called PigeonKeeper, an example of what is called a directed acyclic graph engine. The problem this solves is easy enough to say, but hard to do: How do you know when all of the tiles in a disaggregated application coded in Node.js are all updated? (In other words, how do you know when all of the pigeons have come home to roost?) PigeonKeeper doesn’t just watch these processes, but actually orchestrates them based on the dependencies between the tiles in the application. ADP has also created an API for querying JSON documents, called JQL, which is based on a prior tool called JSONPath and that allows for updates and deleting of items stored in a JSON document. This is important to ADP because the MongoDB data store uses the JSON format for storing data. The company is in the process of working through its legal department to open source PageKeeper and JQL; Masiero has no idea how long that will take.
“We have basically made it the gold standard,” says Rahner of Node.js. “If you are going to build a Web app at Dow Jones, it is probably going to be in Node and you have to have a good excuse why it needs to be in another language. We have also created a core team whose job it is to evangelize about Node and Tesla.”
Node.js may have got its start on the Google Chrome browser but it is picking up back in the stack, as the Semantic Search service at ADP demonstrates. Node.js is also a key component of Joyent’s own Manta object storage service, which is based on the ZFS file system originally created by Sun and including integrated data analytics and compute. Other applications at Groupon, Wal-Mart, LinkedIn, and PayPal have already been created using Node.js and no doubt plenty more will follow wherever non-blocking I/O and streaming is the key attribute of the application.
Node.js needs to mature a bit and see more widespread installation for enterprises to get truly comfortable with it, much as has been the case with all open source tools and indeed any new technology. The V8 engine was created for X86 machines and Google is working to get the V8 engine ported to ARM processors already. Now that Google and IBM are working together in the OpenPower Foundation, it is very likely that the V8 engine will be ported to the Power architecture at some point, too.
In the meantime, the Node.js community is working to get its 0.12 release out the door and towards the 1.0 release that will have all of the bells and whistles necessary for enterprise-grade applications. Fontaine did not provide a timetable for when either Node.js 0.12 or 1.0 would be available. The good bit, said Fontaine, is that to be compatible with Microsoft’s IIS web server, Node.js had to disable heartbeat functions, and therefore, it was never susceptible to the Heartbleed security breach that is causing all kinds of grief out there on the Internet. The 0.12 release sports improvements in the transport layer security (TLS) and cryptography module with PayPal showing client connections running 50 percent faster in early tests and support for dynamic tracing to debug Node.js programs. The update also includes Streams3, another revamp on the streaming I/O capability in the Node stack. Longer term, with the 1.0 release, Node.js will include a C API that will allow programmers coding in C or C++ to target the Node.js backend.