The NoSQL Heart of Telco Messaging
You probably haven’t heard of Openwave Messaging. But if you get email, voice mail, or other messaging services from a tier-one telecommunications firm, chances are fairly good that you have used its white-label messaging products at some point. And as a result of Openwave’s recent standardization on the Apache Cassandra database from DataStax, you are likely accessing a NoSQL database, too.
Openwave Messaging doesn’t call a lot of attention to itself. It is contractually obligated not to talk about the large telecommunications firms that it supplies messaging software to. Suffice it to say, about half of the tier-one telcos in the world use its messaging software, or are in the process of becoming customers, according to Darshan Rawal, vice president of engineering at the Northern California company.
The messaging industry is undergoing a transformation at the moment thanks to the explosion of smartphones and the accompanying desire it has spurred among consumers to share photos, videos, music, and other types of files among their various devices. When little Lucy Ann takes a picture of her cat, she may post a picture of it on Instagram, text it to her friends, and then email it to herself so she can print it out from the desktop.
All these file movements demand resources of the messaging infrastructure of Lucy Ann’s mobile phone provider. Several years ago, the company may have outsourced the ownership and operation of that messaging layer to a third-party service provider under the assumption that it was a cost center and it didn’t add anything of value to the business. In today’s environment, that customer relationship is a valuable commodity, which is causing telcos to bring those messaging systems back in house in a bid to “own” those valuable customer relationships.
The problem is, those cat pictures are really piling up, and that’s putting stress on the underlying architectures of the messaging systems. In the case of Openwave, the company decided it needed to rethink its messaging architecture, which has always used a mix of database technologies, including Oracle, MySQL, and various object data stores, to achieve the new scalability requirements without hurting response time.
Previously, Openwave’s enterprise offerings were based on a two-tier system. The upper layer was built on a lightweight data store that was used to house structured metadata that described the messages (such as header information, portions of messages, and counters), while the lower layer was built on an object data store that was geared toward storing the unstructured data and the binary large objects (BLOBs) that contained the message payload, i.e. pictures of Lucy Ann’s cat.
Openwave built this two-tier system and exposed it via APIs to customers, who would then offer it to end consumers. Often times, customers would have their own existing BLOB storage, such as a multi-petabyte SAN array, so Openwave would be flexible to work with anything. About four years ago, it started using Apache Cassandra to serve the metadata component of messages.
As its customers messaging demands grew, Openwave devised a new “stateless” architecture that could expand as needed without downtime. About 18 months ago, Openwave rolled out a two-tier architecture that utilized Apache Cassandra in a redundant ring architecture using standard X86 servers and superfast solid state drives, which would ensure that consumers had very fast access to their inboxes. Each node in a typical Openwave deployment had about 24 to 30 TB of metadata stored in NoSQL on SSDs and about 180 TB of BLOB data stored on traditional rotational disks.
The problem was, those SSDs were very expensive, and they could only store so much data. “Everything was in the BLOB, and if you wanted to do any interesting things, you had to go and touch the BLOB,” Rawal explains to EnterpriseTech.
So Rawal and his colleagues went back to the drawing board and devised a third-layer in the architecture. The new three-tier architecture still uses NoSQL and SSDs for the hottest metadata and an object-based storage device (it could be EMC or Amazon S3) for the coldest BLOBs. But in between them, Openwave added a second metadata layer that utilized NoSQL running on rotational media.
Using this new three-tier architecture, the density of metadata per node went from 300 GB to 20 TB, while the traffic (measured in IOPs) dropped by 50 percent. “We don’t touch the BLOB as often. Instead we touch that middle-tier quite often,” Rawal says. “So the cost drops dramatically for the entire solution. We have achieved a constant user experience, decoupled from the storage size, while actually reducing the cost of the entire solution.”
In addition to keeping costs down, that middle tier also plays an important role in enabling Openwave’s customers to extract useful information about its customers’ activities. Because of tight regulations, telecommunications firms are not allowed to access the content of emails and other messages. They are not allowed to touch those BLOBs. (Internet-based firms such as Google and Yahoo, by the way, do not generally face the same restrictions against accessing the content of private messages, but that is another story.)
Instead, Openwave’s customers are figuring out interesting things about their customers by accessing the metadata, such as the names of customers, who are the most active users, who users are sending messages to, and when. This can enable Openwave’s customers to create social graphs, in effect, that could be potentially useful for marketing purposes, and also to reduce churn. “Different customers use it for different purposes, and it also varies by region. But in general, there’s a lot of information in that middle tier,” Rawal says. “There’s a lot of value in the header information.”
Several years ago, it would not have been possible to do all of this – ramp up storage density while keeping costs down and maintaining the valuable customer information – on Cassandra. “We worked with DataStax a lot to make sure that Cassandra matured,” he says. “We watched all the points where it needed to mature to the level where it can be deployed at a tier-one telco. Over the years, both our stateless platform and Cassandra have matured.”
Because it is a core component of an application that aims for a 99.999 percent uptime level, Openwave keeps a close eye on Cassandra. “We actually look at Cassandra code pretty often,” Rawal says. “Anytime marketing materials comes out, one of my engineers will go and look at the code, and he may say, ‘No I can’t use this feature because of this or that.’”
Cassandra is not great at all things. Counters, he insists, are Cassandra’s Achilles’ Heel. “Look at it like your ATM machine,” he says. “If you deduct $20 and it doesn’t deduct 20, you are out of business fast. That’s the level of resilience that our customers demand.”
The work with DataStax and Cassandra is paying off. The first customer went live with Openwave’s new three-tier messaging system in late November, and the company already has nearly ten more tier-one telecommunication companies lined up to implement the solution in 2014. “That’s the biggest barometer, that the tier-one customers are lining up,” Rawal says. “They usually don’t line up quickly. They’re very apprehensive. The amount of traction we’re seeing in the field is quite high, frankly.”