Advanced Computing in the Age of AI | Thursday, March 28, 2024

Big Data at Netflix: Testing, Testing, Testing… 

Netflix is a company riding, you could say driving, the streaming internet video wave. Much of that wave is founded on Open Source big data analytics in the cloud at scale – at extreme scale. Five years ago, when Brian Sullivan, director of streaming analytics at Netflix, joined the company, it was still focused on what now seems a primitive business model: mailing DVDs to a U.S.-only customer base.

While the company still has a mail-order remnant, Netflix is nearly completely converted to streaming internet video, a company with an enormous title catalogue and a successful content creator (“House of Cards,” “Orange Is the New Black,” etc.). In addition, Netflix has rapidly expanded 81 million customers in 190 countries (excepting the most populous one, China: “We’re still trying to work out some of the details” – Sullivan), the first internet television network. Driving this growth is a “bias to action” ethos that, as the company has embraced new data technologies, pushes managers like Sullivan and his team to innovate based on continual experimentation and a test-and-adopt discipline.

Speaking at the recent Apache Big Data conference in Vancouver hosted by The Linux Foundation, Sullivan shared the outlines of Netflix’s data analytics-driven methodology, which has as much to do with technology as it does its “freedom and responsibility” culture, whose aim is to preserve an aggressive, self-directed and loosely coupled entrepreneurialism in a company that’s grown to more than 3,500 employees (and $6.7 billion in revenue). The Netflix ethic is to continually disrupt: supersede the old neighborhood video store / Blockbuster model, then the Redbox retail kiosk model and then, finally, its own DVD-by-mail model.

Brian Sullivan of Apache

Brian Sullivan of Apache

“We are truly are a data-driven organization,” Sullivan said, “It’s in our blood. Any time anyone proposed a change to the product, whether it’s to add a feature or to simply improve functionality, we test it. We’ve built a really robust experimentation framework and we use it to analyze any change we make to the product, live and in production, with a subset of our users. And since we’re talking about such a large population of our user base to experiment on, we’re able to do this in a very rigorous fashion with our big data (capabilities). This allows us to iterate quickly through experiments in parallel and talk intelligently about the outcomes.”

Changes are adopted if they meet two criteria: “If a change doesn’t specifically move the needle on a key metric, like retention, we typically don’t move forward with the change.” In addition, changes can only cause minimal dislocation. “Our goal is to improve the product but also keep it as simple as possible. This allows fewer barnacles to develop, which would slow things down for any future innovation.”

Netflix is committed to Open Source, with many Apache ecosystem products in store. Cassandra is critical to Netflix operational systems, Kafka is the backbone of its data event pipeline, and much of the analytics is done using Hadoop with a mixture of Pig, Hive and Presto, with increasing use of Spark. In addition, Netflix is committed to the cloud: Amazon Web Services.

It's an infrastructure that operates at scale: delivering and supporting 125 million hours of streaming video per day (more than a third of North American internet traffic); from a data standpoint, Sullivan said, that’s about 600 billion daily events. The Netflix data warehouse contains roughly 40PB and processes 3PB of data on a daily basis, adding about 300TB of net new information.

“What’s interesting about our ecosystem is because we’re big believers in cloud, we use AWS, so that allows us to put our data on S3. It’s a natural place for our data LAN because this is where our service systems are. It’s one central repository where we can spin up multi Hadoop clusters on top of. That allows us to separate our compute layer from our storage layer, and it lets us custom scale different clusters if we distribute our processing a little, or do things like upgrades of Hadoop or even introduce new tools, like Spark.”

Netflix is driven by both the expansion and retention of its customer base. Sullivan said the company has the advantage of a customer-centered model undiluted by the contradictions built into the model of other large internet companies, some of which sell their customer data, or customer lists, or include advertising in their content.

“We have a holistic relationship with our customers,” Sullivan said, because the single source of Netflix revenue is customer subscriptions. “The dual nature of some of the other companies’ (business models) leads to potential conflict in product development – how to introduce ads but not make them so annoying that users go away. We don’t have to worry about that. We have a really central relationship with our subscribers because they are effectively giving us money to steam video and we can make that product better, and if we do a really good job at innovating our product then people retain their subscriptions, and if we don’t do a great job people will cancel.”

The alignment between Netflix and its customers, Sullivan said, extends to how it licenses content. Subscribers can watch a given piece of content as many times as they would like. “When we buy rights to stream a particular title in our catalog we’re buying that title for a period of time, not on a per-play basis, like music rights are. We want you to consume as much of Netflix as you can or you want to, and that shows me you’re engaged with the product. Not surprisingly, streaming usage is highly correlated with retention.”

He also said that from a content perspective, Netflix isn’t bound by some of the limitations of traditional broadcast television networks. “We’re not worried about a broadly appealing, small set of shows that go into a limited number of prime time slots. We can think about a broader catalog that we know will be enjoyed by a wide spectrum of people, each one with differing tastes, and they can watch exactly what they want to watch. We don’t have to worry the ‘lowest common denominator’ problem.”

With its relatively straightforward customer relationship in place, Netflix goes about using data analytics to measure all aspects of customer interaction. This includes the user interface, which is constantly tinkered with. “We’ve experimented with all sorts of tweaks on the layout, what types of metadata we show for any given show, improved feature like search, added features like user profiles, and even down to the images that we use in the UI. The idea is certain images will pull more people into stream more.”

He said regions and countries respond differently to various designs – in some, print is either more or less of a draw than images, for example. “We were able to measure this in a statistically significant fashion in our subscriber base. The idea is to keep more people engaged and really finding quickly the thing they want to watch within the product, and that has a virtuous cycle of feedback into enjoyment of the product. It’s interesting to learn more about our users and to personalize off of that.”

Sullivan said he and his team also continually refine their predictive analytics capabilities. “We can tune our recommendations based off of explicit signals, like user ratings, and we can also look across broad patterns across our ever changing user data. We evolve our algorithms using this changing set of features and models to get the right titles in front of the subscribers, ideally predicting exactly what they want to watch.”

He broke down the infrastructure in several parts, starting with the streaming platform, which resides under the UI, and is supported by four engineering groups. On top of that is AWS, which builds the dozens of server applications used by Netflix and delivers content to the device ecosystem (phones, tablets, smart TVs, game consoles, etc.). Device usage data is analyzed “and that allows us to be smarter about how we partner with the device partners – Sony, Samsung, Apple, Google, Microsoft.”

In the middle of the infrastructure, Sullivan said, is the streaming client, a source of telemetry from the subscriber’s device that provides data about the actual playback experience – how much time it takes for videos to start up, the quality of image resolution and how often subscribers encounter rebuffers and playback failures. This is accompanied by data about the performance of the delivery network used to distribute the enormous amount of Netflix in a scalable, robust, and cost-effective fashion.

Netflix also measures an aspect of content delivery beyond its control – ISVs. The company publishes the Netflix “ISV Speed Index,” a monthly global statistical breakdown of content download performance.

“We think about the response time of servers, we think about our cloud utilization in AWS, both from a performance and a cost standpoint, we think about the availability of our service,” Sullivan said. “It should look and feel like DVD content, you hit play and it should be on. We measure our rate of innovation. Normally you don’t want to touch something if you want to keep it from breaking. We want to change it constantly and we want to be highly available, so we stretch that muscle all the time.”

EnterpriseAI