Advanced Computing in the Age of AI | Wednesday, April 24, 2024

Big Data Benchmark Test Results Unveiled 

Initial benchmark results of for big data systems have been released in hopes of providing "verifiable performance, price-performance and availability metrics" for big data hardware and software, according to the head of the standards effort.

Raghunath Nambiar, chairman of the TPC big data benchmark committee and a distinguished engineer at Cisco Systems, announced in a Jan. 9 blog post that his group released results for 1, 3 and 10 terabyte scale factors. The TPC, short for Transaction Processing Performance Council, has come up with a new benchmark framework called TPCx, with the x being short for express.

The TPCx-HS test uses a throughput metric that takes the scaling factor and divides it by the time it takes to run the test in seconds and divides it into the scaling factor. Vendors will submit the cost of the cluster on which they run test. A relative price/performance metric is then published.

The TPCx-HS test used scaling factors to load progressively heavier workloads onto a cluster, starting at 1 TB, the baseline for the Terasort benchmark used by TPC as its foundation.

The 1 TB results reported this week were 5.07 HSph and $121,231/HSph. (HSph is a composite metric representing processing power.) See the 1 TB system configuration here.

At the 3 TB scale factor, performance came in at 5.1 HSph and $120,518/HSph.

The results for the 10 TB scale factor were 5.77 HSph and $106,524/HSph, TPC reported.

Nambiar said that the 1 TB scale factor requires around 6 TB of total disk capacity to run because it requires triple copies in the Hadoop Distributed File System and space is required for both the input and output data.

The TPC unveiled its industry standards big data benchmark, TPCx-HS, in August. The group said it was developed to provide verifiable performance, price/performance, availability as well as optional energy consumption metrics for big data systems. The spec enables measurement of both hardware and software, including Hadoop Runtime, Hadoop Filesystem API compatible systems and MapReduce layers, TPC said.

"This benchmark can be used to assess a broad range of system topologies and implementation of Hadoop systems in a technically rigorous and directly comparable, in a vendor-neutral manner," Nambiar explained.

Cisco is also citing the benchmarks to verify the performance of its Unified Computing System (UCS) infrastructure for big data.

Nambiar said the initial benchmark configuration consisted of Cisco's UCS Integrated Infrastructure for big data with two redundant active-active Cisco UCS 6296 Fabric Interconnects. They ran Cisco's UCS Manager version 2.2, 16 Cisco UCS C240 M3 Servers running Red Hat Enterprise Linux Server 6.4 and the MapR Distribution including Apache Hadoop.

The idea behind the new big data benchmark is to leverage the kinds of tests that vendors are already using to do their comparisons, then wrap a framework around them for the industry to follow. At the same time, the benchmark seeks to introduce a measure of rigor into the process by adding pricing information for the systems being tested.

EnterpriseAI