Advanced Computing in the Age of AI | Tuesday, March 19, 2024

Oil Explorer Strikes Black Gold With 20 PB Data Reservoir 

The conventional wisdom at large enterprises that deal with truly enormous datasets is that they can't keep it all online. At some point, because of the high cost of disk storage, the argument goes, some of the least-frequently used data has to be pushed off to massive tape libraries and managed by archiving and retrieval software.

But what does a company do when its data doesn't age? After years of doing it the conventional way, oil and gas exploration and drilling company Apache Corp eventually decided to just keep everything on disk – and that means everything, forever – and remove tape from the equation.

The oil exploration and drilling business is a tough one. Speed and accuracy are everything. While Apache is by no means a small player, with $16.5 billion in sales of oil, gas, and liquefied natural gas in 2013 and an $8.5 billion capital budget for exploration, drilling, and other activities around the world for 2014, the company is nonetheless up against some very truly immense rivals such as ExxonMobile, BP, and Royal Dutch Shell. The name of the game for Apache for the past several years has been to squeeze more oil out of fields with horizontal drilling, hydraulic fracturing, and other techniques that were not economically feasible for those fields until now. From an IT standpoint, what this means is that Apache leans heavily on its team of 200 geoscientists to figure out the best – and cheapest – ways to work its onshore and offshore fields around the globe.

When Bradley Lauritsen, director of exploration applications, joined Apache a dozen years ago, the company was a Sun Microsystems shop, running applications on the Solaris operating system atop Sparc servers and linking out to various Sun storage. "It was tough to administer," Lauritsen recalls. "And once we got our hands on NetApp, it was so much simpler. It was set it and forget it."

From that initial experience with NetApp, Apache started shifting to cheaper disk arrays and this is where the company's current strategy of keeping everything online got its start. When NetApp was getting ready to launch the NearStore R200 ATA-based virtual tape library, Apache wanted to experiment and use it as primary storage.

Bradley Lauritsen, director of exploration applications, Apache Corp

Bradley Lauritsen, director of exploration applications, Apache

"They were very scared about that," says Lauritsen. "No one knew what the reliability was going to be and NetApp had such a high reliability rating and they wanted to stand behind that. We wanted an inexpensive solution and better quality, and to get the density and not have to go to Fibre Channel SANs. So they suggested that if we get a machine, we should get a second one and mirror it so we would not lose data."

Apache had lost data on the earlier Sun gear – a mere 10 TB at the time – but it took weeks to get that data back from tape, and the geoscientists were upset because of the downtime. The same issue holds today because the datasets have grown larger, outpacing tape technology as it has advanced in terms of capacity and bandwidth.

"Basically, our data is on the shelf forever and it never expires," says Lauritsen. "We have looked at lots of different inline archive solutions, but we found that as soon as we tried to archive data off, then the analysts want the data back because something new has come out and they can look at the data differently and now it has new value. Because of that, our approach has been that we keep everything online. A lot of naysayers have said that this approach is going to come back and bite us one day, but we have been running over ten years like this and the technology has kept up."

Having all data online instead of archived to nearline tape storage does mean making multiple copies of the data in different facilities for safe keeping, and this obviously does carry a cost.

This is one reason why Apache now has over 20 PB of files across its various datacenters, which in aggregate have a couple of thousand servers that pull data off them and chew on it for seismic processing (trying to create images from sonic waves projected down into the Earth), reservoir modeling (trying to figure out from the images where the oil and gas might be lurking), and well placement (figuring out where to put the well and where to move it to when it comes up dry).

For the past several years, the company has operated its main datacenter in its Houston, Texas headquarters with a co-location facility across town as a warm backup. These centers tend to have NetApp 6280 and 6290 filers that are maxxed out with flash cache and running multiple 10 Gb/sec ports each.

"If we could get 40 Gb/sec ports, we would be running those, but they haven't given them to us yet. We have been asking for it," says Lauritsen with a laugh.

That Houston co-location facility is being gradually converted into a production datacenter, and a new backup warm site is being setup further away in Dallas. That is far enough away that hurricanes can't cause trouble but close enough to keep the latencies between the datacenters reasonable. Apache takes local snapshots of data using the Data ONTAP storage server that runs on the NetApp arrays and uses SnapVault to replicate data across arrays located in various datacenters.

Each drilling operation has its own datacenter and operates independently, with geoscientists being given Linux and Windows clusters with a couple of hundred nodes each to run their applications. Apache has the Landmark seismic imaging and reservoir modeling applications from Halliburton and the Petrel suite from Schlumberger available to the geoscientists. Landmark modules tend to run on Linux, and Petrel is moving more and more to Windows, says Lauritsen. And over time, because of the new features and algorithms available in Petrel – including support for coprocessing using Nvidia GPUs – the geoscientists increasingly prefer Petrel.

The initial adoption of the Petrel applications more than a decade ago coincided, more or less, with the move to NetApp for storage, and for one simple reason. As soon as the geoscientists got their hands on Petrel, they started generating more data as well as buying data from third party suppliers such as Western Geco (now a division of Schlumberger) and doing more things with it. And capacity requirements skyrocketed by 700 percent annually for a while.

The pace has slowed a bit since then, and now what geoscientists want as much as capacity is speed. Loading up the NetApp arrays with flash cache on the front end of the disks has reduced the time it takes for projects to load from around 20 minutes to less than five minutes. Apache uses plain-old NFS or CFS/SMB protocols to access its files; nothing fancy like Lustre or Gluster.

In general, the NetApp storage arrays are on a three year cycle, and Apache buys three years of maintenance up front. The company not only continues to look at inline tape archiving solutions as an alternative to keeping all of its seismic and reservoir data on disk, but has looked at parking some of its data out in the cloud, just to make sure it is still on the right track.

"We looked at cloud," Lauritsen explains. "We have had everybody in and we have done pilots with everybody. What we found is that we can buy the storage, put it into our Dallas co-lo facility, and do it ourselves cheaper than going to a cloud provider – and we have complete control over that data. We could not make the numbers work out. Two years was kind of the breakeven point. We can buy a filer and three years of maintenance, which is what we normally do, and compared to cloud, it was like we got the third year for free if we did it ourselves. That's a pretty big chunk of money with our size of storage."

An Apache oil rig in the North Sea

An Apache oil rig in the North Sea

Tape has not been completely eliminated from Apache, at least not yet. While the company did not have an active archive with tape libraries, it does keep backups of its most recent datasets on LTO6 tapes, which it then archives offsite. It can take weeks and months to back this data up, and it is only kept for five years. Apache has datasets that go back to the 1980s, and they are still used because squeezing more oil out of an existing field is often as profitable as going after a new field.

It is, of course, good to do both, as Apache has done. Apache discovered a new oil field in Egypt in the 1990s, which has been developed for production in recent years, and the company continues to work the existing fields in Texas, Oklahoma, and the North Sea.

The regional datacenters in Egypt and the North Sea tend to have larger server clusters because, unlike the onshore oil fields in North America, the geology is not as well known and therefore more seismic imaging needs to be done. The clusters in the Egyptian datacenter in particular are a bit heftier because it is hard to get a good, clean seismic image due to the geology of the region. The ground is limestone and other hard rock for miles down, covered in sand, and this is a challenge for making sonic waves go through the rock and echo back.

The datacenter in Midland, Texas supports the Permian oil basin; the one in Tulsa, Oklahoma supports the Anadarko basin; the one in Aberdeen, Scotland supports the offshore drilling in the North Sea; and the one in Calgary, Alberta supports the Canadian tar sands drilling operations. There is another datacenter Australia to support drilling operations there, and Apache just divested its Argentina operations.

The seismic imaging systems are beefy BladeSystem BL460 blade servers and SL6500 hyperscale servers from Hewlett-Packard with InfiniBand networks linking them together. Reservoir engineering and similar workloads are compute-intensive but less I/O is needed so they are running over 10 Gb/sec Ethernet networks and, interestingly, they are running on Cisco's UCS blade servers.

"UCS has been very impressive and very easy to maintain," says Lauritsen. "So we have been leaning more and more to the Cisco side. We use Nexus switches on the backend, and the way we can get 80 Gb/sec into the blade enclosure and split that amongst the blades using virtual ports really simplifies that environment. The price point is there, and they are being very aggressive. They can deliver them fast, and we have been happy with that whole environment."

In general, the servers used by Apache stay in the clusters for a year or two before they are swapped out for the latest iron. The old machines cascade down to back office functions in the Apache datacenters. For the technical workloads, the company is always trying to cram the most cores into its enclosures, and is looking at upgrades to the "Ivy Bridge-EP" Xeon E5 v2 chips with ten and twelve cores per socket. The shopping list at Apache's IT department also includes fat memory machines based on Intel's new Xeon E7 v2 chips, which will be used for parts of the reservoir modeling workload and to support virtual workstations. GPU acceleration for specific workloads is also on the list.

"On the compute side, reverse time migration in seismic analysis is one Petrel application that can take good advantage of GPUs," explains Lauritsen. "It can take days, weeks, or months to run one of these jobs, and GPUs speed it up considerably – something on the order of 10X to 100X. The reservoir simulation side is I/O intensive and takes a lot of CPU, but up until now hasn't been written to take advantage of the GPUs. The new Intersect high-resolution reservoir simulator that Schlumberger just started pushing is coded to take advantage of Nvidia GPUs, so it is possible that we will see that."

The big issues that Apache is facing are the same ones that other large-scale datacenters are wrestling with, and in its case, storage comes first to mind.

"We always need higher density storage," Lauritsen continues. "Thankfully, disk densities continue to increase, and NetApp is offering 48 disks in a shelf instead of 24, and that has helped. Beyond that, datacenter power, cooling, and space issues are always a challenge for all of our sites, especially here in Houston, where the footprint is growing really fast. Space is being gobbled up as soon as it becomes available."

EnterpriseAI