The 2nd Tenet of Big Data: Run It On An Open Hybrid Cloud

Big Data Is More Than Hadoop

Many people today associate big data with Hadoop analytics – and, to be sure, Hadoop is an important technology for this. However, just as the Linux operating system is so much more than just the Linux kernel, a big data environment is so much more than just a Hadoop cluster.

In fact, from a Red Hat perspective, you need three primary things for a robust big data deployment:

  • Big data infrastructure
  • Big data analytics tools
  • Big data application platform

Red Hat is working to deliver enterprise big data solutions that integrate these three areas across an open hybrid cloud. In this post, I’ll focus on how Red Hat plans to deliver a scalable, cloud-based big data infrastructure.

Big Data Infrastructure

Big data infrastructure needs to provide scalable compute and storage infrastructure, and an open hybrid Infrastructure-as-a-Service (IaaS) cloud provides an ideal architecture for this. Here are the elements Red Hat is building out to run big data workloads in the cloud:

Scalable Storage

Spinning up compute capacity in the cloud is important to big data – and I’ll explain more about this below – but first and foremost, big data requires scalable data storage that grows alongside compute.

Red Hat Storage provides scale-out storage that can extend into an open hybrid cloud. Leveraging the GlusterFS distributed filesystem, here is how it does so:

  • Red Hat Storage is a pure software solution that runs on top of standard Red Hat Enterprise Linux with the XFS filesystem. This means Red Hat Storage can run anywhere Red Hat Enterprise Linux runs—including across physical systems, virtualized infrastructure and private or public clouds
  • Red Hat Storage provides a global namespace, even across multiple data centers and across hybrid clouds. This allows a hybrid cloud in which virtual machines in a public cloud can operate on the exact same data as virtual machines in a private cloud
  • Red Hat is working on a Red Hat Storage Hadoop plugin that it will contribute to the Apache Hadoop community. As a result, big data workloads with Hadoop analytics will be able to leverage Red Hat Storage as the underlying data store and span across hybrid clouds

Scalable Compute

Red Hat is also a leader in the OpenStack IaaS project and is working to deliver an enterprise OpenStack distribution to market (currently available as preview to anyone with a Red Hat Enterprise Linux subscription). OpenStack aims to provide the ability to build a large private cloud that can host big data compute workloads. As big data compute needs of an organization grow, OpenStack will be able to elastically expand cloud-based computing capacity through the provisioning of new virtual machines.


In order for OpenStack compute capacity to adjust dynamically according to big data needs and policy, though, it needs cloud operations management tools. Red Hat’s recent acquisition, ManageIQ, provides these capabilities. ManageIQ includes rich monitoring and analytics tools to determine what is happening to cloud infrastructure. For example, it can determine when a particular cloud provider is saturated in certain resources. ManageIQ also includes the ability to create policies and provides orchestration tools to automate responses to events and policies.   As Red Hat introduces its OpenStack product to market, it is also working to add support for OpenStack to ManageIQ.  Combined, these capabilities will enable an enterprise to leverage ManageIQ’s features to auto-flex OpenStack-based capacity for big data computations.


As large as today’s data centers are, a single one is often not enough for for big data workloads. Data can also reside in more than one place—requiring that associated computing does as well. As a result, many enterprises span multiple data centers as well as private and public clouds. Red Hat’s CloudForms product aggregates multiple, disparate providers into uniform hybrid clouds. By leveraging CloudForms on top of OpenStack as well as public clouds, enterprises can deploy a big data compute platform that scales, not just within one OpenStack deployment, but across an entire hybrid cloud spanning multiple data centers. Furthermore, because CloudForms aggregates capacity across a variety of different cloud technology providers such as Red Hat, VMware, and Amazon AWS, enterprises can use both existing and new compute capacity without being locked into a single technology provider or platform. Red Hat is in the process of integrating ManageIQ and CloudForms into a next-generation version of CloudForms. This single cloud management platform is designed to be able to aggregate and operate across open hybrid clouds in one interface.


Open Hybrid Cloud Infrastructure for Big Data

Now let’s bring it all together. Here’s how Red Hat plans to bring its scalable compute and storage capabilities together in one open hybrid cloud:

  • Because Red Hat Storage can run in a virtual machine, we can make it available as a resource both in OpenStack and in a public cloud
  • As CloudForms and ManageIQ orchestrate the scaling out of compute capacity in an open hybrid cloud, they can simultaneously do so for storage capacity as well by spinning up additional virtual machines running Red Hat Storage
  • All this compute and storage can work seamlessly together across data center and firewall boundaries, because Red Hat Storage provides a global namespace


Big Data: An Ideal Workload for Open Hybrid Cloud

Big data, by its very nature, requires big, scalable infrastructure to run. An open hybrid cloud that spans multiple resource providers in private and public clouds, while simultaneously scaling out both compute and storage capacity, provides an extremely powerful platform for big data workloads. Red Hat is focused on delivering this type of infrastructure for enterprises to run big data—and all their other workloads—across an open hybrid cloud.

In follow-up posts, I’ll discuss why an open hybrid cloud makes sense for big data analytics and big data application platforms and how Red Hat is working to deliver those as well.

3 Responses to “The 2nd Tenet of Big Data: Run It On An Open Hybrid Cloud”

  1. I see a lot of interesting content on your blog. You have to spend a lot
    of time writing, i know how to save you a lot of time, there is a tool that creates readable, SEO friendly posts in couple of minutes,
    just search in google – k2 unlimited content

  2. I read a lot of interesting articles here. Probably you spend a lot of time writing, i know how to save you a lot of time, there is an online tool
    that creates unique, SEO friendly articles in seconds, just search in google – laranitas free content source


  1. The 7th Tenet of Hybrid Cloud: An Open Hybrid Cloud Requires Open Hybrid Storage | - September 10, 2013

    […] Data Optimized: One of the most prominent workloads in cloud today is big data. Big data is ideally suited for an open hybrid cloud because it requires scale-out storage, compute, and applications—all of which the cloud can […]

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

%d bloggers like this: