Data Sprawl: The Cloud Data Challenge

5 mins to read

We’ve updated our blog to include new stats on cloud data and the data sprawl challenge in our post Tackling Data Sprawl

The traditional security approach to storing and managing data is woefully inadequate for the demands of today’s cloud world. Why? The unstoppable forces of cloud and agile development are driving momentous changes in how enterprises build, deploy, and run applications. As such, today’s operational requirements demand a paradigm shift in the way data is managed and protected to prevent data sprawl.

Data sprawl (also called data spread/dispersal) is one of the biggest challenges to organizations that utilize cloud services. In fact, according to a study by CSC, only 33% of organizations are able to maintain a single view of their data across all clouds, and only 60% can securely share data between cloud providers.

To address the challenge of cloud data sprawl, organizations must implement a strategy for data control that will help to manage multiple applications and cloud storage providers from one place. In the cloud environment, it is especially imperative to have a data management plan in order to have a single view of data across all cloud environments and be certain that only authorized identities can see and use specific information.

Until data is under control, enterprises are likely to spend a great deal of time and money to mitigate the negative consequences. In fact, on average they could spend $16 million in five years just trying to manage data sprawl. To mitigate this risk, there are four very important considerations IT leaders should understand.

Traditional Data Centers Are Almost Dead

Gartner assumes that by 2025 80% of enterprises will have shut down their traditional data centers. In fact, 10% of organizations already have. Many organizations are rethinking the placement of applications, based on network latency, customer population clusters, and geopolitical limitations — for example, the EU’s General Data Protection Regulation (GDPR) or regulatory restrictions.

Enterprises with older data centers don’t want to rebuild them or build new ones due to high capital costs. They would rather have someone else manage the physical infrastructure. Gartner’s IT Key Metrics Data shows that the percentage of the IT budget spent on data centers has decreased over the past several years, and now accounts for just 17% of the total.

According to the 2020 IDG Cloud Computing Research, 92% of companies have adopted cloud technology. IDG analysis predicts there will be a continued aggressive shift to cloud technology, and predicts that in 18 months SaaS will be in use at 95% of companies, IaaS at 83%, and PaaS at 73%. Also predicted is cloud computing budgets are on the rise, as 32% of IT budget is expected to be allocated to cloud computing within the next 12 months.

Infrastructure and operations (I&O) leaders today face a daunting challenge. The IT they have known for decades is changing — radically. The traditional security methods that worked on-prem no longer work in the cloud.

Clouds and Cloud Accounts Multiply (a lot)

RightScale says that 80% of the companies in the cloud have adopted a multi-cloud strategy using multiple providers such as Amazon, Microsoft, Google, IBM, Oracle, and Alibaba. Also, the ease and benefits of creating cloud accounts ensure that having many AWS or GCP cloud accounts or Azure subscriptions is the norm. It is not unusual for enterprises to have hundreds and in some cases thousands of cloud accounts. It is not unusual for enterprises to have hundreds and in some cases thousands of cloud accounts.

We don’t yet know how much computing power is run across so many distinct clouds by each enterprise but let’s do some back-of-the-envelope math just for fun:

Most enterprises have several hundred AWS accounts and many have over 1000 (or even 10,000). I’ve seen numbers that state that 83% of Fortune 500 companies are consumers of the public cloud. If so, then this means there are approximately 130,000 enterprise-size AWS cloud accounts worldwide as of today which adds up to a massive number of instances/VMs running in AWS alone.

Chances are also very good that each company has at least one Google VM or Compute Engine instance too. I have seen numbers that state that 49% of the Fortune 500 are also users of Google Cloud Platform (GCP). I think it’s safe to assume that if an enterprise has an AWS account, then they have a GCP account. Being able to use multiple clouds is a way for companies to take advantage of some redundancy at the same time as scaling out their capacity whenever needed.

If we do some simple math: 130,000 x 2 = 260,000 total cloud accounts from just two providers by one set of estimates and possibly many more than that number. And these are just accounts, we haven’t even scratched the surface of the number of identities in your cloud.

Anyone reading this may also have a good guess at how many cloud accounts are from their own company, so let’s not even try to estimate the number of identities that employees of a single organization may have access to. We’ll just say “make it an even one million”.

Innovation Spawns Many New Data Stores

Gone are the days of a limited selection of manageable data stores (e.g. Oracle, IBM, and MS SQL).  Innovations in agile cloud development have led to an explosion of new data store options, with teams utilizing Amazon MongoDB, Elasticsearch, CouchDB, Cassandra, Dynamo DB, HashiCorp Vault, and many, many more. Adding these to object stores like AWS S3 and Azure Blob makes it self-evident that new corporate infrastructures do not have a physical or logical concept of a ‘data center.’

Ephemeral Compute Pours Over Your Data

With container orchestration, the typical lifetime of a container is 12 hours. Serverless functions – already adopted by 22% of corporations  – come and go in seconds.  Data is the oil of the digital era, but in this era, the oil rigs are ephemeral and countless. EC2 instances, spot instances, containers, serverless functions, admins and agile development teams are the countless fleeting rigs that drill into your data.

Our New World Has a Data Control Problem

Is it any wonder that 53% of organizations using the cloud have exposed data online? While all this flexibility and agility is good for innovation and flexibility, it poses new challenges. In particular, we have a problem of tracking and controlling cloud data and what has access to it.

A small sample of well-publicized cloud data loss incidents are listed on our Breach Watch page, but we can be confident that the data control problem is set to grow as rapid adoption of cloud platforms and new development techniques continue unabated.

Of course, we don’t have just a security problem. Compliance mandates like PCI, HIPAA, Europe’s GDPR, California’s CCPA, Brazil’s LDPD, Canada’s PIPEDA and many more demand extensive security controls for data and audit/reporting of these controls.

Existing Tools Don’t Manage Cloud Data Sprawl

As cloud and agile has launched into the mainstream, traditional data center and network management tools are stuck in the past. Practitioner-focused cloud provider tooling is also not up to the task. Markedly different identity and data models exist within AWS, Azure, GCP, and other cloud providers.  Access control lists, inline policies, group inline policies, role inline policies, assumed roles, switched roles federation, and managed policies all have an impact on what has access to a resource. We must understand multiple cloud provider identity and data models and track access to third-party data stores used within the major AWS, GCP and Azure cloud platforms. Where can a company turn when it needs to track data access and movement across multiple clouds, tons of cloud accounts and thousands of data stores?

Sonrai Security Can Help

Sonrai Security provides a complete risk model of all identity and data relationships, activity, and data movement across cloud accounts, cloud providers, and third-party data stores.  Sonrai is built on the cloud for the cloud. The service focus is data+identity of all entities that have access to data – across cloud providers and third-party data stores.  Lastly, the Sonrai bridges between DevOps and Security teams to:

  • Improve Data Security and Reduce Risk.  User configuration risks and public data exposure risks are all reported across cloud providers, accounts, countries, teams, and applications.
  • Ensure Compliance.  Data sovereignty, data movement, and identity relationships are all monitored to ensure conformance to sovereign, GDPR, HIPAA, and other compliance mandates.
  • Increase DevOps Efficiency.  Cloud provider management models are normalized with centralized analytics and views across hundreds of AWS & Google Cloud accounts and Azure subscriptions/resource groups.