The Challenge of Cloud Data Sprawl and the Need for Data Control

3 mins to read

Editors note: This blog has been updated to meet contemporary data and identity concerns and evolved Sonrai Security solutions. You can read this new version here.

The unstoppable forces of cloud and agile development are driving momentous changes in how enterprises build, deploy, and run applications. These shifts have many implications, including cloud data sprawl , and these four very important ones.

Traditional Data Centers Are Dead, and Cloud Is the Killer

According to the 2018 IDG Cloud Computing Research, 73% of companies have adopted cloud technology, with executives driving their companies to be 100% in the cloud. IDG analysis predicts there will be a continued aggressive shift to cloud technology, and predicts that in 18 months SaaS will be in use at 95% of companies, IaaS at 83%, and PaaS at 73%. Gartner assumes that by 2025 80% of enterprises will have shut down their traditional data centers, compared to 10% today.

Clouds and Cloud Accounts Multiply (a lot)

RightScale says that 80% of the companies in the cloud have adopted a multi-cloud strategy using multiple providers such as Amazon, Microsoft, Google, IBM, Oracle, and Alibaba. Also, the ease and benefits of creating cloud accounts ensure that having many AWS/GCP cloud accounts or Azure subscriptions is the norm. It is not unusual for enterprises to have hundreds (and in some cases thousands) of cloud accounts.

Innovation Spawns Many New Data Stores

Gone are the days of a limited selection of manageable data stores (e.g. Oracle, IBM, and MS SQL).  Innovations in agile cloud development have led to an explosion of new data store options, with teams utilizing Amazon MongoDB, Elasticsearch, CouchDB, Cassandra, Dynamo DB, HashiCorp Vault, and many, many more. Adding these to object stores like AWS S3 and Azure Blob makes it self-evident that new corporate infrastructures do not have a physical or logical concept of a ‘data center.’

Ephemeral Compute Pours Over Your Data

With container orchestration, the typical lifetime of a container is 12 hours. Serverless functions – already adopted by 22% of corporations  – come and go in seconds.  Data is the oil of the digital era, but in this era, the oil rigs are ephemeral and countless. EC2 instances, spot instances, containers, serverless functions, admins and agile development teams are the countless fleeting rigs that drill into your data.

Our New World Has a Data Control Problem

Is it any wonder that 53% of organizations using the cloud have exposed data online? While all this flexibility and agility is good for innovation and flexibility, it poses new challenges. In particular, we have a problem of tracking and controlling cloud data and what has access to it.

A small sample of well-publicized cloud data loss incidents are listed in Figure 1, but we can be confident that the data control problem is set to grow as rapid adoption of cloud platforms and new development techniques continue unabated.

BleepingComputer Hadoop Servers Expose Over 5 Petabytes of Data
ZDNet Yet another trove of sensitive US voter records has leaked
TheRegister Et tu Accenture? Then fall S3er: Consultancy giant leaks private keys, emails and more online
TechGenix Another AWS configuration error exposes Dow Jones customer data
HealthcareITNewsHackers are ransoming 26,000 unsecured MongoDB databases, security researchers find
ForbesMassive WWE Leak Exposes 3 Million Wrestling Fans’ Addresses, Ethnicities And More
CNNBusinessVerizon data of 6 million users leaked online
cyberscoopBooz Allen Hamilton leaves 60,000 unsecured DOD files on AWS server

Figure 1: Published cloud breach incidents

Of course, we don’t have just a security problem. Compliance mandates like PCI, HIPAA, Europe’s GDPR, California’s CCPA, Brazil’s LDPD, Canada’s PIPEDA and many more demand extensive security controls for data and audit/reporting of these controls.

Existing Tools Don’t Manage Cloud Data Sprawl

As cloud and agile has launched into the mainstream, traditional data center and network management tools are stuck in the past. Practitioner-focused cloud provider tooling is also not up to the task. Markedly different identity and data models exist within AWS, Azure, GCP, and other cloud providers.  Access control lists, inline policies, group inline policies, role inline policies, assumed roles, switched roles federation, and managed policies all have an impact on what has access to a resource. We must understand multiple cloud provider identity and data models and track access to third-party data stores used within the major AWS, GCP and Azure cloud platforms. Where can a company turn when it needs to track data access and movement across multiple clouds, tons of cloud accounts and thousands of data stores?

The Sonrai Security Cloud Data Control Service

The Sonrai Security Cloud Data Control service provides a complete risk model of all identity and data relationships, activity, and data movement across cloud accounts, cloud providers, and third-party data stores.  Sonrai CDC is built on the cloud for the cloud. The service focus is data+identity of all entities that have access to data – across cloud providers and third-party data stores.  Lastly, the Sonrai CDC service bridges between DevOps and Security teams to:

  • Improve Data Security and Reduce Risk.  User configuration risks and public data exposure risks are all reported across cloud providers, accounts, countries, teams, and applications.
  • Ensure Compliance.  Data sovereignty, data movement, and identity relationships are all monitored to ensure conformance to sovereign, GDPR, HIPAA, and other compliance mandates.
  • Increase DevOps Efficiency.  Cloud provider management models are normalized with centralized analytics and views across hundreds of AWS/GCP accounts and Azure subscriptions/resource groups.