Understanding Cloud Data Risk – Uncovering Data and Identities

4 mins to read

Data. The most fundamental construct of the enterprise today. We collect it from our customers. We buy it from third parties. We aggregate it in our data stores. We analyze it to drive business decisions. We train our algorithms with it. Our applications are useless without it.

So why, if data is the lifeblood of our organizations, is it so often relegated to the bottom of the list regarding protection?

Where We’ve Come From

When we look back at how we have historically ‘protected’ our data repositories, it is easy to see how not only has the technology changed, but so has the risk profile.

Not that long ago, the only data repository that the infrastructure security teams had to be concerned with were the massive, monolithic databases that held every customer record we cared about. More often than not, the database server was a massive, parallel-processing system that contained terabytes, if not petabytes, of stored information and procedures. These databases were so large and multi-functional that even normal redundancy actions like backups or duplication were impossible to achieve.

When developers requested samples of the database in order to test their applications, the extract files were often massive comma-delimited files that took days to ingest into the test/dev database instance. Basically, the process was so onerous, once the test database was built, that’s all the developers ever got.  More often than not, no one prioritized data masking, encryption, or any other control because, like its production brethren, the Dev environment was buried deep within the infrastructure, layers upon layers deep behind protective controls.

Data in the Cloud

In today’s modern DevOps shops, long gone are the massive pieces of hardware and the fiefdoms of the database administrators. Developers can readily duplicate entire containers, including applications, data stores, workloads, and identities, instantiating them again over and over, building as many environments as needed – without needing the assistance of a single database or system administrator. While this capability has the benefits of rapid application deployment and, hopefully, revenue generation, it comes at a potentially heavy price.

By their very nature, cloud environments have a significant number of data stores, many of which are related to development or testing, that are often abandoned. Additionally, data extracts, backups, and other critical information can be left unprotected, significantly increasing the overall risk of exposure to the enterprise. Adding to this complexity are the risks surrounding cloud identities and how the propagation of these identities promulgates the overall risk to cloud data. 

For many in the industry, ‘Data Classification’ is a phrase that typically evokes laughs, eye rolls, and downright anger. For years organizations have tried to implement data classification practices only to watch them repeatedly digress into a mess of mismanaged labels and inappropriate and ineffective controls.

However, today, Data Classification has taken on an entirely new appearance; one of contextual understanding of the data integrated with unprecedented visibility throughout the environment.

When we consider the way data is stored within cloud environments, these data stores can be interrogated to better understand the type of data stored, who has access to such data, and the last time it was accessed. In fact, by automating the analysis of the data, one can readily determine the overall risk to the data store in comparison to either the overall cloud environment or other similar data stores and workflows.

Unfortunately, understanding the sensitivity of the data is only the first part of managing cloud risk.

Because of the developer’s ability to easily and quickly replicate their entire application environment, the risk of over-privileged access is a constant and palatable risk. User, device, and application access, combined with the trust inheritance that is a ‘feature’ of most cloud implementations, it is extremely easy to have over-privileged users, machines, and applications who can access not only unauthorized resources within their specific containers but also across containers and even across clouds.

Inside-Out Security

The ephemeral nature of cloud workloads demands a new approach to understanding data-centric risk. One that does not rely on legacy vulnerability scanning, antiquated patch management, or ineffective user access controls. 

Read more about an inside-out security methodology here.

While the notion of ‘inside-out’ security is not new, the application of the concept has been drastically impacted by the migration to the cloud, containers, and serverless architectures. Since the model of having databases tied to monolithic hardware is no longer applicable, and the entire set of workloads, applications, and databases can be replicated with only a few keystrokes, being able to identify risky operations, users with excessive privileges, or unprotected sensitive data in near real-time becomes paramount.

The ability to easily identify sensitive data elements within the cloud environment is of critical importance as more services and applications are decoupled from their legacy bounds and moved to more modern cloud architectures. Understanding the risk to the data must take a primary role in driving how risk mitigation practices are developed. No longer can we just blindly trust our vulnerability scanners to tell us where and when to patch.

Mature DevOps organizations have clear visibility into what sensitive data is in each data store, which humans and machines have access to it, what vulnerabilities exist within the workload, and are able to determine the overall risk to that data at any given moment. 

Enterprise Action Plan

When considering a cloud security platform, one should pay attention to those who can not only identify where data stores are located but also be able to shed light on the kind of data contained within. Combining this knowledge with the visibility of identity access and inherited user trust layered over the actual system vulnerability is the only true way to understand the overall risk to your cloud environment.

Sonrai Security

The Sonrai platform’s ability to aggregate data, identity, and workload risk provides enterprises with a unique view of risk within their cloud environments. By leveraging a modern, inside-out methodology to identify risk by first focusing on the data, Sonrai provides a far more robust approach to risk and vulnerability management than what is achievable with legacy enterprise solutions.