Building a Data Loss Prevention Strategy for Your Cloud

7 mins to read

Data breaches are becoming increasingly common and expensive. IBM estimated that the average data breach costs $3.86 million dollars. Organizations of all sizes and industries are at risk of this reality. In fact, 81% of enterprises and 93% of small businesses have adopted cloud technology, leaving them susceptible to cloud provider targeted malicious attacks. IBM’s Cost of a Data Breach: View From the Cloud Report found that organizations with greater cloud migration experienced higher costs after experiencing a data breach.

data loss prevention

Cloud data loss prevention (DLP) can help organizations mitigate the risk of data breaches. DLP solutions provide visibility into data flows and identify gaps in data protection. They help organizations to implement data protection controls, such as access controls and policies. By adopting a data loss prevention solution, organizations can reduce the likelihood and impact of data breaches.

Today, data is an organization’s most valuable asset, thrusting data security to the top of a CISO priority list. A significant solution leading in the data security market for the cloud is data loss prevention. All three of the top cloud providers, AWS, Azure, and GCP, offer data loss prevention solutions. Let’s explore what data loss prevention solutions are, what they offer, and they fit into a healthy cloud security program.

What is Data Loss Prevention in the Cloud?

Data loss prevention is now applicable to the cloud. It is the effort to secure and protect the data hosted in your public cloud platform. It aims to protect the confidentiality, integrity, and use of your data. Cloud providers include data loss prevention native tools in their platforms. They are useful in beginning your data loss prevention strategy or if you’re only working out of one cloud platform. Additionally, there are third-party cloud security providers who offer Cloud DLP solutions.

So, what does data loss prevention in the cloud look like? A DLP framework has four major areas:


To protect your data, you first need to know where it is, not where you think it is or where it should be. To do this, you need to continuously scan for data across all your cloud environments. This not only looks for new data but discovers when data appears in new places. A large part of the discovery process is doing a data inventory. You want a real-time picture of all the data in your environment.

what is data loss prevention

The key to preventing data loss is understanding which information would have the most damaging effect on your business. To do this, you must first identify what’s most valuable and sensitive; and understand its uses to an attacker or competitor.

Usually, this means focusing on intellectual property like design documents for future products and customer credit card numbers stored online without encryption protection (though there are other ways around these problems). Healthcare companies should put Medical Records at top of the list because they often store laptops containing such highly personal info; retailers may choose between prioritizing PCI-DSS and GDPR.


Not all data is created equal. You need to know not only where your data is but also what it is. Data classification analytics determine the data type, importance, and risk to the business. This context is key in helping you prioritize what is most important. Data classification looks like labeling your data with a ‘name tag’ and then a ‘value tag’. An example may look like DataClassification:Confidential or DataType:CustomerPII. These tags allow you to know this is highly-sensitive content and should be prioritized in protection.

Classifying data is a formidable challenge. A simple, scalable approach is to classify by context, associating a classification with the source application, data store, or user who created the data. Applying persistent classification tags to the data allows organizations to track its use. Content inspection, which examines data to identify regular expressions representative of social security and credit card numbers or keywords (e.g., “confidential”), is also useful and often comes with pre-configured rules for PCI, PII, and other standards.

Lock Down

Once you have a program in place allowing you to understand and detect security risks, the next step is actually protecting your data. This means stripping away access to your most sensitive assets. Data locating and classification enables your teams to work towards achieving Least Access. Least Access enforces data protection from the inside out, meaning, starting with the data, and working outwards to determine who and what can access it. Then stripping that access to only those that need it.

The next step for effective data loss prevention is to work with operations managers to create controls for reducing data risk. Data controls may be simple at the beginning of a DLP initiative, targeting the most common risky behaviors while generating support from operations. As the data loss prevention program matures, organizations can develop more granular, fine-tuned controls to mitigate specific risks to protect data over time.


Once you have visibility into your data, and enforce Least Access, you must continuously monitor data activity to detect anomalous access. Not all data movement represents data loss. However, many actions can increase the risk of data loss. Organizations should monitor all data movement. Periodic or sporadic auditing doesn’t cut it when Non-Person Identities are accessing your data for seconds at a time, at multiple moments a day. Audits include ensuring that basic platform settings like encryption and logging are enabled and understanding who or what has access to your data. Auditing starts with defining a secure baseline you have a comparison. Once you lock that in, you can effectively monitor for deviations on that baseline. An example might be detecting that an Internet-connected VM in Dev, which has a vulnerability on it, has access to your most sensitive data in Prod. Basically, you need to understand when data is at risk.

A critical component of protecting your data is leveraging automation and organized workflows. The scale and speed of the cloud is unmanageable without automating the process of detecting concerns and notifying the right team, at the right time, in an organized manner. Automation also comes into play for remediation efforts. Some security solutions include pre-set remediation and prevention bots to pick up where people left off.

Data Loss Prevention Strategy Best Practices

Once your team and business have decided it’s ready to focus on data loss prevention, a helpful first step is looking into data loss prevention policies. A data loss prevention policy is a basic data rule within an effective data loss prevention (DLP) tool. These rules dictate how your identities, whether they are a person or non-person, can and should be accessing, sharing, and using data in your cloud environment. The relationship between data and identity is an important one to consider in building a strategy for DLP in the cloud. You may be familiar with least privilege, which entails working from identity as your starting point and ensuring every identity only holds the minimum amount of permissions in order to successfully complete their job and nothing else. Flipping that on its head, least access is a policy that considers data as the starting point.

Get to and maintain Least Access

Least access is the most simple and effective policy to protect data and restrict access to only the identities that need it. Establishing the least access policy on the data itself allows you to gain a secure baseline of how, when, and what is accessing the data on a normal basis. This secure baseline can then act as a ‘tripwire’ around your data, so alarm bells will go off when there is anomalous access or behavior. 

Establishing the least access policy is just the beginning. Your environment will also need continuous monitoring capabilities and things like logs enabled or activity data in secret stores to execute on protecting your data. This monitoring ability allows you to detect when there is a compromise so your teams, or automation, can remediate it.

Automated vulnerability scanning

Automatically scan cloud environments for sensitive data. Uncover true impact and severity using a continuous picture of the platform, identity, and data information about the host. In the cloud, vulnerabilities proliferate at a high rate, with some teams buried in a thousands-long backlog of alerts. In a cloud environment, where the perimeter is redefined, and a data-centric security approach is preferable, a vulnerability must have context to understand its potential impact. Machine learning can automated vulnerability scanning to uncover true impact.

Skill up DevOps with security

Once an organization understands the circumstances under which data is moves, training can often mitigate the risk of accidental data loss by insiders. Employees often don’t recognize that their actions can result in data loss and can correct mistakes when educated. To inform employees of risky behaviors that lead to increasing risk, consider educating your team on security.

Defend against insider threat

Training your employees can take you only so far when it comes to mitigating the risks of insider threats. If someone is planning on exposing your data, the best response is strong automation of alerts and custom actions to prevent the data from leaking outside the organization.

Total Cloud Security

Striving for the least access environment is just an entryway into a strong data security program in the cloud. The data security picture is far larger than just least access. Data security begins with data inventory and data classification and tagging, so you know where your data is, what it is, and then who or what is accessing it.

Broadening the view, data security is just one of the four major pillars holding up a strong cloud security foundation. Achieving total public cloud security hinges on the four major pillars that are: Identity, Data, Platform, and Workload. The pillars do not work in isolation; Picture this scenario: you have a workload with a vulnerability, however, that workload has a non-person identity on it, meaning that identity becomes compromised as soon as an attacker finds an entryway into that workload. But to make matters even worse, the non-person identity has access to customer PII through a toxic trust chain. A malicious attacker can detect this after doing some recon, and now you’ve got a major breach on your hands.

Respectively, CIEM (identity), Cloud Data Loss Prevention (data), CSPM (platform), and CWPP (workload) have their individual use cases, but complete visibility and context come from one integrated platform bolstering each security pillar.

To learn more about total cloud security, explore our platform.