Back to Home

Back to Index

Data is often called “the new oil,” but from a security perspective, data can be toxic waste — the more you collect and store, the more you have to lose. Data Attack Surface Reduction (ASR) is about collecting less, storing less, and exposing less data by design.

It challenges the “collect everything, analyze later” mindset and instead asks:

“Do we actually need this data, now or ever?”

By minimizing data collection, limiting its retention, and avoiding unnecessary sharing or transformation, we reduce the blast radius of breaches — or eliminate some altogether.


1. Collect Less Data by Default

  • Adopt a policy of data minimalism: collect only what’s absolutely necessary for the product or service to function.
  • Don’t gather speculative or “future-use” data — that’s security debt with no payoff.
  • Challenge every data field: Do we really need birth dates? Full names? Exact locations?
  • Example: Instead of collecting full DOB, just ask for age >18 as a yes/no flag.

Why this matters:
Less data collected = less data to protect = less damage if compromised.


2. Reduce Data Retention

  • Delete data as soon as its utility ends.
  • Apply time-based purging for:
    • Logs
    • Backups
    • Inactive user data
  • Avoid hoarding data “just in case.”

Real-world example:
Leaked S3 buckets full of old database backups are a common breach source. These backups serve no business purpose — they’re just ticking bombs.


3. Limit Data Propagation and Transformation

  • The more you move or copy data, the more risk you create:
    • Copying prod data to staging/dev environments
    • Uncontrolled ETL pipelines
    • Syncing PII into multiple SaaS tools

Best practices:

  • Do analytics in-place where possible.
  • Mask/anonymize any dev/test data.
  • Remove or restrict third-party integrations that collect user data.

Incident reference:
The 2022 Meta Pixel hospital case — healthcare providers inadvertently sent appointment and health data to Facebook due to excessive analytics integration.


4. Don’t Over-Engineer Analytics

  • Massive data lakes, Kafka pipelines, and real-time dashboards often include:
    • Overly broad access controls
    • Insecure intermediate stores
    • Excessive logging of sensitive fields

Data ASR Principle:

Just because you can analyze it doesn’t mean you should store it.

Focus on:

  • Aggregated over raw data
  • Sampling over exhaustive capture
  • Simpler pipelines with fewer moving parts

5. Real-World Breach Patterns

  • Public cloud storage exposure:
    • Forgotten S3 buckets
    • Azure Blob containers with “anonymous read”
  • Log leaks:
    • API tokens, passwords, or PII included in logs and shipped to centralized logging solutions
  • Stale environments:
    • Retained old prod data in test environments without sufficient access control
  • Third-party exfiltration:
    • Overly permissive tags, SDKs, or tracking scripts sending data externally

6. Guidelines for Practicing Data ASR

Principle Action
Minimal Collection Ask “why do we need this?” before collecting any new data
Retention Limits Automate expiry and deletion policies
Access Control Least privilege on data systems
Anonymization Use tokenization or hashing where PII isn’t needed
Tag & Tracker Review Audit and limit third-party scripts collecting user info
Incident Simulation Run tabletop scenarios assuming your logs, exports, or staging DBs are leaked

7. Shift the Culture: Less Data Is a Win

  • Regulatory compliance becomes easier (GDPR, DPDP, HIPAA)
  • Breach costs go down
  • User trust goes up
  • Engineers get clarity on what data actually matters

“Data is radioactive. Store only what you can shield. And only for as long as you need it.”