NISTISO

🔐 Data Security

Protecting data throughout its lifecycle — classification, encryption at rest/in transit/in use, data loss prevention, tokenization, backup strategies, and data governance.

Overview

Data security focuses on protecting digital information from unauthorized access, corruption, or theft throughout its entire lifecycle. It encompasses the policies, procedures, and technologies used to ensure data confidentiality, integrity, and availability — the CIA triad. With increasing regulations (GDPR, PCI-DSS, HIPAA) and the expanding attack surface, robust data security is critical for every organization.

Key Concepts

Backup & Recovery

3-2-1 Rule: 3 copies of data, on 2 different media types, with 1 offsite/cloud copy. RPO (Recovery Point Objective): Maximum acceptable data loss — determines backup frequency. RTO (Recovery Time Objective): Maximum acceptable downtime — determines recovery strategy. Immutable backups protect against ransomware (WORM storage, air-gapped backups). Regular restoration testing is critical — untested backups are not backups.

Data Classification

Categorizing data by sensitivity and business impact. Common levels: Public (no restrictions), Internal (business use only), Confidential (restricted access — PII, financial data), and Restricted/Secret (highest protection — trade secrets, health records, encryption keys). Classification drives encryption requirements, access controls, retention policies, and handling procedures. Automated tools like Microsoft Purview, Varonis, and BigID help discover and classify data at scale.

Data Governance

Policies and processes for managing data as a strategic asset. Data lifecycle: Creation → Storage → Use → Sharing → Archival → Destruction. Key elements: Data ownership assignment, retention and disposal schedules, privacy impact assessments, data lineage tracking, consent management (GDPR), and cross-border transfer rules (SCCs, BCRs). Tools: Collibra, Alation, Microsoft Purview Governance.

Data Loss Prevention (DLP)

Preventing unauthorized data exfiltration across three vectors: Endpoint DLP: Monitor clipboard, USB, print, and screen capture on endpoints — Microsoft Defender for Endpoint, Symantec DLP. Network DLP: Inspect traffic leaving the network for sensitive patterns (SSN, credit cards, source code) — Palo Alto, Zscaler. Cloud DLP: Monitor SaaS and cloud storage — Microsoft Purview DLP, Google Cloud DLP, Netskope. DLP policies use regex patterns, machine learning classifiers, and fingerprinting to detect sensitive data.

Encryption

At Rest: AES-256 encryption for databases, file systems, and backups using KMS (AWS KMS, Azure Key Vault, Cloud KMS). Full-disk encryption (BitLocker, LUKS). In Transit: TLS 1.3 for all network communication, certificate pinning for APIs, mTLS for service-to-service. In Use: Confidential computing with hardware enclaves (Intel SGX, AMD SEV), homomorphic encryption for processing encrypted data. Key Management: HSM-backed keys, automatic rotation, separation of duties, and key escrow procedures.

Tokenization & Masking

Tokenization: Replacing sensitive data with non-reversible tokens that map back to original data in a secure vault. Used for PCI-DSS compliance — tokenize credit card numbers so they never touch application code. Data Masking: Replacing real data with realistic but fake data for non-production environments. Static masking for dev/test databases, dynamic masking for real-time query results. Tools: Voltage, Protegrity, Delphix.

Three States of Data

Data exists in three states — at rest, in transit, and in use. Each state requires different security controls and encryption strategies. A comprehensive data security program must protect data across all three states.

💾 Data at Rest

Data stored on disk, databases, backup media, or cloud storage — not actively being transmitted or processed.

🔒 Encryption Methods

AES-256: Industry standard symmetric encryption for databases, files, and volumes
Full Disk Encryption (FDE): BitLocker (Windows), FileVault (macOS), LUKS (Linux)
Database Encryption: TDE (Transparent Data Encryption) in SQL Server, Oracle, PostgreSQL
File-Level Encryption: Per-file encryption for selective protection — VeraCrypt, 7-Zip AES
Cloud Storage: SSE-S3, SSE-KMS, SSE-C (AWS); Azure Storage Service Encryption; Google CMEK/CSEK

🛡️ Best Practices

Key Management: Use HSM-backed KMS (AWS KMS, Azure Key Vault, GCP Cloud KMS)
Key Rotation: Automatic rotation every 90-365 days; immediate rotation on compromise
Separation of Duties: Key custodians ≠ data administrators ≠ security team
Backup Encryption: Encrypt ALL backups — immutable/WORM storage for ransomware protection
Secure Deletion: Cryptographic erasure, DoD 5220.22-M wipe, physical destruction for decommissioned media

🔄 Data in Transit

Data actively moving between systems — over the network, internet, APIs, or between services. Most vulnerable to interception and man-in-the-middle attacks.

🔒 Protocols & Encryption

TLS 1.3: Latest standard — faster handshake, forward secrecy by default, removed insecure ciphers (RC4, SHA-1, CBC)
mTLS (Mutual TLS): Both client and server authenticate — essential for service-to-service (microservices, service mesh)
IPsec VPN: Network-layer encryption for site-to-site and remote access VPNs (IKEv2, ESP)
SSH/SFTP: Secure remote access and file transfer — SSH keys over passwords, certificate-based auth
HTTPS Everywhere: Enforce HTTPS via HSTS headers, certificate pinning for mobile apps, TLS termination at load balancer
API Security: OAuth 2.0 + JWT tokens, API gateway TLS enforcement, certificate-based API authentication

🛡️ Best Practices

Certificate Management: Automated renewal (Let's Encrypt, ACME), certificate inventory, expiration monitoring
Perfect Forward Secrecy (PFS): ECDHE key exchange — compromised private key doesn't decrypt past traffic
Disable Legacy Protocols: No SSLv3, TLS 1.0/1.1; enforce TLS 1.2+ minimum
Network Segmentation: Encrypt east-west traffic between VLANs/segments, not just north-south
Email Encryption: S/MIME or PGP for sensitive emails; TLS for SMTP (STARTTLS enforcement)
DNS Security: DNSSEC, DNS-over-HTTPS (DoH), DNS-over-TLS (DoT)

⚡ Data in Use

Data actively being processed in memory (RAM), CPU caches, or registers. The hardest state to protect — traditionally data must be decrypted to be processed. Confidential computing is changing this.

🔒 Protection Technologies

Confidential Computing: Hardware-based Trusted Execution Environments (TEEs) — process encrypted data in secure enclaves
Intel SGX: Software Guard Extensions — create encrypted memory enclaves, even OS/hypervisor cannot access
AMD SEV/SEV-SNP: Secure Encrypted Virtualization — encrypts entire VM memory with per-VM keys
ARM TrustZone: Hardware isolation between secure and non-secure worlds — used in mobile devices
Homomorphic Encryption (HE): Compute on encrypted data without decrypting — fully homomorphic (slow but maturing), partially homomorphic (practical for specific operations)
Secure Multi-Party Computation (MPC): Multiple parties jointly compute a function without revealing their individual inputs

🛡️ Best Practices

Memory Protection: ASLR (Address Space Layout Randomization), DEP/NX (Data Execution Prevention)
Process Isolation: Containers, sandboxing, mandatory access controls (SELinux, AppArmor)
Credential Guard: Windows Credential Guard — isolates LSASS in a virtualized container
Memory Encryption: Total Memory Encryption (TME/MKTME) — encrypt all system memory transparently
Minimize Exposure: Decrypt data only when absolutely necessary, clear sensitive data from memory immediately after use
Cloud Confidential VMs: Azure Confidential VMs, GCP Confidential VMs, AWS Nitro Enclaves

Data Security Architecture

📋 Discovery & Classification (Identify & Label)

↓

🔒 Protection (Encrypt + Tokenize + Mask)

↓

🛡️ Prevention (DLP + Access Control + RBAC)

↓

📡 Detection (Monitoring + Anomaly Detection + UEBA)

↓

💾 Recovery (Backup + DR + Immutable Storage)

Data Security Lifecycle

Layered defense from discovery to recovery — know your data, protect it, prevent loss, detect threats, and recover

Data Breach Response Lifecycle

A structured breach response minimizes damage, meets regulatory obligations, and preserves evidence for investigation. Every organization should have a tested Incident Response Plan (IRP) before a breach occurs.

🚨 Detection & Identification — SIEM alerts, DLP triggers, user reports, threat intel, dark web monitoring

↓

🔒 Containment — Isolate affected systems, revoke compromised credentials, block exfiltration channels, preserve evidence

↓

🔍 Investigation — Forensic analysis, scope assessment, root cause analysis, determine data types & records affected

↓

📢 Notification — Notify regulators (GDPR 72hrs, SEC 4 days), affected individuals, law enforcement, and cyber insurance carrier

↓

🔧 Eradication & Recovery — Patch vulnerabilities, rebuild compromised systems, restore from clean backups, reset credentials

↓

📋 Lessons Learned — Post-incident review, update IR playbook, improve detection rules, conduct tabletop exercises

NIST SP 800-61 Incident Response Framework

Preparation → Detection → Containment → Eradication → Recovery → Lessons Learned. Regular tabletop exercises ensure the team can execute under pressure.

Breach Notification Requirements

Different regulations impose different notification timelines and requirements. Missing a notification deadline can result in additional fines on top of the breach penalty itself.

Regulation	⏱️ Timeline	📋 Who to Notify	💰 Max Penalty
GDPR (EU)	72 hours to DPA	Data Protection Authority + affected individuals (if high risk)	€20M or 4% global revenue
SEC Rule (US)	4 business days	SEC via 8-K filing + investors	Enforcement actions + lawsuits
HIPAA (US Healthcare)	60 days to HHS	HHS + affected individuals + media (if >500 records)	$1.5M per violation category
PCI-DSS	Immediately	Payment brands (Visa, MC) + acquiring bank	$100K–$500K/month + loss of processing
US State Laws	30–90 days (varies)	State AG + affected residents	$750/consumer (CCPA) + AG penalties
FFIEC / Banking	ASAP / 36 hours (OCC)	Primary regulator (OCC/FDIC/Fed) + customers	Consent orders + enforcement actions

Notable Data Breaches — Lessons Learned

Colonial Pipeline (2021)

Critical infrastructure shutdown. Ransomware (DarkSide) via compromised VPN password (no MFA). Led to US East Coast fuel shortage. Paid $4.4M ransom. Root cause: Legacy VPN without MFA, flat network, no segmentation between IT and OT. Lesson: MFA everywhere (especially VPN), network segmentation IT/OT, immutable backups, incident response drills for critical infrastructure.

Equifax (2017)

147M records. Unpatched Apache Struts vulnerability (CVE-2017-5638) exploited 2 months after patch was available. Attackers exfiltrated data for 76 days undetected. Root cause: Failed patch management + expired SSL certificate on monitoring tool (allowed exfiltration to go unnoticed). Lesson: Patch critical vulns within 48 hours, maintain certificate inventory, segment sensitive databases, monitor outbound traffic.

IBM Cost of a Breach Report

2024 average cost: $4.88M per breach. Key findings: Average time to identify + contain: 258 days. Breaches involving stolen credentials: 292 days. Cost reducers: AI & automation saved $2.2M, DevSecOps saved $1.7M, incident response planning saved $1.5M. Cost multipliers: Cloud migration breaches +$750K, skills shortage +$1.8M, compliance failures +$1.6M. Lesson: Invest in detection speed, automate response, and train incident response teams.

MOVEit (2023)

2,500+ organizations. Zero-day SQL injection in MOVEit Transfer file-sharing software exploited by Cl0p ransomware group. Mass exfiltration before patches available. Root cause: SQLi vulnerability in internet-facing file transfer appliance. Lesson: Minimize internet-facing attack surface, WAF for file transfer apps, monitor file transfer logs for bulk downloads, incident response readiness.

SolarWinds (2020)

18,000+ organizations. Nation-state supply chain attack — malicious code injected into Orion software build process (SUNBURST backdoor). Went undetected for 9+ months. Root cause: Compromised CI/CD pipeline, weak build security. Lesson: Secure build pipelines, implement SBOM, code signing, monitor for anomalous DNS/network behavior, Zero Trust architecture.

T-Mobile (2021–2023)

76M+ records across multiple breaches. Repeated breaches via API exploitation, credential stuffing, and insider threats. Root cause: Insufficient API security, inadequate access controls, lack of monitoring. Lesson: API security testing, rate limiting, UEBA for insider threat detection, breach is not a one-time event — continuous improvement is essential.

🎯 Data Plane Attack Tree — 7-Stage Threat Analysis

Data is the ultimate target of most cyber attacks. Attackers may compromise identities, APIs, or infrastructure — but their final goal is unauthorized access to sensitive data. This attack tree maps 7 stages where data can be exposed, accessed, manipulated, or exfiltrated.

Stage	Attack Surface	Key Risks
1. Data Creation	Where data is born	Sensitive data stored without classification, unstructured data proliferation, shadow datasets, sensitive logs, AI/analytics datasets with PII, improper data tagging, test data with production PII, over-collection beyond business need
2. Data Discovery & Exposure	Where data is found	Exposed storage buckets/databases, shadow data stores, unindexed assets, catalog misconfiguration, metadata leakage, search/index system exposure without restrictions, backup and snapshot discovery
3. Data Storage	Where data lives	Public object storage exposure, misconfigured database access policies, unencrypted storage systems (no lock), backup storage exposure, snapshot exposure, misconfigured file permissions, data lake misconfig, cross-account storage exposure. Encryption risks: KMS misconfig, improper key rotation, shared/hardcoded encryption keys
4. Data Access Controls	Who can touch data	Over-privileged database roles, weak ACLs, shared database credentials (login misuse), token or API key misuse, service account abuse (robot avatar with too many rights), broken object-level authorization, privilege escalation through database roles, lack of row/column-level security
5. Data Processing	Where data is transformed	Compromised ETL pipelines, malicious transformations, data pipeline injection, unauthorized analytics queries, compromised processing clusters, SQL query manipulation, Spark/big data cluster exploitation. Inference & Query Abuse: sensitive data inference via queries, aggregation-based leakage, model-driven data exposure
6. Data Sharing & Distribution	Where data travels	API data overexposure, partner integration misuse, unrestricted data exports, public data sharing links, data replication misconfiguration, cross-region data exposure, third-party leakage (analytics chains), webhook/event-driven leakage
7. Impact & Exfiltration	Where data leaves	Mass data exfiltration, sensitive dataset extraction, customer data theft, intellectual property theft, regulated data exposure (PII lock removed), data manipulation or corruption, data destruction/ransomware (burning hard drive), stealth exfiltration over trusted channels (subtle move)

🛡️ Defense Controls & Architecture

🛡️ Access Control

Fine-grained RBAC/ABAC, row/column-level security, least privilege enforcement, and data monitoring/detection with access logging and anomaly detection.

📋 Data Governance

Classification, ownership assignment, and data minimization. Know what data you have, where it lives, and who owns it. Without governance, all other controls are guesswork.

🚫 Data Loss Prevention

DLP scanning and policy enforcement, tokenization/masking of sensitive fields, sensitive data detection across endpoints, network, and cloud. Controlled APIs with partner integration governance.

🔍 DSPM

Data Security Posture Management — sensitive data discovery, exposure risk identification, object storage scanning, and inventory mapping. Tools: Varonis, BigID, Normalyze, Sentra.

🔐 Encryption & Key Mgmt

Encryption at rest/transit, KMS configuration, key rotation, access control on keys, and separation of duties. Never share or hardcode encryption keys.

🚨 Incident Response

Breach detection, automated containment, forensic tracking, and backup protection. Treat every data plane anomaly as a potential exfiltration event until proven otherwise.

💡 Interview Question

Walk me through the 7 stages of a Data Plane Attack Tree — how would you defend an organization's data at each stage?

The Data Plane Attack Tree maps 7 stages where data is vulnerable:

1DATA CREATION

Biggest risk is data born without classification
Implement automated classification at creation — Microsoft Purview, BigID
Enforce data minimization policies — don't collect what you don't need
Scan for PII in test/dev environments

2DATA DISCOVERY & EXPOSURE

Attackers search for exposed storage buckets, shadow databases, and misconfigured catalogs
Defense: continuous DSPM scanning (Varonis, Normalyze), automated detection of public S3 buckets/Azure blobs, metadata access controls for data catalogs

3DATA STORAGE

Encrypt everything — AES-256 at rest, KMS-managed keys with automatic rotation
Never hardcode encryption keys
Audit storage permissions weekly
Cross-account storage access should be exception-based and logged

4DATA ACCESS CONTROLS

Implement least privilege at the database level — row/column-level security, not just table-level
Eliminate shared database credentials
Service accounts get minimum permissions with automated rotation
Monitor for privilege escalation through database roles

5DATA PROCESSING

Secure ETL pipelines with integrity checks
Validate data transformations and detect injection in pipeline parameters
Monitor analytics queries for inference attacks — when someone runs queries that individually look innocent but together reconstruct sensitive data

6DATA SHARING

API response filtering — never return more data than needed
Audit partner integrations quarterly
Disable unrestricted data exports
Monitor webhook payloads for sensitive data leakage
Implement data residency controls for cross-region flows

7IMPACT & EXFILTRATION

DLP at all egress points — network, endpoint, cloud
Monitor for bulk data transfers, unusual download patterns, and exfiltration over trusted channels (DNS, HTTPS to legitimate-looking domains)
Immutable backups protect against ransomware/destruction
KEY TAKEAWAY: Protecting the data plane requires strong data governance, fine-grained access controls, secure data pipelines, continuous monitoring, and DLP at every boundary

Interview Preparation

💡 Interview Question

How would you implement a data classification program?

1) Define classification levels (Public, Internal, Confidential, Restricted) with clear criteria.

2Assign data owners for each data domain.

3Deploy automated discovery tools (Microsoft Purview, Varonis) to scan repositories.

4Label data with metadata tags (sensitivity, retention, jurisdiction).

5Map classification to security controls — encryption requirements, access levels, DLP policies.

6Train employees on handling procedures for each level.

7Audit regularly and measure: percentage of data classified, policy violations, and remediation time.

💡 Interview Question

Explain the difference between tokenization and encryption.

Encryption transforms data using an algorithm and key — it's mathematically reversible with the correct key. The encrypted data (ciphertext) has the same format/length characteristics. Tokenization replaces data with a random token that has NO mathematical relationship to the original — the mapping exists only in a secure token vault. Key difference: encrypted data can be brute-forced given enough compute; tokens cannot. Tokenization is preferred for PCI-DSS scope reduction because tokenized data is not considered cardholder data, shrinking the compliance boundary.

💡 Interview Question

Walk me through how you would respond to a data breach.

1) DETECT & ASSESS — Verify the alert is a true positive. Determine data types affected (PII, PHI, PCI, IP), volume of records, and whether exfiltration occurred. Activate the Incident Response Team (IRT).

2CONTAIN — Isolate affected systems (network quarantine via EDR), revoke compromised credentials, block attacker IPs/domains, disable data exfiltration channels. Preserve forensic evidence — do NOT wipe systems yet.

3INVESTIGATE — Conduct forensic analysis to determine root cause, attack vector, lateral movement, and full scope. Build a timeline. Engage external forensics firm if needed (required for PCI).

4NOTIFY — Engage legal counsel immediately. Notify regulators within required timeframes (GDPR 72hrs, SEC 4 days). Prepare customer notification with clear description of what happened, what data was affected, and what you're doing about it. Notify cyber insurance carrier.

5ERADICATE & RECOVER — Patch the exploited vulnerability, rebuild compromised systems from clean images, force credential resets, restore from verified clean backups.

6LESSONS LEARNED — Post-incident review within 2 weeks. Update IR playbook, improve detection rules, conduct tabletop exercise simulating the attack scenario. Report metrics: time to detect, contain, and recover.

🔐 Data Security

Overview

Key Concepts

Backup & Recovery

Data Classification

Data Governance

Data Loss Prevention (DLP)

Encryption

Tokenization & Masking

Three States of Data

💾 Data at Rest

🔒 Encryption Methods

🛡️ Best Practices

🔄 Data in Transit

🔒 Protocols & Encryption

🛡️ Best Practices

⚡ Data in Use

🔒 Protection Technologies

🛡️ Best Practices

Data Security Architecture

Data Security Lifecycle

Data Breach Response Lifecycle

NIST SP 800-61 Incident Response Framework

Breach Notification Requirements

Notable Data Breaches — Lessons Learned

Colonial Pipeline (2021)

Equifax (2017)

IBM Cost of a Breach Report

MOVEit (2023)

SolarWinds (2020)

T-Mobile (2021–2023)

🎯 Data Plane Attack Tree — 7-Stage Threat Analysis

🛡️ Defense Controls & Architecture

🛡️ Access Control

📋 Data Governance

🚫 Data Loss Prevention

🔍 DSPM

🔐 Encryption & Key Mgmt

🚨 Incident Response

Walk me through the 7 stages of a Data Plane Attack Tree — how would you defend an organization's data at each stage?

Interview Preparation

How would you implement a data classification program?

Explain the difference between tokenization and encryption.

Walk me through how you would respond to a data breach.

Related Domains

Cloud Security

GRC

IAM