🔐 Data Security
Protecting data throughout its lifecycle — classification, encryption at rest/in transit/in use, data loss prevention, tokenization, backup strategies, and data governance.
Overview
Data security focuses on protecting digital information from unauthorized access, corruption, or theft throughout its entire lifecycle. It encompasses the policies, procedures, and technologies used to ensure data confidentiality, integrity, and availability — the CIA triad. With increasing regulations (GDPR, PCI-DSS, HIPAA) and the expanding attack surface, robust data security is critical for every organization.
Key Concepts
Backup & Recovery
3-2-1 Rule: 3 copies of data, on 2 different media types, with 1 offsite/cloud copy. RPO (Recovery Point Objective): Maximum acceptable data loss — determines backup frequency. RTO (Recovery Time Objective): Maximum acceptable downtime — determines recovery strategy. Immutable backups protect against ransomware (WORM storage, air-gapped backups). Regular restoration testing is critical — untested backups are not backups.
Data Classification
Categorizing data by sensitivity and business impact. Common levels: Public (no restrictions), Internal (business use only), Confidential (restricted access — PII, financial data), and Restricted/Secret (highest protection — trade secrets, health records, encryption keys). Classification drives encryption requirements, access controls, retention policies, and handling procedures. Automated tools like Microsoft Purview, Varonis, and BigID help discover and classify data at scale.
Data Governance
Policies and processes for managing data as a strategic asset. Data lifecycle: Creation → Storage → Use → Sharing → Archival → Destruction. Key elements: Data ownership assignment, retention and disposal schedules, privacy impact assessments, data lineage tracking, consent management (GDPR), and cross-border transfer rules (SCCs, BCRs). Tools: Collibra, Alation, Microsoft Purview Governance.
Data Loss Prevention (DLP)
Preventing unauthorized data exfiltration across three vectors: Endpoint DLP: Monitor clipboard, USB, print, and screen capture on endpoints — Microsoft Defender for Endpoint, Symantec DLP. Network DLP: Inspect traffic leaving the network for sensitive patterns (SSN, credit cards, source code) — Palo Alto, Zscaler. Cloud DLP: Monitor SaaS and cloud storage — Microsoft Purview DLP, Google Cloud DLP, Netskope. DLP policies use regex patterns, machine learning classifiers, and fingerprinting to detect sensitive data.
Encryption
At Rest: AES-256 encryption for databases, file systems, and backups using KMS (AWS KMS, Azure Key Vault, Cloud KMS). Full-disk encryption (BitLocker, LUKS). In Transit: TLS 1.3 for all network communication, certificate pinning for APIs, mTLS for service-to-service. In Use: Confidential computing with hardware enclaves (Intel SGX, AMD SEV), homomorphic encryption for processing encrypted data. Key Management: HSM-backed keys, automatic rotation, separation of duties, and key escrow procedures.
Tokenization & Masking
Tokenization: Replacing sensitive data with non-reversible tokens that map back to original data in a secure vault. Used for PCI-DSS compliance — tokenize credit card numbers so they never touch application code. Data Masking: Replacing real data with realistic but fake data for non-production environments. Static masking for dev/test databases, dynamic masking for real-time query results. Tools: Voltage, Protegrity, Delphix.
Three States of Data
Data exists in three states — at rest, in transit, and in use. Each state requires different security controls and encryption strategies. A comprehensive data security program must protect data across all three states.
💾 Data at Rest
Data stored on disk, databases, backup media, or cloud storage — not actively being transmitted or processed.
🔒 Encryption Methods
- AES-256: Industry standard symmetric encryption for databases, files, and volumes
- Full Disk Encryption (FDE): BitLocker (Windows), FileVault (macOS), LUKS (Linux)
- Database Encryption: TDE (Transparent Data Encryption) in SQL Server, Oracle, PostgreSQL
- File-Level Encryption: Per-file encryption for selective protection — VeraCrypt, 7-Zip AES
- Cloud Storage: SSE-S3, SSE-KMS, SSE-C (AWS); Azure Storage Service Encryption; Google CMEK/CSEK
🛡️ Best Practices
- Key Management: Use HSM-backed KMS (AWS KMS, Azure Key Vault, GCP Cloud KMS)
- Key Rotation: Automatic rotation every 90-365 days; immediate rotation on compromise
- Separation of Duties: Key custodians ≠ data administrators ≠ security team
- Backup Encryption: Encrypt ALL backups — immutable/WORM storage for ransomware protection
- Secure Deletion: Cryptographic erasure, DoD 5220.22-M wipe, physical destruction for decommissioned media
🔄 Data in Transit
Data actively moving between systems — over the network, internet, APIs, or between services. Most vulnerable to interception and man-in-the-middle attacks.
🔒 Protocols & Encryption
- TLS 1.3: Latest standard — faster handshake, forward secrecy by default, removed insecure ciphers (RC4, SHA-1, CBC)
- mTLS (Mutual TLS): Both client and server authenticate — essential for service-to-service (microservices, service mesh)
- IPsec VPN: Network-layer encryption for site-to-site and remote access VPNs (IKEv2, ESP)
- SSH/SFTP: Secure remote access and file transfer — SSH keys over passwords, certificate-based auth
- HTTPS Everywhere: Enforce HTTPS via HSTS headers, certificate pinning for mobile apps, TLS termination at load balancer
- API Security: OAuth 2.0 + JWT tokens, API gateway TLS enforcement, certificate-based API authentication
🛡️ Best Practices
- Certificate Management: Automated renewal (Let's Encrypt, ACME), certificate inventory, expiration monitoring
- Perfect Forward Secrecy (PFS): ECDHE key exchange — compromised private key doesn't decrypt past traffic
- Disable Legacy Protocols: No SSLv3, TLS 1.0/1.1; enforce TLS 1.2+ minimum
- Network Segmentation: Encrypt east-west traffic between VLANs/segments, not just north-south
- Email Encryption: S/MIME or PGP for sensitive emails; TLS for SMTP (STARTTLS enforcement)
- DNS Security: DNSSEC, DNS-over-HTTPS (DoH), DNS-over-TLS (DoT)
⚡ Data in Use
Data actively being processed in memory (RAM), CPU caches, or registers. The hardest state to protect — traditionally data must be decrypted to be processed. Confidential computing is changing this.
🔒 Protection Technologies
- Confidential Computing: Hardware-based Trusted Execution Environments (TEEs) — process encrypted data in secure enclaves
- Intel SGX: Software Guard Extensions — create encrypted memory enclaves, even OS/hypervisor cannot access
- AMD SEV/SEV-SNP: Secure Encrypted Virtualization — encrypts entire VM memory with per-VM keys
- ARM TrustZone: Hardware isolation between secure and non-secure worlds — used in mobile devices
- Homomorphic Encryption (HE): Compute on encrypted data without decrypting — fully homomorphic (slow but maturing), partially homomorphic (practical for specific operations)
- Secure Multi-Party Computation (MPC): Multiple parties jointly compute a function without revealing their individual inputs
🛡️ Best Practices
- Memory Protection: ASLR (Address Space Layout Randomization), DEP/NX (Data Execution Prevention)
- Process Isolation: Containers, sandboxing, mandatory access controls (SELinux, AppArmor)
- Credential Guard: Windows Credential Guard — isolates LSASS in a virtualized container
- Memory Encryption: Total Memory Encryption (TME/MKTME) — encrypt all system memory transparently
- Minimize Exposure: Decrypt data only when absolutely necessary, clear sensitive data from memory immediately after use
- Cloud Confidential VMs: Azure Confidential VMs, GCP Confidential VMs, AWS Nitro Enclaves
Data Security Architecture
Data Security Lifecycle
Layered defense from discovery to recovery — know your data, protect it, prevent loss, detect threats, and recover
Data Breach Response Lifecycle
A structured breach response minimizes damage, meets regulatory obligations, and preserves evidence for investigation. Every organization should have a tested Incident Response Plan (IRP) before a breach occurs.
NIST SP 800-61 Incident Response Framework
Preparation → Detection → Containment → Eradication → Recovery → Lessons Learned. Regular tabletop exercises ensure the team can execute under pressure.
Breach Notification Requirements
Different regulations impose different notification timelines and requirements. Missing a notification deadline can result in additional fines on top of the breach penalty itself.
| Regulation | ⏱️ Timeline | 📋 Who to Notify | 💰 Max Penalty |
|---|---|---|---|
| GDPR (EU) | 72 hours to DPA | Data Protection Authority + affected individuals (if high risk) | €20M or 4% global revenue |
| SEC Rule (US) | 4 business days | SEC via 8-K filing + investors | Enforcement actions + lawsuits |
| HIPAA (US Healthcare) | 60 days to HHS | HHS + affected individuals + media (if >500 records) | $1.5M per violation category |
| PCI-DSS | Immediately | Payment brands (Visa, MC) + acquiring bank | $100K–$500K/month + loss of processing |
| US State Laws | 30–90 days (varies) | State AG + affected residents | $750/consumer (CCPA) + AG penalties |
| FFIEC / Banking | ASAP / 36 hours (OCC) | Primary regulator (OCC/FDIC/Fed) + customers | Consent orders + enforcement actions |
Notable Data Breaches — Lessons Learned
Colonial Pipeline (2021)
Critical infrastructure shutdown. Ransomware (DarkSide) via compromised VPN password (no MFA). Led to US East Coast fuel shortage. Paid $4.4M ransom. Root cause: Legacy VPN without MFA, flat network, no segmentation between IT and OT. Lesson: MFA everywhere (especially VPN), network segmentation IT/OT, immutable backups, incident response drills for critical infrastructure.
Equifax (2017)
147M records. Unpatched Apache Struts vulnerability (CVE-2017-5638) exploited 2 months after patch was available. Attackers exfiltrated data for 76 days undetected. Root cause: Failed patch management + expired SSL certificate on monitoring tool (allowed exfiltration to go unnoticed). Lesson: Patch critical vulns within 48 hours, maintain certificate inventory, segment sensitive databases, monitor outbound traffic.
IBM Cost of a Breach Report
2024 average cost: $4.88M per breach. Key findings: Average time to identify + contain: 258 days. Breaches involving stolen credentials: 292 days. Cost reducers: AI & automation saved $2.2M, DevSecOps saved $1.7M, incident response planning saved $1.5M. Cost multipliers: Cloud migration breaches +$750K, skills shortage +$1.8M, compliance failures +$1.6M. Lesson: Invest in detection speed, automate response, and train incident response teams.
MOVEit (2023)
2,500+ organizations. Zero-day SQL injection in MOVEit Transfer file-sharing software exploited by Cl0p ransomware group. Mass exfiltration before patches available. Root cause: SQLi vulnerability in internet-facing file transfer appliance. Lesson: Minimize internet-facing attack surface, WAF for file transfer apps, monitor file transfer logs for bulk downloads, incident response readiness.
SolarWinds (2020)
18,000+ organizations. Nation-state supply chain attack — malicious code injected into Orion software build process (SUNBURST backdoor). Went undetected for 9+ months. Root cause: Compromised CI/CD pipeline, weak build security. Lesson: Secure build pipelines, implement SBOM, code signing, monitor for anomalous DNS/network behavior, Zero Trust architecture.
T-Mobile (2021–2023)
76M+ records across multiple breaches. Repeated breaches via API exploitation, credential stuffing, and insider threats. Root cause: Insufficient API security, inadequate access controls, lack of monitoring. Lesson: API security testing, rate limiting, UEBA for insider threat detection, breach is not a one-time event — continuous improvement is essential.
🎯 Data Plane Attack Tree — 7-Stage Threat Analysis
Data is the ultimate target of most cyber attacks. Attackers may compromise identities, APIs, or infrastructure — but their final goal is unauthorized access to sensitive data. This attack tree maps 7 stages where data can be exposed, accessed, manipulated, or exfiltrated.
| Stage | Attack Surface | Key Risks |
|---|---|---|
| 1. Data Creation | Where data is born | Sensitive data stored without classification, unstructured data proliferation, shadow datasets, sensitive logs, AI/analytics datasets with PII, improper data tagging, test data with production PII, over-collection beyond business need |
| 2. Data Discovery & Exposure | Where data is found | Exposed storage buckets/databases, shadow data stores, unindexed assets, catalog misconfiguration, metadata leakage, search/index system exposure without restrictions, backup and snapshot discovery |
| 3. Data Storage | Where data lives | Public object storage exposure, misconfigured database access policies, unencrypted storage systems (no lock), backup storage exposure, snapshot exposure, misconfigured file permissions, data lake misconfig, cross-account storage exposure. Encryption risks: KMS misconfig, improper key rotation, shared/hardcoded encryption keys |
| 4. Data Access Controls | Who can touch data | Over-privileged database roles, weak ACLs, shared database credentials (login misuse), token or API key misuse, service account abuse (robot avatar with too many rights), broken object-level authorization, privilege escalation through database roles, lack of row/column-level security |
| 5. Data Processing | Where data is transformed | Compromised ETL pipelines, malicious transformations, data pipeline injection, unauthorized analytics queries, compromised processing clusters, SQL query manipulation, Spark/big data cluster exploitation. Inference & Query Abuse: sensitive data inference via queries, aggregation-based leakage, model-driven data exposure |
| 6. Data Sharing & Distribution | Where data travels | API data overexposure, partner integration misuse, unrestricted data exports, public data sharing links, data replication misconfiguration, cross-region data exposure, third-party leakage (analytics chains), webhook/event-driven leakage |
| 7. Impact & Exfiltration | Where data leaves | Mass data exfiltration, sensitive dataset extraction, customer data theft, intellectual property theft, regulated data exposure (PII lock removed), data manipulation or corruption, data destruction/ransomware (burning hard drive), stealth exfiltration over trusted channels (subtle move) |
🛡️ Defense Controls & Architecture
🛡️ Access Control
Fine-grained RBAC/ABAC, row/column-level security, least privilege enforcement, and data monitoring/detection with access logging and anomaly detection.
📋 Data Governance
Classification, ownership assignment, and data minimization. Know what data you have, where it lives, and who owns it. Without governance, all other controls are guesswork.
🚫 Data Loss Prevention
DLP scanning and policy enforcement, tokenization/masking of sensitive fields, sensitive data detection across endpoints, network, and cloud. Controlled APIs with partner integration governance.
🔍 DSPM
Data Security Posture Management — sensitive data discovery, exposure risk identification, object storage scanning, and inventory mapping. Tools: Varonis, BigID, Normalyze, Sentra.
🔐 Encryption & Key Mgmt
Encryption at rest/transit, KMS configuration, key rotation, access control on keys, and separation of duties. Never share or hardcode encryption keys.
🚨 Incident Response
Breach detection, automated containment, forensic tracking, and backup protection. Treat every data plane anomaly as a potential exfiltration event until proven otherwise.
Walk me through the 7 stages of a Data Plane Attack Tree — how would you defend an organization's data at each stage?
The Data Plane Attack Tree maps 7 stages where data is vulnerable:
- Biggest risk is data born without classification
- Implement automated classification at creation — Microsoft Purview, BigID
- Enforce data minimization policies — don't collect what you don't need
- Scan for PII in test/dev environments
- Attackers search for exposed storage buckets, shadow databases, and misconfigured catalogs
- Defense: continuous DSPM scanning (Varonis, Normalyze), automated detection of public S3 buckets/Azure blobs, metadata access controls for data catalogs
- Encrypt everything — AES-256 at rest, KMS-managed keys with automatic rotation
- Never hardcode encryption keys
- Audit storage permissions weekly
- Cross-account storage access should be exception-based and logged
- Implement least privilege at the database level — row/column-level security, not just table-level
- Eliminate shared database credentials
- Service accounts get minimum permissions with automated rotation
- Monitor for privilege escalation through database roles
- Secure ETL pipelines with integrity checks
- Validate data transformations and detect injection in pipeline parameters
- Monitor analytics queries for inference attacks — when someone runs queries that individually look innocent but together reconstruct sensitive data
- API response filtering — never return more data than needed
- Audit partner integrations quarterly
- Disable unrestricted data exports
- Monitor webhook payloads for sensitive data leakage
- Implement data residency controls for cross-region flows
- DLP at all egress points — network, endpoint, cloud
- Monitor for bulk data transfers, unusual download patterns, and exfiltration over trusted channels (DNS, HTTPS to legitimate-looking domains)
- Immutable backups protect against ransomware/destruction
- KEY TAKEAWAY: Protecting the data plane requires strong data governance, fine-grained access controls, secure data pipelines, continuous monitoring, and DLP at every boundary
Interview Preparation
How would you implement a data classification program?
1) Define classification levels (Public, Internal, Confidential, Restricted) with clear criteria.
2Assign data owners for each data domain.
3Deploy automated discovery tools (Microsoft Purview, Varonis) to scan repositories.
4Label data with metadata tags (sensitivity, retention, jurisdiction).
5Map classification to security controls — encryption requirements, access levels, DLP policies.
6Train employees on handling procedures for each level.
7Audit regularly and measure: percentage of data classified, policy violations, and remediation time.
Explain the difference between tokenization and encryption.
Encryption transforms data using an algorithm and key — it's mathematically reversible with the correct key. The encrypted data (ciphertext) has the same format/length characteristics. Tokenization replaces data with a random token that has NO mathematical relationship to the original — the mapping exists only in a secure token vault. Key difference: encrypted data can be brute-forced given enough compute; tokens cannot. Tokenization is preferred for PCI-DSS scope reduction because tokenized data is not considered cardholder data, shrinking the compliance boundary.
Walk me through how you would respond to a data breach.
1) DETECT & ASSESS — Verify the alert is a true positive. Determine data types affected (PII, PHI, PCI, IP), volume of records, and whether exfiltration occurred. Activate the Incident Response Team (IRT).
2CONTAIN — Isolate affected systems (network quarantine via EDR), revoke compromised credentials, block attacker IPs/domains, disable data exfiltration channels. Preserve forensic evidence — do NOT wipe systems yet.
3INVESTIGATE — Conduct forensic analysis to determine root cause, attack vector, lateral movement, and full scope. Build a timeline. Engage external forensics firm if needed (required for PCI).
4NOTIFY — Engage legal counsel immediately. Notify regulators within required timeframes (GDPR 72hrs, SEC 4 days). Prepare customer notification with clear description of what happened, what data was affected, and what you're doing about it. Notify cyber insurance carrier.
5ERADICATE & RECOVER — Patch the exploited vulnerability, rebuild compromised systems from clean images, force credential resets, restore from verified clean backups.
6LESSONS LEARNED — Post-incident review within 2 weeks. Update IR playbook, improve detection rules, conduct tabletop exercise simulating the attack scenario. Report metrics: time to detect, contain, and recover.