GDPR-by-Design: Architectural Patterns for Public Services
How to embed personal data protection into the design of public administration digital services: architectural patterns, pseudonymization, data minimization, and GDPR-compliant consent management.
Privacy by Design in Public Administration: a Legal Obligation and an Architectural Choice
The General Data Protection Regulation (GDPR), in force since May 25, 2018, goes beyond imposing bureaucratic obligations on public administrations. Article 25 requires that data protection be built in by design — embedded into information systems and processes from the very beginning. This principle, known as Privacy by Design, transforms regulatory compliance into an architectural quality of software.
For a developer or architect working on Italian public administration systems — digital civil registries, electronic health records, SPID/CIE access portals, administrative workflow systems — understanding and correctly implementing GDPR-by-Design is not optional. Fines from the Italian Data Protection Authority (Garante) can reach 4% of global annual turnover for private entities; for public bodies, consequences include data processing bans, compensation claims, and significant reputational damage.
What You Will Learn
- The 7 core Privacy by Design principles mapped to concrete architectural decisions
- Pseudonymization and anonymization patterns for government databases and APIs
- Implementing the data minimization pattern in PA microservices
- Consent management: architecture and implementation of a GDPR-compliant system
- Automated data retention and right-to-erasure in relational databases
- Compliant audit logging: how to track processing without violating privacy
- DPIA (Data Protection Impact Assessment) for high-risk systems
The 7 Core Principles and Their Impact on Architecture
Ann Cavoukian, former Information and Privacy Commissioner of Ontario, formalized the 7 Privacy by Design principles that the GDPR has adopted. Each principle maps to specific architectural choices:
| PbD Principle | GDPR Art. | Architectural Pattern | Concrete Techniques |
|---|---|---|---|
| Proactive, not Reactive | Art. 25 | Privacy Threat Modeling | Privacy Risk Assessment during design phase |
| Privacy as the Default | Art. 25(2) | Opt-in by default | Explicit consent, minimize data by default |
| Privacy Embedded into Design | Art. 25(1) | Privacy-Embedded Architecture | Pseudonymization, encryption at-rest |
| Full Functionality | Art. 5 | Zero-sum avoidance | Privacy and security are not in conflict |
| End-to-End Security | Art. 32 | Defense in depth | TLS, encryption, key management |
| Visibility and Transparency | Art. 13/14 | Audit trail + disclosure | Anonymized logging, privacy notice |
| Respect for User Privacy | Art. 7/8/17 | User-centric controls | Consent UI, right to erasure, portability |
Pattern 1: Data Minimization in PA Microservices
The minimization principle (Art. 5(1)(c) GDPR) requires that data collected be "adequate, relevant and limited to what is necessary in relation to the purposes for which they are processed". In a microservices architecture, this translates into the Data Minimization at Boundary pattern: each service must receive only the data it needs to perform its specific function.
Consider an identity verification service for accessing a healthcare benefit. The service does not need the citizen's full name to verify eligibility — it only needs an anonymous identifier and the eligibility status. The pattern is implemented through selective DTOs and projection queries.
# Data Minimization Pattern - Python/FastAPI example
# Scenario: healthcare benefit service for PA
from pydantic import BaseModel
# WRONG: transfers unnecessary data
class CitizenFullDTO(BaseModel):
citizen_id: str
fiscal_code: str
first_name: str # Not needed for eligibility check
last_name: str # Not needed for eligibility check
birth_date: str # Not needed for eligibility check
address: str # Not needed for eligibility check
# CORRECT: minimize data to the strict minimum
class CitizenEligibilityDTO(BaseModel):
pseudonymous_id: str # Opaque token, not real fiscal code
is_eligible: bool
benefit_category: str
# No personally identifiable information
class HealthBenefitService:
def check_eligibility(self, pseudonymous_id: str, benefit_code: str) -> CitizenEligibilityDTO:
real_id = self.pseudonym_service.resolve(pseudonymous_id)
# Targeted query: only the necessary field
is_eligible = self.citizen_repo.check_benefit_eligibility(
citizen_id=real_id,
benefit_code=benefit_code
)
return CitizenEligibilityDTO(
pseudonymous_id=pseudonymous_id,
is_eligible=is_eligible,
benefit_category=benefit_code
)
# Does NOT return: name, fiscal code, address, phone, email
class CitizenRepository:
def check_benefit_eligibility(self, citizen_id: str, benefit_code: str) -> bool:
# SELECT only the needed column, never SELECT *
query = """
SELECT EXISTS(
SELECT 1 FROM citizen_benefits
WHERE citizen_id = {citizen_id}
AND benefit_code = {benefit_code}
AND valid_until >= CURRENT_DATE
)
"""
return self.db.execute(query, {"citizen_id": citizen_id, "benefit_code": benefit_code}).scalar()
Pattern 2: Pseudonymization and Tokenization
Pseudonymization (defined in Art. 4(5) GDPR) is one of the technical measures explicitly cited in Art. 25 as adequate to demonstrate Privacy by Design compliance. It differs from anonymization: pseudonymized data can be re-linked to the data subject using additional information held separately, whereas anonymized data cannot (and therefore falls outside GDPR scope).
For PA systems, pseudonymization is often preferable to anonymization because it allows internal traceability (required for audit and compliance) while protecting citizen identity in front-end systems and logs.
# Pseudonymization Pattern with isolated Vault
from dataclasses import dataclass
from datetime import datetime, timedelta
import hmac, hashlib
@dataclass
class PseudonymRecord:
pseudonym: str
real_id: str
created_at: datetime
expires_at: datetime
purpose: str # Processing purpose (Art. 5(1)(b))
class PseudonymVault:
"""
Isolated vault managing pseudonym <-> real identity mapping.
Restricted access: only authorized services with vault key.
Every access is audit-logged for GDPR compliance.
"""
def __init__(self, vault_key: bytes, db_connection):
self._vault_key = vault_key
self._db = db_connection
def create_pseudonym(self, real_id: str, purpose: str, validity_days: int = 365) -> str:
"""
Generates a cryptographically secure pseudonym using HMAC-SHA256.
Deterministic (same input = same output) but non-reversible without vault key.
"""
pseudonym_bytes = hmac.new(
key=self._vault_key,
msg=f"{real_id}:{purpose}".encode(),
digestmod=hashlib.sha256
).digest()
pseudonym = pseudonym_bytes.hex()[:32] # 128-bit
record = PseudonymRecord(
pseudonym=pseudonym,
real_id=real_id,
created_at=datetime.utcnow(),
expires_at=datetime.utcnow() + timedelta(days=validity_days),
purpose=purpose
)
self._db.insert_pseudonym(record)
return pseudonym
def resolve_pseudonym(self, pseudonym: str, requesting_service: str) -> str:
"""Resolves pseudonym to real identity. Audit logs every access."""
self._log_resolution_access(pseudonym, requesting_service, datetime.utcnow())
record = self._db.get_pseudonym(pseudonym)
if not record:
raise ValueError("Pseudonym not found")
if datetime.utcnow() > record.expires_at:
raise ValueError("Pseudonym expired")
return record.real_id
Pattern 3: GDPR-Compliant Consent Management
Consent (Art. 7 GDPR) must be freely given, specific, informed, and unambiguous. For public administration, most processing relies on legal bases other than consent (Art. 6(1)(e): public task; Art. 6(1)(c): legal obligation). However, when consent is chosen as the legal basis — for optional newsletters or non-mandatory services — the consent management system must meet precise requirements. A key rule: withdrawal must be as easy as granting consent.
-- SQL Schema: GDPR-compliant Consent Management
-- Separate database or isolated schema with controlled access
CREATE TABLE consent_purposes (
purpose_id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
code VARCHAR(64) UNIQUE NOT NULL,
name_en TEXT NOT NULL,
description_en TEXT NOT NULL,
legal_basis VARCHAR(32) NOT NULL,
data_categories TEXT[] NOT NULL,
retention_days INTEGER NOT NULL,
third_parties TEXT[],
version INTEGER NOT NULL DEFAULT 1,
is_active BOOLEAN DEFAULT TRUE
);
CREATE TABLE citizen_consents (
consent_id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
citizen_pseudonym VARCHAR(64) NOT NULL,
purpose_id UUID REFERENCES consent_purposes(purpose_id),
status VARCHAR(16) NOT NULL CHECK (status IN ('granted', 'denied', 'withdrawn')),
granted_at TIMESTAMPTZ,
withdrawn_at TIMESTAMPTZ,
proof_channel VARCHAR(32) NOT NULL,
proof_ip_hash VARCHAR(64),
privacy_policy_version VARCHAR(16) NOT NULL,
consent_language VARCHAR(8) NOT NULL DEFAULT 'en'
);
-- Right to withdrawal function (Art. 7(3))
CREATE OR REPLACE FUNCTION withdraw_consent(
p_citizen_pseudonym VARCHAR(64),
p_purpose_code VARCHAR(64)
) RETURNS VOID AS $
BEGIN
UPDATE citizen_consents
SET status = 'withdrawn', withdrawn_at = NOW()
WHERE citizen_pseudonym = p_citizen_pseudonym
AND purpose_id = (SELECT purpose_id FROM consent_purposes WHERE code = p_purpose_code)
AND status = 'granted';
END;
$ LANGUAGE plpgsql;
Pattern 4: Automated Data Retention and Right to Erasure
The storage limitation principle (Art. 5(1)(e) GDPR) requires that personal data be kept for no longer than necessary for the stated purpose. Every PA system must implement automated retention policies. The right to erasure (Art. 17 GDPR) adds complexity: the system must delete a specific individual's data on request, while respecting exceptions such as legal obligations.
# Automated Data Retention Manager
from dataclasses import dataclass
from enum import Enum
from datetime import datetime, timedelta
class RetentionAction(Enum):
DELETE = "delete"
ANONYMIZE = "anonymize"
@dataclass
class RetentionPolicy:
table_name: str
date_column: str
retention_days: int
action: RetentionAction
legal_basis: str
RETENTION_POLICIES = [
RetentionPolicy("session_logs", "created_at", 90, RetentionAction.DELETE,
"GDPR Art. 5(1)(e) - Storage limitation"),
RetentionPolicy("service_requests", "completed_at", 2555, RetentionAction.ANONYMIZE,
"Administrative records - 7 year retention"),
]
class DataRetentionManager:
async def run_retention_policy(self, policy: RetentionPolicy) -> dict:
cutoff = datetime.utcnow() - timedelta(days=policy.retention_days)
if policy.action == RetentionAction.DELETE:
result = await self.db.execute(
f"DELETE FROM {policy.table_name} WHERE {policy.date_column} < $1",
cutoff
)
elif policy.action == RetentionAction.ANONYMIZE:
result = await self.db.execute(
f"""UPDATE {policy.table_name}
SET first_name = NULL, last_name = NULL, email = NULL,
fiscal_code = 'ANONYMIZED_' || gen_random_uuid()::text
WHERE {policy.date_column} < $1
AND fiscal_code NOT LIKE 'ANONYMIZED_%'""",
cutoff
)
return {"policy": policy.table_name, "rows_affected": result.rowcount}
async def handle_erasure_request(self, citizen_pseudonym: str) -> dict:
"""Handles Art. 17 right to erasure with legal exception checks."""
exceptions = await self._check_erasure_exceptions(citizen_pseudonym)
if exceptions:
return {"status": "denied", "reasons": exceptions}
tables_affected = []
for table in ["session_logs", "consent_records"]:
rows = await self.db.execute(
f"DELETE FROM {table} WHERE citizen_pseudonym = $1",
citizen_pseudonym
)
if rows.rowcount > 0:
tables_affected.append(table)
return {"status": "completed", "tables_affected": tables_affected}
DPIA: When It Is Mandatory and How to Structure It
A Data Protection Impact Assessment (DPIA) is mandatory under Art. 35 GDPR when processing "is likely to result in a high risk to the rights and freedoms of natural persons". The Italian Data Protection Authority has published a mandatory list of such processing operations for public bodies.
PA Processing Operations Requiring Mandatory DPIA
- Systematic profiling of citizens (e.g., socioeconomic scoring)
- Large-scale processing of special categories (health, criminal records)
- Systematic monitoring of publicly accessible areas (CCTV)
- Matching/linking of datasets from different sources
- Data of vulnerable persons (minors, patients, asylum seekers)
- Innovative use of AI/ML for automated decisions (Art. 22)
- International data transfers
Pattern 5: GDPR-Compliant Audit Logging
Accountability (Art. 5(2) GDPR) requires the controller to demonstrate compliance. This demands an audit system that tracks data operations without itself becoming a source of excessive personal data processing. A compliant audit log must be append-only (immutable), pseudonymized, have its own retention policy, and support efficient queries for DPO or regulatory authority requests.
-- Immutable audit log (append-only enforcement via PostgreSQL rules)
CREATE TABLE gdpr_audit_log (
log_id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
subject_pseudonym VARCHAR(64), -- Who performed the action
affected_entity_pseudonym VARCHAR(64), -- Which entity was affected
action_type VARCHAR(64) NOT NULL, -- READ, UPDATE, DELETE, EXPORT
resource_type VARCHAR(64) NOT NULL, -- citizen_record, consent, health_data
legal_basis VARCHAR(128),
purpose VARCHAR(256),
service_name VARCHAR(128) NOT NULL,
request_id VARCHAR(128),
occurred_at TIMESTAMPTZ NOT NULL DEFAULT NOW()
-- Does NOT include: raw IP, full user agent, request payload
);
-- Prevent any modification (true append-only)
CREATE RULE no_update_audit AS ON UPDATE TO gdpr_audit_log DO INSTEAD NOTHING;
CREATE RULE no_delete_audit AS ON DELETE TO gdpr_audit_log DO INSTEAD NOTHING;
-- Auto-trigger for sensitive table access
CREATE OR REPLACE FUNCTION audit_sensitive_access() RETURNS TRIGGER AS $
BEGIN
INSERT INTO gdpr_audit_log (
action_type, resource_type, subject_pseudonym,
affected_entity_pseudonym, service_name, legal_basis
) VALUES (
TG_OP, TG_TABLE_NAME,
current_setting('app.current_user_pseudonym', true),
NEW.citizen_pseudonym,
current_setting('app.service_name', true),
current_setting('app.legal_basis', true)
);
RETURN NEW;
END;
$ LANGUAGE plpgsql SECURITY DEFINER;
Overall Architecture: GDPR-by-Design in a PA Service
Combining all described patterns, a GDPR-by-Design compliant PA service architecture includes:
- API Gateway: verifies authentication (SPID/CIE), applies pseudonymization, propagates the legal basis in request context
- Pseudonym Vault: isolated service for pseudonym-to-identity mapping, restricted access, every resolution audit-logged
- Data-minimized microservices: each service receives only necessary data, exposes minimal DTOs, uses projection queries
- Consent Management Service: manages consents, verifies legal bases, supports withdrawal and data portability (Art. 20)
- Retention Scheduler: periodic jobs applying retention policies, handling erasure requests, producing DPO reports
- Audit Log Service: append-only, isolated, write-only from application, read-only from DPO
Useful Tools for GDPR-by-Design in PA
- OpenDP: Python library for differential privacy, useful for anonymized analytics
- ARX Data Anonymization Tool: open-source tool for dataset anonymization
- Keycloak: open-source identity provider with built-in consent management
- PgAudit: PostgreSQL extension for database-level audit logging
- Designers Italia: AgID official guidelines on Privacy by Design for PA
Conclusions and Next Steps
GDPR-by-Design is not a bureaucratic checklist: it is an architectural approach that, when integrated from the earliest project phases, produces systems that are more secure, more transparent, and more robust. The patterns described here apply to any Italian PA digital service, from a simple newsletter consent form to complex systems like the electronic health record or digital payment platforms.
The next article in this series examines how to implement accessible user interfaces for public administration following the WCAG 2.1 AA standard — another regulatory requirement that rewards early integration into the design process.
Related Articles in This Series
- GovTech #01: eIDAS 2.0 and EUDI Wallet - European digital identity and verifiable credentials
- GovTech #02: OpenID Connect for government identity - SPID, CIE and security best practices
- GovTech #05: Accessible UIs for PA - implementing WCAG 2.1 AA
- GovTech #06: Government API Integration - SPID, CIE and pagoPA







