Carbon Accounting Software Architecture: ESG Platforms
In 2025, carbon emissions reporting is no longer a voluntary choice for large European companies: it is a legal obligation. The Corporate Sustainability Reporting Directive (CSRD), transposed into Italian law by Legislative Decree 125/2024, has fundamentally transformed the ESG reporting landscape, shifting the discipline from the domain of corporate communications to regulatory compliance — carrying the same operational and technical implications as a financial accounting system.
The global carbon accounting software market reached $1.8 billion in 2025 and is projected to exceed $6 billion by 2030, with a CAGR of 27.4%. This is not simply a matter of purchasing a SaaS platform: designing a robust carbon accounting system requires expertise spanning atmospheric chemistry, software engineering, statistics and EU law. Engineering teams building these platforms must understand the GHG Protocol, the European Sustainability Reporting Standards (ESRS), emission factor databases and the architecture of distributed systems capable of collecting, calculating and certifying data from hundreds of different sources.
This article is a comprehensive technical guide: from domain modelling with GHG Protocol entities, to implementing a calculation engine in Python with FastAPI, through automating CSRD/CDP/GRI reports and integrating with the Climatiq API for real-time emission factors. Whether you are building an internal ESG platform, evaluating enterprise solutions, or simply want to understand how these architectures work under the hood, you are in the right place.
What You Will Learn in This Article
- The GHG Protocol: Scope 1, 2 and 3 and the 15 indirect supply chain emission categories
- How to model the data domain of a carbon accounting system (Organization, Facility, EmissionSource, Activity)
- The main emission factor databases: DEFRA, EPA, ecoinvent, Climatiq API
- Microservices architecture: Data Collection, Calculation Engine, Reporting, Audit Trail
- Python implementation with FastAPI and pandas for emission calculation
- Scope 3 automation: integration with SAP ERP, procurement data, travel data
- Automated report generation for CSRD/ESRS, CDP and GRI Standards
- Audit trail and data lineage: full traceability of calculations
- Platform comparison: Persefoni, Watershed, Sphera, Plan A
- Case study: Italian manufacturing company with CSRD reporting 2025
- Updated regulatory framework: CSRD Omnibus 2025, EU Taxonomy, D.Lgs. 125/2024
Position in the EnergyTech Series
| # | Article | Status |
|---|---|---|
| 1 | MQTT and InfluxDB: Time Series for Energy Data | Published |
| 2 | IEC 61850: Standard Protocol for Smart Electrical Grids | Published |
| 3 | DERMS: Distributed Energy Resource Management | Published |
| 4 | Building Management System: AI Energy Optimization | Published |
| 5 | Renewable Energy Forecasting: ML for Solar and Wind | Published |
| 6 | EV Load Balancing: Smart Charging and Vehicle-to-Grid | Published |
| 7 | Blockchain P2P Energy Trading: Decentralised Energy Markets | Published |
| 8 | You are here - Carbon Accounting Software Architecture: ESG Platforms | Current |
| 9 | Energy Digital Twin: Simulation and Optimization | Next |
| 10 | OCPP and EV Infrastructure: Standards and Implementation | Coming soon |
The Regulatory Context: CSRD, ESRS and D.Lgs. 125/2024
Understanding the regulation is not a bureaucratic prerequisite: it is the foundation on which the entire system architecture is designed. Every technical requirement, every database field, every API endpoint derives from a specific disclosure obligation.
CSRD and the Waves of Application
The Corporate Sustainability Reporting Directive defined three waves of application for European companies. The first wave (financial year 2024, reports published in 2025) covered companies already subject to the previous NFRD: listed companies, banks and insurers with more than 500 employees. The second and third waves, originally scheduled for 2025 and 2026, were postponed by two years by the Stop-the-Clock Directive published in the EU Official Journal on 16 April 2025.
The most significant change came in December 2025 with the approval of the Omnibus I package: the mandatory application threshold was raised to 1,000 employees and €450 million in turnover, reducing by approximately 80% the number of companies subject to mandatory CSRD. The ESRS standards are under revision with a 61% reduction in mandatory data points (from approximately 1,100 to approximately 430), with adoption expected in the first half of 2026 and application from 2027.
Updated CSRD Timeline (post-Omnibus 2025)
| Wave | Entities | First Report | Status |
|---|---|---|---|
| Wave 1 | Former NFRD companies (>500 emp., listed/banks/insurers) | Report 2025 (FY 2024) | Ongoing |
| Wave 2 | Large companies >1,000 emp. and >€450M turnover | Report 2027 (FY 2026) - postponed | Postponed |
| Wave 3 | SMEs listed on EU regulated markets | Report 2028 (FY 2027) - postponed | Postponed |
| ESRS rev. | All CSRD entities, simplified standards | From FY 2027 | Under consultation |
D.Lgs. 125/2024: The Italian Transposition
Italy transposed the CSRD via Legislative Decree 125 of 6 September 2024, which entered into force on 25 September 2024. The decree repealed the previous D.Lgs. 254/2016 that had transposed the NFRD. The main changes for Italian companies include: the obligation for limited assurance on the sustainability report from a qualified auditor, integration of the sustainability report into the management report, and publication in the dedicated section of the Companies Register.
EU Taxonomy: The Connection
Carbon accounting directly intersects with the EU Taxonomy Regulation (EU Reg. 852/2020), which classifies economic activities as "sustainable" based on six environmental objectives. CSRD companies must report KPIs for Taxonomy alignment (percentages of turnover, CapEx, OpEx that are "aligned" and "eligible"). With the 2025 simplifications, companies with fewer than 1,000 employees are exempt from Taxonomy reporting, and the remaining companies may limit their reporting to activities representing at least 10% of turnover, CapEx or OpEx.
GHG Protocol: The Reference Framework
The Greenhouse Gas Protocol is the world's most widely used accounting standard for greenhouse gas emissions, developed by the World Resources Institute (WRI) and the World Business Council for Sustainable Development (WBCSD). Virtually all reporting frameworks (CSRD/ESRS, CDP, GRI, ISO 14064) reference or build on the GHG Protocol.
The Three Scopes: Precise Definitions
The division into three "scopes" allows emissions to be attributed clearly, avoiding double counting between different actors in the value chain.
Scope 1: Direct Emissions
Emissions from sources owned by or under the control of the organisation. These include: combustion of fossil fuels in boilers, furnaces, and company-owned vehicles; process emissions (e.g. CO2 from chemical reactions, CH4 from livestock); fugitive emissions (refrigerant leaks, gas leaks from installations). The relevant GHG gases are the seven under the Kyoto Protocol: CO2, CH4, N2O, HFC, PFC, SF6, NF3, all converted to CO2 equivalent (CO2e) using the IPCC Global Warming Potentials (GWP).
Scope 2: Indirect Emissions from Purchased Energy
Emissions associated with the generation of purchased electricity, heat, steam or cooling consumed by the organisation. The GHG Protocol Scope 2 Guidance (2015) requires reporting under two distinct methods: the location-based method (uses the emission factor of the electricity grid at the location of consumption, e.g. the average Italian grid factor) and the market-based method (uses factors from market instruments such as Contracts for Difference, Renewable Energy Certificates/GOs, Power Purchase Agreements). Both must be reported.
Scope 3: Other Indirect Emissions
Value chain emissions, divided into 15 upstream and downstream categories. These are typically the most significant (on average 70-80% of the total footprint) and the most difficult to measure. ESRS requires reporting of Scope 3 categories identified as "material" through the Double Materiality Assessment.
The 15 Scope 3 Categories
| # | Category | Type | Typical for |
|---|---|---|---|
| 1 | Purchased goods and services | Upstream | All sectors |
| 2 | Capital goods | Upstream | Manufacturing, construction |
| 3 | Fuel and energy related activities | Upstream | All sectors |
| 4 | Upstream transportation and distribution | Upstream | Retail, manufacturing |
| 5 | Waste generated in operations | Upstream | Manufacturing, food |
| 6 | Business travel | Upstream | Services, tech |
| 7 | Employee commuting | Upstream | All sectors |
| 8 | Upstream leased assets | Upstream | Real estate, retail |
| 9 | Downstream transportation | Downstream | Manufacturing, FMCG |
| 10 | Processing of sold products | Downstream | Raw materials, chemical |
| 11 | Use of sold products | Downstream | Automotive, electronics |
| 12 | End-of-life treatment | Downstream | Packaging, durable goods |
| 13 | Downstream leased assets | Downstream | Real estate |
| 14 | Franchises | Downstream | Food & beverage, retail |
| 15 | Investments | Downstream | Banks, investment funds |
Data Model: Modelling the Carbon Accounting Domain
The heart of every carbon accounting platform is a robust data model that faithfully reflects the concepts of the GHG Protocol. Let us examine the main entities with their attributes and relationships.
Core Entities
A complete system requires at least six core entities: Organization, Facility, EmissionSource, EmissionFactor, Activity and Calculation. Here is the model in Python with SQLAlchemy:
# models/core.py - Complete GHG Protocol data model
from sqlalchemy import Column, String, Float, Enum, DateTime, ForeignKey, JSON, Integer
from sqlalchemy.orm import relationship, DeclarativeBase
from datetime import datetime
from enum import Enum as PyEnum
import uuid
class Base(DeclarativeBase):
pass
class ScopeType(PyEnum):
SCOPE_1 = "scope_1"
SCOPE_2_LOCATION = "scope_2_location"
SCOPE_2_MARKET = "scope_2_market"
SCOPE_3 = "scope_3"
class Scope3Category(PyEnum):
CAT_1_PURCHASED_GOODS = "cat_1"
CAT_2_CAPITAL_GOODS = "cat_2"
CAT_3_FUEL_ENERGY = "cat_3"
CAT_4_UPSTREAM_TRANSPORT = "cat_4"
CAT_5_WASTE = "cat_5"
CAT_6_BUSINESS_TRAVEL = "cat_6"
CAT_7_EMPLOYEE_COMMUTING = "cat_7"
CAT_8_UPSTREAM_LEASED = "cat_8"
CAT_9_DOWNSTREAM_TRANSPORT = "cat_9"
CAT_10_PROCESSING = "cat_10"
CAT_11_USE_OF_PRODUCTS = "cat_11"
CAT_12_END_OF_LIFE = "cat_12"
CAT_13_DOWNSTREAM_LEASED = "cat_13"
CAT_14_FRANCHISES = "cat_14"
CAT_15_INVESTMENTS = "cat_15"
class Organization(Base):
__tablename__ = "organizations"
id = Column(String, primary_key=True, default=lambda: str(uuid.uuid4()))
name = Column(String(255), nullable=False)
legal_entity_id = Column(String(100)) # LEI or VAT number
country_code = Column(String(3), nullable=False) # ISO 3166-1 alpha-3
nace_code = Column(String(10)) # NACE Rev.2 sector code
reporting_year = Column(Integer, nullable=False)
consolidation_approach = Column(String(50)) # equity_share, financial_control, operational_control
base_year = Column(Integer) # reference year for targets
created_at = Column(DateTime, default=datetime.utcnow)
updated_at = Column(DateTime, default=datetime.utcnow, onupdate=datetime.utcnow)
facilities = relationship("Facility", back_populates="organization")
calculations = relationship("Calculation", back_populates="organization")
class Facility(Base):
__tablename__ = "facilities"
id = Column(String, primary_key=True, default=lambda: str(uuid.uuid4()))
organization_id = Column(String, ForeignKey("organizations.id"), nullable=False)
name = Column(String(255), nullable=False)
facility_type = Column(String(50)) # manufacturing_plant, office, warehouse, data_center
address = Column(String(500))
country_code = Column(String(3), nullable=False)
latitude = Column(Float)
longitude = Column(Float)
grid_region = Column(String(50)) # e.g. "IT_NORD", "DE_TENNET" for Scope 2
floor_area_sqm = Column(Float)
is_owned = Column(String(10)) # owned, leased, operated
operational_start = Column(DateTime)
organization = relationship("Organization", back_populates="facilities")
emission_sources = relationship("EmissionSource", back_populates="facility")
class EmissionSource(Base):
"""Specific emission source: boiler, vehicle, process, energy purchase"""
__tablename__ = "emission_sources"
id = Column(String, primary_key=True, default=lambda: str(uuid.uuid4()))
facility_id = Column(String, ForeignKey("facilities.id"), nullable=False)
name = Column(String(255), nullable=False)
source_type = Column(String(100)) # natural_gas_boiler, diesel_generator, company_car, electricity
scope = Column(Enum(ScopeType), nullable=False)
scope_3_category = Column(Enum(Scope3Category)) # only for Scope 3
fuel_type = Column(String(50)) # natural_gas, diesel, petrol, LPG
unit_of_measure = Column(String(20)) # kWh, liters, kg, km, tonne
description = Column(String(1000))
is_active = Column(String(10), default="true")
facility = relationship("Facility", back_populates="emission_sources")
activities = relationship("Activity", back_populates="emission_source")
class EmissionFactor(Base):
"""Emission factor from a certified database"""
__tablename__ = "emission_factors"
id = Column(String, primary_key=True, default=lambda: str(uuid.uuid4()))
source_database = Column(String(50), nullable=False) # DEFRA, EPA, ecoinvent, Climatiq, IPCC
activity_id = Column(String(200)) # specific ID from source database
name = Column(String(500), nullable=False)
category = Column(String(100))
region = Column(String(50)) # country/region code
year = Column(Integer, nullable=False)
unit_type = Column(String(50)) # kgCO2e/kWh, kgCO2e/liter, kgCO2e/km, kgCO2e/tonne
co2e_factor = Column(Float, nullable=False) # main value in kgCO2e
co2_factor = Column(Float) # CO2 separate
ch4_factor = Column(Float) # CH4 separate
n2o_factor = Column(Float) # N2O separate
gwp_version = Column(String(20), default="AR6") # AR5, AR6
lca_activity = Column(String(50)) # upstream, combustion, downstream
source_url = Column(String(500))
last_updated = Column(DateTime)
class Activity(Base):
"""Record of an activity with measured consumption"""
__tablename__ = "activities"
id = Column(String, primary_key=True, default=lambda: str(uuid.uuid4()))
emission_source_id = Column(String, ForeignKey("emission_sources.id"), nullable=False)
emission_factor_id = Column(String, ForeignKey("emission_factors.id"))
period_start = Column(DateTime, nullable=False)
period_end = Column(DateTime, nullable=False)
quantity = Column(Float, nullable=False)
unit = Column(String(20), nullable=False)
data_quality = Column(String(20), default="measured") # measured, estimated, calculated, supplier
data_source = Column(String(200)) # source system name: SAP, utility_bill, travel_tool
raw_data = Column(JSON) # original unprocessed data
notes = Column(String(1000))
created_by = Column(String(100))
created_at = Column(DateTime, default=datetime.utcnow)
emission_source = relationship("EmissionSource", back_populates="activities")
emission_factor = relationship("EmissionFactor")
calculation = relationship("Calculation", back_populates="activity", uselist=False)
class Calculation(Base):
"""Emission calculation result, immutable once created"""
__tablename__ = "calculations"
id = Column(String, primary_key=True, default=lambda: str(uuid.uuid4()))
organization_id = Column(String, ForeignKey("organizations.id"), nullable=False)
activity_id = Column(String, ForeignKey("activities.id"), nullable=False)
scope = Column(Enum(ScopeType), nullable=False)
scope_3_category = Column(Enum(Scope3Category))
co2e_tonnes = Column(Float, nullable=False) # result in tonnes CO2e
co2_tonnes = Column(Float)
ch4_tonnes = Column(Float)
n2o_tonnes = Column(Float)
calculation_method = Column(String(50)) # spend_based, activity_based, hybrid, supplier_specific
emission_factor_value = Column(Float) # snapshot of factor used
emission_factor_unit = Column(String(50))
emission_factor_source = Column(String(100))
calculation_formula = Column(String(500)) # formula used, for audit
calculated_at = Column(DateTime, default=datetime.utcnow)
calculated_by = Column(String(100)) # user or automated system
version = Column(Integer, default=1) # versioning for recertification
is_verified = Column(String(10), default="false")
organization = relationship("Organization", back_populates="calculations")
activity = relationship("Activity", back_populates="calculation")
Emission Factor Databases: DEFRA, EPA, ecoinvent, Climatiq
Emission factors are the mathematical core of carbon accounting: they convert a quantity of activity (litres of diesel, kWh consumed, km travelled, euros spent) into tonnes of CO2e. The quality and currency of the factors directly determines the quality of the report.
The Main Databases
| Database | Manager | Coverage | Update Frequency | Access |
|---|---|---|---|---|
| DEFRA/BEIS | UK Govt (DESNZ) | UK + international, all scopes | Annual (July) | Free (Excel) |
| EPA GHG Hub | US EPA | USA, mobility, energy, Scope 3 | Annual (January) | Free (Excel) |
| ecoinvent | ecoinvent Association | Global, full LCA, 18,000+ datasets | Semi-annual | Paid licence |
| IPCC EF Database | IPCC | Global, national inventories | With each Assessment Report | Free |
| Climatiq API | Climatiq | Multi-source, 50,000+ factors | Continuous (real-time) | API (freemium) |
| AIB Residual Mix | Association of Issuing Bodies | Europe, Scope 2 market-based | Annual | Free |
| IEA Electricity | International Energy Agency | Global, electricity grid factors | Annual | Partially free |
Climatiq API Integration
Climatiq provides a REST API with more than 50,000 verified emission factors, ISO 14067 and GHG Protocol coverage, and is one of the most widely used solutions for integrating emission calculations programmatically. The API supports Scope 1, 2 and 3 with activity-based and spend-based methods.
# services/climatiq_client.py - Climatiq API Integration
import httpx
import os
from dataclasses import dataclass
from typing import Optional
from functools import lru_cache
CLIMATIQ_BASE_URL = "https://api.climatiq.io"
@dataclass
class EmissionEstimate:
co2e: float
co2e_unit: str
co2e_calculation_method: str
co2e_calculation_origin: str
emission_factor_name: str
emission_factor_id: str
source: str
year: int
region: str
@dataclass
class ActivityData:
activity_id: str
data: dict # { "energy": { "value": 1000, "energy_unit": "kWh" } }
region: Optional[str] = None
year: Optional[int] = None
class ClimatiqClient:
def __init__(self, api_key: Optional[str] = None):
self.api_key = api_key or os.environ.get("CLIMATIQ_API_KEY")
if not self.api_key:
raise ValueError("CLIMATIQ_API_KEY not set")
self.client = httpx.AsyncClient(
base_url=CLIMATIQ_BASE_URL,
headers={
"Authorization": f"Bearer {self.api_key}",
"Content-Type": "application/json"
},
timeout=30.0
)
async def estimate_emission(self, activity: ActivityData) -> EmissionEstimate:
"""Calculate emissions for a single activity"""
payload = {
"emission_factor": {
"activity_id": activity.activity_id,
},
**activity.data
}
if activity.region:
payload["emission_factor"]["region"] = activity.region
if activity.year:
payload["emission_factor"]["year"] = activity.year
response = await self.client.post("/estimate", json=payload)
response.raise_for_status()
result = response.json()
return EmissionEstimate(
co2e=result["co2e"],
co2e_unit=result["co2e_unit"],
co2e_calculation_method=result["co2e_calculation_method"],
co2e_calculation_origin=result.get("co2e_calculation_origin", ""),
emission_factor_name=result["emission_factor"]["name"],
emission_factor_id=result["emission_factor"]["activity_id"],
source=result["emission_factor"]["source"],
year=result["emission_factor"]["year"],
region=result["emission_factor"].get("region", "")
)
async def estimate_batch(self, activities: list[ActivityData]) -> list[EmissionEstimate]:
"""Batch calculation for up to 100 activities"""
payload = {
"requests": [
{
"emission_factor": {"activity_id": a.activity_id},
**a.data
}
for a in activities
]
}
response = await self.client.post("/batch", json=payload)
response.raise_for_status()
results = response.json()["results"]
return [
EmissionEstimate(
co2e=r["co2e"],
co2e_unit=r["co2e_unit"],
co2e_calculation_method=r["co2e_calculation_method"],
co2e_calculation_origin=r.get("co2e_calculation_origin", ""),
emission_factor_name=r["emission_factor"]["name"],
emission_factor_id=r["emission_factor"]["activity_id"],
source=r["emission_factor"]["source"],
year=r["emission_factor"]["year"],
region=r["emission_factor"].get("region", "")
)
for r in results
]
async def search_emission_factors(
self,
query: str,
region: Optional[str] = None,
year: Optional[int] = None,
source: Optional[str] = None
) -> list[dict]:
"""Search emission factors in the Climatiq database"""
params = {"query": query, "page": 1, "page_size": 20}
if region:
params["region"] = region
if year:
params["year"] = year
if source:
params["source"] = source
response = await self.client.get("/search", params=params)
response.raise_for_status()
return response.json()["results"]
async def __aenter__(self):
return self
async def __aexit__(self, *args):
await self.client.aclose()
# Usage example - Scope 2 location-based calculation for Italian plant
async def calculate_scope2_italy(kwh_consumed: float) -> float:
"""
Calculate Scope 2 location-based emissions for electricity consumption in Italy.
IEA 2024 factor for Italy: ~0.233 kgCO2e/kWh
"""
async with ClimatiqClient() as client:
activity = ActivityData(
activity_id="electricity-supply_grid-source_supplier_mix",
data={
"energy": {
"value": kwh_consumed,
"energy_unit": "kWh"
}
},
region="IT", # Italy
year=2024
)
estimate = await client.estimate_emission(activity)
# Convert kgCO2e to tonnes CO2e
return estimate.co2e / 1000 if estimate.co2e_unit == "kg" else estimate.co2e
Microservices Architecture for Carbon Accounting
An enterprise carbon accounting platform must manage heterogeneous data flows (energy bills, ERP data, expense reports, supplier data), complex and auditable calculations, and report generation in multiple formats. A microservices architecture is the natural choice for these requirements.
The Four Core Services
1. Data Collection Service
Responsible for acquiring data from diverse sources: ERP APIs (SAP, Oracle), utility bills via OCR/parser, travel management tools (Concur, TravelPerk), procurement data, manual CSV uploads, and supplier APIs. It exposes ingestion endpoints and handles normalisation of units of measure.
2. Calculation Engine
The computational core: receives normalised activities, selects appropriate emission factors (local or via Climatiq API), applies GHG Protocol formulas, and produces immutable results with an explicit calculation formula for audit purposes. Supports historical recalculations when factors are updated.
3. Reporting Service
Generates reports in the formats required by frameworks: CSRD/ESRS (with XBRL tagging), CDP questionnaire (JSON/XML format), GRI Standards disclosures, internal reports for dashboards. Manages report versioning and digital signatures for assurance.
4. Audit Trail Service
Maintains an immutable log of every operation: who entered the data, which factor was used, when it was calculated, who approved it. Supports complete data lineage: from the source data to the final number in the report. Essential for assurance by auditors.
Architecture Diagram
+---------------------------------------------------------------+
| CARBON ACCOUNTING PLATFORM |
+---------------------------------------------------------------+
| FRONTEND |
| +-------------+ +-------------+ +------------------------+ |
| | Dashboard | | Data Input | | Report Generator | |
| | (Angular) | | Wizard | | (CSRD/CDP/GRI) | |
| +------+------+ +------+------+ +------------+-----------+ |
+---------+----------------+-----------------------+------------+
| API GATEWAY (FastAPI/Kong) |
| | | | |
+---------+----------------+-----------------------+------------+
| MICROSERVICES |
| +------+------+ +------+------+ +----------------------+ |
| | Data | | Calculation | | Reporting | |
| | Collection |->| Engine |->| Service | |
| | Service | | (Python) | | (PDF/XBRL/JSON) | |
| +------+------+ +------+------+ +----------------------+ |
| | | |
| +------+------+ +------+------+ +----------------------+ |
| | SAP | | Climatiq | | Audit Trail | |
| | Connector | | API Client | | Service | |
| +-------------+ +-------------+ +----------------------+ |
+---------------------------------------------------------------+
| DATA LAYER |
| +--------------+ +-------------+ +----------------------+ |
| | PostgreSQL | | TimeSeries | | Object Storage | |
| | (Core data) | | (InfluxDB) | | (S3 - documents) | |
| +--------------+ +-------------+ +----------------------+ |
+---------------------------------------------------------------+
EXTERNAL INTEGRATIONS:
- SAP S/4HANA (procurement, energy data)
- Oracle NetSuite (financials for spend-based)
- TravelPerk/Concur (travel data - Cat.6)
- Climatiq API (emission factors)
- AIB Registry (GO certificates - Scope 2 market-based)
- Utility providers API (energy bills)
Calculation Engine: Python Implementation with FastAPI
The calculation engine is the most critical component. It must be precise, auditable, versioned and capable of handling thousands of calculations in parallel. Here is a complete implementation with FastAPI.
# calculation_engine/main.py - FastAPI Calculation Engine
from fastapi import FastAPI, HTTPException, BackgroundTasks, Depends
from pydantic import BaseModel, Field, validator
from typing import Optional, List
from datetime import datetime
from enum import Enum
import pandas as pd
import uuid
import asyncio
from decimal import Decimal, ROUND_HALF_UP
app = FastAPI(
title="Carbon Calculation Engine",
description="GHG Protocol-compliant emission calculation service",
version="2.1.0"
)
# ── Pydantic Models ──────────────────────────────────────────────
class ScopeEnum(str, Enum):
scope_1 = "scope_1"
scope_2_location = "scope_2_location"
scope_2_market = "scope_2_market"
scope_3 = "scope_3"
class CalculationMethod(str, Enum):
activity_based = "activity_based" # quantity * emission factor
spend_based = "spend_based" # spend in EUR * intensity factor
average_data = "average_data" # average over periods
supplier_specific = "supplier_specific" # direct data from suppliers
class ActivityInput(BaseModel):
activity_id: str = Field(..., description="Unique source activity ID")
emission_source_id: str
scope: ScopeEnum
scope_3_category: Optional[str] = None
period_start: datetime
period_end: datetime
quantity: float = Field(..., gt=0, description="Activity quantity")
unit: str = Field(..., description="Unit of measure (kWh, liter, km, tonne, EUR)")
emission_factor_id: Optional[str] = None # if None, engine auto-selects
calculation_method: CalculationMethod = CalculationMethod.activity_based
region: Optional[str] = None
data_quality: str = "measured"
@validator("quantity")
def quantity_must_be_positive(cls, v):
if v <= 0:
raise ValueError("Quantity must be positive")
return v
class CalculationResult(BaseModel):
calculation_id: str
activity_id: str
scope: ScopeEnum
co2e_tonnes: float
co2_tonnes: Optional[float]
ch4_tonnes: Optional[float]
n2o_tonnes: Optional[float]
emission_factor_value: float
emission_factor_unit: str
emission_factor_source: str
emission_factor_year: int
calculation_formula: str
calculation_method: str
uncertainty_percentage: Optional[float]
calculated_at: datetime
audit_reference: str
class BatchCalculationRequest(BaseModel):
organization_id: str
reporting_year: int
activities: List[ActivityInput] = Field(..., max_items=500)
class ScopeAggregation(BaseModel):
scope_1_tonnes: float
scope_2_location_tonnes: float
scope_2_market_tonnes: float
scope_3_tonnes: float
scope_3_by_category: dict
total_location_based: float
total_market_based: float
reporting_year: int
organization_id: str
# ── Calculation Logic ────────────────────────────────────────────
class EmissionCalculator:
"""
GHG Protocol Corporate Standard implementation.
Base formula: Emissions (kgCO2e) = Activity x Emission Factor
"""
# GWP AR6 factors (IPCC Sixth Assessment Report, 2021)
GWP_AR6 = {
"CO2": 1.0,
"CH4": 27.9, # CH4 fossil
"CH4_bio": 27.9,
"N2O": 273.0,
"SF6": 25200.0,
"NF3": 17400.0,
"HFC134a": 1526.0,
"HFC32": 771.0,
}
# Uncertainty factors by data quality (DEFRA methodology)
UNCERTAINTY_BY_DATA_QUALITY = {
"measured": 5.0, # directly measured data
"calculated": 10.0, # calculated from measurements
"estimated": 20.0, # estimates using proxy data
"spend_based": 35.0, # spend-based (less precise)
"default": 25.0,
}
def calculate_activity_based(
self,
quantity: float,
unit: str,
emission_factor: dict,
gwp_version: str = "AR6"
) -> dict:
"""
Activity-based calculation:
CO2e (kg) = Quantity (unit) x EF (kgCO2e/unit)
"""
ef_value = emission_factor["co2e_factor"]
ef_unit = emission_factor["unit_type"]
# Check unit compatibility
if not self._units_compatible(unit, ef_unit):
raise ValueError(
f"Incompatible units: activity in {unit}, "
f"factor in {ef_unit}"
)
co2e_kg = quantity * ef_value
# Calculate separate components if available
co2_kg = quantity * emission_factor.get("co2_factor", 0)
ch4_kg = quantity * emission_factor.get("ch4_factor", 0)
n2o_kg = quantity * emission_factor.get("n2o_factor", 0)
return {
"co2e_tonnes": round(co2e_kg / 1000, 6),
"co2_tonnes": round(co2_kg / 1000, 6) if co2_kg else None,
"ch4_tonnes": round(ch4_kg / 1000, 6) if ch4_kg else None,
"n2o_tonnes": round(n2o_kg / 1000, 6) if n2o_kg else None,
"formula": (
f"{quantity} {unit} x {ef_value} kgCO2e/{unit} "
f"= {co2e_kg:.4f} kgCO2e "
f"= {co2e_kg/1000:.6f} tCO2e"
)
}
def calculate_spend_based(
self,
spend_eur: float,
emission_intensity: float, # kgCO2e/EUR
currency: str = "EUR",
exchange_rate: float = 1.0
) -> dict:
"""
Spend-based calculation (Scope 3 Cat.1 when activity data is unavailable):
CO2e (kg) = Spend (EUR) x Intensity (kgCO2e/EUR)
"""
spend_normalized = spend_eur * exchange_rate
co2e_kg = spend_normalized * emission_intensity
return {
"co2e_tonnes": round(co2e_kg / 1000, 6),
"co2_tonnes": None,
"ch4_tonnes": None,
"n2o_tonnes": None,
"formula": (
f"{spend_normalized:.2f} EUR x {emission_intensity} kgCO2e/EUR "
f"= {co2e_kg:.4f} kgCO2e"
)
}
def _units_compatible(self, activity_unit: str, ef_unit: str) -> bool:
"""Check unit of measure compatibility"""
ef_denominator = ef_unit.split("/")[-1].strip().lower() if "/" in ef_unit else ef_unit
return activity_unit.lower() == ef_denominator
def aggregate_by_scope(
self,
calculations: List[CalculationResult]
) -> ScopeAggregation:
"""Aggregate calculations by scope - uses pandas for performance"""
df = pd.DataFrame([c.dict() for c in calculations])
scope_1 = df[df["scope"] == "scope_1"]["co2e_tonnes"].sum()
scope_2_loc = df[df["scope"] == "scope_2_location"]["co2e_tonnes"].sum()
scope_2_mkt = df[df["scope"] == "scope_2_market"]["co2e_tonnes"].sum()
scope_3_df = df[df["scope"] == "scope_3"]
scope_3_total = scope_3_df["co2e_tonnes"].sum()
scope_3_by_cat = {}
if not scope_3_df.empty and "scope_3_category" in scope_3_df.columns:
scope_3_by_cat = (
scope_3_df.groupby("scope_3_category")["co2e_tonnes"]
.sum()
.to_dict()
)
return ScopeAggregation(
scope_1_tonnes=round(scope_1, 3),
scope_2_location_tonnes=round(scope_2_loc, 3),
scope_2_market_tonnes=round(scope_2_mkt, 3),
scope_3_tonnes=round(scope_3_total, 3),
scope_3_by_category=scope_3_by_cat,
total_location_based=round(scope_1 + scope_2_loc + scope_3_total, 3),
total_market_based=round(scope_1 + scope_2_mkt + scope_3_total, 3),
reporting_year=2024,
organization_id=""
)
calculator = EmissionCalculator()
# ── API Endpoints ────────────────────────────────────────────────
@app.post("/v1/calculate/single", response_model=CalculationResult)
async def calculate_single(activity: ActivityInput):
"""Calculate emissions for a single activity"""
mock_ef = {
"co2e_factor": 0.233, # kgCO2e/kWh - Italian grid 2024
"co2_factor": 0.228,
"ch4_factor": 0.002,
"n2o_factor": 0.001,
"unit_type": "kgCO2e/kWh",
"source": "IEA 2024",
"year": 2024
}
result = calculator.calculate_activity_based(
quantity=activity.quantity,
unit=activity.unit,
emission_factor=mock_ef
)
calculation_id = str(uuid.uuid4())
uncertainty = calculator.UNCERTAINTY_BY_DATA_QUALITY.get(
activity.data_quality, 25.0
)
return CalculationResult(
calculation_id=calculation_id,
activity_id=activity.activity_id,
scope=activity.scope,
co2e_tonnes=result["co2e_tonnes"],
co2_tonnes=result["co2_tonnes"],
ch4_tonnes=result["ch4_tonnes"],
n2o_tonnes=result["n2o_tonnes"],
emission_factor_value=mock_ef["co2e_factor"],
emission_factor_unit=mock_ef["unit_type"],
emission_factor_source=mock_ef["source"],
emission_factor_year=mock_ef["year"],
calculation_formula=result["formula"],
calculation_method=activity.calculation_method.value,
uncertainty_percentage=uncertainty,
calculated_at=datetime.utcnow(),
audit_reference=f"CALC-{calculation_id[:8].upper()}"
)
@app.post("/v1/calculate/batch", response_model=List[CalculationResult])
async def calculate_batch(request: BatchCalculationRequest):
"""Calculate emissions for a batch of activities (max 500)"""
tasks = [calculate_single(activity) for activity in request.activities]
results = await asyncio.gather(*tasks, return_exceptions=True)
successful = [r for r in results if isinstance(r, CalculationResult)]
failed = [r for r in results if isinstance(r, Exception)]
if failed:
# Log errors but do not block the batch
pass
return successful
@app.get("/v1/organizations/{org_id}/summary", response_model=ScopeAggregation)
async def get_emissions_summary(org_id: str, year: int = 2024):
"""Emission summary by scope for an organisation"""
# In production: query from DB
pass
Scope 3 Automation: ERP, Procurement and Travel
Scope 3 is the greatest challenge in carbon accounting: data distributed across dozens of systems, suppliers with varying levels of maturity, different calculation methodologies for each category. Automation is the only way to make it operationally sustainable.
SAP Integration for Scope 3 Category 1 (Purchased Goods)
# integrations/sap_connector.py - Extracting procurement data from SAP
import httpx
from dataclasses import dataclass
from typing import List, Optional
from datetime import date
import pandas as pd
@dataclass
class ProcurementRecord:
purchase_order_id: str
vendor_id: str
vendor_name: str
vendor_country: str
material_code: str
material_description: str
quantity: float
unit: str
amount_eur: float
nace_code: Optional[str] # vendor sector classification
delivery_date: date
class SAPConnector:
"""
SAP S/4HANA connector via OData API.
Extracts procurement data for Scope 3 Cat.1 and Cat.4.
"""
def __init__(self, base_url: str, client_id: str, client_secret: str):
self.base_url = base_url
self.client_id = client_id
self.client_secret = client_secret
self._token: Optional[str] = None
async def authenticate(self):
"""OAuth 2.0 client credentials for SAP"""
async with httpx.AsyncClient() as client:
response = await client.post(
f"{self.base_url}/oauth/token",
data={
"grant_type": "client_credentials",
"client_id": self.client_id,
"client_secret": self.client_secret,
}
)
response.raise_for_status()
self._token = response.json()["access_token"]
async def get_purchase_orders(
self,
year: int,
cost_center: Optional[str] = None
) -> List[ProcurementRecord]:
"""
Retrieve purchase orders from SAP MM module.
OData endpoint: /sap/opu/odata/sap/MM_PUR_PO_MANAGE_SRV/
"""
if not self._token:
await self.authenticate()
params = {
"$filter": f"PostingDate ge datetime'{year}-01-01T00:00:00' and "
f"PostingDate le datetime'{year}-12-31T23:59:59'",
"$select": "PurchaseOrder,Supplier,SupplierName,OrderedQuantity,"
"PurchaseOrderQuantityUnit,NetPriceAmount,Currency",
"$format": "json",
"$top": 5000
}
async with httpx.AsyncClient() as client:
response = await client.get(
f"{self.base_url}/sap/opu/odata/sap/MM_PUR_PO_MANAGE_SRV/"
"A_PurchaseOrder",
headers={"Authorization": f"Bearer {self._token}"},
params=params
)
response.raise_for_status()
data = response.json()["d"]["results"]
return [
ProcurementRecord(
purchase_order_id=item["PurchaseOrder"],
vendor_id=item["Supplier"],
vendor_name=item["SupplierName"],
vendor_country="IT", # to enrich with master data
material_code=item.get("Material", ""),
material_description=item.get("MaterialName", ""),
quantity=float(item["OrderedQuantity"]),
unit=item["PurchaseOrderQuantityUnit"],
amount_eur=float(item["NetPriceAmount"]),
nace_code=None, # to be mapped to NACE classification
delivery_date=date.fromisoformat(item.get("ScheduleLine", f"{year}-12-31")[:10])
)
for item in data
]
def calculate_scope3_cat1_spend_based(
self,
records: List[ProcurementRecord],
emission_intensities: dict # { nace_code: kgCO2e_per_EUR }
) -> pd.DataFrame:
"""
Scope 3 Cat.1 calculation using spend-based method.
Use when activity data from suppliers is unavailable.
Factors from EXIOBASE or WIOD (World Input-Output Database).
"""
df = pd.DataFrame([vars(r) for r in records])
default_intensity = emission_intensities.get("DEFAULT", 0.35) # kgCO2e/EUR
df["emission_intensity"] = df["nace_code"].map(emission_intensities).fillna(default_intensity)
df["co2e_kg"] = df["amount_eur"] * df["emission_intensity"]
df["co2e_tonnes"] = df["co2e_kg"] / 1000
summary = df.groupby(["vendor_id", "vendor_name"]).agg(
total_spend_eur=("amount_eur", "sum"),
total_co2e_tonnes=("co2e_tonnes", "sum"),
transaction_count=("purchase_order_id", "count")
).reset_index()
summary["avg_intensity"] = summary["total_co2e_tonnes"] * 1000 / summary["total_spend_eur"]
return summary.sort_values("total_co2e_tonnes", ascending=False)
# integrations/travel_connector.py - Travel data for Scope 3 Cat.6
@dataclass
class TravelRecord:
employee_id: str
travel_date: date
origin_iata: str # IATA airport code
destination_iata: str
transport_mode: str # air, rail, car, ferry
distance_km: float
travel_class: str # economy, business, first
booking_amount_eur: float
class TravelDataProcessor:
"""Processes business travel data for Scope 3 Cat.6"""
# Flight emission factors (kgCO2e/pkm) - DEFRA 2024
DEFRA_FLIGHT_FACTORS = {
("short_haul", "economy"): 0.151,
("short_haul", "business"): 0.227,
("medium_haul", "economy"): 0.131,
("medium_haul", "business"): 0.262,
("long_haul", "economy"): 0.195,
("long_haul", "business"): 0.585,
("long_haul", "first"): 0.780,
}
# Rail factors (kgCO2e/pkm) - European average
RAIL_FACTORS = {
"IT": 0.004, # Trenitalia (high renewables share)
"DE": 0.006,
"FR": 0.002,
"DEFAULT": 0.041,
}
def classify_flight(self, distance_km: float) -> str:
if distance_km < 1500:
return "short_haul"
elif distance_km < 4000:
return "medium_haul"
else:
return "long_haul"
def calculate_flight_emissions(self, record: TravelRecord) -> float:
"""
Flight emission calculation with Radiative Forcing Index (RFI = 1.9)
for non-CO2 effects at high altitude (DEFRA method with uplift factor)
"""
haul = self.classify_flight(record.distance_km)
travel_class = record.travel_class.lower() if record.travel_class else "economy"
key = (haul, travel_class)
base_factor = self.DEFRA_FLIGHT_FACTORS.get(
key,
self.DEFRA_FLIGHT_FACTORS[(haul, "economy")]
)
rfi_factor = 1.9
co2e_kg = record.distance_km * base_factor * rfi_factor
return co2e_kg / 1000 # in tonnes
def process_travel_data(self, records: List[TravelRecord]) -> dict:
"""Processes all travel data and returns summary for Cat.6"""
results = []
for record in records:
if record.transport_mode == "air":
co2e = self.calculate_flight_emissions(record)
method = "DEFRA 2024 with RFI=1.9"
elif record.transport_mode == "rail":
country = record.origin_iata[:2] # approximation
factor = self.RAIL_FACTORS.get(country, self.RAIL_FACTORS["DEFAULT"])
co2e = record.distance_km * factor / 1000
method = f"Rail factor {country}"
else:
co2e = record.distance_km * 0.171 / 1000 # car average
method = "DEFRA car average"
results.append({
"employee_id": record.employee_id,
"travel_date": record.travel_date,
"transport_mode": record.transport_mode,
"distance_km": record.distance_km,
"co2e_tonnes": co2e,
"method": method
})
df = pd.DataFrame(results)
return {
"total_co2e_tonnes": df["co2e_tonnes"].sum(),
"by_mode": df.groupby("transport_mode")["co2e_tonnes"].sum().to_dict(),
"by_month": df.groupby(df["travel_date"].apply(lambda x: x.month))["co2e_tonnes"].sum().to_dict(),
"total_km": df["distance_km"].sum(),
"record_count": len(df)
}
Automated Reporting: CSRD/ESRS, CDP and GRI
Generating regulatory reports is often the most time-consuming process in carbon accounting. Automation dramatically reduces the time from weeks to hours, increases consistency and ensures every figure is traceable to its source.
CSRD/ESRS E1 Report Structure (Climate Change)
The ESRS E1 standard dedicated to climate requires disclosure on: climate governance, strategy and scenario analysis, risk management, metrics and targets. Emission metrics are defined in ESRS E1-6 and require data for all three scopes.
# reporting/csrd_generator.py - CSRD/ESRS E1 report generation
from dataclasses import dataclass, asdict
from typing import List, Optional, Dict
from datetime import datetime
import json
@dataclass
class ESRS_E1_6_Disclosure:
"""
ESRS E1-6: Gross Scopes 1, 2 and 3 greenhouse gas emissions
Disclosure requirements per ESRS E1 (climate change)
"""
organization_name: str
legal_entity_identifier: str # LEI
reporting_period: str # e.g. "FY2024"
reporting_standard: str = "ESRS E1 - Climate Change"
# Scope 1 - Direct emissions
scope_1_total_gross_tco2e: float = 0.0
scope_1_breakdown_by_ghg: Dict[str, float] = None # { "CO2": x, "CH4": y ... }
scope_1_breakdown_by_source: Dict[str, float] = None
# Scope 2 - Indirect energy emissions
scope_2_location_based_tco2e: float = 0.0
scope_2_market_based_tco2e: float = 0.0
scope_2_purchased_electricity_kwh: float = 0.0
scope_2_renewable_electricity_percentage: float = 0.0
# Scope 3 - Value chain emissions
scope_3_total_tco2e: float = 0.0
scope_3_upstream_tco2e: float = 0.0
scope_3_downstream_tco2e: float = 0.0
scope_3_by_category: Dict[str, float] = None
# Emission intensity
revenue_intensity_tco2e_per_meur: Optional[float] = None # tCO2e/M EUR
employee_intensity_tco2e_per_fte: Optional[float] = None
# Prior year comparison and targets
base_year: int = 2020
scope_1_2_reduction_vs_base: Optional[float] = None # percentage
sbti_target: Optional[str] = None # e.g. "1.5degC aligned, -42% by 2030"
# Methodology
ghg_accounting_standard: str = "GHG Protocol Corporate Standard"
emission_factor_sources: List[str] = None
data_quality_notes: str = ""
assurance_level: str = "limited_assurance"
assurance_provider: str = ""
def __post_init__(self):
if self.scope_3_by_category is None:
self.scope_3_by_category = {}
if self.scope_1_breakdown_by_ghg is None:
self.scope_1_breakdown_by_ghg = {}
if self.emission_factor_sources is None:
self.emission_factor_sources = []
@property
def total_ghg_location_based(self) -> float:
return (self.scope_1_total_gross_tco2e +
self.scope_2_location_based_tco2e +
self.scope_3_total_tco2e)
@property
def total_ghg_market_based(self) -> float:
return (self.scope_1_total_gross_tco2e +
self.scope_2_market_based_tco2e +
self.scope_3_total_tco2e)
class CSRDReportGenerator:
def generate_esrs_e1_json(self, disclosure: ESRS_E1_6_Disclosure) -> str:
"""Generate ESRS E1-6 disclosure in structured JSON format"""
report = {
"metadata": {
"standard": disclosure.reporting_standard,
"generated_at": datetime.utcnow().isoformat(),
"reporting_period": disclosure.reporting_period,
"organization": disclosure.organization_name,
"lei": disclosure.legal_entity_identifier,
},
"ESRS_E1-6": {
"gross_scope_1_tco2e": disclosure.scope_1_total_gross_tco2e,
"scope_1_breakdown_by_ghg": disclosure.scope_1_breakdown_by_ghg,
"scope_2_location_based_tco2e": disclosure.scope_2_location_based_tco2e,
"scope_2_market_based_tco2e": disclosure.scope_2_market_based_tco2e,
"scope_2_purchased_electricity_kwh": disclosure.scope_2_purchased_electricity_kwh,
"scope_2_renewable_pct": disclosure.scope_2_renewable_electricity_percentage,
"scope_3_total_tco2e": disclosure.scope_3_total_tco2e,
"scope_3_by_category": disclosure.scope_3_by_category,
"total_ghg_location_based_tco2e": disclosure.total_ghg_location_based,
"total_ghg_market_based_tco2e": disclosure.total_ghg_market_based,
"ghg_intensity_revenue": disclosure.revenue_intensity_tco2e_per_meur,
"ghg_intensity_employee": disclosure.employee_intensity_tco2e_per_fte,
},
"methodology": {
"accounting_standard": disclosure.ghg_accounting_standard,
"emission_factor_sources": disclosure.emission_factor_sources,
"base_year": disclosure.base_year,
"consolidation_approach": "operational_control",
"data_quality": disclosure.data_quality_notes,
},
"assurance": {
"level": disclosure.assurance_level,
"provider": disclosure.assurance_provider,
},
"targets": {
"sbti_commitment": disclosure.sbti_target,
"scope_1_2_reduction_vs_base": disclosure.scope_1_2_reduction_vs_base,
}
}
return json.dumps(report, indent=2, ensure_ascii=False)
def generate_cdp_questionnaire_c6(self, disclosure: ESRS_E1_6_Disclosure) -> dict:
"""
Generate responses for CDP questionnaire - section C6 (Emissions Data).
CDP shares many requirements with CSRD, reducing double reporting burden.
"""
return {
"C6.1": {
"question": "Provide total gross global Scope 1 emissions in metric tons CO2e",
"response": disclosure.scope_1_total_gross_tco2e,
"unit": "metric tons CO2e"
},
"C6.2": {
"question": "Describe Scope 1 emissions by constituent gases",
"response": disclosure.scope_1_breakdown_by_ghg
},
"C6.3": {
"question": "Provide total gross global Scope 2 emissions",
"response": {
"location_based": disclosure.scope_2_location_based_tco2e,
"market_based": disclosure.scope_2_market_based_tco2e,
}
},
"C6.5": {
"question": "Account for Scope 3 emissions",
"response": {
"total": disclosure.scope_3_total_tco2e,
"categories": disclosure.scope_3_by_category
}
},
"C6.10": {
"question": "Describe Scope 1 and 2 GHG emissions by location",
"note": "See facility-level breakdown in supporting data"
}
}
Audit Trail and Data Lineage
With CSRD requiring limited assurance from a qualified auditor, the audit trail is not an optional requirement: it is the backbone of the platform. Every number in the report must be traceable back to its original source with the explicit calculation formula.
# audit/audit_trail.py - Immutable audit system
from dataclasses import dataclass
from datetime import datetime
from typing import Any, Dict, Optional
import hashlib
import json
import uuid
@dataclass(frozen=True) # immutable
class AuditEvent:
"""
Immutable audit event. Never modified, only new events are appended.
The SHA256 hash guarantees the integrity of the chain.
"""
event_id: str
event_type: str # DATA_INGESTED, FACTOR_SELECTED, CALCULATION_PERFORMED, REPORT_GENERATED, APPROVED
entity_type: str # Activity, Calculation, EmissionFactor, Report
entity_id: str
actor_id: str # user_id or service_name for automated actions
actor_type: str # human, system
timestamp: str # ISO 8601 UTC
data_before: Optional[str] # JSON snapshot before change
data_after: Optional[str] # JSON snapshot after change
metadata: str # JSON { formula, ef_source, notes ... }
previous_hash: str # hash of previous event (blockchain-like)
event_hash: str # SHA256 of all fields
@classmethod
def create(
cls,
event_type: str,
entity_type: str,
entity_id: str,
actor_id: str,
actor_type: str = "human",
data_before: Optional[Dict] = None,
data_after: Optional[Dict] = None,
metadata: Optional[Dict] = None,
previous_hash: str = "GENESIS"
) -> "AuditEvent":
event_id = str(uuid.uuid4())
timestamp = datetime.utcnow().isoformat() + "Z"
data_before_str = json.dumps(data_before, sort_keys=True) if data_before else None
data_after_str = json.dumps(data_after, sort_keys=True) if data_after else None
metadata_str = json.dumps(metadata or {}, sort_keys=True)
hash_content = "|".join([
event_id, event_type, entity_type, entity_id,
actor_id, timestamp, data_after_str or "", previous_hash
])
event_hash = hashlib.sha256(hash_content.encode()).hexdigest()
return cls(
event_id=event_id,
event_type=event_type,
entity_type=entity_type,
entity_id=entity_id,
actor_id=actor_id,
actor_type=actor_type,
timestamp=timestamp,
data_before=data_before_str,
data_after=data_after_str,
metadata=metadata_str,
previous_hash=previous_hash,
event_hash=event_hash
)
class DataLineageTracker:
"""
Tracks the complete journey of a data point from source to report.
Allows auditors to verify every number in the final report.
"""
def __init__(self, db_session):
self.db = db_session
def trace_calculation(self, calculation_id: str) -> dict:
"""
Reconstructs the complete lineage of a calculation:
Report -> Calculation -> Activity -> EmissionFactor -> RawData -> Source System
"""
lineage = {
"calculation_id": calculation_id,
"trace": [
{
"step": 1,
"description": "Raw data ingested from SAP S/4HANA",
"system": "SAP_CONNECTOR",
"timestamp": "2025-01-15T08:30:00Z",
"data_summary": "Natural gas consumption: 5,420 m3, Jan 2024",
"audit_event_id": "AE-001"
},
{
"step": 2,
"description": "Unit conversion applied: m3 -> kWh",
"formula": "5,420 m3 x 10.55 kWh/m3 = 57,181 kWh",
"timestamp": "2025-01-15T08:31:00Z",
"audit_event_id": "AE-002"
},
{
"step": 3,
"description": "Emission factor selected from DEFRA 2024",
"emission_factor": "Natural gas - Gross CV: 0.18306 kgCO2e/kWh",
"factor_id": "DEFRA_2024_NG_GROSS",
"timestamp": "2025-01-15T08:31:05Z",
"audit_event_id": "AE-003"
},
{
"step": 4,
"description": "Emission calculation performed",
"formula": "57,181 kWh x 0.18306 kgCO2e/kWh = 10,467 kgCO2e = 10.467 tCO2e",
"result_tco2e": 10.467,
"scope": "Scope 1",
"timestamp": "2025-01-15T08:31:06Z",
"audit_event_id": "AE-004"
},
{
"step": 5,
"description": "Calculation included in FY2024 CSRD Report - ESRS E1-6",
"report_id": "RPT-CSRD-2024-001",
"approved_by": "CFO - Mario Rossi",
"timestamp": "2025-03-10T14:00:00Z",
"audit_event_id": "AE-089"
}
],
"data_quality_flag": "HIGH",
"uncertainty_pct": 5.0,
"verification_status": "verified_by_auditor"
}
return lineage
Enterprise Platform Comparison: Build vs Buy
Before building a custom platform, it is essential to evaluate the available enterprise solutions. The carbon accounting software market is very mature and the leading platforms cover the majority of standard use cases.
| Platform | Strength | Target Sector | Indicative Price | CSRD Ready |
|---|---|---|---|---|
| Persefoni | Investor-grade reporting, XBRL tagging, SEC-focused | Finance, Corporate | $50K-500K/year | Yes |
| Watershed | Speed to report, advanced Scope 3 tooling | Tech, Enterprise | $100K-1M/year | Yes |
| Sphera | Integrated LCA, industrial compliance, risk management | Manufacturing, Energy, Chemical | On request | Yes |
| Plan A | Simple UX, rapid implementation | SMEs, Mid-market | $10K-100K/year | Partial |
| IBM Envizi | 40,000+ emission factors, ERP integration | Enterprise, Utility | On request | Yes |
| Custom Build | Total flexibility, native integration with internal systems | Large companies with specific requirements | $500K-5M development | Depends |
When to Build Custom vs Buy
Buy a platform if: you have standard requirements, want to go live in 3-6 months, do not have a dedicated sustainability engineering team, and your data volume is manageable.
Build custom if: you have highly specific production processes with data not handled by standard platforms, want native integration with legacy systems, have data residency requirements incompatible with SaaS, or your data volume is in the order of millions of records per year (SaaS costs become prohibitive).
Case Study: Italian Manufacturing SME with CSRD 2025
An Italian manufacturing company in the precision mechanics sector with 1,200 employees, €350 million in revenue and plants in Turin, Milan and Brescia falls within Wave 1 of the CSRD and must submit its first report for financial year 2024 by June 2025.
GHG Emissions Inventory FY2024
# case_study/inventory_2024.py - Realistic GHG inventory example
# Representative data for an Italian manufacturing SME
INVENTORY_2024 = {
"organization": "MeccanicaPrecisione SpA",
"reporting_year": 2024,
"boundary": "Operational Control",
"currency": "EUR",
"scope_1": {
"natural_gas_combustion": {
"consumption_m3": 485_000, # industrial boilers
"consumption_kwh": 5_116_750, # conversion: 1 m3 NG = 10.55 kWh
"emission_factor_kgco2e_kwh": 0.18306, # DEFRA 2024
"co2e_tonnes": 937.0,
"source": "DEFRA 2024 - Natural Gas Gross CV"
},
"diesel_mobile": {
"consumption_liters": 42_000, # forklifts, operational vehicles
"emission_factor_kgco2e_liter": 2.56, # DEFRA 2024
"co2e_tonnes": 107.5,
"source": "DEFRA 2024 - Diesel"
},
"fugitive_refrigerants": {
"substance": "R-410A",
"kg_recharged": 45,
"gwp_ar6": 2088,
"co2e_tonnes": 94.0,
"source": "IPCC AR6 GWP100"
},
"total_co2e_tonnes": 1138.5
},
"scope_2": {
"purchased_electricity": {
"consumption_kwh": 8_250_000,
"location_based": {
"emission_factor": 0.233, # IEA Italy 2024 kgCO2e/kWh
"co2e_tonnes": 1922.3,
"source": "IEA 2024 Italy Grid"
},
"market_based": {
"go_certificates_kwh": 4_125_000, # 50% renewable with GO
"residual_mix_factor": 0.395, # AIB Italy residual mix 2024
"co2e_tonnes": 1629.4, # only on 50% not covered by GO
"source": "AIB Italy Residual Mix 2024"
}
}
},
"scope_3": {
"cat_1_purchased_goods": {
"total_spend_eur": 145_000_000,
"method": "spend_based + supplier_specific (top 20 suppliers)",
"co2e_tonnes": 28_450.0,
"data_quality": "mix: 35% supplier-specific, 65% spend-based"
},
"cat_4_upstream_transport": {
"tonne_km": 3_200_000,
"emission_factor": 0.089, # kgCO2e/tonne-km road freight DEFRA
"co2e_tonnes": 284.8,
},
"cat_5_waste": {
"waste_tonnes": 1_850,
"co2e_tonnes": 185.0,
"method": "waste-type specific factors"
},
"cat_6_business_travel": {
"total_km_air": 1_250_000,
"total_km_rail": 320_000,
"co2e_tonnes": 312.5,
"source": "DEFRA 2024 with RFI=1.9"
},
"cat_7_employee_commuting": {
"employees": 1_200,
"avg_km_per_day": 22,
"working_days": 220,
"mode_split": {"car_solo": 0.65, "car_sharing": 0.05,
"public_transport": 0.25, "cycling": 0.05},
"co2e_tonnes": 892.5
},
"cat_11_use_of_products": {
"units_sold": 45_000,
"avg_energy_use_kwh_per_unit_per_year": 850,
"product_lifetime_years": 15,
"co2e_tonnes_per_year": 2_250.0,
"note": "Calculated per year of use, not lifetime"
},
"total_co2e_tonnes": 32_374.8
},
"summary": {
"scope_1_tco2e": 1_138.5,
"scope_2_location_tco2e": 1_922.3,
"scope_2_market_tco2e": 1_629.4,
"scope_3_tco2e": 32_374.8,
"total_location_based_tco2e": 35_435.6,
"total_market_based_tco2e": 35_142.7,
"intensity_tco2e_per_meur_revenue": 101.3,
"intensity_tco2e_per_fte": 29.5,
"scope_3_percentage_of_total": 91.4, # typical in manufacturing
}
}
Case Study Observations
The most significant finding is that Scope 3 accounts for 91.4% of total emissions, with Category 1 (raw materials and components) alone making up 80% of Scope 3. This is typical of the manufacturing sector and explains why CSRD requires supply chain reporting: without Scope 3, carbon accounting would cover less than 10% of the actual impact.
The difference between Scope 2 location-based (1,922 tCO2e) and market-based (1,629 tCO2e) is due to the purchase of renewable energy certificates (Guarantees of Origin - GOs) for 50% of electricity consumption. The Italian residual mix for 2024 (0.395 kgCO2e/kWh) is higher than the average grid factor (0.233 kgCO2e/kWh): this is counterintuitive but methodologically correct, because the residual mix excludes energy already certified with GOs and therefore "contains" a higher share of high-intensity sources.
Testing: Validating Calculations
Emission calculations must be tested with the same rigour as financial code. An error in a conversion factor can lead to errors in the order of hundreds of tonnes of CO2e.
# tests/test_calculation_engine.py
import pytest
from decimal import Decimal
from calculation_engine.main import EmissionCalculator
@pytest.fixture
def calculator():
return EmissionCalculator()
class TestScope1Calculations:
def test_natural_gas_combustion_scope1(self, calculator):
"""
Test natural gas calculation with DEFRA 2024 factor.
Expected value: 5000 kWh x 0.18306 kgCO2e/kWh = 915.3 kgCO2e = 0.9153 tCO2e
"""
emission_factor = {
"co2e_factor": 0.18306,
"co2_factor": 0.18207,
"ch4_factor": 0.000291,
"n2o_factor": 0.000087,
"unit_type": "kgCO2e/kWh",
"source": "DEFRA 2024",
"year": 2024
}
result = calculator.calculate_activity_based(
quantity=5000,
unit="kWh",
emission_factor=emission_factor
)
assert abs(result["co2e_tonnes"] - 0.9153) < 0.001, (
f"Scope 1 natural gas: expected 0.9153, got {result['co2e_tonnes']}"
)
assert result["co2_tonnes"] is not None
assert "formula" in result
def test_diesel_combustion_scope1(self, calculator):
"""
Test diesel calculation.
DEFRA 2024 factor: 2.56179 kgCO2e/liter
1000 litres -> 2561.79 kgCO2e -> 2.56179 tCO2e
"""
emission_factor = {
"co2e_factor": 2.56179,
"co2_factor": 2.51476,
"ch4_factor": 0.00179,
"n2o_factor": 0.00524,
"unit_type": "kgCO2e/liter",
"source": "DEFRA 2024",
"year": 2024
}
result = calculator.calculate_activity_based(
quantity=1000,
unit="liter",
emission_factor=emission_factor
)
expected = 2561.79 / 1000
assert abs(result["co2e_tonnes"] - expected) < 0.001
def test_unit_incompatibility_raises_error(self, calculator):
"""Test that incompatible units raise an error"""
emission_factor = {
"co2e_factor": 0.18306,
"unit_type": "kgCO2e/kWh",
}
with pytest.raises(ValueError, match="Incompatible"):
calculator.calculate_activity_based(
quantity=1000,
unit="liter", # incompatible with kWh
emission_factor=emission_factor
)
class TestScope2Calculations:
def test_scope2_location_based_italy_2024(self, calculator):
"""
Italian electricity grid factor IEA 2024: 0.233 kgCO2e/kWh
100,000 kWh -> 23,300 kgCO2e -> 23.3 tCO2e
"""
emission_factor = {
"co2e_factor": 0.233,
"unit_type": "kgCO2e/kWh",
"source": "IEA 2024 Italy",
"year": 2024
}
result = calculator.calculate_activity_based(
quantity=100_000,
unit="kWh",
emission_factor=emission_factor
)
assert abs(result["co2e_tonnes"] - 23.3) < 0.01
def test_scope2_market_based_with_go_certificates(self, calculator):
"""
With GO certificates for renewable energy: factor = 0 for covered portion.
Italian residual mix 2024: 0.395 kgCO2e/kWh for uncovered portion.
50% renewables (GO): 50,000 kWh x 0 = 0
50% residual mix: 50,000 kWh x 0.395 = 19,750 kgCO2e = 19.75 tCO2e
"""
emission_factor_residual = {
"co2e_factor": 0.395,
"unit_type": "kgCO2e/kWh",
"source": "AIB Italy Residual Mix 2024",
"year": 2024
}
non_go_kwh = 50_000 # 50% not covered by GOs
result = calculator.calculate_activity_based(
quantity=non_go_kwh,
unit="kWh",
emission_factor=emission_factor_residual
)
assert abs(result["co2e_tonnes"] - 19.75) < 0.01
class TestScope3Calculations:
def test_business_travel_air_short_haul_economy(self):
"""Test short-haul flight emissions with RFI"""
from integrations.travel_connector import TravelDataProcessor, TravelRecord
from datetime import date
processor = TravelDataProcessor()
record = TravelRecord(
employee_id="EMP001",
travel_date=date(2024, 3, 15),
origin_iata="LIN",
destination_iata="FCO",
transport_mode="air",
distance_km=490,
travel_class="economy",
booking_amount_eur=180
)
# 490 km x 0.151 kgCO2e/pkm x 1.9 RFI = 140.531 kgCO2e = 0.14053 tCO2e
co2e = processor.calculate_flight_emissions(record)
expected = 490 * 0.151 * 1.9 / 1000
assert abs(co2e - expected) < 0.001
def test_spend_based_with_nace_intensity(self, calculator):
"""Test spend-based calculation for Category 1"""
result = calculator.calculate_spend_based(
spend_eur=100_000,
emission_intensity=0.35 # kgCO2e/EUR - metal manufacturing
)
# 100,000 EUR x 0.35 kgCO2e/EUR = 35,000 kgCO2e = 35.0 tCO2e
assert abs(result["co2e_tonnes"] - 35.0) < 0.01
Science Based Targets (SBTi) and Dashboard
The Science Based Targets initiative (SBTi) has defined precise criteria for emission reduction targets aligned with the Paris Agreement. The SBTi corporate standard requires: a 42% reduction in Scope 1+2 by 2030 (2020 baseline) for a 1.5°C scenario, and Scope 3 coverage if it represents more than 40% of total emissions (almost always true in manufacturing).
The platform dashboard must show not only current emissions, but the path towards the target: the SBTi required reduction curve, actual year-on-year emissions, and the projection based on planned reduction initiatives (PPAs for renewable energy, fleet electrification, process efficiency improvements).
# dashboard/sbti_tracker.py - SBTi target tracking
from dataclasses import dataclass
from typing import List, Dict
import pandas as pd
import numpy as np
@dataclass
class SBTiTarget:
organization_id: str
base_year: int
base_year_scope_1_2_tco2e: float
base_year_scope_3_tco2e: float
target_year: int = 2030
scope_12_reduction_pct: float = 42.0 # % reduction vs base year
scope_3_reduction_pct: float = 25.0 # % reduction vs base year
scenario: str = "1.5C"
@property
def scope_12_target_tco2e(self) -> float:
return self.base_year_scope_1_2_tco2e * (1 - self.scope_12_reduction_pct / 100)
@property
def annual_reduction_rate(self) -> float:
"""Required annual linear reduction rate"""
years = self.target_year - self.base_year
return self.scope_12_reduction_pct / years
def build_sbti_trajectory(target: SBTiTarget, actuals: Dict[int, float]) -> pd.DataFrame:
"""
Builds comparison table between SBTi trajectory and actual emissions.
actuals: { year: actual Scope1+2 tCO2e }
"""
years = range(target.base_year, target.target_year + 1)
trajectory = []
for year in years:
years_elapsed = year - target.base_year
total_years = target.target_year - target.base_year
required_reduction_pct = (years_elapsed / total_years) * target.scope_12_reduction_pct
sbti_budget = target.base_year_scope_1_2_tco2e * (1 - required_reduction_pct / 100)
actual = actuals.get(year)
on_track = actual <= sbti_budget if actual is not None else None
trajectory.append({
"year": year,
"sbti_budget_tco2e": round(sbti_budget, 1),
"actual_tco2e": actual,
"gap_tco2e": round(actual - sbti_budget, 1) if actual else None,
"on_track": on_track
})
df = pd.DataFrame(trajectory)
return df
# Usage for MeccanicaPrecisione SpA
target = SBTiTarget(
organization_id="meccanica-precisione-spa",
base_year=2020,
base_year_scope_1_2_tco2e=3_850.0, # 2020 baseline
base_year_scope_3_tco2e=35_000.0,
)
actuals_scope_12 = {
2020: 3850.0,
2021: 3720.0,
2022: 3650.0,
2023: 3320.0,
2024: 3060.8, # scope_1 + scope_2_market from our inventory
}
trajectory = build_sbti_trajectory(target, actuals_scope_12)
print(trajectory.to_string(index=False))
# Output:
# year sbti_budget_tco2e actual_tco2e gap_tco2e on_track
# 2020 3850.0 3850.0 0.0 True
# 2021 3608.6 3720.0 111.4 False
# 2022 3367.2 3650.0 282.8 False
# 2023 3125.8 3320.0 194.2 False
# 2024 2884.4 3060.8 176.4 False
# -> Company is off-track: reduction initiatives need to accelerate
Best Practices and Anti-Patterns
Best Practices
- Immutability of calculations: Once calculated and saved, an emission must never be modified. If factors change, create a new version of the calculation. The audit trail must show both versions.
- Report both Scope 2 methods: Location-based AND market-based are both mandatory under ESRS and the GHG Protocol. Do not report only market-based because it is lower.
- Document Scope 3 materiality: You do not need to calculate all 15 Scope 3 categories. But you must demonstrate through a materiality analysis why you have included or excluded each category. This is an explicit ESRS requirement.
- Emission factor versioning: Factors change every year. Maintain a snapshot of the factor used for each calculation. Do not update retroactively without producing an explicit recalculation with a new version.
- Data quality flags: Classify each data point by quality (measured, calculated, estimated, spend-based). ESRS requires declaration of Scope 3 data quality. Lower-quality data may be used but must be disclosed.
- Boundary documentation: Explicitly document the entities included in the reporting boundary and those excluded, with justification (threshold <5%, data unavailable, etc.).
Anti-Patterns to Avoid
- Using only Scope 2 market-based: The GHG Protocol requires both. Reporting only market-based (typically lower thanks to GO/PPA) without location-based is methodologically incorrect and potential greenwashing.
- Outdated emission factors: Using factors that are 5+ years old introduces significant errors, especially for the electricity grid which changes every year as renewables grow.
- Duplicating emissions in consolidation: If a subsidiary calculates its own emissions and the parent company includes them in the consolidated report, the equity share or operational control method must be applied consistently.
- Ignoring uncertainty: All emission calculations carry uncertainty, especially Scope 3. Do not present numbers without indicating the confidence range and the method used.
- Reports without assurance: With CSRD, reports without limited assurance do not meet regulatory requirements for Wave 1 entities. Plan for assurance from year one, not as an afterthought.
Conclusions and Implementation Roadmap
Designing an enterprise-grade carbon accounting platform is a complex project requiring multidisciplinary expertise. But the conceptual structure is clear: the GHG Protocol provides the accounting framework, emission factor databases (DEFRA, EPA, Climatiq) provide the coefficients, the microservices architecture ensures scalability and maintainability, and the immutable audit trail guarantees credibility for CSRD.
For an SME approaching CSRD in 2025-2026, the practical recommendation is to start with a mature SaaS platform (Persefoni, Plan A or Watershed) for the first two years, collect real data, understand data gaps, and only then evaluate whether to build a custom solution or remain on SaaS. 90% of organisations have no need to build from scratch.
For engineering teams building ESG platforms as a product, the concepts illustrated in this article — GHG Protocol data model, auditable calculation engine, integration with emission factor APIs and regulatory report generation — are the foundations on which to build. The carbon accounting software market will grow at 27% annually through 2030: there is room for specialised solutions, particularly for industrial sectors with process data requirements not covered by generalist platforms.
Recommended Implementation Roadmap
| Phase | Duration | Objective | Output |
|---|---|---|---|
| Phase 1 | 1-2 months | Double Materiality Assessment + boundary definition | List of material Scope 3 categories, reporting boundary |
| Phase 2 | 2-3 months | Scope 1 and 2 data collection | Automated pipeline with utility bills, SAP, SCADA |
| Phase 3 | 3-4 months | Material Scope 3 categories (spend-based) | Scope 3 calculations for cat. 1, 4, 6, 7, 11 |
| Phase 4 | 1-2 months | CSRD/ESRS E1 report and assurance | Report ready for limited assurance |
| Phase 5 | Ongoing | Data quality improvement and supplier engagement | Reduction of spend-based, increase in supplier-specific Scope 3 |
Essential Resources
- GHG Protocol Corporate Standard: ghgprotocol.org - the reference standard, free to download
- DEFRA Emission Factors 2024: gov.uk/government/publications/ greenhouse-gas-reporting-conversion-factors-2024
- Climatiq API: climatiq.io/docs - complete documentation and free quickstart
- ESRS E1 Standard: efrag.org - European Financial Reporting Advisory Group, final ESRS standards
- SBTi Corporate Manual: sciencebasedtargets.org - guide to setting science-based targets
- ecoinvent Database: ecoinvent.org - LCA database for Scope 3 with detailed process data
Next Articles in the EnergyTech Series
The next article will explore Energy Digital Twins: how to create virtual replicas of industrial plants to simulate emission reduction scenarios before implementing them in the physical world. An increasingly central technology in the decarbonisation strategies of major industries.
To deepen AI technologies applied to data business, explore the Data Warehouse, AI and Digital Transformation series, in particular the articles on Data Governance and Data Quality for Reliable AI and on MLOps for Business, which cover topics directly applicable to managing emission calculation models in production.







