Ciao! Sono

Federico Calò

Sviluppatore Software | Divulgatore Tecnico

Creo applicazioni web moderne e strumenti digitali personalizzati per aiutare le attività a crescere attraverso l'innovazione tecnologica. La mia passione è unire informatica ed economia per generare valore reale.

Contattami

Chi Sono

La mia passione per l'informatica è nata tra i banchi dell'Istituto Tecnico Commerciale di Maglie, dove ho scoperto il potere della programmazione e il fascino di creare soluzioni digitali. Fin da subito, ho capito che l'informatica non era solo codice, ma uno strumento straordinario per trasformare idee in realtà.

Durante gli studi superiori in Sistemi Informativi Aziendali, ho iniziato a intrecciare informatica ed economia, comprendendo come la tecnologia possa essere il motore della crescita per qualsiasi attività. Questa visione mi ha accompagnato all'Università degli Studi di Bari, dove ho conseguito la Laurea in Informatica, approfondendo le mie competenze tecniche e la mia passione per lo sviluppo software.

Oggi metto questa esperienza al servizio di imprese, professionisti e startup, creando soluzioni digitali su misura che automatizzano processi, ottimizzano risorse e aprono nuove opportunità di business. Perché la vera innovazione inizia quando la tecnologia incontra le esigenze reali delle persone.

Le Mie Competenze

Analisi Dati & Modelli Previsionali

Trasformo i dati in insights strategici con analisi approfondite e modelli predittivi per decisioni informate

Automazione Processi

Creo strumenti personalizzati che automatizzano operazioni ripetitive e liberano tempo per attività a valore aggiunto

Sistemi Custom

Sviluppo sistemi software su misura, dalle integrazioni tra piattaforme alle dashboard personalizzate

const federico = {
  nome: "Federico Calò",
  ruolo: "Sviluppatore Software",
  città: "Bari, Italia",
  missione: "Aiutare attraverso l'informatica",
  passioni: [
    "Codice Pulito",
    "Innovazione",
    "Crescita Continua"
  ]
};

La Mia Missione

Credo fermamente che l'informatica sia lo strumento più potente per trasformare le idee in realtà e migliorare la vita delle persone.

🚀

Democratizzare la Tecnologia

La mia missione è rendere l'informatica accessibile a tutti: dalle piccole imprese locali alle startup innovative, fino ai professionisti che vogliono digitalizzare la propria attività. Ogni realtà merita di sfruttare le potenzialità del digitale.

💡

Unire Informatica ed Economia

Non è solo questione di scrivere codice: è capire come la tecnologia possa generare valore reale. Intrecciando competenze informatiche e visione economica, aiuto le attività a crescere, ottimizzare processi e raggiungere nuovi traguardi di efficienza e redditività.

🎯

Creare Soluzioni su Misura

Ogni attività è unica, e così devono esserlo le soluzioni. Sviluppo strumenti personalizzati che rispondono alle esigenze specifiche di ciascun cliente, automatizzando processi ripetitivi e liberando tempo per ciò che conta davvero: far crescere il business.

Trasforma la Tua Attività con la Tecnologia

Dicembre 2024

Visualizza

Master SQL

RoadMap.sh

Novembre 2024

Visualizza

Oracle Certified Foundations Associate

Oracle

Ottobre 2024

Visualizza

People Leadership Credential

Connect

Settembre 2024

💻 Linguaggi & Tecnologie

☕Java

🐍Python

📜JavaScript

🅰️Angular

⚛️React

🔷TypeScript

🗄️SQL

🐘PHP

🎨CSS/SCSS

🔧Node.js

🐳Docker

🌿Git

💼

12/2024 - Presente

Custom Software Engineering Analyst

Accenture

Bari, Puglia, Italia · Ibrida Analisi e sviluppo di sistemi informatici attraverso l'utilizzo di Java e Quarkus in Health and Public Sector. Formazione continua su tecnologie moderne per la creazione di soluzioni software personalizzate ed efficienti e sugli agenti.

💼

06/2022 - 12/2024

Analista software e Back End Developer Associate Consultant

Links Management and Technology SpA

Esperienza nell'analisi di sistemi software as-is e flussi ETL utilizzando PowerCenter. Formazione completata su Spring Boot per lo sviluppo di applicazioni backend moderne e scalabili. Sviluppatore Backend specializzato in Spring Boot, con esperienza in progettazione di database, analisi, sviluppo e testing dei task assegnati.

💼

02/2021 - 10/2021

Programmatore software

Adesso.it (prima era WebScience srl)

Esperienza nell'analisi AS-IS e TO-BE, evoluzioni SEO ed evoluzioni website per migliorare le performance e l'engagement degli utenti.

🎓

2018 - 2025

Laurea in Informatica

Università degli Studi di Bari Aldo Moro

Bachelor's degree in Computer Science, focusing on software engineering, algorithms, and modern development practices.

📚

2013 - 2018

Diploma - Sistemi Informativi Aziendali

Istituto Tecnico Commerciale di Maglie

Technical diploma specializing in Business Information Systems, combining IT knowledge with business management.

Contattami

Hai un progetto in mente? Parliamone! Compila il form qui sotto e ti risponderò al più presto.

* Campi obbligatori. I tuoi dati saranno utilizzati solo per rispondere alla tua richiesta.

Battery Management System for Grid-Scale Storage

On January 13, 2025, the Moss Landing BESS facility in California caught fire. The 300 MW / 1.2 GWh installation, one of the largest in the world, forced the evacuation of 1,500 residents and burned for days before being contained. This was not an isolated incident: in September 2024 a 30 MW plant in Escondido, California, had already exhibited the same cascading thermal runaway dynamics, and in May 2024 the Gateway Energy Storage project in San Diego involved 15,000 NMC cells in a fire that lasted 13 hours.

These incidents do not call grid-scale energy storage into question. They call into question the quality of the Battery Management System: the software and hardware brain that governs every aspect of a battery, from state-of-charge estimation to thermal runaway prevention. A well-designed BMS is not optional: it is the only tool separating a profitable energy asset from a public hazard.

The global grid-scale storage market is worth $10–16 billion in 2025 and is growing at a CAGR of 26–27%, heading toward $44–87 billion by 2030–2034. In the US alone, 57 GWh / 28 GW of BESS capacity was installed in 2025, with $25 billion in investment projected for 2026. Italy, driven by Terna's MACSE mechanism and PNIEC targets (50 GWh of storage by 2030), is racing toward utility-scale storage deployment.

This article covers the full engineering of BMS for grid-scale applications: from hardware architecture to SoC/SoH estimation with the Extended Kalman Filter, from thermal management to cell balancing, from the safety state machine to cycle-life optimization and grid integration. Every section includes working Python code and real-world architectures.

What You Will Learn in This Article

Multi-level BMS architecture: Cell, Module, Pack, Rack, System
SoC estimation with Extended Kalman Filter (EKF) in Python
Degradation models for predicting Remaining Useful Life (RUL)
Thermal management and thermal runaway prevention
Passive vs active cell balancing: algorithms and trade-offs
Safety state machine with fault detection in Python
DoD, C-rate optimization and CC-CV charging strategies
BESS grid integration for frequency regulation and peak shaving
LFP vs NMC vs NCA vs Sodium-ion comparison for grid applications
Italian regulatory context: MACSE, FER X, PNIEC

The EnergyTech Series - Article Position

#	Article	Level	Status
1	OCPP 2.x Protocol: Building EV Charging Systems	Advanced	Published
2	DERMS Architecture: Aggregating Millions of Distributed Resources	Advanced	Published
3	Renewable Energy Forecasting with ML: Python LSTM for Solar and Wind	Advanced	Published
4	You are here - Battery Management System for Grid-Scale Storage	Advanced	Current
5	IEC 61850 for Software Engineers: Smart Grid Communication	Advanced	Next
6	EV Charging Load Balancing: Real-Time Algorithms	Advanced	Coming soon
7	From MQTT to InfluxDB: Real-Time Energy IoT Platform	Advanced	Coming soon
8	Carbon Accounting Software Architecture: ESG Platforms	Advanced	Coming soon
9	Digital Twin for Energy Infrastructure: Real-Time Simulation	Advanced	Coming soon
10	Blockchain for P2P Energy Trading: Smart Contracts and Constraints	Advanced	Coming soon

Lessons from BESS Incidents: Why the BMS Is Critical

Before diving into the engineering, it is worth understanding what happens when a BMS fails. Thermal runaway is the most dangerous failure mechanism in lithium-ion batteries: a cell overheats, exothermic chemical reactions accelerate, flammable gases accumulate and, once the critical threshold is crossed, the progression becomes irreversible within seconds. In a grid-scale system holding hundreds of MWh, this reaction propagates from cell to module, module to rack, rack to container.

The Moss Landing incident (January 2025) highlighted three systemic failures:

Insufficient sensing: Temperature sensors not adequately distributed among cells failed to detect hot spots before the thermal cascade became uncontrollable. The UL 9540A standard now requires thermal runaway propagation testing at the unit level, but many legacy systems had been certified under earlier standards.
Delayed detection algorithms: The BMS did not correlate weak signals (gradual impedance increase, micro voltage variations, initial off-gassing) early enough to trigger emergency procedures.
Inadequate compartmentalization: Propagation from one container to adjacent ones demonstrated that physical and thermal isolation was not dimensioned for the worst case.

BESS Safety Standards to Know

UL 9540A: Testing for thermal runaway propagation at the cell, module, and unit levels
NFPA 855: Standard for the installation of stationary energy storage systems (2023 edition)
IEC 62619: Safety requirements for Li-ion batteries in stationary applications
UN 38.3: Transport testing for lithium batteries
IEEE 1547: Standard for interconnection of distributed energy resources to the grid

BMS Architecture: From Cell to System

A grid-scale BESS follows a five-level hierarchical architecture, each level with distinct responsibilities for sensing, protection and control.

The Five Hierarchical Levels

Level	Entity	Typical Voltage	BMS Responsibilities
L1 - Cell	Single electrochemical cell	2.5 - 4.2 V (Li-ion)	V, T, current measurement; OV/UV protection
L2 - Module	N cells in series/parallel	20 - 100 V	Cell balancing, module SoC, fault isolation
L3 - Pack	N modules in series	300 - 800 V	SoC/SoH pack, thermal mgmt, safety contactor
L4 - Rack	N packs in parallel	500 - 1500 V DC	Rack BMS, inter-pack balancing, CAN/RS485 communication
L5 - System	N racks + PCS + EMS	MV/HV (AC grid)	Master BMS, coordination with PCS, grid interface

BMS Hardware: Key Components

The BMS hardware consists of distinct functional blocks that work in concert:

Component	Function	Typical Specifications
AFE (Analog Front End)	Cell voltage measurement, balancing	Resolution ±0.5–5 mV, 12–16 cells/IC
Current Sensor	Pack current measurement	Shunt ±0.1% or Hall effect ±0.5%
Temperature Sensors	Distributed thermal monitoring	NTC/PTC, resolution ±0.5°C, 1 per 5–10 cells
MCU/DSP	SoC/SoH algorithms, fault detection	ARM Cortex-M4/M7, real-time OS
Isolation Monitor	Insulation fault detection	Impedance > 100 kohm/V, IEC 61557-8 standard
Main Contactor	Emergency disconnection	Opening in < 100 ms, fault current rating
Pre-charge Circuit	Inrush current limiting at power-on	Limiting resistor + auxiliary contactor
Communication	CAN bus, RS-485, Ethernet, Modbus	CAN 1 Mbps, Modbus TCP/RTU for EMS

BMS Software: The Layered Architecture

BMS software is organized in layers with clear responsibilities and well-defined interfaces. In modern systems, the stack extends from embedded firmware all the way to the cloud:

# BMS software architecture - full stack

# Layer 1: Firmware (C/C++ on MCU)
# - High-frequency ADC data acquisition (1-1000 Hz)
# - Real-time algorithms: SoC Coulomb counting, fault detection
# - Actuator control: contactors, balancing, cooling

# Layer 2: Edge BMS Controller (Python/C++ on Linux SBC)
# - Advanced algorithms: EKF, SoH prediction, thermal model
# - Communication with firmware via CAN/SPI
# - Data buffering and aggregation

# Layer 3: Rack/System BMS (Python on industrial server)
# - Multi-rack coordination
# - Interface with PCS (Power Conversion System)
# - Communication with EMS via Modbus TCP / IEC 61850

# Layer 4: Cloud/Edge Analytics
# - Fleet analytics across multiple sites
# - ML model training for SoH prediction
# - Digital twin of the battery system

class BMSArchitecture:
    """
    Representation of the BMS software architecture
    with responsibilities for each layer
    """

    LAYERS = {
        'firmware': {
            'language': 'C/C++',
            'os': 'FreeRTOS / Bare Metal',
            'cycle_time_ms': 1,          # 1 ms for fault detection
            'functions': [
                'cell_voltage_sampling',  # 1 kHz
                'current_integration',    # Coulomb counting
                'fault_detection',        # Hardware comparators
                'contactor_control',      # Fail-safe logic
                'passive_balancing'       # Resistive discharge
            ]
        },
        'bms_controller': {
            'language': 'Python / C++',
            'os': 'Linux (RT kernel)',
            'cycle_time_ms': 100,        # 100 ms for EKF update
            'functions': [
                'ekf_soc_estimation',
                'soh_prediction',
                'thermal_model',
                'active_balancing_control',
                'can_communication'
            ]
        },
        'system_bms': {
            'language': 'Python',
            'os': 'Linux',
            'cycle_time_ms': 1000,       # 1 s for grid commands
            'functions': [
                'multi_rack_coordination',
                'pcs_interface',          # Modbus TCP / IEC 61850
                'ems_interface',
                'scada_interface',
                'data_logging'
            ]
        }
    }

State of Charge (SoC): Estimation Methods

State of Charge is the percentage of remaining energy relative to the battery's total capacity. It is the most fundamental BMS parameter: without an accurate SoC it is impossible either to protect the battery from overcharge/overdischarge or to optimize asset utilization. A 5% error on a 100 MWh system means 5 MWh of unusable energy or risk of permanent damage.

Method 1: Coulomb Counting

The simplest method integrates current over time. It is accurate in the short term but accumulates drift errors. It requires periodic calibration from the open-circuit voltage (OCV).

import numpy as np
from dataclasses import dataclass, field
from typing import Optional

@dataclass
class CoulombCounterSoC:
    """
    SoC estimation with Coulomb Counting.
    Simple but subject to drift from current measurement errors.
    """
    capacity_ah: float           # Nominal capacity [Ah]
    coulombic_efficiency: float  # Coulombic efficiency (typical: 0.98-0.99 LFP, 0.995 NMC)
    soc: float = 1.0             # Initial SoC [0-1]

    # Error accumulation
    _accumulated_ah: float = field(default=0.0, repr=False)

    def update(self, current_a: float, dt_s: float) -> float:
        """
        Update SoC with current measurement.

        Args:
            current_a: Current [A]. Positive = discharge, negative = charge
            dt_s: Sampling interval [s]

        Returns:
            Updated SoC [0-1]
        """
        # Delta charge in Ah
        delta_ah = current_a * dt_s / 3600.0

        # Apply coulombic efficiency during charging
        if current_a < 0:  # Charge
            effective_delta = delta_ah * self.coulombic_efficiency
        else:              # Discharge
            effective_delta = delta_ah

        self._accumulated_ah += effective_delta

        # Update SoC
        new_soc = self.soc - (effective_delta / self.capacity_ah)

        # Clamp to physical range [0, 1]
        self.soc = float(np.clip(new_soc, 0.0, 1.0))
        return self.soc

    def calibrate_from_ocv(self, ocv_v: float, ocv_soc_curve: dict) -> float:
        """
        Calibrate SoC from the OCV curve.
        Performed at rest (current ~0 A for at least 30 min).
        """
        # Interpolation on the OCV-SoC curve
        voltages = sorted(ocv_soc_curve.keys())
        socs = [ocv_soc_curve[v] for v in voltages]
        self.soc = float(np.interp(ocv_v, voltages, socs))
        self._accumulated_ah = 0.0
        return self.soc

Method 2: Extended Kalman Filter (EKF)

The EKF is the gold standard for SoC estimation in advanced BMS systems. It models the battery as a dynamic system with hidden states (SoC, RC voltages) and fuses the current measurement (Coulomb counting) with the voltage measurement (OCV lookup) to obtain an optimal estimate with uncertainty bounds. It is robust against measurement noise and drift.

The Thevenin Model for Batteries

The EKF typically uses the first or second-order Thevenin equivalent circuit model:

V_oc(SoC): Open-circuit voltage, a function of SoC
R0: Internal ohmic resistance (immediate losses)
R1, C1: RC circuit for slow dynamics (diffusion)
V_terminal = V_oc - I*R0 - V_RC1

import numpy as np
from scipy.interpolate import interp1d

class BatteryEKF:
    """
    Extended Kalman Filter for SoC estimation with first-order Thevenin model.

    State: x = [SoC, V_RC]
    - SoC: State of Charge [0-1]
    - V_RC: Voltage across RC circuit [V] (polarization dynamics)

    Reference: Plett, G.L. (2004) - "Extended Kalman Filtering for
    Battery Management Systems" - Journal of Power Sources
    """

    def __init__(self,
                 capacity_ah: float,
                 R0: float,
                 R1: float,
                 C1: float,
                 ocv_soc_table: tuple,
                 Q_noise: np.ndarray = None,
                 R_noise: float = None):
        """
        Args:
            capacity_ah: Nominal capacity [Ah]
            R0: Ohmic resistance [ohm]
            R1: RC resistance [ohm]
            C1: RC capacitànce [F]
            ocv_soc_table: (soc_array, ocv_array) for interpolation
            Q_noise: Process noise covariance matrix (2x2)
            R_noise: Voltage measurement noise variance [V^2]
        """
        self.Q_batt = capacity_ah * 3600  # Capacity in Coulombs
        self.R0 = R0
        self.R1 = R1
        self.C1 = C1

        # OCV-SoC curve for interpolation
        soc_pts, ocv_pts = ocv_soc_table
        self._ocv_func = interp1d(soc_pts, ocv_pts,
                                   kind='cubic',
                                   fill_value='extrapolate')
        # OCV derivative with respect to SoC (for linearization)
        self._docv_dsoc = np.gradient(ocv_pts, soc_pts)
        self._docv_func = interp1d(soc_pts, self._docv_dsoc,
                                    kind='linear',
                                    fill_value='extrapolate')

        # Initial state: x = [SoC, V_RC]
        self.x = np.array([1.0, 0.0])

        # Initial covariance (high uncertainty on SoC)
        self.P = np.diag([0.01, 0.001])  # [SoC^2, V^2]

        # Noise matrices
        self.Q = Q_noise if Q_noise is not None else np.diag([1e-6, 1e-8])
        self.R = R_noise if R_noise is not None else 1e-4  # 10 mV RMS

    def predict(self, current_a: float, dt_s: float) -> np.ndarray:
        """
        EKF prediction step.

        Dynamic model (discretized with Euler):
          SoC(k+1)  = SoC(k) - I*dt / Q_batt
          V_RC(k+1) = V_RC(k) * exp(-dt/(R1*C1)) + I * R1 * (1 - exp(-dt/(R1*C1)))

        Note: positive current = discharge (BMS convention)
        """
        soc, v_rc = self.x

        # RC time constant
        tau = self.R1 * self.C1
        exp_tau = np.exp(-dt_s / tau)

        # State prediction
        soc_pred = soc - (current_a * dt_s) / self.Q_batt
        v_rc_pred = v_rc * exp_tau + current_a * self.R1 * (1 - exp_tau)

        self.x = np.array([
            np.clip(soc_pred, 0.0, 1.0),
            v_rc_pred
        ])

        # Model Jacobian (linearized transition matrix)
        # F = d(f)/d(x)
        F = np.array([
            [1.0,  0.0],
            [0.0,  exp_tau]
        ])

        # Covariance propagation
        self.P = F @ self.P @ F.T + self.Q
        return self.x

    def update(self, v_terminal_measured: float, current_a: float) -> tuple:
        """
        EKF update step with terminal voltage measurement.

        Observation model:
          V_terminal = OCV(SoC) - I*R0 - V_RC

        Returns:
            (soc_estimate, covariance, innovation)
        """
        soc, v_rc = self.x

        # Voltage predicted by model
        v_oc = float(self._ocv_func(soc))
        v_predicted = v_oc - current_a * self.R0 - v_rc

        # Innovation (residual)
        innovation = v_terminal_measured - v_predicted

        # Observation Jacobian: H = d(h)/d(x)
        d_ocv_d_soc = float(self._docv_func(soc))
        H = np.array([[d_ocv_d_soc, -1.0]])

        # Innovation covariance
        S = H @ self.P @ H.T + self.R

        # Kalman gain
        K = self.P @ H.T / S

        # State and covariance update
        self.x = self.x + K.flatten() * innovation
        self.x[0] = np.clip(self.x[0], 0.0, 1.0)  # SoC in [0,1]

        I_matrix = np.eye(2)
        self.P = (I_matrix - np.outer(K.flatten(), H)) @ self.P

        return self.x[0], self.P[0, 0], innovation

    @property
    def soc(self) -> float:
        return float(self.x[0])

    @property
    def soc_uncertainty_1sigma(self) -> float:
        """SoC uncertainty at 1 sigma (68% confidence interval)"""
        return float(np.sqrt(self.P[0, 0]))


# ---- Usage example ----

def demo_ekf():
    # OCV-SoC curve for LFP (LiFePO4) - typical values
    soc_pts = np.array([0.0, 0.1, 0.2, 0.3, 0.4, 0.5,
                        0.6, 0.7, 0.8, 0.9, 1.0])
    ocv_pts = np.array([3.0, 3.15, 3.22, 3.28, 3.30, 3.32,
                        3.33, 3.34, 3.35, 3.40, 3.65])  # Volts

    # Grid-scale LFP battery parameters (e.g., CATL 280 Ah)
    bms_ekf = BatteryEKF(
        capacity_ah=280.0,
        R0=0.0002,     # 0.2 mohm - typical prismatic LFP
        R1=0.0005,     # 0.5 mohm
        C1=5000.0,     # 5000 F = tau ~ 2.5 s
        ocv_soc_table=(soc_pts, ocv_pts),
        Q_noise=np.diag([1e-7, 1e-9]),
        R_noise=1e-5   # ~3.2 mV RMS voltage
    )

    # Simulation: discharge at 0.5C for 100 steps of 1 s
    dt = 1.0
    I_discharge = 140.0  # Amperes (0.5C on 280 Ah)

    results = []
    for step in range(100):
        # Predict
        bms_ekf.predict(I_discharge, dt)

        # Simulate voltage measurement with noise
        true_ocv = float(np.interp(bms_ekf.soc, soc_pts, ocv_pts))
        v_meas = true_ocv - I_discharge * 0.0002 - bms_ekf.x[1]
        v_meas += np.random.normal(0, 0.003)  # 3 mV noise

        # Update
        soc_est, variance, innov = bms_ekf.update(v_meas, I_discharge)

        results.append({
            'step': step,
            'soc': soc_est,
            'uncertainty': bms_ekf.soc_uncertainty_1sigma,
            'innovation_mv': innov * 1000
        })

    return results

if __name__ == '__main__':
    results = demo_ekf()
    print(f"Final SoC: {results[-1]['soc']:.3f}")
    print(f"Uncertainty: ±{results[-1]['uncertainty']*100:.2f}% SoC")

Comparison of SoC Estimation Methods

Method	Accuracy	Computational Complexity	Robustness	Typical Use
Coulomb Counting	±5–10% (drift)	Minimal (8-bit MCU)	Low (error accumulation)	Base firmware, calibration
OCV Lookup	±2–5% (at rest)	Minimal	High (rest only)	Periodic calibration
EKF (1st order)	±1–3%	Medium (ARM Cortex-M4)	High (sensor fusion)	BMS edge controller
EKF (2nd order)	±0.5–2%	Medium-High	Very high	Premium BMS, EV
ML-based (LSTM)	±0.5–1.5%	High (GPU/NPU)	High (adaptive)	Cloud analytics, SoH

State of Health (SoH) and RUL Prediction

State of Health quantifies battery degradation relative to its initial condition. It manifests in two primary forms: capacity fade (reduction in usable capacity) and power fade (increase in internal resistance, reduction of deliverable power). For grid applications, SoH < 80% is typically the replacement or reconditioning threshold.

Degradation Mechanisms

Li-ion battery degradation has two main components:

Calendar aging: Degradation as a function of time and temperature, independent of usage. Dominated by growth of the SEI (Solid Electrolyte Interphase) layer on the anode. Accelerated by high temperatures and high SoC. Typical model: Q_loss = a * sqrt(t) * exp(-Ea/(R*T))
Cycle aging: Degradation from charge/discharge cycles. Depends on Depth of Discharge (DoD), C-rate and temperature. LFP: 3,000–6,000 cycles at 80% DoD. NMC: 1,000–2,000 cycles under the same conditions.

import numpy as np
from dataclasses import dataclass

@dataclass
class BatteryDegradationModel:
    """
    Combined calendar + cycle aging battery degradation model.

    Based on: Wang et al. (2011) "Cycle-life model for graphite-LiFePO4 cells"
    Journal of Power Sources, adapted for grid-scale applications.
    """

    # LFP chemistry parameters (tunable for NMC)
    # Calendar aging: Q_cal = B * exp(-Ea_cal / (R*T)) * sqrt(t)
    B_calendar: float = 14876      # Pre-exponential factor
    Ea_calendar: float = 24500.0   # Activation energy [J/mol] (LFP)

    # Cycle aging: Q_cyc = A * exp(Ea_cyc / (R*T)) * exp(b_dod * DoD) * N
    A_cycle: float = 7.543e5       # Pre-exponential factor
    Ea_cycle: float = -31700.0     # Activation energy [J/mol]
    b_dod: float = -0.836          # DoD coefficient

    R_gas: float = 8.314           # Gas constant [J/(mol*K)]

    def calendar_loss_fraction(self,
                                temp_k: float,
                                time_days: float,
                                avg_soc: float = 0.5) -> float:
        """
        Calculate capacity loss from calendar aging.

        Args:
            temp_k: Average storage temperature [K]
            time_days: Calendar time [days]
            avg_soc: Average SoC during storage

        Returns:
            Fraction of capacity lost [0-1]
        """
        # Temperature factor (Arrhenius)
        k_cal = self.B_calendar * np.exp(-self.Ea_calendar / (self.R_gas * temp_k))

        # SoC factor (stress factor - higher SoC = more degradation)
        soc_factor = 1 + 1.5 * (avg_soc - 0.5) ** 2

        # Square-root power law (SEI diffusion)
        time_hours = time_days * 24
        loss = k_cal * soc_factor * np.sqrt(time_hours) / 100.0

        return float(np.clip(loss, 0.0, 1.0))

    def cycle_loss_fraction(self,
                             temp_k: float,
                             dod: float,
                             n_cycles: int,
                             avg_crate: float = 0.5) -> float:
        """
        Calculate capacity loss from cycle aging.

        Args:
            temp_k: Average operating temperature [K]
            dod: Depth of Discharge [0-1]
            n_cycles: Number of equivalent full-depth cycles
            avg_crate: Average C-rate (1C = discharge in 1 hour)

        Returns:
            Fraction of capacity lost [0-1]
        """
        # Temperature factor
        k_cyc = self.A_cycle * np.exp(self.Ea_cycle / (self.R_gas * temp_k))

        # DoD factor (mechanical/chemical stress on electrodes)
        dod_factor = np.exp(self.b_dod * (1 - dod))

        # C-rate factor (thermal and mechanical stress)
        crate_factor = 1 + 0.1 * max(0, avg_crate - 0.5)

        loss = k_cyc * dod_factor * crate_factor * n_cycles / 100.0

        return float(np.clip(loss, 0.0, 1.0))

    def predict_soh(self,
                    temp_k: float,
                    time_days: float,
                    n_cycles: int,
                    dod: float = 0.8,
                    avg_soc: float = 0.5,
                    avg_crate: float = 0.3) -> dict:
        """
        Predict SoH combining calendar and cycle aging.

        Returns:
            dict with SoH, calendar loss, cycle loss, estimated RUL
        """
        q_cal = self.calendar_loss_fraction(temp_k, time_days, avg_soc)
        q_cyc = self.cycle_loss_fraction(temp_k, dod, n_cycles, avg_crate)

        # Total loss (non-linear - interaction between mechanisms)
        q_total = q_cal + q_cyc - 0.3 * q_cal * q_cyc

        current_soh = 1.0 - q_total

        # RUL estimate (remaining cycles until SoH = 0.8)
        if q_cyc > 0 and n_cycles > 0:
            rate_per_cycle = q_cyc / n_cycles
            remaining_capacity_loss = max(0, current_soh - 0.8)
            rul_cycles = int(remaining_capacity_loss / rate_per_cycle) if rate_per_cycle > 0 else 0
        else:
            rul_cycles = 0

        return {
            'soh': float(np.clip(current_soh, 0.0, 1.0)),
            'soh_percent': float(np.clip(current_soh * 100, 0.0, 100.0)),
            'calendar_loss_pct': q_cal * 100,
            'cycle_loss_pct': q_cyc * 100,
            'total_loss_pct': q_total * 100,
            'rul_cycles': rul_cycles,
            'eol_threshold': 0.8
        }


# Example: 100 MWh BESS in operation for 2 years
model = BatteryDegradationModel()

result = model.predict_soh(
    temp_k=298.15,     # 25°C
    time_days=730,     # 2 years
    n_cycles=730,      # 1 cycle/day
    dod=0.8,           # 80% DoD typical for grid
    avg_soc=0.5,
    avg_crate=0.3      # 0.3C - typical 4h BESS
)

print(f"SoH after 2 years: {result['soh_percent']:.1f}%")
print(f"Estimated RUL: {result['rul_cycles']} remaining cycles")
print(f"  - Calendar loss: {result['calendar_loss_pct']:.1f}%")
print(f"  - Cycle loss: {result['cycle_loss_pct']:.1f}%")

Thermal Management: Preventing Thermal Runaway

Thermal management is arguably the most safety-critical function of the BMS. Li-ion batteries operate optimally in the 15–35°C range: below 10°C performance drops dramatically and the risk of lithium plating on the anode increases (danger during charging); above 45°C degradation accelerates exponentially; above 60–80°C (chemistry-dependent) thermal runaway begins.

Cooling Strategies

Strategy	Dissipatable Power	Relative Cost	Typical Application	Notes
Air cooling (passive)	5–10 W/cell	Low	Small systems, low C-rates	Not suitable for high-intensity grid-scale
Air cooling (forced)	10–25 W/cell	Medium-low	Standard BESS containers	Requires dust filters, noise
Liquid cooling (indirect)	50–100 W/cell	Medium	High-density BESS, EVs	Cold plates between cells, glycol-water
Liquid cooling (direct)	100–200 W/cell	High	Racing, aerospace	Dielectric compatibility required
Dielectric oil immersion	150–300 W/cell	Very high	Ultra-high-density BESS (2025+)	Emerging technology, better anti-TR safety

Thermal Model and Simulation

import numpy as np
from scipy.integrate import solve_ivp

class BatteryThermalModel:
    """
    Lumped-parameter thermal model for a battery cell.

    2-node model: cell core + surface
    dT_core/dt = (Q_gen - (T_core - T_surf) / R_internal) / C_core
    dT_surf/dt = ((T_core - T_surf) / R_internal - (T_surf - T_amb) / R_external) / C_surf

    Reference: Bernardi et al. (1985) "A Mathematical Model of the
    Lithium/Polymer Battery" - J. Electrochem. Soc.
    """

    def __init__(self,
                 R_thermal_internal: float = 0.05,  # K/W - core-surface resistance
                 R_thermal_external: float = 0.5,   # K/W - surface-ambient resistance
                 C_thermal_core: float = 100.0,      # J/K - core thermal capacity
                 C_thermal_surf: float = 10.0,       # J/K - surface thermal capacity
                 cell_resistance_ohm: float = 0.001, # Ohm - internal resistance for Q_gen
                 t_ambient_c: float = 25.0):

        self.R_int = R_thermal_internal
        self.R_ext = R_thermal_external
        self.C_core = C_thermal_core
        self.C_surf = C_thermal_surf
        self.R_cell = cell_resistance_ohm
        self.T_amb = t_ambient_c + 273.15  # Kelvin

        # Initial state: T_core = T_surf = T_amb
        self.state = np.array([self.T_amb, self.T_amb])

    def _thermal_ode(self, t: float, T: np.ndarray, current_a: float) -> np.ndarray:
        """
        Thermal model ODE.
        T[0] = T_core, T[1] = T_surf (Kelvin)
        """
        T_core, T_surf = T

        # Joule heat generation: Q = I^2 * R
        # For a more accurate model, include entropic heat
        Q_joule = (current_a ** 2) * self.R_cell
        Q_entropic = 0.0  # Simplified (can be negative during LFP charging)
        Q_gen = Q_joule + Q_entropic

        # Heat flux core -> surface
        Q_cs = (T_core - T_surf) / self.R_int

        # Heat flux surface -> ambient (cooling system)
        Q_sa = (T_surf - self.T_amb) / self.R_ext

        dT_core_dt = (Q_gen - Q_cs) / self.C_core
        dT_surf_dt = (Q_cs - Q_sa) / self.C_surf

        return np.array([dT_core_dt, dT_surf_dt])

    def simulate(self,
                 current_profile_a: np.ndarray,
                 dt_s: float = 1.0) -> dict:
        """
        Simulate thermal profile for a current profile.

        Args:
            current_profile_a: Array of currents [A] over time
            dt_s: Time step [s]

        Returns:
            dict with core and surface temperatures and alarm flags
        """
        n_steps = len(current_profile_a)
        T_core_arr = np.zeros(n_steps)
        T_surf_arr = np.zeros(n_steps)
        alarms = []

        current_state = self.state.copy()

        for i, current in enumerate(current_profile_a):
            # Numerical integration
            sol = solve_ivp(
                fun=lambda t, y: self._thermal_ode(t, y, current),
                t_span=(0, dt_s),
                y0=current_state,
                method='RK45',
                max_step=dt_s / 10
            )
            current_state = sol.y[:, -1]

            T_core_c = current_state[0] - 273.15
            T_surf_c = current_state[1] - 273.15

            T_core_arr[i] = T_core_c
            T_surf_arr[i] = T_surf_c

            # Thermal fault detection
            alarm = self._check_thermal_faults(i, T_core_c, T_surf_c)
            if alarm:
                alarms.append(alarm)

        self.state = current_state

        return {
            'T_core_c': T_core_arr,
            'T_surf_c': T_surf_arr,
            'T_max_c': float(np.max(T_core_arr)),
            'alarms': alarms,
            'thermal_runaway_risk': float(np.max(T_core_arr)) > 60.0
        }

    def _check_thermal_faults(self,
                               step: int,
                               t_core_c: float,
                               t_surf_c: float) -> Optional[dict]:
        """Check thermal alarm thresholds."""

        # LFP thresholds - more conservative than NMC
        WARN_TEMP = 45.0
        ALERT_TEMP = 55.0
        CRITICAL_TEMP = 65.0  # Pre-thermal runaway

        if t_core_c > CRITICAL_TEMP:
            return {'step': step, 'level': 'CRITICAL', 'T': t_core_c,
                    'action': 'EMERGENCY_DISCONNECT'}
        elif t_core_c > ALERT_TEMP:
            return {'step': step, 'level': 'ALERT', 'T': t_core_c,
                    'action': 'REDUCE_POWER_50PCT'}
        elif t_core_c > WARN_TEMP:
            return {'step': step, 'level': 'WARNING', 'T': t_core_c,
                    'action': 'INCREASE_COOLING'}
        return None


# Import Optional (omitted above for brevity)
from typing import Optional

# Simulation: BESS fast charging (1C) at 25°C ambient
model_thermal = BatteryThermalModel(
    R_thermal_external=0.3,   # Active liquid cooling
    cell_resistance_ohm=0.0005  # LFP 280Ah prismatic
)

# Current profile: 1C charge (280A) for 30 minutes
current_profile = np.full(1800, -280.0)  # Negative = charging

result = model_thermal.simulate(current_profile, dt_s=1.0)
print(f"Maximum core temperature: {result['T_max_c']:.1f}°C")
print(f"Alarms generated: {len(result['alarms'])}")
print(f"Thermal runaway risk: {result['thermal_runaway_risk']}")

Cell Balancing: Algorithms and Trade-offs

Even cells from the same production batch exhibit variations in capacity, internal resistance and self-discharge. In a pack, the weakest cell limits the entire string: during discharge it empties first (undervoltage cutoff), during charging it fills first (overvoltage cutoff). Without balancing, the usable capacity of the pack can degrade by 10–30% relative to the sum of individual cells.

Passive Balancing vs Active Balancing

Characteristic	Passive Balancing	Active Balancing
Principle	Dissipates excess energy through a resistor	Transfers energy between cells (DC-DC converter)
Efficiency	Low (energy dissipated as heat)	High (85–95% transfer efficiency)
Balancing speed	Slow (typical 10–100 mA)	Fast (1–10 A possible)
Hardware cost	Very low (resistor + MOSFET)	High (converter, inductors, control)
Firmware complexity	Simple (on/off threshold)	Complex (optimization algorithm)
Heat generated	High (problematic for grid-scale)	Low
Typical use	Consumer, budget-constrained BESS	Premium EVs, high-performance BESS

from dataclasses import dataclass
from typing import List, Tuple
import numpy as np

@dataclass
class CellState:
    id: int
    voltage_v: float
    soc: float
    temperature_c: float
    capacity_ah: float

class ActiveBalancingController:
    """
    Controller for active cell balancing with SoC-based algorithm.
    Objective: equalize SoC between cells, not voltage.

    Note: equalizing SoC is more correct than equalizing voltage
    because cells with different capacities have different OCV curves.
    """

    def __init__(self,
                 balancing_current_a: float = 2.0,
                 soc_tolerance: float = 0.02,
                 min_imbalance_for_action: float = 0.03):
        """
        Args:
            balancing_current_a: Balancing current [A]
            soc_tolerance: SoC tolerance to consider cells balanced
            min_imbalance_for_action: Minimum imbalance to start balancing
        """
        self.I_bal = balancing_current_a
        self.soc_tol = soc_tolerance
        self.min_imbalance = min_imbalance_for_action

    def compute_balancing_plan(self,
                                cells: List[CellState],
                                dt_s: float = 10.0) -> List[dict]:
        """
        Calculate optimal balancing plan.

        Algorithm:
        1. Calculate average pack SoC
        2. Identify cells with SoC above average (sources) and below (sinks)
        3. Plan energy transfers to minimize imbalance

        Returns:
            List of actions: {'from_cell': id, 'to_cell': id,
                                  'duration_s': t, 'current_a': I}
        """
        if not cells:
            return []

        soc_values = np.array([c.soc for c in cells])
        soc_mean = np.mean(soc_values)
        max_imbalance = np.max(soc_values) - np.min(soc_values)

        # Do not act if imbalance is negligible
        if max_imbalance < self.min_imbalance:
            return []

        actions = []

        # Cells above average (energy sources)
        sources = [(c, c.soc - soc_mean)
                   for c in cells if c.soc - soc_mean > self.soc_tol]
        # Cells below average (energy recipients)
        sinks = [(c, soc_mean - c.soc)
                 for c in cells if soc_mean - c.soc > self.soc_tol]

        # Sort by maximum imbalance
        sources.sort(key=lambda x: -x[1])
        sinks.sort(key=lambda x: -x[1])

        # Plan transfers (greedy algorithm)
        for src_cell, src_excess in sources:
            for snk_cell, snk_deficit in sinks:
                if src_excess < self.soc_tol or snk_deficit < self.soc_tol:
                    continue

                # Energy to transfer (in SoC terms)
                delta_soc = min(src_excess, snk_deficit) * 0.5  # Conservative

                # Balancing duration to transfer delta_soc
                # delta_soc = I * t / (3600 * capacity_ah)
                avg_cap = (src_cell.capacity_ah + snk_cell.capacity_ah) / 2
                duration_s = (delta_soc * avg_cap * 3600) / self.I_bal

                if duration_s > dt_s:  # Significant action
                    actions.append({
                        'from_cell': src_cell.id,
                        'to_cell': snk_cell.id,
                        'current_a': self.I_bal,
                        'duration_s': min(duration_s, 300.0),  # Max 5 min per action
                        'delta_soc': delta_soc
                    })

                src_excess -= delta_soc
                snk_deficit -= delta_soc

        return actions

    def estimate_balancing_time(self, cells: List[CellState]) -> float:
        """
        Estimate the time needed to fully balance the pack [hours].
        """
        soc_values = np.array([c.soc for c in cells])
        max_imbalance = np.max(soc_values) - np.min(soc_values)

        if max_imbalance < self.soc_tol:
            return 0.0

        avg_cap = np.mean([c.capacity_ah for c in cells])

        # Energy to transfer (Ah) for one cell
        energy_to_transfer_ah = max_imbalance * avg_cap / 2

        # Hours required at current I_bal
        hours = energy_to_transfer_ah / self.I_bal

        return float(hours)

Safety: Fault Detection and State Machine

The BMS safety module implements a state machine that manages all fault conditions and transitions between operating states. The design must follow the fail-safe principle: when in doubt, the system moves to the safest state (typically controlled disconnection). For grid-scale systems, the safety state machine typically operates at 10–100 Hz to guarantee response times in the millisecond range.

from enum import Enum, auto
from dataclasses import dataclass, field
from typing import List, Optional, Callable
import time

class BMSState(Enum):
    """BMS states - only permitted transitions are valid"""
    INIT           = auto()  # Startup, self-test
    STANDBY        = auto()  # Ready, grid connected, no energy flow
    PRECHARGE      = auto()  # Pre-charging capacitors (prevents inrush)
    OPERATIONAL    = auto()  # Normal operation (charge or discharge)
    CHARGING       = auto()  # Charging
    DISCHARGING    = auto()  # Discharging
    BALANCING      = auto()  # Active cell balancing
    FAULT_SOFT     = auto()  # Recoverable fault (overtemp warning, etc.)
    FAULT_HARD     = auto()  # Critical fault - requires manual reset
    EMERGENCY_STOP = auto()  # Emergency stop - physical disconnection
    SHUTDOWN       = auto()  # Controlled shutdown

@dataclass
class FaultCode:
    code: str
    description: str
    severity: str   # 'WARNING', 'SOFT_FAULT', 'HARD_FAULT', 'EMERGENCY'
    recoverable: bool
    action: str

# Fault code registry
FAULT_REGISTRY = {
    'OV_CELL':   FaultCode('OV_CELL',   'Cell overvoltage',        'HARD_FAULT', False, 'OPEN_CONTACTOR'),
    'UV_CELL':   FaultCode('UV_CELL',   'Cell undervoltage',       'HARD_FAULT', False, 'OPEN_CONTACTOR'),
    'OT_CELL':   FaultCode('OT_CELL',   'Cell overtemperature',    'HARD_FAULT', False, 'OPEN_CONTACTOR'),
    'OT_WARN':   FaultCode('OT_WARN',   'Temperature warning',     'WARNING',    True,  'REDUCE_POWER'),
    'OC_CHARGE': FaultCode('OC_CHARGE', 'Overcurrent in charge',   'HARD_FAULT', False, 'OPEN_CONTACTOR'),
    'OC_DISC':   FaultCode('OC_DISC',   'Overcurrent discharge',   'HARD_FAULT', False, 'OPEN_CONTACTOR'),
    'ISOLATION': FaultCode('ISOLATION', 'Isolation fault',         'EMERGENCY',  False, 'EMERGENCY_STOP'),
    'COMM_FAIL': FaultCode('COMM_FAIL', 'Communication failure',   'SOFT_FAULT', True,  'STANDBY'),
    'SOC_LOW':   FaultCode('SOC_LOW',   'SoC below minimum',       'SOFT_FAULT', True,  'STOP_DISCHARGE'),
    'SOC_HIGH':  FaultCode('SOC_HIGH',  'SoC above maximum',       'SOFT_FAULT', True,  'STOP_CHARGE'),
    'TR_DETECT': FaultCode('TR_DETECT', 'Thermal runaway detected', 'EMERGENCY', False, 'EMERGENCY_STOP'),
}

class BMSSafetyStateMachine:
    """
    Safety state machine for grid-scale BMS.
    Implements protections defined in IEC 62619 and IEEE 1625.
    """

    # Protection thresholds (configurable per chemistry)
    PROTECTION_THRESHOLDS = {
        'LFP': {
            'cell_ov_v': 3.65,      # Overvoltage cutoff
            'cell_uv_v': 2.80,      # Undervoltage cutoff
            'cell_ov_warn_v': 3.60, # Overvoltage warning
            'cell_uv_warn_v': 2.90, # Undervoltage warning
            'temp_ot_c': 55.0,      # Overtemperature cutoff
            'temp_ot_warn_c': 45.0, # Overtemperature warning
            'temp_ut_c': -10.0,     # Undertemperature cutoff (charge only)
            'oc_charge_a': 1.0,     # Max C-rate charge (per unit)
            'oc_disc_a': 2.0,       # Max C-rate discharge (per unit)
        },
        'NMC': {
            'cell_ov_v': 4.20,
            'cell_uv_v': 3.00,
            'cell_ov_warn_v': 4.15,
            'cell_uv_warn_v': 3.10,
            'temp_ot_c': 50.0,
            'temp_ot_warn_c': 40.0,
            'temp_ut_c': 0.0,
            'oc_charge_a': 0.7,
            'oc_disc_a': 1.5,
        }
    }

    def __init__(self, chemistry: str = 'LFP',
                 on_state_change: Optional[Callable] = None):
        self.state = BMSState.INIT
        self.thresholds = self.PROTECTION_THRESHOLDS[chemistry]
        self.active_faults: List[FaultCode] = []
        self.fault_history: List[dict] = []
        self.on_state_change = on_state_change
        self._contactor_closed = False

    def check_and_transition(self, telemetry: dict) -> BMSState:
        """
        Check telemetry and update state.

        Args:
            telemetry: dict with cell_voltages, cell_temps, pack_current, soc, etc.

        Returns:
            New BMS state
        """
        faults = self._detect_faults(telemetry)

        if faults:
            self._handle_faults(faults)
        else:
            # Remove recoverable faults if condition normalized
            self._clear_recoverable_faults(telemetry)

        return self.state

    def _detect_faults(self, tel: dict) -> List[FaultCode]:
        """Detect active faults in telemetry."""
        detected = []
        thr = self.thresholds

        # 1. Cell Voltage Checks
        for v in tel.get('cell_voltages', []):
            if v > thr['cell_ov_v']:
                detected.append(FAULT_REGISTRY['OV_CELL'])
                break
            if v < thr['cell_uv_v']:
                detected.append(FAULT_REGISTRY['UV_CELL'])
                break

        # 2. Temperature Checks
        for t in tel.get('cell_temps', []):
            if t > thr['temp_ot_c']:
                detected.append(FAULT_REGISTRY['OT_CELL'])
                break
            if t > thr['temp_ot_warn_c']:
                detected.append(FAULT_REGISTRY['OT_WARN'])
                break

        # 3. Isolation Fault (impedance monitor)
        if tel.get('isolation_ok', True) == False:
            detected.append(FAULT_REGISTRY['ISOLATION'])

        # 4. Thermal runaway detection (gas sensor, rapid T rise)
        if self._detect_thermal_runaway(tel):
            detected.append(FAULT_REGISTRY['TR_DETECT'])

        return detected

    def _detect_thermal_runaway(self, tel: dict) -> bool:
        """
        Early thermal runaway detection:
        - Rate of temperature rise > 1°C/s (early stage)
        - Gas sensor (CO, H2, electrolyte VOC) triggered
        - Sudden cell voltage drop with overheating
        """
        t_rate = tel.get('temp_rate_c_per_s', 0.0)
        gas_alarm = tel.get('gas_sensor_alarm', False)

        return t_rate > 1.0 or gas_alarm

    def _handle_faults(self, faults: List[FaultCode]) -> None:
        """Handle detected faults with appropriate state transition."""

        # Priority: EMERGENCY > HARD_FAULT > SOFT_FAULT > WARNING
        max_severity = max(faults, key=lambda f: [
            'WARNING', 'SOFT_FAULT', 'HARD_FAULT', 'EMERGENCY'
        ].index(f.severity))

        # Log faults
        for fault in faults:
            if fault not in self.active_faults:
                self.active_faults.append(fault)
                self.fault_history.append({
                    'timestamp': time.time(),
                    'code': fault.code,
                    'severity': fault.severity
                })

        # State transition
        if max_severity.severity == 'EMERGENCY':
            self._transition_to(BMSState.EMERGENCY_STOP)
            self._open_main_contactor(emergency=True)

        elif max_severity.severity == 'HARD_FAULT':
            self._transition_to(BMSState.FAULT_HARD)
            self._open_main_contactor(emergency=False)

        elif max_severity.severity == 'SOFT_FAULT':
            if self.state not in (BMSState.FAULT_HARD, BMSState.EMERGENCY_STOP):
                self._transition_to(BMSState.FAULT_SOFT)

    def _transition_to(self, new_state: BMSState) -> None:
        if new_state != self.state:
            old_state = self.state
            self.state = new_state
            if self.on_state_change:
                self.on_state_change(old_state, new_state)

    def _open_main_contactor(self, emergency: bool) -> None:
        """Opens the main contactor (physical)."""
        self._contactor_closed = False
        # In production: hardware GPIO/CAN command to contactor driver

    def _clear_recoverable_faults(self, tel: dict) -> None:
        """Remove recoverable faults if condition normalized."""
        thr = self.thresholds

        self.active_faults = [
            f for f in self.active_faults
            if not f.recoverable
        ]

        if not self.active_faults and self.state == BMSState.FAULT_SOFT:
            self._transition_to(BMSState.STANDBY)

Useful Life Optimization: DoD, C-rate and Charging Strategies

A grid-scale BESS represents an investment of $200–500 per kWh for batteries (2025), with typical capacities of 50–500 MWh. The expected economic lifetime is 10–20 years, but without a cycle optimization strategy, accelerated degradation can significantly reduce asset value. The Battery Life Optimizer balances operational revenues (arbitrage, frequency regulation) against battery degradation.

Fundamental Optimization Rules

Parameter	Impact on Cycle Life	Grid BESS Recommendation	Trade-off
Depth of Discharge (DoD)	High: 100% DoD reduces cycles by 50–70% vs 80%	70–85% DoD typical	More DoD = more energy/cycle = more revenue
Charge C-rate	High: 1C vs 0.3C reduces cycles by 20–30%	0.25–0.5C for 4h BESS	Higher C-rate = faster response but more heat
Maximum SoC	High: staying at 100% accelerates calendar aging	Max SoC 90–95% for long-term storage	Reduces available capacity
Operating temperature	Very high: 10°C above optimal doubles degradation	15–30°C ideal	HVAC costs energy, reduces round-trip efficiency
Minimum cutoff (SoC min)	Medium: below 5% risk of lithium plating	SoC min 5–15%	Reduces available energy

import numpy as np
from dataclasses import dataclass
from typing import Tuple

@dataclass
class ChargingOptimizer:
    """
    Charging strategy optimization to maximize cycle life.

    Strategies implemented:
    1. CC-CV (Constant Current - Constant Voltage) standard
    2. Multi-step CC (reduces electrode degradation)
    3. Pulse charging (reduces thermal stress)
    """

    capacity_ah: float
    cell_voltage_max: float   # E.g. 3.65V for LFP
    cell_voltage_min: float   # E.g. 2.80V for LFP
    soc_max: float = 0.95     # Do not charge beyond 95%
    soc_min: float = 0.05     # Do not discharge below 5%

    def cc_cv_profile(self,
                       target_soc: float,
                       current_soc: float,
                       max_crate: float = 0.5) -> dict:
        """
        Generate optimized CC-CV charging profile.

        CC phase: charge at constant current until V_max - 50mV
        CV phase: maintain constant voltage, current decreases
        Termination: when current drops below C/20
        """
        if current_soc >= target_soc:
            return {'phase': 'COMPLETE', 'current_a': 0, 'voltage_v': 0}

        soc_delta = target_soc - current_soc

        # Calculate optimal CC current based on soc_delta and temperature
        # Current reduction at high SoC (near end of charge)
        if current_soc < 0.8:
            cc_crate = max_crate
        elif current_soc < 0.9:
            cc_crate = max_crate * 0.7  # Slow down at 80%
        else:
            cc_crate = max_crate * 0.3  # CV-like region

        cc_current = cc_crate * self.capacity_ah

        # Target voltage (with safety margin for cell balancing)
        v_target = self.cell_voltage_max - 0.05  # 50 mV margin

        return {
            'phase': 'CC' if current_soc < 0.9 else 'CV',
            'current_a': cc_current,
            'voltage_v': v_target,
            'estimated_minutes': (soc_delta * self.capacity_ah) / cc_current * 60
        }

    def calculate_optimal_dod(self,
                               daily_cycles: float,
                               target_years: float,
                               chemistry: str = 'LFP') -> dict:
        """
        Calculate optimal DoD to maximize total energy throughput
        over the target lifetime.

        Trade-off: more DoD = more energy per cycle but fewer total cycles
        Optimum = maximum of (DoD * cycles_at_that_DoD)
        """
        # Empirical cycles vs DoD model (simplified)
        CYCLES_AT_DOD = {
            'LFP': {0.5: 8000, 0.6: 6500, 0.7: 5500, 0.8: 4500, 0.9: 3500, 1.0: 2500},
            'NMC': {0.5: 3000, 0.6: 2400, 0.7: 2000, 0.8: 1600, 0.9: 1200, 1.0: 900}
        }

        cycles_map = CYCLES_AT_DOD.get(chemistry, CYCLES_AT_DOD['LFP'])
        dod_values = sorted(cycles_map.keys())

        results = []
        required_cycles = daily_cycles * 365 * target_years

        for dod in dod_values:
            total_cycles = cycles_map[dod]
            energy_per_cycle_rel = dod  # Relative
            total_energy_rel = total_cycles * energy_per_cycle_rel

            # Does the battery last the required number of cycles?
            years_of_life = total_cycles / (daily_cycles * 365)

            results.append({
                'dod': dod,
                'total_cycles': total_cycles,
                'years_of_life': years_of_life,
                'total_energy_throughput': total_energy_rel,
                'meets_target': years_of_life >= target_years
            })

        # Optimum: maximum energy throughput that meets target years
        valid = [r for r in results if r['meets_target']]
        optimal = max(valid, key=lambda x: x['total_energy_throughput']) if valid else results[-1]

        return {
            'optimal_dod': optimal['dod'],
            'expected_years': optimal['years_of_life'],
            'total_cycles_available': optimal['total_cycles'],
            'all_scenarios': results
        }


# Example
optimizer = ChargingOptimizer(
    capacity_ah=280.0,
    cell_voltage_max=3.65,
    cell_voltage_min=2.80
)

result = optimizer.calculate_optimal_dod(
    daily_cycles=1.5,   # 1.5 cycles/day (typical arbitrage + frequency reg)
    target_years=15.0,  # 15-year target lifetime
    chemistry='LFP'
)

print(f"Optimal DoD: {result['optimal_dod']*100:.0f}%")
print(f"Expected lifetime: {result['expected_years']:.1f} years")

Grid Integration: BESS as a Grid Asset

A grid-scale BESS is not merely an "energy storage device": it is an electrical asset that participates in the energy market. The primary functions that generate revenue (or savings) are:

Frequency Regulation (FR): Fast response (<100 ms) to grid frequency deviations. In European markets (such as Terna's FCR service), response within 30 seconds to ±200 mHz deviations is required. Value: 50–150 €/MWh/year.
Peak Shaving: Reduction of consumption peaks to avoid power penalties (demand charges). Typical ROI for industrial users: 2–4 years.
Energy Arbitrage: Charge during low-price hours (nighttime, renewable surplus), discharge during high-price hours. In Italy, the day/night PUN spread can exceed 80–100 €/MWh on days with high solar generation.
Ramp Rate Control: Attenuation of rapid variations in solar or wind output to comply with ramp-rate limits imposed by grid operators.

from dataclasses import dataclass
from typing import List, Optional
import numpy as np

@dataclass
class GridDispatchCommand:
    """Dispatch command from the grid or EMS"""
    power_kw: float          # Positive = discharge to grid, negative = charge from grid
    duration_s: int
    service_type: str        # 'FCR', 'aFRR', 'peak_shaving', 'arbitrage', 'ramp_control'
    priority: int            # 1 = highest priority (safety), 10 = lowest
    timestamp: float

class BESSGridDispatcher:
    """
    Dispatcher for BESS managing commands from the grid
    while respecting BMS constraints (SoC, temperature, fault state).

    Integrates with EMS via Modbus TCP / IEC 61850 XCBR/MMXU
    """

    def __init__(self,
                 power_max_kw: float,
                 capacity_kwh: float,
                 soc_min: float = 0.1,
                 soc_max: float = 0.95,
                 ramp_rate_kw_per_s: float = None):

        self.P_max = power_max_kw
        self.E_total = capacity_kwh
        self.soc_min = soc_min
        self.soc_max = soc_max
        # Default ramp rate: full power in 1 second (typical modern BESS)
        self.ramp_rate = ramp_rate_kw_per_s or power_max_kw

        self._current_power_kw = 0.0
        self._current_soc = 0.5
        self._bms_state = 'OPERATIONAL'

    def execute_command(self,
                         cmd: GridDispatchCommand,
                         bms_telemetry: dict) -> dict:
        """
        Execute a dispatch command verifying BMS constraints.

        Returns:
            {'executed_power_kw': float, 'curtailed': bool,
              'reason': str, 'available_energy_kwh': float}
        """
        self._current_soc = bms_telemetry.get('soc', self._current_soc)
        self._bms_state = bms_telemetry.get('state', self._bms_state)

        # 1. Check BMS state
        if self._bms_state in ('FAULT_HARD', 'EMERGENCY_STOP'):
            return {
                'executed_power_kw': 0,
                'curtailed': True,
                'reason': f'BMS in state {self._bms_state}',
                'available_energy_kwh': 0
            }

        # 2. Calculate permitted power with SoC constraints
        requested_p = cmd.power_kw

        if requested_p > 0:  # Discharge
            # Available energy above minimum SoC
            available_energy = max(0,
                (self._current_soc - self.soc_min) * self.E_total)

            # Maximum power that does not bring SoC < soc_min over duration
            p_max_soc = (available_energy / cmd.duration_s) * 3600

            max_discharge = min(self.P_max, p_max_soc)

            if requested_p > max_discharge:
                executed_p = max_discharge
                curtailed = True
                reason = f'SoC too low: {available_energy:.1f} kWh available'
            else:
                executed_p = requested_p
                curtailed = False
                reason = 'OK'

        else:  # Charge (negative power)
            # Available capacity below maximum SoC
            available_cap = max(0,
                (self.soc_max - self._current_soc) * self.E_total)

            p_max_soc = (available_cap / cmd.duration_s) * 3600
            p_requested_abs = abs(requested_p)

            max_charge = min(self.P_max, p_max_soc)

            if p_requested_abs > max_charge:
                executed_p = -max_charge
                curtailed = True
                reason = f'SoC too high: {available_cap:.1f} kWh available'
            else:
                executed_p = requested_p
                curtailed = False
                reason = 'OK'

        # 3. Apply ramp rate limiting
        power_delta = executed_p - self._current_power_kw
        max_delta = self.ramp_rate * 0.1  # 100 ms step

        if abs(power_delta) > max_delta:
            executed_p = self._current_power_kw + np.sign(power_delta) * max_delta

        self._current_power_kw = executed_p

        return {
            'executed_power_kw': executed_p,
            'curtailed': curtailed,
            'reason': reason,
            'available_energy_kwh': abs(
                (self.soc_max - self._current_soc) * self.E_total
                if executed_p < 0
                else (self._current_soc - self.soc_min) * self.E_total
            )
        }

    def frequency_regulation_response(self,
                                        grid_freq_hz: float,
                                        nominal_freq_hz: float = 50.0,
                                        deadband_hz: float = 0.010) -> float:
        """
        Automatic grid frequency response (FCR - Frequency Containment Reserve).

        European rule: linear proportional response between ±200 mHz,
        maximum power beyond ±200 mHz (IEC 61000-4-30).

        Returns:
            Response power [kW] (positive = injection into grid)
        """
        freq_deviation = grid_freq_hz - nominal_freq_hz

        # Deadband: no response for minor deviations
        if abs(freq_deviation) <= deadband_hz:
            return 0.0

        # Proportional response (droop)
        effective_deviation = freq_deviation - np.sign(freq_deviation) * deadband_hz

        # Proportional range: ±200 mHz = ±100% power
        droop_range_hz = 0.200
        droop_response = np.clip(
            effective_deviation / droop_range_hz, -1.0, 1.0
        )

        # Invert: low frequency = grid needs power = BESS discharges
        response_power = -droop_response * self.P_max

        return float(response_power)

Battery Chemistries for Grid-Scale: 2025 Comparison

The choice of battery chemistry is the most impactful decision for a BESS project. In 2025, the grid-scale market is dominated by LFP (LiFePO4), which has displaced NMC for most stationary applications thanks to its superior safety and cycle life, despite lower energy density. Sodium-ion is the emerging frontier, with potentially lower costs and no dependence on lithium or cobalt.

Parameter	LFP	NMC (622/811)	NCA	Sodium-ion (SIB)
Energy density (cell)	130–200 Wh/kg	200–280 Wh/kg	220–300 Wh/kg	100–160 Wh/kg
Cycles (80% capacity)	3,000–6,000+	1,000–2,000	800–1,500	2,000–5,000
Thermal stability temp (°C)	~500°C (TEP)	~200–250°C	~150–180°C	~400°C
Nominal cell voltage	3.2 V	3.6–3.7 V	3.6 V	3.0–3.2 V
Cell cost (2025 est.)	$55–70/kWh	$85–110/kWh	$90–120/kWh	$40–60/kWh (target)
Complete BESS system cost	$200–280/kWh	$280–350/kWh	$300–400/kWh	$180–250/kWh (target)
Temperature range	-20°C to 60°C	-20°C to 50°C	-20°C to 50°C	-40°C to 60°C
Round-trip efficiency	95–98%	93–96%	92–95%	90–93%
Material dependencies	Fe, P, Li	Ni, Mn, Co, Li	Ni, Co, Al, Li	Na, Fe, Mn (no Li, Co)
Grid-scale suitability	Excellent	Good	Limited	Promising (2026+)
Key players	CATL, BYD, EVE, REPT	CATL, Samsung SDI, LG	Panasonic, Samsung	CATL, HiNa, Farasis

Why LFP Won in Grid-Scale

In 2025, over 85% of new utility-scale BESS uses LFP cells. The main reasons:

Superior safety: The olivine structure of LiFePO4 does not release oxygen during thermal decomposition, making thermal runaway far less likely and less energetic. Thermal onset temperature ~500°C vs ~200°C for NMC.
Superior cycle life: 3,000–6,000 cycles vs NMC's 1,000–2,000. At 1.5 cycles/day, LFP lasts 6–11 years vs NMC's 2–4 before replacement.
Lower cost: No cobalt, no high-purity nickel. LFP cells fell to $55–70/kWh in 2025 (from $120+ in 2020).
Robust supply chain: CATL/BYD dominance with enormous production capacity.
Flat discharge curve: LFP's flat discharge curve makes SoC estimation via voltage less accurate (EKF needed), but operation is more stable and predictable.

The Italian Context: MACSE, PNIEC and the BESS Market

Italy began a significant transformation of its storage sector in 2024–2025, primarily through the MACSE (Meccanismo di Approvvigionamento della capacità di Storage Elettrico) mechanism managed by Terna, the national transmission system operator.

The MACSE Mechanism

On September 30, 2024, Terna awarded the first MACSE auction with the following results:

Capacity awarded: 10 GWh of storage for the islands and southern Italy
Average premium: approximately 13,000 €/MWh/year (vs cap of 37,000 €/MWh/year)
Winners receive the premium in exchange for availability in dispatching markets
Terna targets 50 GWh of installed storage by 2030 (PNIEC objective)

BESS Projects Approved in Italy (2024–2025)

MASE (Ministry of Environment and Energy Security) has approved several significant BESS projects, including:

Sessa Aurunca (Campania) - 120 MW: 392 containers, 49 PCS systems at 2.75 MVA. First project of this scale approved in central-southern Italy.
Additional 600+ MW of new projects approved with Terna technical approval for integration into the RTN (National Transmission Grid).

FER X and the Energy Transition

The FER X Decree (transitional, effective February 28, 2025) incentivizes renewables with a framework that includes provisions for storage paired with wind and solar plants. It is funded by PNRR resources with reporting deadlines at end of 2025 for many categories.

Opportunities for Italian BMS Development

With 50 GWh of storage expected by 2030 and a pipeline of approximately 4–6 GW per year in coming years, the Italian market offers concrete opportunities for:

Software houses specializing in BMS and EMS systems for BESS
System integrators for 10–200 MW plants
Monitoring and optimization service providers (SoH tracking, RUL prediction)
Startups developing ML algorithms for cycle-life optimization

BMS Technology Stack: From Embedded to Cloud

A modern grid-scale BMS uses a multi-layer architecture with different technologies at each layer, optimized for specific requirements (latency, reliability, scalability).

# Grid-scale BMS technology stack - 2025

BMS_TECH_STACK = {

    # LAYER 1: Cell Monitoring IC (Hardware)
    'cell_monitoring': {
        'vendors': ['Texas Instruments BQ76952', 'Analog Devices ADBMS6815',
                    'Renesas ISL94212', 'NXP MC33771'],
        'voltage_accuracy': '±0.5-2 mV',
        'current_integration': 'Shunt or Hall-effect sensor',
        'interface': 'SPI / isoSPI / CAN',
        'isolation': 'Galvanic (up to 1500V DC)'
    },

    # LAYER 2: Cell Controller MCU (Firmware)
    'cell_controller': {
        'hw': ['ST STM32H7', 'NXP S32K3', 'Renesas RH850'],
        'os': ['FreeRTOS', 'AUTOSAR CP', 'Bare Metal'],
        'language': 'C99/C11',
        'cycle_time': '1-10 ms',
        'standards': ['ISO 26262 (ASIL-D for EV)', 'IEC 61508 (SIL-2 for grid)']
    },

    # LAYER 3: BMS Controller (Edge Computing)
    'bms_controller': {
        'hw': ['Raspberry Pi CM4 Industrial', 'Kontron KBox A-202',
               'Beckhoff CX5200', 'NVIDIA Jetson (for ML)'],
        'os': 'Linux (PREEMPT-RT kernel)',
        'language': 'Python 3.11 + C extensions',
        'key_libs': ['NumPy', 'SciPy', 'filterpy (EKF)',
                     'scikit-learn', 'asyncio'],
        'comms': ['CANopen', 'Modbus RTU/TCP', 'EtherCAT'],
        'protocols': ['IEC 61850', 'OCPP 2.0.1 (for EV)']
    },

    # LAYER 4: System EMS (Server)
    'energy_management': {
        'platform': ['Python FastAPI', 'Node.js', 'Java Spring Boot'],
        'database': ['InfluxDB (timeseries)', 'PostgreSQL (config)',
                     'Redis (real-time cache)'],
        'message_broker': ['Apache Kafka', 'MQTT (EMQX)'],
        'grid_protocols': ['Modbus TCP', 'IEC 61850', 'DNP3', 'SunSpec'],
        'monitoring': ['Grafana', 'Prometheus', 'Victoria Metrics']
    },

    # LAYER 5: Cloud Analytics
    'cloud_analytics': {
        'platform': ['AWS IoT TwinMaker', 'Azure IoT Hub', 'GCP IoT Core'],
        'ml_platform': ['MLflow', 'Ray', 'TensorFlow Lite (edge inference)'],
        'analytics': ['Apache Spark (batch)', 'Apache Flink (streaming)'],
        'digital_twin': ['AWS IoT TwinMaker', 'Bentley iTwin', 'AVEVA PI']
    }
}

# Modbus register map for BMS-EMS communication
MODBUS_REGISTER_MAP = {
    # Input Registers (read-only)
    1000: ('soc_percent', 'uint16', 'x100'),        # SoC: 0-10000 = 0-100.00%
    1001: ('soh_percent', 'uint16', 'x100'),         # SoH: 0-10000 = 0-100.00%
    1002: ('pack_voltage', 'uint16', 'x10'),          # V: 0-65535 = 0-6553.5V
    1003: ('pack_current', 'int16', 'x10'),           # A: -32768-32767 = -3276.8 to 3276.7A
    1004: ('max_cell_temp', 'int16', 'x10'),          # degC: -500 to +1000 = -50.0 to 100.0
    1005: ('bms_state', 'uint16', 'enum'),            # 0=INIT, 1=STANDBY, ..., 9=EMERGENCY
    1006: ('active_faults_bitmask', 'uint32', 'bits'), # Bit per active fault
    1008: ('available_power_kw', 'int16', 'x1'),      # Available kW (pos=discharge, neg=charge)

    # Holding Registers (read-write)
    2000: ('power_setpoint_kw', 'int16', 'x1'),       # Power setpoint from EMS
    2001: ('charge_enable', 'uint16', 'bool'),         # 1 = enable charging
    2002: ('discharge_enable', 'uint16', 'bool'),      # 1 = enable discharging
    2003: ('soc_setpoint_percent', 'uint16', 'x100'), # Target SoC from EMS
}

Best Practices and Anti-Patterns for Grid-Scale BMS

Best Practices

BMS Design: Fundamental Rules

Defense in depth: Never rely on a single protection layer. Hardware comparators + firmware checks + software BMS + EMS = 4 independent levels.
Fail-safe by default: In the event of communication loss, MCU failure or power loss, the system must autonomously move to a safe state (contactor open).
Watchdog timer: Every firmware module must be monitored by a hardware watchdog. If software hangs, the watchdog opens the contactors.
Periodic SoC calibration: Even with EKF, calibrate SoC from the OCV curve every 1–4 weeks (when the system is at rest).
Immutable logging: All fault events, state transitions and critical measurements must be saved on non-volatile storage with precise timestamps (NTP/PTP).
System-level thermal runaway testing: Certify with UL 9540A not just the single cell but the entire module/container.
Chemistry segregation: Never mix LFP and NMC cells in the same pack. Different OCV curves make cell balancing impossible.

Anti-Patterns to Avoid

Critical BMS Design Errors

SoC estimation with Coulomb Counting only: Current measurement drift (typical: 0.1–0.5%) leads to SoC errors of 5–15% within weeks. Always combine with OCV calibration or Kalman Filter.
Ignoring the aging curve in the SoC model: Nominal capacity changes over time. A BMS using initial capacity for Coulomb Counting will overestimate SoC on a battery with 20% degradation.
Inadequate temperature sensing: One sensor per 20–30 cells is insufficient to detect localized hot spots. Minimum 1 sensor per 5–10 cells for grid-scale applications.
Cell balancing on voltage only (not SoC): Cells with different capacities have the same voltage at different SoC levels. Balancing on voltage in applications with cells of different ages leads to selective over/under-charge.
Missing pre-charge circuit: Without pre-charging the PCS capacitors, the inrush current at main contactor closure can cause mechanical damage to cells and premature contactor wear.
EMS without SoH awareness: An EMS that issues commands to the BESS without knowledge of the current SoH risks damaging already-degraded cells with excessively deep cycles.

Conclusions

The Battery Management System is far more than a simple protection system: it is the operational brain of an energy asset worth tens or hundreds of millions of euros. A well-designed BMS extends BESS useful life by 30–50%, prevents potentially catastrophic incidents such as thermal runaway, and maximizes operational revenues through optimized dispatch and participation in flexibility markets.

The fundamental concepts we have covered are:

The hierarchical Cell-Module-Pack-Rack-System architecture with distinct responsibilities at each level and the separation between real-time firmware and edge/cloud processing.
SoC estimation with Extended Kalman Filter, which fuses Coulomb counting and voltage measurement to achieve 1–3% accuracy even with aged cells.
Calendar + cycle aging degradation models for predicting RUL and optimizing the operational strategy (DoD, C-rate, target temperature).
The fail-safe safety state machine with early thermal runaway detection through temperature monitoring, gas sensors and multi-parameter correlation.
Grid integration for frequency regulation, peak shaving and arbitrage, with a dispatcher that always respects BMS constraints in real time.
The Italian context with Terna's MACSE mechanism and the 50 GWh storage target by 2030, which represents a concrete market for engineers and software houses.

The next article in the EnergyTech series will explore the IEC 61850 standard, the smart grid substation communication protocol that defines how intelligent devices (IEDs) such as our BMS communicate with SCADA, EMS and other grid assets.

Next Article in the Series

Article 5: IEC 61850 for Software Engineers: Smart Grid Communication. We will cover the IEC 61850 data model, GOOSE messaging, MMS and how to integrate a BMS or a photovoltaic converter into a compliant substation control system.

Related Series on federicocalo.dev

MLOps Series: How to take ML models (SoH prediction, RUL) to production with MLflow, DVC and deployment on industrial edge hardware.
AI Engineering Series: RAG and LLM for BESS technical documentation, AI-assisted troubleshooting and natural language interfaces for EMS.
Data & AI Business Series: How to build a data platform for fleet analytics across multiple BESS sites with Snowflake, dbt and Grafana dashboarding.