Satellite and Weather APIs for AgriTech: Predictive Data with Sentinel-2 and Python
In 2024, the European Space Agency acquired more than 1.5 petabytes of data every single day from the Sentinel satellites of the Copernicus programme. Of that volume, the share dedicated to agricultural monitoring across Europe represents the largest and strategically most significant segment. Yet the vast majority of farming operations — including the most technologically advanced ones — have not yet tapped into this public, free, and scientifically validated infrastructure.
The barrier is not a lack of data: it is the complexity of accessing it, processing it, and translating it into concrete agronomic decisions. A Sentinel-2 satellite passes over your vineyard every 5 days at a 10-metre-per-pixel resolution. Each pass produces 13 spectral bands encoding vegetation health, water stress, chlorophyll content, and pest presence. Combined with weather data, ground-level IoT sensors, and predictive models, this data can radically transform agronomic decision quality — cutting input costs by 15–25% and boosting yields by up to 10–20%.
The global Earth Observation market is valued at $10.07 billion in 2025 according to Straits Research, with projected growth to $17.20 billion by 2033 (CAGR 6.92%). The agriculture segment accounts for 21% of applications and is growing at the fastest pace, driven by demand for precision farming, yield monitoring, and irrigation optimisation. In this article we build, step by step, a complete Python pipeline to access Sentinel-2 via CDSE (Copernicus Data Space Ecosystem), compute NDVI and other vegetation indices, integrate weather data from Open-Meteo, and feed a predictive model for water stress.
What You Will Learn in This Article
- Architecture of the Copernicus programme and how to access Sentinel-2 for free via CDSE
- Vegetation indices: NDVI, EVI, SAVI, LAI — mathematical formulas, interpretation, and thresholds for Italian crops
- Complete Python implementation: CDSE authentication, band download, NDVI computation with rasterio and numpy
- Weather API comparison: Open-Meteo, OpenWeatherMap, Tomorrow.io, and integration with satellite data
- Geospatial pipeline with GeoPandas, rasterio, and PostGIS for vector and raster data
- Water-stress predictive model: combining Sentinel-2 + weather + IoT with scikit-learn
- Practical case study: a Primitivo vineyard in Puglia, seasonal NDVI monitoring, and irrigation optimisation
- Italian regulatory context: AGEA, SIAN, CAP open data, and digital PAC
FoodTech Series — All Articles
| # | Article | Level | Status |
|---|---|---|---|
| 1 | IoT Pipeline for Precision Agriculture with Python and MQTT | Advanced | Available |
| 2 | Edge ML for Crop Monitoring: Computer Vision in the Field | Advanced | Available |
| 3 | Satellite and Weather APIs for AgriTech: Predictive Data (you are here) | Advanced | Current |
| 4 | Blockchain Traceability in Food: From Field to Supermarket | Intermediate | Coming soon |
| 5 | Computer Vision for Quality Control in the Food Industry | Advanced | Coming soon |
| 6 | FSMA and Digital Compliance: Automating Regulatory Processes | Intermediate | Coming soon |
| 7 | Vertical Farming: Environmental Control with IoT and ML | Advanced | Coming soon |
| 8 | Demand Forecasting for Food Retail with Prophet and LightGBM | Intermediate | Coming soon |
| 9 | Farm Intelligence Dashboard: Real-Time Analytics with Grafana | Intermediate | Coming soon |
| 10 | Food Supply Chain Optimisation: ML for Waste Reduction | Intermediate | Coming soon |
The Copernicus Programme: Europe's Earth Observation Infrastructure
The European Union's Copernicus programme is the largest Earth observation infrastructure ever built. Managed by ESA (European Space Agency) and the European Commission with operational support from EUMETSAT, Copernicus operates a constellation of satellites called "Sentinels", each designed for specific monitoring missions. For agriculture, the reference satellite is Sentinel-2.
The Sentinel-2 constellation consists of two twin satellites (Sentinel-2A and Sentinel-2B) in sun-synchronous orbit at 786 km altitude. The two-satellite configuration guarantees a revisit time of 5 days at the equator and 2–3 days at European latitudes (Italy included). Each satellite carries the MSI (MultiSpectral Instrument) sensor, a pushbroom scanner that acquires data across a 290 km swath width and 13 spectral bands distributed across three spatial resolutions: 10 metres, 20 metres, and 60 metres.
Sentinel-2's most strategic feature is its complete availability at no cost: since 2014, all Copernicus data has been available in open access for any use, commercial or non-commercial. The new access portal since 2023 is the Copernicus Data Space Ecosystem (CDSE), which replaced the previous Copernicus Open Access Hub (SciHub) that was decommissioned in 2023. CDSE provides immediate access to the full Sentinel-2 archive, with a latency from acquisition to availability of approximately 3 hours for the most recent imagery.
Sentinel-2 Satellite: MSI Sensor Technical Specifications
| Band | Wavelength (nm) | Resolution | Primary Application |
|---|---|---|---|
| B1 - Coastal aerosol | 443 | 60 m | Air quality, atmospheric correction |
| B2 - Blue | 490 | 10 m | Vegetation/soil discrimination, mapping |
| B3 - Green | 560 | 10 m | Vegetation vigour, leaf colour |
| B4 - Red | 665 | 10 m | Chlorophyll, NDVI (red band) |
| B5 - Vegetation Red Edge | 705 | 20 m | Vegetation stress, Red Edge NDVI |
| B6 - Vegetation Red Edge | 740 | 20 m | Chlorophyll content, species classification |
| B7 - Vegetation Red Edge | 783 | 20 m | Biomass, LAI |
| B8 - NIR | 842 | 10 m | NDVI (NIR band), biomass, LAI |
| B8a - Vegetation Red Edge | 865 | 20 m | Canopy structure, narrow NDVI |
| B9 - Water vapour | 940 | 60 m | Atmospheric water vapour correction |
| B10 - SWIR - Cirrus | 1375 | 60 m | Cirrus cloud detection |
| B11 - SWIR | 1610 | 20 m | Water stress, leaf water content |
| B12 - SWIR | 2190 | 20 m | Advanced water stress, senescence |
The two Sentinel-2 processing levels most relevant for agriculture are: Level-1C (L1C) — top-of-atmosphere radiance with geometric correction, and Level-2A (L2A) — bottom-of-atmosphere (BOA) surface reflectance with atmospheric correction applied via the Sen2Cor algorithm. For direct agricultural applications, Level-2A is the recommended choice since it has already been corrected for atmospheric influence, making reflectance values comparable across different dates and geographic locations.
Satellite Platforms for AgriTech: Full Comparison 2025
Sentinel-2 is not the only option available. The satellite data market for agriculture includes commercial providers like Planet Labs, established operators like Landsat (NASA/USGS), and cloud-integrated platforms like Google Earth Engine. The choice depends on a complex trade-off between spatial resolution, acquisition frequency, cloud coverage handling, and available budget.
Satellite Platform Comparison for Precision Agriculture - 2025
| Platform | Spatial Resolution | Revisit Frequency | Spectral Bands | Cost | Data Latency | API |
|---|---|---|---|---|---|---|
| Sentinel-2 (ESA/Copernicus) | 10 m (vis/NIR), 20 m (SWIR) | 5 days (2–3 in Europe) | 13 MSI bands | Free (open data) | ~3 hours from acquisition | CDSE OData, STAC, SentinelHub, OpenEO |
| Planet PlanetScope | 3.7 m | Daily (near-daily) | 8 bands (PS2.SD) | Commercial subscription; free for research | 24 hours | Planet API v1, Subscriptions API |
| Landsat 8/9 (NASA/USGS) | 30 m (multispectral), 15 m (pan) | 16 days | 11 OLI/TIRS bands | Free (open data) | 24–48 hours | USGS EarthExplorer, Google Earth Engine |
| MODIS (NASA Terra/Aqua) | 250 m / 500 m / 1 km | 1–2 days | 36 bands | Free (open data) | 24–48 hours | NASA EarthData STAC, Google Earth Engine |
| Google Earth Engine | Dataset-dependent (10 m–1 km) | Dataset-dependent | Multi-dataset integrated | Free non-commercial; paid commercial | Immediate cloud processing | Python ee, JavaScript API |
| Maxar WorldView-3 | 0.3 m (pan), 1.24 m (multi) | 1–4 days | 29 bands (CAVIS) | ~$25–40/km² | 4–8 hours | Maxar Streaming API |
| Airbus Pleiades Neo | 0.3 m | 1–2 days | 6 multispectral bands | ~€15–30/km² | 24–48 hours | OneAtlas API |
For the vast majority of agricultural applications in Italy, Sentinel-2 is the optimal choice: 10-metre resolution is sufficient for plots larger than 0.5 hectares (the threshold below which pixel statistics become unreliable), the 2–3-day revisit frequency across Italy is adequate for seasonal monitoring, and the complete absence of cost removes any economic barrier to adoption. Planet Labs becomes relevant only for applications requiring near-daily monitoring (for example, detecting the early onset of water stress in high-value crops like intensive horticulture) or sub-10m resolution. MODIS and Landsat remain useful for large-area analyses and multi-decade historical time series.
Vegetation Indices: NDVI, EVI, SAVI and LAI for Italian Crops
Vegetation Indices (VI) are metrics mathematically derived from combinations of spectral bands that quantify specific properties of vegetation. They exploit the fact that chlorophyll absorbs strongly in the red band (665 nm) and reflects strongly in the near-infrared (842 nm): a ratio between these bands encodes vegetation density and vitality with mathematical elegance.
NDVI — Normalized Difference Vegetation Index
NDVI is the most widely used vegetation index in the world, introduced by Rouse et al. in 1973. The mathematical formula is:
NDVI Formula
NDVI = (NIR - Red) / (NIR + Red)
On Sentinel-2:
NDVI = (B8 - B4) / (B8 + B4)
Range: -1.0 to +1.0
NDVI values are interpreted according to a standardised scale, but the optimal thresholds vary by crop and phenological stage. The table below shows reference thresholds for the main Italian crops, derived from scientific literature and empirical validation across typical growing regions (Po Valley, Puglia, Sicily, Tuscany):
NDVI Interpretation for Italian Crops
| NDVI Range | General Interpretation | Wheat (Triticum) | Grapevine (Vitis) | Tomato (Solanum) | Maize (Zea mays) | Olive (Olea) |
|---|---|---|---|---|---|---|
| < 0.1 | Bare soil, rock, snow | Unseeded ground | Winter, pre-bud break | Post-harvest | Pre-sowing | Bare inter-row soil |
| 0.1 - 0.2 | Very sparse or dry vegetation | Initial emergence | Dormancy / severe stress | Recent transplant | Emergence (VE) | Severe water/thermal stress |
| 0.2 - 0.4 | Sparse vegetation, moderate stress | Early tillering | Pre-flowering, slight stress | Early vegetative growth | Stages V1–V3 | Sparse canopy, stress present |
| 0.4 - 0.6 | Moderate vegetation, good health | Full stem extension | Pre-flowering / fruit set | Active vegetative development | Stages V4–V8 | Normal canopy, well irrigated |
| 0.6 - 0.8 | Dense vegetation, excellent vigour | Heading (seasonal peak) | Full foliation, veraison | Maximum cover (flowering) | Stages V10–VT (tasselling) | Dense canopy, optimal condition |
| > 0.8 | Very dense vegetation, forest | Rare (field-edge woodland) | Hedgerows and boundary trees | Greenhouses with full cover | Rare optimal conditions | High-density olive groves |
EVI — Enhanced Vegetation Index
EVI, developed by Huete et al. for the MODIS sensor, corrects the limitations of NDVI in high-density vegetation areas (NDVI saturation beyond 0.7) and in zones with exposed soils (NDVI errors caused by soil reflectance). The formula is:
EVI = G * (NIR - Red) / (NIR + C1*Red - C2*Blue + L)
On Sentinel-2:
EVI = 2.5 * (B8 - B4) / (B8 + 6*B4 - 7.5*B2 + 1)
Standard constants:
G = 2.5 (gain factor)
C1 = 6.0 (aerosol resistance coefficient)
C2 = 7.5 (aerosol resistance coefficient)
L = 1.0 (canopy background adjustment factor)
Range: -1.0 to +1.0 (typically 0 to 0.9 for vegetation)
SAVI — Soil Adjusted Vegetation Index
SAVI, proposed by Huete in 1988, is particularly useful in semi-arid environments such as Puglia or Sicily, where exposed soil significantly influences spectral response. The correction factor L reduces soil influence:
SAVI = (NIR - Red) * (1 + L) / (NIR + Red + L)
On Sentinel-2:
SAVI = (B8 - B4) * (1 + 0.5) / (B8 + B4 + 0.5)
L = 0.5 (standard value for medium vegetation)
L = 1.0 for very sparse vegetation
L = 0.25 for dense vegetation
Advantageous over NDVI when:
- Vegetation cover < 40%
- Bright, light-coloured soils (limestone, sandy)
- Semi-arid zones of southern Italy
LAI — Leaf Area Index
The LAI quantifies the total leaf surface area per unit of ground surface area. It is a fundamental agronomic parameter correlated with potential productivity and transpiration. Sentinel-2 allows LAI estimation through empirical or physical algorithms (SNAP BioPhysical Processor):
# Empirical LAI estimate from NDVI (Boegh et al. relationship, corrected for Sentinel-2)
# Valid for European cereal and herbaceous crops
LAI_estimate = -0.3 + 10.2 * NDVI # valid for NDVI in [0.2, 0.8]
# For grapevine (Italy-specific relationship from literature):
LAI_vine = 0.57 * EXP(3.28 * NDVI) # Dalla et al., 2019
# Indicative LAI thresholds for Italian crops:
# Wheat: optimal LAI 3-5 m2/m2 at heading
# Maize: optimal LAI 3-5 m2/m2 at flowering
# Grapevine: optimal LAI 1.5-2.5 m2/m2 during veraison
# Tomato: optimal LAI 3-4 m2/m2 at maximum cover
Limitations of Satellite Vegetation Indices
- Cloud cover: Sentinel-2 is an optical sensor — clouds render images unusable. In Italy, average cloud cover is 48% in winter and 15% in summer. The QA60 band enables automatic pixel-level cloud masking.
- NDVI saturation: Beyond NDVI ~0.7–0.8, sensitivity drops sharply. For high-density crops, use EVI instead.
- Mixed pixels: At 10-metre resolution, a single pixel may contain both vegetation and bare soil. For plots smaller than 0.5 ha, mean NDVI is less representative.
- Phenological seasonality: Comparing NDVI across different dates requires accounting for the phenological stage, not just the absolute value.
- Inter-annual variability: NDVI also varies with seasonal weather conditions (temperature, rainfall) independently of agronomic practices.
Python Implementation: CDSE Access and Sentinel-2 Download
The Copernicus Data Space Ecosystem (CDSE) became the single access portal for Sentinel data in 2023. It offers four main APIs: OData API (product search and download), STAC API (SpatioTemporal Asset Catalog), OpenEO API (standardised processing), and Sentinel Hub API (processing as a service). To get started, create a free account at dataspace.copernicus.eu and generate OAuth2 credentials.
Python Environment Setup and Dependencies
# Install required libraries
pip install sentinelhub rasterio numpy pandas geopandas shapely \
requests python-dotenv matplotlib scipy scikit-learn \
openmeteo-requests retry-requests xarray
# requirements.txt with validated versions for Python 3.11+
# sentinelhub==3.11.4 - CDSE and SentinelHub interface
# rasterio==1.4.0 - geospatial raster read/write
# numpy==2.1.0 - matrix computation for bands
# geopandas==1.0.1 - vector data and spatial operations
# shapely==2.0.6 - geometries for AOI (Area of Interest)
# openmeteo-requests==0.3.3 - Open-Meteo API client
# scikit-learn==1.5.2 - water stress predictive model
CDSE Authentication with OAuth2
import os
import requests
from datetime import datetime, timedelta
from dotenv import load_dotenv
load_dotenv()
# CDSE credentials configuration (from environment variables)
CDSE_USERNAME = os.getenv("CDSE_USERNAME")
CDSE_PASSWORD = os.getenv("CDSE_PASSWORD")
CDSE_CLIENT_ID = "cdse-public"
CDSE_TOKEN_URL = "https://identity.dataspace.copernicus.eu/auth/realms/CDSE/protocol/openid-connect/token"
def get_cdse_access_token(username: str, password: str) -> str:
"""
Obtain OAuth2 access token from the CDSE Identity Server.
Token lifetime is 600 seconds (10 minutes).
"""
response = requests.post(
CDSE_TOKEN_URL,
data={
"grant_type": "password",
"client_id": CDSE_CLIENT_ID,
"username": username,
"password": password,
},
headers={"Content-Type": "application/x-www-form-urlencoded"},
timeout=30,
)
response.raise_for_status()
token_data = response.json()
return token_data["access_token"]
class CDSESession:
"""Authenticated session with automatic token refresh management."""
def __init__(self, username: str, password: str):
self._username = username
self._password = password
self._token: str | None = None
self._token_expiry: datetime | None = None
def _refresh_token(self) -> None:
self._token = get_cdse_access_token(self._username, self._password)
# 60-second safety margin before actual expiry (600s)
self._token_expiry = datetime.utcnow() + timedelta(seconds=540)
def get_headers(self) -> dict:
if self._token is None or datetime.utcnow() >= self._token_expiry:
self._refresh_token()
return {"Authorization": f"Bearer {self._token}"}
# Initialise session
session = CDSESession(CDSE_USERNAME, CDSE_PASSWORD)
Searching Sentinel-2 Products via OData API
from shapely.geometry import box
import json
ODATA_BASE_URL = "https://catalogue.dataspace.copernicus.eu/odata/v1"
def search_sentinel2_products(
bbox: tuple[float, float, float, float], # (lon_min, lat_min, lon_max, lat_max)
start_date: str, # format: "2024-05-01T00:00:00.000Z"
end_date: str, # format: "2024-05-31T23:59:59.000Z"
cloud_cover_max: float = 20.0,
product_type: str = "S2MSI2A", # L2A = Bottom of Atmosphere, preferred for agriculture
max_results: int = 20,
) -> list[dict]:
"""
Search for Sentinel-2 products in the CDSE catalogue.
Args:
bbox: Bounding box in order (lon_min, lat_min, lon_max, lat_max)
start_date: Start date (ISO 8601 format)
end_date: End date (ISO 8601 format)
cloud_cover_max: Maximum cloud cover percentage (0-100)
product_type: S2MSI2A (L2A, recommended) or S2MSI1C (L1C)
max_results: Maximum number of results
Returns:
List of dictionaries with product metadata
"""
lon_min, lat_min, lon_max, lat_max = bbox
aoi_wkt = f"POLYGON(( {lon_min} {lat_min}, {lon_max} {lat_min}, {lon_max} {lat_max}, {lon_min} {lat_max}, {lon_min} {lat_min} ))"
filter_query = (
f"Collection/Name eq 'SENTINEL-2' "
f"and Attributes/OData.CSC.DoubleAttribute/any(att:att/Name eq 'cloudCover' "
f" and att/OData.CSC.DoubleAttribute/Value le {cloud_cover_max}) "
f"and Attributes/OData.CSC.StringAttribute/any(att:att/Name eq 'productType' "
f" and att/OData.CSC.StringAttribute/Value eq '{product_type}') "
f"and ContentDate/Start ge {start_date} "
f"and ContentDate/Start lt {end_date} "
f"and OData.CSC.Intersects(area=geography'SRID=4326;{aoi_wkt}')"
)
params = {
"$filter": filter_query,
"$orderby": "ContentDate/Start desc",
"$top": max_results,
"$expand": "Attributes",
}
response = requests.get(
f"{ODATA_BASE_URL}/Products",
params=params,
timeout=60,
)
response.raise_for_status()
data = response.json()
return data.get("value", [])
# Example: search for imagery over the Manduria vineyard (Puglia)
# Growing season: May-September 2024
bbox_manduria = (17.4, 40.3, 17.7, 40.5) # Primitivo di Manduria DOC area
products = search_sentinel2_products(
bbox=bbox_manduria,
start_date="2024-05-01T00:00:00.000Z",
end_date="2024-09-30T23:59:59.000Z",
cloud_cover_max=10.0,
product_type="S2MSI2A",
)
print(f"Found {len(products)} Sentinel-2 L2A products with cloud cover <= 10%")
for p in products[:5]:
cloud = next(
(a["Value"] for a in p.get("Attributes", []) if a["Name"] == "cloudCover"),
"N/A",
)
print(f" {p['Name']} | Cloud: {cloud:.1f}% | Date: {p['ContentDate']['Start'][:10]}")
Downloading and Accessing Bands with Sentinel Hub Python
For processing bands without downloading the full scene (which weighs 800 MB – 1.2 GB per L2A product), it is more efficient to use the Sentinel Hub Process API, which returns only the requested bands for the region of interest:
from sentinelhub import (
SHConfig,
SentinelHubRequest,
DataCollection,
MimeType,
BBox,
CRS,
bbox_to_dimensions,
)
import numpy as np
# SentinelHub configuration via CDSE
config = SHConfig()
config.sh_client_id = os.getenv("SH_CLIENT_ID") # from apps.sentinel-hub.com
config.sh_client_secret = os.getenv("SH_CLIENT_SECRET")
config.sh_base_url = "https://sh.dataspace.copernicus.eu" # CDSE endpoint
# Area of Interest (AOI) definition - Manduria vineyard ~100 ha
aoi_bbox = BBox(
bbox=[17.42, 40.34, 17.52, 40.42],
crs=CRS.WGS84,
)
# Compute raster dimensions at 10m resolution
size = bbox_to_dimensions(aoi_bbox, resolution=10)
print(f"Raster dimensions: {size[0]} x {size[1]} pixels at 10m")
# Evalscript to download B04 (Red) and B08 (NIR) for NDVI computation
evalscript_ndvi_bands = """
//VERSION=3
function setup() {
return {
input: [{
bands: ["B04", "B08", "CLM"], // Red, NIR, Cloud Mask
units: "REFLECTANCE"
}],
output: {
bands: 3,
sampleType: "FLOAT32"
}
};
}
function evaluatePixel(sample) {
// Returns [Red, NIR, CloudMask]
return [sample.B04, sample.B08, sample.CLM];
}
"""
# Data request
request = SentinelHubRequest(
evalscript=evalscript_ndvi_bands,
input_data=[
SentinelHubRequest.input_data(
data_collection=DataCollection.SENTINEL2_L2A.define_from(
"s2l2a",
service_url=config.sh_base_url,
),
time_interval=("2024-07-15", "2024-07-20"),
other_args={"dataFilter": {"maxCloudCoverage": 10}},
)
],
responses=[SentinelHubRequest.output_response("default", MimeType.TIFF)],
bbox=aoi_bbox,
size=size,
config=config,
)
# Download data (returns numpy array [H, W, Bands])
images = request.get_data()
band_data = images[0] # shape: (height, width, 3)
red_band = band_data[:, :, 0].astype(np.float32) # B04 - Red
nir_band = band_data[:, :, 1].astype(np.float32) # B08 - NIR
cloud_mask = band_data[:, :, 2] # CLM - 0=clear, 1=cloud
print(f"Bands downloaded - Shape: {red_band.shape}")
print(f"Red B04 - Min: {red_band.min():.4f}, Max: {red_band.max():.4f}, Mean: {red_band.mean():.4f}")
print(f"NIR B08 - Min: {nir_band.min():.4f}, Max: {nir_band.max():.4f}, Mean: {nir_band.mean():.4f}")
print(f"Cloud pixels: {cloud_mask.sum()} out of {cloud_mask.size} total ({100*cloud_mask.mean():.1f}%)")
NDVI Computation and GeoTIFF Export
import rasterio
from rasterio.transform import from_bounds
from rasterio.crs import CRS as RasterioCRS
import matplotlib.pyplot as plt
import matplotlib.colors as mcolors
def calculate_ndvi(
red: np.ndarray,
nir: np.ndarray,
cloud_mask: np.ndarray | None = None,
) -> np.ndarray:
"""
Compute NDVI from the red and NIR bands.
Handles division by zero with np.errstate.
Masks cloudy pixels with np.nan.
Returns:
NDVI array as float32, range [-1, 1], NaN for invalid pixels
"""
with np.errstate(divide="ignore", invalid="ignore"):
ndvi = np.where(
(nir + red) == 0,
np.nan,
(nir - red) / (nir + red),
).astype(np.float32)
# Mask clouds
if cloud_mask is not None:
ndvi = np.where(cloud_mask == 1, np.nan, ndvi)
# Clip physically impossible values (instrumental artefacts)
ndvi = np.clip(ndvi, -1.0, 1.0)
return ndvi
def save_ndvi_geotiff(
ndvi: np.ndarray,
bbox_wgs84: tuple[float, float, float, float],
output_path: str,
) -> None:
"""
Save the NDVI array as a georeferenced GeoTIFF in WGS84.
Args:
ndvi: 2D NDVI array (H, W)
bbox_wgs84: (lon_min, lat_min, lon_max, lat_max)
output_path: Output file path
"""
height, width = ndvi.shape
lon_min, lat_min, lon_max, lat_max = bbox_wgs84
transform = from_bounds(lon_min, lat_min, lon_max, lat_max, width, height)
with rasterio.open(
output_path,
"w",
driver="GTiff",
height=height,
width=width,
count=1,
dtype=rasterio.float32,
crs=RasterioCRS.from_epsg(4326),
transform=transform,
compress="lzw",
nodata=np.nan,
) as dst:
dst.write(ndvi, 1)
dst.update_tags(
NDVI_COMPUTED_AT=datetime.utcnow().isoformat(),
SENTINEL2_PRODUCT_TYPE="S2MSI2A",
FORMULA="(B08-B04)/(B08+B04)",
)
# Compute and save NDVI
ndvi_map = calculate_ndvi(red_band, nir_band, cloud_mask)
# NDVI statistics for the AOI (excluding NaN)
valid_pixels = ndvi_map[~np.isnan(ndvi_map)]
print(f"\nNDVI Statistics - Manduria Vineyard (July 2024)")
print(f" Valid pixels: {len(valid_pixels)} out of {ndvi_map.size}")
print(f" Mean NDVI: {valid_pixels.mean():.3f}")
print(f" Median NDVI: {np.median(valid_pixels):.3f}")
print(f" NDVI p10: {np.percentile(valid_pixels, 10):.3f} (stressed zones)")
print(f" NDVI p90: {np.percentile(valid_pixels, 90):.3f} (optimal zones)")
bbox_wgs84 = (17.42, 40.34, 17.52, 40.42)
save_ndvi_geotiff(ndvi_map, bbox_wgs84, "ndvi_manduria_20240717.tif")
print("NDVI GeoTIFF saved: ndvi_manduria_20240717.tif")
Weather APIs for AgriTech: Comparison and Implementation
Weather data is the second fundamental data source in precision agriculture. Combined with NDVI and ground-level IoT data, it enables building predictive models for water stress, disease risk, evapotranspiration, and yield forecasting. The weather API landscape in 2025 spans open-source free providers, commercial providers with free tiers, and regional data from Italy's ARPA networks.
Weather APIs for AgriTech — Comparison 2025
| Provider | Price | Spatial Resolution | Forecast | Agro Variables | Historical Archive | API |
|---|---|---|---|---|---|---|
| Open-Meteo | Free (non-commercial); ~$15/month commercial | 1–11 km (multi-model) | 16 days | ET0, soil temp, soil moisture, vapour pressure | 1940–present (ERA5) | REST JSON, Python client |
| OpenWeatherMap | Free up to 60 calls/min; $40/month pro | 2.5 km (ICON) | 5 days (free), 16 days (pro) | Limited, no ET0 | Paid history only | REST JSON, Python SDK |
| Tomorrow.io | Free 500 calls/day; $199/month core | 1 km | 21 days | SoilMoisture, ET0, spray conditions, pest risk | 6 years | REST, WebSocket, gRPC |
| ARPA Regional (Italy, 15 regions) | Free (regional open data) | Point stations (~1,500 stations across Italy) | No (observed data only) | All agro-meteorological variables | Full archive (decades) | WMS, REST, CSV/JSON download |
| Meteoam / CFS/ECMWF | ECMWF open data free; CFS NOAA free | 9–25 km | 10–45 days | Basic meteorological variables | 1940–present via ERA5 | ECMWF API (Python), CDS API |
| Visual Crossing | Free 1,000 records/day; $35/month standard | Station-interpolated | 15 days | ET0, Growing Degree Days, frost risk | Unlimited paid | REST JSON, CSV, SQL-like |
For AgriTech applications in production, the recommendation is to use Open-Meteo as the base (free, open-source, ERA5 data for historical records, 16-day forecast) and supplement it with ARPA station networks for local calibration. Open-Meteo includes critical agrometeorological variables such as ET0 (reference evapotranspiration), soil moisture at multiple depths, and water vapour pressure — variables that are not always freely available from commercial providers.
Open-Meteo Client with Caching and Retry Logic
import openmeteo_requests
import requests_cache
import pandas as pd
from retry_requests import retry
from dataclasses import dataclass
from typing import Optional
@dataclass
class WeatherForecast:
"""Daily weather data for agronomic decision-making."""
date: pd.DatetimeIndex
temperature_max: np.ndarray # Celsius
temperature_min: np.ndarray # Celsius
precipitation: np.ndarray # mm
et0: np.ndarray # mm/day - reference evapotranspiration (FAO-56)
wind_speed_max: np.ndarray # km/h
solar_radiation: np.ndarray # MJ/m2/day
soil_moisture_0_7cm: np.ndarray # m3/m3 (0-7 cm depth)
soil_moisture_7_28cm: np.ndarray # m3/m3 (7-28 cm depth)
soil_temp_0_7cm: np.ndarray # Celsius
def get_weather_forecast(
latitude: float,
longitude: float,
start_date: Optional[str] = None, # "YYYY-MM-DD" for historical archive
end_date: Optional[str] = None,
forecast_days: int = 16,
) -> WeatherForecast:
"""
Retrieve weather data from Open-Meteo with local cache (1 hour) and auto-retry.
Combines forecast (16 days) with ERA5 historical archive if start_date is provided.
Args:
latitude: WGS84 latitude
longitude: WGS84 longitude
start_date: If provided, requests historical data (not forecast)
end_date: End of historical period
forecast_days: Forecast horizon in days (1-16)
Returns:
WeatherForecast with all agrometeorological parameters
"""
# Local HTTP cache (1-hour validity) + retry on transient errors
cache_session = requests_cache.CachedSession(".cache/open_meteo", expire_after=3600)
retry_session = retry(cache_session, retries=5, backoff_factor=0.2)
client = openmeteo_requests.Client(session=retry_session)
base_url = (
"https://archive-api.open-meteo.com/v1/archive"
if start_date
else "https://api.open-meteo.com/v1/forecast"
)
params = {
"latitude": latitude,
"longitude": longitude,
"daily": [
"temperature_2m_max",
"temperature_2m_min",
"precipitation_sum",
"et0_fao_evapotranspiration",
"wind_speed_10m_max",
"shortwave_radiation_sum",
"soil_moisture_0_to_7cm",
"soil_moisture_7_to_28cm",
"soil_temperature_0_to_7cm",
],
"timezone": "Europe/Rome", # Italian time zone
"wind_speed_unit": "kmh",
}
if start_date:
params["start_date"] = start_date
params["end_date"] = end_date
else:
params["forecast_days"] = forecast_days
# Open-Meteo uses ERA5 for historical archive (1940-present), excellent for agriculture
if start_date:
# ERA5-Land archive: high resolution 9 km for Italy
params["models"] = "era5_land"
responses = client.weather_api(base_url, params=params)
r = responses[0] # first (and only) location
daily = r.Daily()
return WeatherForecast(
date=pd.date_range(
start=pd.to_datetime(daily.Time(), unit="s", utc=True),
end=pd.to_datetime(daily.TimeEnd(), unit="s", utc=True),
freq=pd.Timedelta(seconds=daily.Interval()),
inclusive="left",
),
temperature_max = daily.Variables(0).ValuesAsNumpy(),
temperature_min = daily.Variables(1).ValuesAsNumpy(),
precipitation = daily.Variables(2).ValuesAsNumpy(),
et0 = daily.Variables(3).ValuesAsNumpy(),
wind_speed_max = daily.Variables(4).ValuesAsNumpy(),
solar_radiation = daily.Variables(5).ValuesAsNumpy(),
soil_moisture_0_7cm = daily.Variables(6).ValuesAsNumpy(),
soil_moisture_7_28cm = daily.Variables(7).ValuesAsNumpy(),
soil_temp_0_7cm = daily.Variables(8).ValuesAsNumpy(),
)
# Example: weather for the Manduria vineyard (2024 growing season)
lat_manduria, lon_manduria = 40.38, 17.47
weather_historical = get_weather_forecast(
latitude=lat_manduria,
longitude=lon_manduria,
start_date="2024-04-01",
end_date="2024-09-30",
)
weather_forecast = get_weather_forecast(
latitude=lat_manduria,
longitude=lon_manduria,
forecast_days=16,
)
df_weather = pd.DataFrame({
"date": weather_historical.date,
"t_max_C": weather_historical.temperature_max,
"t_min_C": weather_historical.temperature_min,
"rain_mm": weather_historical.precipitation,
"et0_mm": weather_historical.et0,
"rad_MJ": weather_historical.solar_radiation,
"sw_0_7cm": weather_historical.soil_moisture_0_7cm,
"sw_7_28cm": weather_historical.soil_moisture_7_28cm,
})
print(df_weather.describe())
print(f"\nTotal ET0 growing season Apr-Sep 2024: {df_weather['et0_mm'].sum():.1f} mm")
print(f"Total rainfall Apr-Sep 2024: {df_weather['rain_mm'].sum():.1f} mm")
water_deficit = df_weather["et0_mm"].sum() - df_weather["rain_mm"].sum()
print(f"Seasonal water deficit (ET0 - rainfall): {water_deficit:.1f} mm")
Complete Geospatial Pipeline: GeoPandas, rasterio and PostGIS
Professional management of satellite data requires a complete geospatial infrastructure that integrates raster data (NDVI satellite imagery, weather maps) with vector data (plot boundaries, management zones, sampling points). The pipeline described here adopts the de-facto standard Python geospatial stack: rasterio for rasters, GeoPandas for vectors, PostGIS as the spatial database, and GDAL as the underlying format translation engine.
Geospatial Pipeline Architecture
AgriTech Geospatial Pipeline Technology Stack
+-------------------------------------------------------------------------+
| INPUT DATA |
| [Sentinel-2 GeoTIFF] [Plot Shapefiles] [Weather/IoT CSV] |
| | | | |
| rasterio geopandas pandas |
+---------|----------------------|---------------------- |----------------+
| | |
+---------v----------------------v-----------------------v----------------+
| PROCESSING LAYER (Python) |
| [NDVI Calculation] [Zonal Statistics] [Spatial Join] |
| numpy/rasterio rasterstats geopandas |
+---------|----------------------|---------------------- |----------------+
| | |
+----------------------v-----------------------+
|
+--------------------------------v----------------------------------------+
| STORAGE LAYER (PostGIS + S3) |
| [PostGIS - geometries + attributes] [S3/MinIO - GeoTIFF rasters] |
| - Plot table with mean NDVI - Original raster files |
| - Historical series per plot - Satellite scene archive |
| - GIST spatial index - Cloud Optimized GeoTIFF |
+--------------------------------|----------------------------------------+
|
+--------------------------------v----------------------------------------+
| SERVING LAYER |
| [REST API - FastAPI] [Dashboard - Grafana] [ML Models] |
| NDVI per plot Real-time NDVI maps Stress prediction |
+-------------------------------------------------------------------------+
Zonal Statistics: Mean NDVI per Plot with rasterstats
import geopandas as gpd
from rasterstats import zonal_stats
import rasterio
from sqlalchemy import create_engine, text
from geoalchemy2 import Geometry
import pandas as pd
# Load vineyard plot shapefile
gdf_plots = gpd.read_file("data/plots_manduria_vineyard.shp")
gdf_plots = gdf_plots.to_crs("EPSG:4326") # Reproject to WGS84
print(f"Plots loaded: {len(gdf_plots)}")
print(gdf_plots[["plot_id", "variety", "hectares", "planting_year"]].head(10))
def compute_zonal_ndvi(
ndvi_raster_path: str,
plots_gdf: gpd.GeoDataFrame,
acquisition_date: str,
) -> gpd.GeoDataFrame:
"""
Compute zonal NDVI statistics for each plot.
Per plot, computes:
- Mean NDVI (overall vigour indicator)
- Median NDVI (robust to outliers)
- NDVI std (intra-plot variability)
- 10th percentile (stressed zones within the plot)
- 90th percentile (optimal zones)
- Percentage of valid pixels (excludes cloud NaN)
Returns:
GeoDataFrame with added NDVI columns
"""
stats = zonal_stats(
vectors=plots_gdf.geometry,
raster=ndvi_raster_path,
stats=["mean", "median", "std", "percentile_10", "percentile_90", "count", "nodata"],
nodata=np.nan,
geojson_out=False,
)
df_stats = pd.DataFrame(stats)
df_stats.columns = [
"ndvi_mean", "ndvi_median", "ndvi_std",
"ndvi_p10", "ndvi_p90", "pixel_count", "pixel_nodata",
]
df_stats["acquisition_date"] = pd.to_datetime(acquisition_date)
df_stats["pixel_valid_pct"] = (
df_stats["pixel_count"] /
(df_stats["pixel_count"] + df_stats["pixel_nodata"].fillna(0)) * 100
)
# Stress classification based on mean NDVI (thresholds for grapevine in summer)
df_stats["stress_category"] = pd.cut(
df_stats["ndvi_mean"],
bins=[-1.0, 0.25, 0.40, 0.55, 0.70, 1.0],
labels=["Severe Stress", "Moderate Stress", "Normal", "Good", "Optimal"],
)
return gpd.GeoDataFrame(
pd.concat([plots_gdf.reset_index(drop=True), df_stats], axis=1),
geometry="geometry",
crs=plots_gdf.crs,
)
# Compute zonal NDVI for all plots
gdf_ndvi = compute_zonal_ndvi(
ndvi_raster_path="ndvi_manduria_20240717.tif",
plots_gdf=gdf_plots,
acquisition_date="2024-07-17",
)
# Stressed plots - irrigation priority
stressed_plots = gdf_ndvi[
gdf_ndvi["stress_category"].isin(["Severe Stress", "Moderate Stress"])
]
print(f"\nStressed plots ({len(stressed_plots)}/{len(gdf_ndvi)}):")
print(stressed_plots[["plot_id", "variety", "ndvi_mean", "stress_category", "hectares"]])
# Persist to PostGIS for spatial queries
engine = create_engine("postgresql://agritech:pass@localhost:5432/vineyard_db")
gdf_ndvi.to_postgis(
name="ndvi_historical",
con=engine,
if_exists="append",
index=False,
dtype={"geometry": Geometry("MULTIPOLYGON", srid=4326)},
)
print("\nNDVI data saved to PostGIS (table: ndvi_historical)")
PostGIS Queries for Advanced Spatial Analysis
-- PostGIS schema for the AgriTech system
CREATE TABLE IF NOT EXISTS plots (
plot_id SERIAL PRIMARY KEY,
cadastral_code VARCHAR(20) UNIQUE NOT NULL,
variety VARCHAR(50),
hectares NUMERIC(8, 4),
planting_year INTEGER,
training_system VARCHAR(30),
geom GEOMETRY(MULTIPOLYGON, 4326)
);
CREATE INDEX idx_plots_geom ON plots USING GIST(geom);
CREATE TABLE IF NOT EXISTS ndvi_historical (
id SERIAL PRIMARY KEY,
plot_id INTEGER REFERENCES plots(plot_id),
satellite_date DATE NOT NULL,
ndvi_mean NUMERIC(6, 4),
ndvi_median NUMERIC(6, 4),
ndvi_std NUMERIC(6, 4),
ndvi_p10 NUMERIC(6, 4),
ndvi_p90 NUMERIC(6, 4),
pixel_valid_pct NUMERIC(5, 2),
stress_category VARCHAR(20),
satellite VARCHAR(20) DEFAULT 'Sentinel-2',
created_at TIMESTAMPTZ DEFAULT NOW()
);
CREATE INDEX idx_ndvi_plot_date ON ndvi_historical(plot_id, satellite_date);
-- Query: NDVI trend over last 4 acquisitions per plot
SELECT
p.cadastral_code,
p.variety,
n.satellite_date,
n.ndvi_mean,
LAG(n.ndvi_mean, 1) OVER (PARTITION BY n.plot_id ORDER BY n.satellite_date) AS ndvi_prev,
n.ndvi_mean - LAG(n.ndvi_mean, 1) OVER (PARTITION BY n.plot_id ORDER BY n.satellite_date) AS ndvi_delta
FROM plots p
JOIN ndvi_historical n ON p.plot_id = n.plot_id
WHERE n.satellite_date >= NOW() - INTERVAL '60 days'
ORDER BY p.plot_id, n.satellite_date;
-- Query: plots with NDVI drop > 0.1 in the last month (stress alert)
SELECT
p.cadastral_code,
p.variety,
p.hectares,
latest.ndvi_mean AS current_ndvi,
prev.ndvi_mean AS ndvi_30d_ago,
(latest.ndvi_mean - prev.ndvi_mean) AS change
FROM plots p
JOIN ndvi_historical latest ON p.plot_id = latest.plot_id
AND latest.satellite_date = (SELECT MAX(satellite_date) FROM ndvi_historical WHERE plot_id = p.plot_id)
JOIN ndvi_historical prev ON p.plot_id = prev.plot_id
AND prev.satellite_date = (SELECT MAX(satellite_date) FROM ndvi_historical
WHERE plot_id = p.plot_id AND satellite_date < NOW() - INTERVAL '25 days')
WHERE (latest.ndvi_mean - prev.ndvi_mean) < -0.10
ORDER BY change ASC;
Water Stress Predictive Model with ML
Combining satellite NDVI with weather and IoT data enables building water-stress predictive models with 3–7 days of lead time, enabling proactive irrigation planning. The model described here integrates three data sources: the Sentinel-2 NDVI time series, the water balance computed from ET0 and precipitation, and soil moisture from IoT sensors (or from Open-Meteo ERA5-Land when physical sensors are not available).
Feature Engineering for the Model
import pandas as pd
import numpy as np
from sklearn.ensemble import GradientBoostingClassifier
from sklearn.model_selection import train_test_split, cross_val_score
from sklearn.preprocessing import StandardScaler
from sklearn.pipeline import Pipeline
from sklearn.metrics import classification_report
import joblib
from typing import Tuple
def build_stress_features(
df_ndvi: pd.DataFrame, # columns: date, ndvi_mean, ndvi_std, ndvi_p10
df_weather: pd.DataFrame, # columns: date, t_max_C, t_min_C, rain_mm, et0_mm, sw_0_7cm
lookback_days: int = 14,
) -> pd.DataFrame:
"""
Build feature set for the water stress predictive model.
Combines satellite NDVI with water balance and weather variables.
Generated features:
- Current NDVI and change over 5/10 days
- Cumulative water deficit (ET0 - rainfall) over 7/14/21 days
- Growing Degree Days (GDD) from season start
- Average soil moisture and trend
- Thermal variability (t_max - t_min)
"""
df = df_weather.merge(df_ndvi, on="date", how="left").sort_values("date")
df["ndvi_mean"] = df["ndvi_mean"].interpolate(method="linear", limit=5)
# Cumulative water balance (mm) - positive = surplus, negative = deficit
df["water_balance"] = df["rain_mm"] - df["et0_mm"]
df["deficit_7d"] = df["water_balance"].rolling(7).sum()
df["deficit_14d"] = df["water_balance"].rolling(14).sum()
df["deficit_21d"] = df["water_balance"].rolling(21).sum()
# NDVI trend (point and windowed change)
df["ndvi_delta_5d"] = df["ndvi_mean"] - df["ndvi_mean"].shift(5)
df["ndvi_delta_10d"] = df["ndvi_mean"] - df["ndvi_mean"].shift(10)
# Growing Degree Days (base 10 degrees - standard for grapevine)
df["t_mean"] = (df["t_max_C"] + df["t_min_C"]) / 2
df["gdd_daily"] = np.maximum(df["t_mean"] - 10, 0)
df["gdd_cumulative"] = df["gdd_daily"].cumsum()
# Thermal variability (thermal stress indicator)
df["diurnal_range"] = df["t_max_C"] - df["t_min_C"]
# Average soil moisture and trend
df["sm_avg_7d"] = df["sw_0_7cm"].rolling(7).mean()
df["sm_trend"] = df["sw_0_7cm"] - df["sw_0_7cm"].shift(3)
feature_cols = [
"ndvi_mean", "ndvi_std", "ndvi_p10",
"ndvi_delta_5d", "ndvi_delta_10d",
"deficit_7d", "deficit_14d", "deficit_21d",
"et0_mm", "rain_mm",
"t_max_C", "t_min_C", "diurnal_range",
"gdd_cumulative", "gdd_daily",
"sm_avg_7d", "sm_trend",
]
return df[["date"] + feature_cols].dropna()
def train_stress_model(
features_df: pd.DataFrame,
labels: pd.Series, # 0=no stress, 1=moderate stress, 2=severe stress
) -> Tuple[Pipeline, dict]:
"""
Train GradientBoosting model for water stress classification.
Uses sklearn pipeline with integrated StandardScaler.
Returns:
(trained_pipeline, validation_metrics)
"""
feature_cols = [c for c in features_df.columns if c != "date"]
X = features_df[feature_cols].values
y = labels.values
X_train, X_test, y_train, y_test = train_test_split(
X, y, test_size=0.2, random_state=42, stratify=y
)
pipeline = Pipeline([
("scaler", StandardScaler()),
("clf", GradientBoostingClassifier(
n_estimators=200,
learning_rate=0.05,
max_depth=4,
subsample=0.8,
random_state=42,
)),
])
pipeline.fit(X_train, y_train)
y_pred = pipeline.predict(X_test)
cv_scores = cross_val_score(pipeline, X, y, cv=5, scoring="f1_weighted")
metrics = {
"f1_cv_mean": cv_scores.mean(),
"f1_cv_std": cv_scores.std(),
"classification_report": classification_report(
y_test, y_pred,
target_names=["No Stress", "Moderate Stress", "Severe Stress"],
),
}
return pipeline, metrics
# Save model for deployment
def save_model(pipeline: Pipeline, path: str) -> None:
joblib.dump(pipeline, path)
print(f"Model saved: {path}")
def predict_stress_7days(
pipeline: Pipeline,
features_forecast: pd.DataFrame, # features computed on 7-day forecast data
) -> pd.DataFrame:
"""
Predict water stress for the next 7 days.
Uses features computed on weather forecast data (Open-Meteo, 16 days).
Returns:
DataFrame with date, class probabilities, and predicted class
"""
feature_cols = [c for c in features_forecast.columns if c != "date"]
X_forecast = features_forecast[feature_cols].values
proba = pipeline.predict_proba(X_forecast)
pred_class = pipeline.predict(X_forecast)
labels = ["no_stress", "moderate_stress", "severe_stress"]
result_df = pd.DataFrame(proba, columns=[f"prob_{l}" for l in labels])
result_df["predicted_class"] = pred_class
result_df["date"] = features_forecast["date"].values
result_df["predicted_label"] = [labels[c] for c in pred_class]
return result_df[["date", "predicted_label", "prob_no_stress",
"prob_moderate_stress", "prob_severe_stress"]]
Case Study: Seasonal NDVI Monitoring in a Primitivo Vineyard in Manduria
To make the described concepts concrete, we present a case study inspired by a real application in the DOC Primitivo di Manduria territory (Taranto, Puglia). The area presents conditions typical of Mediterranean viticulture: hot dry summers (summer rainfall < 50 mm), calcareous clay soils, and training systems ranging from the traditional bush vine (alberello pugliese) to vertical shoot positioning (controspalliera).
Case Study Parameters — Primitivo Vineyard, Manduria
| Parameter | Value |
|---|---|
| Location | Manduria (TA), Puglia — lat 40.38, lon 17.47 |
| Total area | 35 hectares divided into 12 plots |
| Main varieties | Primitivo (80%), Negroamaro (20%) |
| Training systems | Bush vine — alberello (8 plots), VSP — controspalliera (4 plots) |
| Planting year | 2003–2015 (mixed young/mature vines) |
| Irrigation | Drip irrigation (not all plots) |
| Monitoring satellite | Sentinel-2A/B, L2A, 10m resolution |
| Acquisitions used | 28 scenes in 2024 season (cloud < 20%) |
| Integrated IoT sensors | 6 weather stations, 18 soil moisture sensors at 30/60 cm |
| Analysis period | April – October 2024 |
Seasonal NDVI Results and Irrigation Decisions
The seasonal NDVI analysis revealed significantly differentiated water stress patterns across plots, with direct implications for irrigation management and product quality:
NDVI Results per Plot — 2024 Growing Season
| Plot | Variety | NDVI Apr | NDVI Jun | NDVI Jul (peak) | NDVI Aug | NDVI Sep | Irrigation Alert |
|---|---|---|---|---|---|---|---|
| Block A1 | Primitivo / bush vine | 0.28 | 0.42 | 0.58 | 0.41 | 0.31 | Moderate stress August |
| Block A2 | Primitivo / bush vine | 0.22 | 0.36 | 0.47 | 0.29 | 0.24 | Severe stress Aug–Sep |
| Block B1 | Primitivo / VSP | 0.31 | 0.53 | 0.66 | 0.57 | 0.44 | No alert |
| Block B2 | Primitivo / VSP | 0.29 | 0.49 | 0.63 | 0.52 | 0.40 | No alert |
| Block C1 | Negroamaro / bush vine | 0.25 | 0.44 | 0.55 | 0.38 | 0.28 | Moderate stress August |
| Block C2 | Negroamaro / VSP | 0.30 | 0.51 | 0.61 | 0.50 | 0.38 | No alert |
The pattern that emerged is clear and agronomically significant: bush vine plots without supplemental irrigation show more pronounced water stress in August and September compared to VSP plots with drip irrigation. Block A2 in particular showed an NDVI decline from 0.47 to 0.29 between July and August 2024, correlated in time with a 23-day dry spell and maximum temperatures consistently exceeding 36 degrees Celsius.
Automated Alerting Script
import smtplib
from email.mime.text import MIMEText
from email.mime.multipart import MIMEMultipart
from typing import List
def check_ndvi_alerts(
gdf_ndvi: gpd.GeoDataFrame,
stress_threshold: float = 0.35,
min_valid_pixels_pct: float = 70.0,
) -> List[dict]:
"""
Check plots with NDVI below the stress threshold.
Generates an alert list for agronomist notification.
Args:
gdf_ndvi: GeoDataFrame with ndvi_mean and pixel_valid_pct columns
stress_threshold: NDVI threshold below which to trigger alert (0.35 default for grapevine)
min_valid_pixels_pct: Minimum valid pixel percentage for data reliability
Returns:
List of dicts with per-plot alert details
"""
alerts = []
for _, row in gdf_ndvi.iterrows():
if row["pixel_valid_pct"] < min_valid_pixels_pct:
continue # Too many cloudy pixels, data unreliable
if row["ndvi_mean"] < stress_threshold:
severity = "SEVERE" if row["ndvi_mean"] < 0.25 else "MODERATE"
alerts.append({
"plot_id": row["plot_id"],
"variety": row.get("variety", "N/A"),
"hectares": row.get("hectares", 0),
"current_ndvi": round(row["ndvi_mean"], 3),
"ndvi_threshold": stress_threshold,
"severity": severity,
"date": row.get("acquisition_date", "N/A"),
"pixel_valid": round(row["pixel_valid_pct"], 1),
})
return sorted(alerts, key=lambda x: x["current_ndvi"])
def send_alert_email(
alerts: List[dict],
recipient: str,
sender: str,
smtp_host: str = "smtp.gmail.com",
smtp_port: int = 587,
) -> None:
"""Send NDVI alert email to the agronomist."""
if not alerts:
return
body_lines = [
f"Satellite NDVI Monitoring Alert — {alerts[0]['date']}\n",
f"Plots under water stress: {len(alerts)}\n",
"=" * 60,
"",
]
for a in alerts:
body_lines.append(
f" [{a['severity']}] Plot {a['plot_id']} ({a['variety']}, {a['hectares']} ha)\n"
f" NDVI: {a['current_ndvi']} (threshold: {a['ndvi_threshold']}) | Valid pixels: {a['pixel_valid']}%\n"
)
msg = MIMEMultipart()
msg["Subject"] = f"[NDVI ALERT] {len(alerts)} plots under stress — {alerts[0]['date']}"
msg["From"] = sender
msg["To"] = recipient
msg.attach(MIMEText("\n".join(body_lines), "plain"))
smtp_password = os.getenv("SMTP_PASSWORD")
with smtplib.SMTP(smtp_host, smtp_port) as server:
server.starttls()
server.login(sender, smtp_password)
server.send_message(msg)
print(f"Alert sent to {recipient} for {len(alerts)} stressed plots")
ROI of the Satellite Monitoring System
Quantifying the Return on Investment of satellite monitoring is essential for justifying the technological investment to viticulture businesses. For the Manduria vineyard, the cost-benefit analysis for the 2024 season shows:
ROI Analysis — Satellite Monitoring System, Manduria Vineyard (35 ha, 2024 Season)
| Item | Value | Notes |
|---|---|---|
| COSTS | ||
| Sentinel-2 data access (CDSE) | €0 | Free open data |
| Weather API Open-Meteo | €0 | Non-commercial free tier |
| Python pipeline development | €2,500 | 30 dev hours × €83/h (one-time, amortised over 3 years) |
| Cloud hosting (VM + storage) | €600/year | 4 vCPU VM + 50 GB S3 storage |
| Agronomic integration consultancy | €1,000 | One-time, crop-specific threshold calibration |
| Total Year 1 Cost | €4,100 | Subsequent years: ~€1,400/year |
| ESTIMATED BENEFITS (35 ha, 100,000 bottles/year) | ||
| Irrigation reduction (timing optimisation) | €2,800 | 15% water saving, 8,500 m³ × €0.33/m³ |
| Reduced crop protection inputs | €1,750 | Targeted intervention in at-risk zones only (−20% treatments) |
| Grape quality improvement | €8,500 | +0.5 mean Brix points in 12 ha, +€0.20/kg quality premium |
| Water stress loss recovery | €3,200 | 8% yield recovery in 2 plots with predicted severe stress |
| Total Year 1 Benefits | €16,250 | |
| Year 1 ROI | 296% | Payback in approximately 3 months |
Google Earth Engine: Cloud Processing for Large-Scale Analysis
For analysis at regional or national scale — for example, variety mapping across an entire DOC appellation, assessing spring frost damage at provincial scale, or monitoring CAP compliance for paying agencies — Google Earth Engine (GEE) offers cloud-native computing capacity that would otherwise require dedicated HPC infrastructure.
import ee
# Google Earth Engine authentication (requires GEE account)
ee.Authenticate()
ee.Initialize(project="your-gee-project-id")
def calculate_ndvi_gee(
aoi_geojson: dict,
start_date: str,
end_date: str,
cloud_threshold: int = 20,
) -> ee.ImageCollection:
"""
Compute NDVI at large scale using Google Earth Engine.
Suited for analysis across DOC appellations or entire regions.
Args:
aoi_geojson: GeoJSON of the area of interest
start_date: "YYYY-MM-DD"
end_date: "YYYY-MM-DD"
cloud_threshold: Maximum cloud cover percentage (0-100)
Returns:
ImageCollection with NDVI computed for each scene
"""
aoi = ee.Geometry(aoi_geojson)
def mask_s2_clouds(image: ee.Image) -> ee.Image:
"""Mask clouds using the Sentinel-2 QA60 band."""
qa = image.select("QA60")
cloud_bit_mask = 1 << 10 # bit 10: opaque clouds
cirrus_bit_mask = 1 << 11 # bit 11: cirrus clouds
mask = qa.bitwiseAnd(cloud_bit_mask).eq(0).And(
qa.bitwiseAnd(cirrus_bit_mask).eq(0))
return image.updateMask(mask).divide(10000) # scale to [0,1]
def add_ndvi(image: ee.Image) -> ee.Image:
"""Add NDVI band to the Sentinel-2 image."""
ndvi = image.normalizedDifference(["B8", "B4"]).rename("NDVI")
return image.addBands(ndvi)
collection = (
ee.ImageCollection("COPERNICUS/S2_SR_HARMONIZED")
.filterBounds(aoi)
.filterDate(start_date, end_date)
.filter(ee.Filter.lt("CLOUDY_PIXEL_PERCENTAGE", cloud_threshold))
.map(mask_s2_clouds)
.map(add_ndvi)
)
return collection
# Compute seasonal NDVI median mosaic (cloud-free composite)
def get_seasonal_ndvi_mosaic(
collection: ee.ImageCollection,
aoi: ee.Geometry,
) -> ee.Image:
"""Create seasonal NDVI mosaic (median is robust to residual outliers)."""
ndvi_mosaic = collection.select("NDVI").median().clip(aoi)
return ndvi_mosaic
# Example: Primitivo di Manduria DOC appellation (~21,000 ha)
aoi_doc_manduria = {
"type": "Polygon",
"coordinates": [[
[17.2, 40.2], [17.8, 40.2], [17.8, 40.6],
[17.2, 40.6], [17.2, 40.2],
]],
}
collection_summer = calculate_ndvi_gee(
aoi_geojson=aoi_doc_manduria,
start_date="2024-07-01",
end_date="2024-08-31",
cloud_threshold=15,
)
print(f"Available scenes: {collection_summer.size().getInfo()}")
# Export NDVI mosaic to Google Drive (for subsequent local Python analysis)
ndvi_mosaic = get_seasonal_ndvi_mosaic(collection_summer, ee.Geometry(aoi_doc_manduria))
task = ee.batch.Export.image.toDrive(
image=ndvi_mosaic,
description="NDVI_DOC_Manduria_Summer2024",
folder="AgriTech_Export",
fileNamePrefix="ndvi_manduria_summer2024",
region=ee.Geometry(aoi_doc_manduria),
scale=10,
crs="EPSG:4326",
maxPixels=1e10,
)
task.start()
print(f"Export started — Task ID: {task.id}")
Regulatory Framework and Open Data for AgriTech in Italy
Italy's agricultural open data ecosystem is structured around three main actors: AGEA (Agency for Agricultural Disbursements), the SIAN (National Agricultural Information System), and connected regional portals. In February 2025, AGEA completed the migration of SIAN to the National Strategic Pole (Polo Strategico Nazionale), updating the cloud infrastructure and expanding geospatial data processing capabilities.
Main Italian Agricultural Open Data Sources
| Source | URL | Available Data | Format | Update Frequency |
|---|---|---|---|---|
| SIAN Open Data | dati.sian.it | High-resolution orthophotos, AI-based Land Use Map, Farm Graphic Plans | GeoTIFF, Shapefile, WFS | Annual |
| AGEA EO Services | sian.it | Multispectral orthophotos, X-band radar imagery, DEM, DSM | GeoTIFF, ECW | Periodic (campaign) |
| ISTAT — Agricultural Statistics | istat.it/agricoltura | UAA, production, farm structure, agricultural census | CSV, JSON, SDMX | Annual |
| National Geoportal MATTM | geoportale.gov.it | Carta Natura, Corine Land Cover, IDT, DBT | WMS, WFS, Shapefile | Variable |
| Regional ARPA Networks (15 regions) | arpa.[region].it | Station climate/weather data, regional forecast models | CSV, JSON, NetCDF | Near real-time |
| Copernicus Land Service | land.copernicus.eu | CORINE Land Cover, Grassland, Water and Wetness, Urban Atlas | GeoTIFF, Shapefile | Triennial/annual |
PNRR Transizione 5.0 Tax Credits for AgriTech
Investments in satellite monitoring systems and precision agriculture can qualify for benefits under Piano Transizione 5.0 (Law 207/2024 — 2025 Budget Law). Agricultural businesses, admitted from 2025, can access tax credits for investment in 4.0 tangible and intangible capital goods, including farm management software with artificial intelligence and geospatial data analytics components.
Transizione 5.0 Tax Credit Rates for Agricultural Businesses (2025)
| Investment Band | Energy Saving 3–6% | Saving 6–10% | Saving >10% |
|---|---|---|---|
| Up to €2.5M | 35% | 40% | 45% |
| €2.5M to €10M | 15% | 20% | 25% |
| €10M to €50M | 5% | 10% | 15% |
Precision irrigation systems driven by NDVI and weather forecasts can qualify for the "energy saving >6%" band if they document a reduction in water consumption (pumping energy) versus the baseline. A certified technical report is required.
Production Architecture: Best Practices and Anti-Patterns
Having described the components, it is important to synthesise best practices for a robust production implementation and identify the common anti-patterns that lead to fragile or inaccurate systems.
Best Practices — AgriTech Satellite Monitoring System
- Progressive cloud filtering: Do not rely solely on the product-level cloud cover percentage (Level-1 metadata). Always apply pixel-level masking with QA60 (Sentinel-2) or BQA (Landsat). A product at 5% cloud cover may have a specific plot completely obscured by a localised cloud.
- Cloud Optimized GeoTIFF (COG): Archive NDVI rasters in COG format for efficient range-request access from S3/MinIO. Avoid classic GeoTIFFs not optimised for cloud storage.
- Consistent projection: Standardise EVERYTHING to EPSG:4326 (WGS84) for final data, or to UTM Zone 32N (EPSG:32632) for Italy to preserve correct metric measurements. Never mix CRS without explicit reprojection.
- Ground truth IoT validation: Calibrate NDVI thresholds against field measurements (soil moisture sensors, portable LAI meter, field-collected agronomic data). NDVI alone without local calibration can yield 15–30% false-positive/negative alerts.
- Historical archive management: Maintain at least 3–5 years of NDVI time series for each plot. The NDVI anomaly relative to the historical mean (same month, same crop) is far more informative than the absolute value.
- Rate limiting and caching: CDSE and Open-Meteo have rate limits. Always implement local caching (file-based or Redis) to avoid redundant downloads. A single Sentinel-2 download takes 30–120 seconds — never request the same data twice.
- Logging and audit trail: Every NDVI computation must be traceable to the source satellite scene, acquisition date, cloud cover percentage, and algorithm version. Essential for CAP/AGEA audits and for debugging anomalies.
Common Anti-Patterns — AgriTech Satellite Systems
- Skipping atmospheric correction: Using Level-1C (top-of-atmosphere) instead of Level-2A (bottom-of-atmosphere) introduces systematic NDVI errors on the order of 10–20% under conditions of high atmospheric humidity (Po Valley fog, Adriatic proximity). ALWAYS use L2A for quantitative applications.
- NDVI as the sole indicator: Relying exclusively on NDVI while ignoring EVI (for dense canopies), NDWI (early water stress), and Red Edge NDVI (early stress before it is visible in standard NDVI) is insufficient. A professional system uses at least 3–4 indices in combination.
- Pixel size vs. plot size mismatch: Applying NDVI to plots smaller than 0.3 ha at 10-metre resolution produces statistically unreliable means (fewer than 30 pixels). For small plots, consider Planet Labs (3.7 m) or drone-mounted sensors.
- Missing temporal compositing: Taking the single available acquisition for a date — even at 19% cloud cover — leads to NDVI artefacts in partially cloudy zones. Always use medoid compositing over a 10–15-day temporal window.
- Pipeline without data versioning: Recomputing NDVI with different Python/rasterio/sentinelhub versions produces slightly different values for the same source data. Version computation pipelines with DVC or equivalent.
Complete Pipeline: Orchestration with Prefect
Integrating all the described components, the complete satellite monitoring pipeline can be orchestrated with Prefect (see the Pipeline Orchestration series) for automatic execution every 5 days — aligned with the Sentinel-2 revisit frequency.
from prefect import flow, task
from prefect.schedules import CronSchedule
from datetime import datetime, timedelta
import logging
logger = logging.getLogger(__name__)
@task(retries=3, retry_delay_seconds=60, name="search-sentinel2-products")
def search_products_task(bbox: tuple, lookback_days: int = 7) -> list[dict]:
"""Prefect task: search for recent Sentinel-2 products."""
end_date = datetime.utcnow()
start_date = end_date - timedelta(days=lookback_days)
products = search_sentinel2_products(
bbox=bbox,
start_date=start_date.strftime("%Y-%m-%dT00:00:00.000Z"),
end_date=end_date.strftime("%Y-%m-%dT23:59:59.000Z"),
cloud_cover_max=20.0,
)
logger.info(f"Found {len(products)} Sentinel-2 products")
return products
@task(retries=2, retry_delay_seconds=120, name="download-compute-ndvi")
def compute_ndvi_task(
product: dict,
aoi_bbox: tuple,
output_dir: str = "/data/ndvi",
) -> str:
"""Prefect task: download bands and compute NDVI for one product."""
# Implementation based on functions defined in previous sections
acquisition_date = product["ContentDate"]["Start"][:10]
output_path = f"{output_dir}/ndvi_{acquisition_date}.tif"
# Download + compute (code omitted for brevity, uses functions above)
logger.info(f"NDVI computed for {acquisition_date}: {output_path}")
return output_path
@task(name="zonal-stats-postgis")
def zonal_stats_task(ndvi_path: str, plots_shapefile: str) -> dict:
"""Prefect task: compute zonal statistics and persist to PostGIS."""
gdf_plots = gpd.read_file(plots_shapefile).to_crs("EPSG:4326")
acquisition_date = ndvi_path.split("ndvi_")[1].replace(".tif", "")
gdf_ndvi = compute_zonal_ndvi(ndvi_path, gdf_plots, acquisition_date)
engine = create_engine(os.getenv("POSTGIS_URL"))
gdf_ndvi.to_postgis("ndvi_historical", engine, if_exists="append", index=False)
alerts = check_ndvi_alerts(gdf_ndvi)
return {"plots_processed": len(gdf_ndvi), "alerts": len(alerts), "alert_details": alerts}
@task(name="send-alerts")
def alerts_task(stats: dict) -> None:
"""Prefect task: send alert email if any alerts are present."""
if stats["alerts"] > 0:
send_alert_email(
alerts=stats["alert_details"],
recipient=os.getenv("AGRONOMIST_EMAIL"),
sender=os.getenv("ALERT_EMAIL"),
)
logger.info(f"Alerts sent for {stats['alerts']} plots")
@flow(
name="satellite-monitoring-pipeline",
schedule=CronSchedule(cron="0 8 */5 * *"), # every 5 days at 08:00
)
def satellite_monitoring_flow(
bbox: tuple = (17.35, 40.28, 17.60, 40.50),
plots_shapefile: str = "/data/plots_vineyard.shp",
) -> None:
"""
Complete satellite monitoring pipeline for the vineyard.
Runs every 5 days, aligned with the Sentinel-2 revisit frequency.
"""
products = search_products_task(bbox=bbox)
if not products:
logger.warning("No products found in the last period")
return
# Process the most recent product with minimum cloud cover
best_product = sorted(
products,
key=lambda p: next(
(a["Value"] for a in p.get("Attributes", []) if a["Name"] == "cloudCover"), 100
),
)[0]
ndvi_path = compute_ndvi_task(best_product, bbox)
stats = zonal_stats_task(ndvi_path, plots_shapefile)
alerts_task(stats)
logger.info(
f"Pipeline complete: {stats['plots_processed']} plots, {stats['alerts']} alerts"
)
if __name__ == "__main__":
satellite_monitoring_flow()
Conclusions and Next Steps
Sentinel-2 satellite data, freely accessible through CDSE, is arguably the most underutilised data source in Italian AgriTech. With Python, rasterio, sentinelhub, and a CDSE account, you can build in just a few days an NDVI monitoring system that outperforms many commercial paid solutions in quality and granularity.
The key to success is not technological sophistication but local agronomic calibration: NDVI thresholds, alert time windows, and feature weights in the predictive model must be tuned to the specific crop, local climate, and the farm's agronomic practices. A system calibrated for grapevine in Puglia will not perform equally well for wheat in the Po Valley without human calibration intervention.
Key takeaways for anyone building a similar system:
- Always use Sentinel-2 Level-2A (atmospheric correction applied) for quantitative comparisons across different dates.
- Combine NDVI with at least one complementary index (EVI for dense vineyards, NDWI for early water stress, Red Edge NDVI for chlorophyll monitoring).
- Integrate Open-Meteo ERA5-Land for the water balance: ET0 + rainfall + model-derived soil moisture is an excellent proxy when you do not have IoT sensors in the field.
- Maintain a historical series of at least 3 years per plot: the anomaly relative to the historical mean is far more actionable than an absolute NDVI value.
- Scale with Google Earth Engine only when you need to process regional or national scales: for individual farms, the local Python pipeline on CDSE is more controllable and avoids vendor lock-in.
Continue the FoodTech Series
You have built the satellite pipeline. Now explore the other articles in the series to complete the AgriTech architecture:
- Article 1: IoT Pipeline for Precision Agriculture with Python and MQTT — How to integrate ground-level sensors with the satellite pipeline.
- Article 2: Edge ML for Crop Monitoring: Computer Vision in the Field — Computer vision models for early detection of crop diseases.
- Article 4: Blockchain Traceability in Food: From Field to Supermarket — How NDVI data feeds quality certificates on-chain.
To dive deeper into MLOps and the deployment of the predictive models described in this article, see the MLOps for Business with MLflow series and the Enterprise LLM: RAG and Fine-Tuning series.







