Real-Time Property Market Data Pipeline
The real estate market generates an astronomical amount of data every day: new listings, price reductions, recorded transactions, cadastral data, building permits, commercial occupancy rates, construction material prices. Transforming this flood of raw data into actionable market indicators in real time is the competitive advantage that separates leading PropTech platforms from traditional solutions.
In this article, we build a complete pipeline for ingestion, processing, and analysis of real estate market data using Apache Kafka for streaming, Apache Flink for real-time processing, TimescaleDB for time-series, and Elasticsearch for search and aggregations.
What You'll Learn
- Lambda and Kappa architecture for real-time real estate data
- Ethical scraping and normalization from heterogeneous sources (MLS, portals, land registry)
- Apache Kafka: topic design for real estate events
- Apache Flink: stream processing for market indicators
- TimescaleDB: time-series optimization for historical prices
- Indicator calculation: Price Index, Days on Market, Price Reduction Rate
- Real-time alerting: notifications when market opportunities emerge
- Analytics dashboards with Apache Superset and Grafana
Real Estate Market Data Sources
A real estate market intelligence pipeline must integrate highly heterogeneous sources with completely different formats, quality levels, and update frequencies.
Primary Data Sources
- MLS Feed (RESO Web API): high-quality structured data, RESO/ODATA standard, 15-min updates
- Real estate portals (Zillow, Rightmove, Idealista): scraping with rate limit and ToS compliance
- Land Registry / Cadastre: recorded transactions, assessed values, mortgage data
- Building permits: leading indicator of neighborhood growth/decline
- Demographic data (Census Bureau, Eurostat): population, income, migration dynamics
- OpenStreetMap POIs: services, transport, schools for geospatial enrichment
- Central bank rates (Fed/ECB): direct impact on mortgage market and property purchases
Data Model for Real Estate Events
// Event types for the real estate market
type PropertyEventType =
| 'listing.created'
| 'listing.updated'
| 'listing.price_changed'
| 'listing.status_changed'
| 'listing.removed'
| 'transaction.recorded'
| 'appraisal.completed'
| 'permit.issued';
interface PropertyEvent {
id: string;
type: PropertyEventType;
sourceId: string;
sourceName: string; // 'mls_london', 'rightmove', 'land_registry'
timestamp: string;
propertyId?: string;
payload: PropertyListingPayload | TransactionPayload;
metadata: {
ingestTimestamp: string;
schemaVersion: string;
quality: 'high' | 'medium' | 'low';
};
}
interface PropertyListingPayload {
externalId: string;
title: string;
price: number;
previousPrice?: number;
currency: string;
propertyType: string;
squareMeters: number;
rooms: number;
location: {
lat: number;
lon: number;
address: string;
city: string;
neighborhood?: string;
h3Cell?: string;
};
listingType: 'sale' | 'rent';
status: 'active' | 'sold' | 'rented' | 'withdrawn';
listedAt: string;
daysOnMarket?: number;
}
Kafka: Topic Design and Partitioning
Kafka topic design is critical for performance. Partitions by geographic area allow parallelized consumers by city, guaranteeing event order for specific properties.
import { Kafka, Partitioners } from 'kafkajs';
const kafka = new Kafka({
clientId: 'proptech-ingestion',
brokers: process.env['KAFKA_BROKERS']!.split(','),
ssl: true,
});
const TOPICS = [
{
topic: 'property.events.raw',
numPartitions: 50,
replicationFactor: 3,
configEntries: [
{ name: 'retention.ms', value: String(7 * 24 * 60 * 60 * 1000) }, // 7 days
{ name: 'compression.type', value: 'lz4' },
],
},
{
topic: 'property.events.enriched',
numPartitions: 50,
replicationFactor: 3,
configEntries: [
{ name: 'retention.ms', value: String(30 * 24 * 60 * 60 * 1000) }, // 30 days
{ name: 'compression.type', value: 'zstd' },
],
},
{
topic: 'market.indicators',
numPartitions: 10,
replicationFactor: 3,
configEntries: [
{ name: 'cleanup.policy', value: 'compact' },
{ name: 'retention.ms', value: String(-1) }, // indefinite (compacted)
],
},
];
export class PropertyEventProducer {
private producer = kafka.producer({
createPartitioner: Partitioners.DefaultPartitioner,
});
async sendEvent(event: PropertyEvent): Promise<void> {
await this.producer.send({
topic: 'property.events.raw',
messages: [{
// Key = city code: guarantees event order for city on same partition
key: this.getCityCode(event),
value: JSON.stringify(event),
headers: {
'event-type': event.type,
'source': event.sourceName,
'schema-version': '1.0',
},
timestamp: String(Date.now()),
}],
});
}
private getCityCode(event: PropertyEvent): string {
const payload = event.payload as PropertyListingPayload;
return payload.location?.city?.toLowerCase().replace(/\s/g, '-') ?? 'unknown';
}
}
Apache Flink: Real-Time Market Indicators
Apache Flink is the most powerful stream processing framework for processing real estate event streams and computing market indicators in sliding time windows.
# PyFlink: Price Index calculation per geographic area in real-time
from pyflink.table import StreamTableEnvironment, EnvironmentSettings
settings = EnvironmentSettings.new_instance().in_streaming_mode().build()
tenv = StreamTableEnvironment.create(env, settings)
# Source definition
tenv.execute_sql("""
CREATE TABLE property_events (
event_id STRING,
event_type STRING,
city STRING,
h3_cell STRING,
price DOUBLE,
square_meters DOUBLE,
price_per_sqm DOUBLE AS (price / square_meters),
listing_type STRING,
property_type STRING,
event_time TIMESTAMP(3),
WATERMARK FOR event_time AS event_time - INTERVAL '30' SECOND
) WITH (
'connector' = 'kafka',
'topic' = 'property.events.enriched',
'properties.bootstrap.servers' = '






