Scalable Real Estate Platform Architecture
Modern real estate platforms like Zillow, Idealista, Immobiliare.it, and Rightmove manage millions of listings, billions of monthly searches, and real-time data streams from agencies, MLS (Multiple Listing Services), and private owners. Designing a system capable of sustaining this load requires precise architectural decisions at every level of the stack: from data modeling to geospatial search, from the media pipeline to the real-time notification system.
In this article, we will build the complete architecture of a scalable PropTech platform, analyzing each component with code examples in TypeScript and Python, technology comparisons, and production-proven design patterns.
What You Will Learn
- Microservices architecture for real estate platforms with domain-driven decomposition
- Complete data model: properties, listings, agents, transactions, media
- Advanced geospatial search with Elasticsearch, PostGIS, and H3
- Listing ingestion pipeline from heterogeneous sources (MLS/RESO Web API, scraping, XML feeds)
- Media pipeline: image processing, floor plans, 3D virtual tours
- Real-time features: notifications, messaging, price alerts
- Performance strategies at scale: caching, CDN, sharding, CQRS
- Compliance: GDPR, fair housing, regional regulations
The Real Estate Platform Landscape
Before designing the architecture, it is essential to understand the competitive landscape and business models that drive technical decisions. Each platform has a unique feature mix that directly influences infrastructure choices.
| Platform | Market | Active Listings | Distinctive Feature | Known Stack |
|---|---|---|---|---|
| Zillow | USA | ~135M properties | Zestimate (ML valuation) | Java, Kafka, Elasticsearch |
| Idealista | EU (ES, IT, PT) | ~1.8M listings | Interactive map search | Java, Solr, PostgreSQL |
| Immobiliare.it | Italy | ~1.2M listings | Automated valuation, heatmap | PHP/Go, Elasticsearch |
| Rightmove | UK | ~1M listings | Draw-a-search (area drawing) | .NET, SQL Server, Azure |
| Redfin | USA | ~100M properties | Integrated agents, 3D tours | Java, React, Kafka |
Monetization Models and Technical Impact
The business model directly influences architecture. A freemium platform with sponsored listings requires an ad-serving engine, A/B testing, and impression tracking. A lead-based model requires intelligent contact routing to agents and integrated CRM. A transactional model (iBuying) implies ML valuation pipelines, financial management, and advanced regulatory compliance.
- Sponsored listings: requires an ad ranking engine, bidding system, and ROI analytics
- Agency subscriptions: tier management, recurring billing, agent dashboards
- Lead generation: lead scoring, intelligent routing, attribution tracking
- Transactional (iBuying): ML valuation models, offer management, legal pipeline
- SaaS for agencies: multi-tenancy, white-label, CRM integration APIs
High-Level Architecture
A scalable real estate platform adopts a domain-oriented microservices architecture, where each service owns its database and communicates through asynchronous events. This approach enables independent scaling of high-load components (typically search and public APIs) without impacting less-stressed services.
Guiding Principle: Strangler Fig Pattern
Most real estate platforms start as monoliths. Migration to microservices happens gradually through the Strangler Fig Pattern: high-load services are extracted first (search, media), followed progressively by the rest, while keeping the monolith operational throughout the entire transition.
Microservices Decomposition
| Service | Responsibility | Database | Pattern |
|---|---|---|---|
| Listing Service | Listing CRUD, validation, publication workflow | PostgreSQL + PostGIS | CQRS, Event Sourcing |
| Search Service | Full-text search, filters, geospatial, faceted | Elasticsearch/OpenSearch | Read Model (CQRS) |
| User Service | Authentication, profiles, preferences, saved | PostgreSQL | OAuth 2.0 / OIDC |
| Agent Service | Agent profiles, agencies, ratings, availability | PostgreSQL | Domain Service |
| Media Service | Upload, processing, optimization, CDN | S3 + DynamoDB (metadata) | Async Pipeline |
| Messaging Service | Agent-user chat, info requests, notifications | MongoDB / Cassandra | WebSocket + Event Driven |
| Notification Service | Email, push, SMS, price alerts, new listings | Redis + PostgreSQL | Fan-out, Template Engine |
| Analytics Service | View tracking, heatmaps, agent reports | ClickHouse / BigQuery | Event Streaming |
| Ingestion Service | Import from MLS, XML feeds, external APIs | Staging DB + Queue | ETL Pipeline |
| Valuation Service | ML price estimation, comparables, market trends | Feature Store + Model Registry | ML Pipeline |
Architecture Diagram
+------------------+
| API Gateway |
| (Kong / Envoy) |
+--------+---------+
|
+--------------------+--------------------+
| | |
+--------v------+ +--------v------+ +--------v------+
| Listing | | Search | | User |
| Service | | Service | | Service |
| (PostgreSQL) | | (Elasticsearch)| | (PostgreSQL) |
+--------+------+ +---------------+ +--------+------+
| |
| Events (Kafka / NATS) |
+--------------------+-------------------+
| | |
+--------v------+ +--------v------+ +--------v------+
| Media | | Messaging | | Notification |
| Service | | Service | | Service |
| (S3 + CDN) | | (MongoDB) | | (Redis) |
+---------------+ +---------------+ +---------------+
|
+--------v------+ +---------------+
| Ingestion | | Valuation |
| Service | | Service |
| (ETL Pipeline)| | (ML Models) |
+---------------+ +---------------+
Data Model: The Platform's Core
The data model of a real estate platform is surprisingly complex. A single physical property can have multiple listings over time, be managed by different agents, undergo transactions, and generate hundreds of media assets. Designing these relationships correctly is critical for performance and data consistency.
Core Entity Schema
// Base entity: Physical property
interface Property {
readonly id: string;
readonly externalId?: string; // ID from MLS/external source
readonly source: DataSource;
readonly propertyType: PropertyType;
readonly address: Address;
readonly location: GeoPoint; // lat/lng for spatial search
readonly h3Index: string; // H3 cell index (resolution 9)
readonly characteristics: PropertyCharacteristics;
readonly amenities: ReadonlyArray<Amenity>;
readonly energyRating?: EnergyRating;
readonly cadastralRef?: string; // Cadastral reference (IT)
readonly createdAt: Date;
readonly updatedAt: Date;
}
type PropertyType =
| 'apartment' | 'house' | 'villa' | 'penthouse'
| 'studio' | 'loft' | 'commercial' | 'land'
| 'garage' | 'office' | 'warehouse';
interface Address {
readonly street: string;
readonly streetNumber: string;
readonly city: string;
readonly province: string;
readonly region: string;
readonly postalCode: string;
readonly country: string;
readonly formattedAddress: string;
readonly neighborhood?: string;
}
interface GeoPoint {
readonly lat: number;
readonly lng: number;
}
interface PropertyCharacteristics {
readonly sqMeters: number;
readonly rooms: number;
readonly bedrooms: number;
readonly bathrooms: number;
readonly floor?: number;
readonly totalFloors?: number;
readonly hasElevator?: boolean;
readonly hasBalcony?: boolean;
readonly hasTerrace?: boolean;
readonly hasGarden?: boolean;
readonly gardenSqMeters?: number;
readonly yearBuilt?: number;
readonly condition: PropertyCondition;
readonly heatingType?: HeatingType;
readonly orientation?: ReadonlyArray<Orientation>;
}
// Listing: a sale/rental instance
interface Listing {
readonly id: string;
readonly propertyId: string;
readonly agentId: string;
readonly agencyId?: string;
readonly listingType: 'sale' | 'rent' | 'auction';
readonly status: ListingStatus;
readonly price: Money;
readonly pricePerSqMeter: Money;
readonly condominiumFees?: Money;
readonly description: LocalizedText;
readonly media: ReadonlyArray<MediaAsset>;
readonly virtualTourUrl?: string;
readonly availableFrom?: Date;
readonly publishedAt?: Date;
readonly expiresAt?: Date;
readonly viewCount: number;
readonly favoriteCount: number;
readonly contactCount: number;
}
interface Money {
readonly amount: number;
readonly currency: CurrencyCode;
}
type CurrencyCode = 'EUR' | 'USD' | 'GBP' | 'CHF';
interface LocalizedText {
readonly [locale: string]: string; // { it: "...", en: "..." }
}
type ListingStatus =
| 'draft' | 'pending_review' | 'active'
| 'under_offer' | 'sold' | 'rented'
| 'expired' | 'withdrawn';
Key aspects of this model:
- Property/Listing separation: a physical property exists independently from listings. The same apartment can be put up for sale, then withdrawn, then rented, generating distinct listings linked to the same entity.
-
Immutability: all interfaces use
readonlyto prevent accidental mutations. State transitions produce new objects. - H3 Index: pre-computed at insertion time to enable efficient geospatial searches via hexagonal aggregation.
-
Multi-currency and multi-language: native support through
MoneyandLocalizedTextfor international markets.
Listing Ingestion Pipeline
The beating heart of a real estate platform is the data ingestion pipeline. Listings arrive from heterogeneous sources: RESO Web API feeds (US/international standard), proprietary XML feeds, agency APIs, manual uploads, and partner portal scraping. Each source has its own format, update frequency, and data quality level.
RESO Web API: The Industry Standard
The RESO (Real Estate Standards Organization) Web API is the modern standard for MLS data exchange, built on REST/OData with JSON payloads. It replaces the legacy RETS (now deprecated). Data Dictionary 2.x defines standard field names (ListingId, StandardStatus, ListPrice, LivingArea), ensuring interoperability between systems.
ETL Pipeline Architecture
import { EventEmitter } from 'events';
// Generic interface for data sources
interface ListingSource {
readonly sourceId: string;
readonly sourceType: 'reso_api' | 'xml_feed' | 'manual' | 'scraper';
fetch(since: Date): Promise<ReadonlyArray<RawListing>>;
}
// Raw data from any source
interface RawListing {
readonly externalId: string;
readonly source: string;
readonly rawData: Record<string, unknown>;
readonly fetchedAt: Date;
}
// Ingestion pipeline with clear steps
class ListingIngestionPipeline {
private readonly events = new EventEmitter();
constructor(
private readonly sources: ReadonlyArray<ListingSource>,
private readonly normalizer: ListingNormalizer,
private readonly validator: ListingValidator,
private readonly deduplicator: ListingDeduplicator,
private readonly enricher: ListingEnricher,
private readonly repository: ListingRepository,
private readonly searchIndex: SearchIndexer,
) {}
async ingest(source: ListingSource): Promise<IngestionResult> {
const startTime = Date.now();
const result: IngestionResult = {
sourceId: source.sourceId,
processed: 0,
created: 0,
updated: 0,
skipped: 0,
errors: [],
};
// Step 1: Fetch raw data
const rawListings = await source.fetch(
await this.getLastSyncTime(source.sourceId)
);
for (const raw of rawListings) {
try {
// Step 2: Normalization (source format -> internal format)
const normalized = this.normalizer.normalize(raw);
// Step 3: Validation (required fields, ranges, format)
const validation = this.validator.validate(normalized);
if (!validation.isValid) {
result.skipped++;
result.errors.push({
externalId: raw.externalId,
errors: validation.errors,
});
continue;
}
// Step 4: Deduplication (match by address, coordinates, external ID)
const existingId = await this.deduplicator.findDuplicate(normalized);
// Step 5: Enrichment (geocoding, H3 index, neighborhood)
const enriched = await this.enricher.enrich(normalized);
// Step 6: Persistence
if (existingId) {
await this.repository.update(existingId, enriched);
result.updated++;
} else {
await this.repository.create(enriched);
result.created++;
}
// Step 7: Search indexing (async)
this.events.emit('listing:upserted', enriched);
result.processed++;
} catch (error) {
result.errors.push({
externalId: raw.externalId,
errors: [String(error)],
});
}
}
// Update last sync timestamp
await this.updateLastSyncTime(source.sourceId, new Date());
this.events.emit('ingestion:completed', {
...result,
durationMs: Date.now() - startTime,
});
return result;
}
private async getLastSyncTime(sourceId: string): Promise<Date> {
// Retrieve last sync timestamp from DB
return new Date(Date.now() - 24 * 60 * 60 * 1000); // fallback: 24h ago
}
private async updateLastSyncTime(sourceId: string, time: Date): Promise<void> {
// Persist timestamp for next execution
}
}
interface IngestionResult {
readonly sourceId: string;
processed: number;
created: number;
updated: number;
skipped: number;
errors: Array<{ externalId: string; errors: string[] }>;
}
Normalization: From RESO to Internal Format
Normalization is the most critical step in the pipeline. Every data source has a different format, and normalization must map heterogeneous fields into a uniform model. Here is an example of a normalizer for the RESO format:
class ResoListingNormalizer implements ListingNormalizer {
normalize(raw: RawListing): NormalizedListing {
const data = raw.rawData as ResoPropertyData;
return {
externalId: String(data.ListingId),
source: raw.source,
propertyType: this.mapPropertyType(data.PropertyType),
listingType: this.mapListingType(data.TransactionType),
status: this.mapStatus(data.StandardStatus),
price: {
amount: data.ListPrice,
currency: data.CurrencyCode ?? 'USD',
},
address: {
street: data.StreetName,
streetNumber: data.StreetNumber,
city: data.City,
province: data.StateOrProvince,
postalCode: data.PostalCode,
country: data.Country ?? 'US',
formattedAddress: this.buildFormattedAddress(data),
},
location: data.Latitude && data.Longitude
? { lat: data.Latitude, lng: data.Longitude }
: undefined,
characteristics: {
sqMeters: this.sqFeetToSqMeters(data.LivingArea),
rooms: data.RoomsTotal ?? 0,
bedrooms: data.BedroomsTotal ?? 0,
bathrooms: data.BathroomsTotalInteger ?? 0,
yearBuilt: data.YearBuilt,
},
description: {
en: data.PublicRemarks ?? '',
},
photos: (data.Media ?? []).map((m: ResoMedia) => ({
url: m.MediaURL,
order: m.Order,
caption: m.ShortDescription,
})),
rawData: raw.rawData,
fetchedAt: raw.fetchedAt,
};
}
private mapPropertyType(resoType: string): PropertyType {
const mapping: Record<string, PropertyType> = {
'Residential': 'house',
'Condominium': 'apartment',
'Townhouse': 'house',
'Land': 'land',
'Commercial': 'commercial',
};
return mapping[resoType] ?? 'apartment';
}
private mapStatus(resoStatus: string): ListingStatus {
const mapping: Record<string, ListingStatus> = {
'Active': 'active',
'Pending': 'under_offer',
'Closed': 'sold',
'Withdrawn': 'withdrawn',
'Expired': 'expired',
};
return mapping[resoStatus] ?? 'draft';
}
private sqFeetToSqMeters(sqFeet?: number): number {
return sqFeet ? Math.round(sqFeet * 0.092903 * 100) / 100 : 0;
}
private mapListingType(txType: string): 'sale' | 'rent' {
return txType === 'Lease' ? 'rent' : 'sale';
}
private buildFormattedAddress(data: ResoPropertyData): string {
return [data.StreetNumber, data.StreetName, data.City, data.StateOrProvince]
.filter(Boolean)
.join(', ');
}
}
Search Architecture: Elasticsearch and Geospatial
Search is the most critical feature of any real estate platform. Users expect instant results, combinable filters, relevance/price/distance sorting, and map search with real-time updates. Elasticsearch (or its fork OpenSearch) is the dominant choice in the industry thanks to native support for full-text, geospatial, and faceted search.
Elasticsearch Listing Index
{
"mappings": {
"properties": {
"listingId": { "type": "keyword" },
"propertyType": { "type": "keyword" },
"listingType": { "type": "keyword" },
"status": { "type": "keyword" },
"price": { "type": "long" },
"pricePerSqm": { "type": "float" },
"currency": { "type": "keyword" },
"location": { "type": "geo_point" },
"geoShape": { "type": "geo_shape" },
"h3Index": { "type": "keyword" },
"h3Res7": { "type": "keyword" },
"city": { "type": "keyword" },
"neighborhood": { "type": "keyword" },
"province": { "type": "keyword" },
"postalCode": { "type": "keyword" },
"sqMeters": { "type": "integer" },
"rooms": { "type": "integer" },
"bedrooms": { "type": "integer" },
"bathrooms": { "type": "integer" },
"floor": { "type": "integer" },
"yearBuilt": { "type": "integer" },
"hasElevator": { "type": "boolean" },
"hasBalcony": { "type": "boolean" },
"hasGarden": { "type": "boolean" },
"energyRating": { "type": "keyword" },
"amenities": { "type": "keyword" },
"description": { "type": "text", "analyzer": "multilingual_analyzer" },
"title": { "type": "text", "analyzer": "multilingual_analyzer",
"fields": { "keyword": { "type": "keyword" } } },
"agentId": { "type": "keyword" },
"agencyId": { "type": "keyword" },
"publishedAt": { "type": "date" },
"updatedAt": { "type": "date" },
"viewCount": { "type": "integer" },
"photoCount": { "type": "integer" },
"hasVirtualTour": { "type": "boolean" }
}
},
"settings": {
"number_of_shards": 5,
"number_of_replicas": 1,
"analysis": {
"analyzer": {
"multilingual_analyzer": {
"type": "custom",
"tokenizer": "standard",
"filter": ["lowercase", "asciifolding", "stop_it", "stop_en"]
}
},
"filter": {
"stop_it": { "type": "stop", "stopwords": "_italian_" },
"stop_en": { "type": "stop", "stopwords": "_english_" }
}
}
}
}
Search API with Geospatial Filters
The search query combines boolean filters, numeric ranges, full-text search, and geospatial constraints. We support three geographic search modes: geo_distance (radius from a point), geo_bounding_box (rectangle on the map), and geo_shape (user-drawn polygon, like Rightmove's "Draw-a-search").
interface PropertySearchParams {
readonly query?: string;
readonly listingType: 'sale' | 'rent';
readonly propertyTypes?: ReadonlyArray<PropertyType>;
readonly priceMin?: number;
readonly priceMax?: number;
readonly sqMetersMin?: number;
readonly sqMetersMax?: number;
readonly bedroomsMin?: number;
readonly bathroomsMin?: number;
readonly amenities?: ReadonlyArray<string>;
readonly geoFilter?: GeoFilter;
readonly sortBy?: 'relevance' | 'price_asc' | 'price_desc' | 'newest' | 'distance';
readonly page?: number;
readonly pageSize?: number;
}
type GeoFilter =
| { type: 'radius'; center: GeoPoint; radiusKm: number }
| { type: 'bbox'; topLeft: GeoPoint; bottomRight: GeoPoint }
| { type: 'polygon'; points: ReadonlyArray<GeoPoint> };
class PropertySearchService {
constructor(private readonly esClient: ElasticsearchClient) {}
async search(params: PropertySearchParams): Promise<SearchResult> {
const must: any[] = [];
const filter: any[] = [];
// Required filter: listing type and active status
filter.push({ term: { listingType: params.listingType } });
filter.push({ term: { status: 'active' } });
// Full-text search (title + description)
if (params.query) {
must.push({
multi_match: {
query: params.query,
fields: ['title^3', 'description', 'city^2', 'neighborhood^2'],
type: 'best_fields',
fuzziness: 'AUTO',
},
});
}
// Price range filters
if (params.priceMin || params.priceMax) {
filter.push({
range: {
price: {
...(params.priceMin ? { gte: params.priceMin } : {}),
...(params.priceMax ? { lte: params.priceMax } : {}),
},
},
});
}
// Area filters
if (params.sqMetersMin || params.sqMetersMax) {
filter.push({
range: {
sqMeters: {
...(params.sqMetersMin ? { gte: params.sqMetersMin } : {}),
...(params.sqMetersMax ? { lte: params.sqMetersMax } : {}),
},
},
});
}
// Property type
if (params.propertyTypes?.length) {
filter.push({ terms: { propertyType: params.propertyTypes } });
}
// Minimum bedrooms
if (params.bedroomsMin) {
filter.push({ range: { bedrooms: { gte: params.bedroomsMin } } });
}
// Amenities (AND logic)
if (params.amenities?.length) {
for (const amenity of params.amenities) {
filter.push({ term: { amenities: amenity } });
}
}
// Geospatial filter
if (params.geoFilter) {
filter.push(this.buildGeoFilter(params.geoFilter));
}
const page = params.page ?? 0;
const pageSize = params.pageSize ?? 20;
const response = await this.esClient.search({
index: 'listings',
body: {
query: {
bool: {
must: must.length ? must : [{ match_all: {} }],
filter,
},
},
sort: this.buildSort(params.sortBy, params.geoFilter),
from: page * pageSize,
size: pageSize,
aggs: this.buildAggregations(),
},
});
return this.mapResponse(response, page, pageSize);
}
private buildGeoFilter(geo: GeoFilter): Record<string, unknown> {
switch (geo.type) {
case 'radius':
return {
geo_distance: {
distance: `






