What is dbt and Why Analytics Engineering is Revolutionizing the Data Stack
dbt brings software engineering to SQL: versioning with Git, automatic testing, documentation auto-generated and deployment pipeline. Find out what analytics engineering is, the role of the analytics engineer and why dbt has become the standard for data transformation on Snowflake, BigQuery and Redshift.
The Problem that dbt Solves
Imagine a data warehouse full of SQL queries written by different analysts over the course of three years.
Hundreds of called views final_v3_DEFINITIVO_bis, calculation logics duplicated in
ten different places, no tests to ensure data quality, and zero documentation
about what that column really means net_revenue_adjusted.
This is the starting point for most teams given in 2026. And it is exactly the problem that dbt (data build tool) it was designed to solve.
dbt is not a new ETL tool, it is not a pipeline orchestrator and it is not a data warehouse. It's a tool transformation: takes data already loaded into the warehouse and transforms them into ready-to-consume analytical models, using SQL and practices of software engineering.
ELT vs ETL: Why the World Has Changed
For decades the dominant paradigm was ETL: Extract, Transform, Load. The data was extracted from the sources, transformed into an intermediate layer (often on dedicated servers) and then loaded into the data warehouse already in the final form.
The advent of cloud data warehouses such as Snowflakes, BigQuery e Amazon Redshift he turned this logic on its head. These systems have computing power practically unlimited and very low storage costs. It no longer makes sense to transform data Before to load them: it is better to load them raw and transform them After, directly into the warehouse. Thus ELT was born.
ELT vs ETL: the Fundamental Difference
- ETL: transformation outside the warehouse (dedicated server, Spark, Talend) — expensive, rigid, difficult to debug
- ELT: Transformation inside the warehouse (native SQL) — scalable, cost-effective, easier to test and document
- dbt is the ELT tool par excellence: it handles the T of ELT with software engineering
What is dbt in Concrete
dbt is a command line (and cloud) tool that allows you to:
- Write SQL transformations as Models versioned in Git
- Define automatic dependencies between models with the macro
ref() - Add quality tests to the data directly in the code
- Generate automatically documentation and lineage graph
- Do all this in one CI/CD pipeline
A dbt template is simply a SQL file with a SELECT. dbt materializes it in the warehouse as view, table, or incremental table, depending on your configuration.
-- models/marts/finance/orders_daily.sql
-- dbt materializza questo come una tabella nel warehouse
SELECT
DATE_TRUNC('day', created_at) AS order_date,
SUM(total_amount) AS revenue,
COUNT(*) AS order_count,
AVG(total_amount) AS avg_order_value
FROM {{ ref('stg_orders') }} -- ref() crea la dipendenza
WHERE status = 'completed'
GROUP BY 1
Note the {{ ref('stg_orders') }}: dbt knows that this model depends
from stg_orders, automatically builds the dependency graph (DAG) and guarantees
that the patterns run in the correct order.
The Role of the Analytics Engineer
The rise of dbt has created a new professional figure: theanalytics engineer. It is the profile halfway between data analyst and data engineer:
- He knows SQL as deeply as an analyst
- Apply software engineering practices (Git, testing, CI/CD) like an engineer
- Builds and maintains the data warehouse transformation layer
- Collaborate with data analysts (who consume the models) and data engineers (who manage the ingestion pipelines)
According to the State of Data Engineering 2025, the role of analytics engineer has grown by 340% in the last three years on LinkedIn, making it one of the most searched profiles in the data world.
The dbt Ecosystem in 2026
dbt is not just a tool: it has become a complete ecosystem with two main distributions:
dbt Core
The version open-source and free of dbt. It is installed via pip, it works from line command and integrates with any CI/CD system. Supports all major data warehouses: Snowflake, BigQuery, Redshift, Databricks, DuckDB, PostgreSQL and many more via adapter.
dbt Cloud
The managed service of dbt Labs, the company behind dbt. Adds a web IDE, scheduler integrated, monitoring, dbt Explorer (advanced lineage) and the dbt Semantic Layer for consistent metrics. Available with a free plan for developers and $100/month enterprise plans for teams.
The Pillars of the dbt Project
A dbt project has a standard structure that brings order where there was previously chaos:
jaffle_shop/ # nome del progetto
├── dbt_project.yml # configurazione del progetto
├── profiles.yml # credenziali di connessione (locale)
├── models/
│ ├── staging/ # modelli vicini alla sorgente
│ │ ├── stg_orders.sql
│ │ ├── stg_customers.sql
│ ├── marts/ # modelli business-oriented
│ ├── finance/
│ │ └── orders_daily.sql
│ └── marketing/
│ └── customer_ltv.sql
├── tests/ # test SQL personalizzati
├── seeds/ # CSV statici come reference data
├── macros/ # funzioni SQL riutilizzabili
└── analyses/ # query ad hoc (non materializzate)
This tiered structure (staging → intermediate → marts) is a best practice dbt call layered architecture:
- Staging: 1-to-1 with sources, just rename and cast types
- Intermediate: complex joins and aggregations between staging models
- Marts: Final models ready for consumption by BI and analytics
Testing and Documentation: dbt's Killer Features
Two dbt features that completely transform the data development cycle:
Automatic Testing
Define tests directly in the YAML file that accompanies each template:
# models/staging/schema.yml
models:
- name: stg_orders
columns:
- name: order_id
tests:
- unique
- not_null
- name: status
tests:
- accepted_values:
values: ['completed', 'pending', 'cancelled']
- name: customer_id
tests:
- not_null
- relationships:
to: ref('stg_customers')
field: customer_id
With dbt test, dbt runs these tests as SQL queries in the warehouse and fails the pipeline
if something is wrong. Zero Python code, zero external frameworks.
Self-Generated Documentation
Add descriptions in the same YAML file:
# models/marts/schema.yml
models:
- name: orders_daily
description: "Aggregazione giornaliera degli ordini completati"
columns:
- name: order_date
description: "Data dell'ordine, troncata al giorno"
- name: revenue
description: "Totale ricavi per il giorno, al netto dei resi"
dbt docs generate && dbt docs serve generates a static site with documentation
complete and the interactive DAG showing dependencies between all models. Each column is
documented and traceable back to the source.
dbt in the Modern Data Stack
dbt is typically the T layer of the stack Modern ELT:
- Ingestion: Airbyte, Fivetran, Stitch or Python scripts load raw data into the warehouse
- Storage: Snowflake, BigQuery, Redshift, Databricks Delta Lake
- Transformation: dbt transforms raw data into analytical models
- BI & Analytics: Looker, Metabase, Tableau or notebooks consume dbt models
In 2026, dbt is present in the data stack of over 30,000 companies (dbt Labs data), from startups to giants like GitLab, JetBlue, Conde Nast and Shopify.
Why Learn DBT in 2026
Concrete reasons to invest time in learning DBT:
- Market demand: “dbt” appears in 45% of data engineer job postings and analytics engineer on LinkedIn in Europe
- Productivity: dbt teams report a 60-70% reduction in time spent to debugging data pipelines thanks to automatic tests
- Ecosystem: Over 200 dbt packages on dbt Hub, including dbt-utils (the library standard) and dbt-expectations for advanced tests
- Observability: native integration with Elementary, re_data and tools data observability for quality monitoring in production
dbt Is Not For Everyone
dbt makes sense if you already have a cloud data warehouse and work with SQL as your primary language. It is not a replacement for Spark for massive transformations on unstructured data, nor is it suitable a real-time streaming pipeline (for those use Flink, Kafka Streams or Spark Structured Streaming). The sweet spot of dbt is batch analytics on SQL warehouse.
Conclusions and Next Steps
dbt has redefined how data teams build and maintain their analytics pipelines, bringing the mature practices of software engineering — testing, versioning, documentation, CI/CD — in a world that has lived without them for decades.
The key takeaway: dbt is not a niche tool for large enterprises. It's accessible, open-source, and even runs on a single laptop with DuckDB as the backend. The learning curve is low for those who know SQL, the reward is high.
In the next article in the series we will set up a dbt Core project from scratch: we will install dbt, we will write
the profiles.yml, we will create our first models and see the ref() macro in action
on a real dataset.







