dbt Core Setup: Project Structure, Connection Profiles and First SQL Transformation
From command line to first transformation: Configure dbt Core with profiles.yml for BigQuery, Snowflake or PostgreSQL, explore the project structure (models, tests, seeds, macros) and discover the ref() macro that creates automatic dependencies between models.
Prerequisites and Installation
dbt Core requires Python 3.8+ and installs via pip. The important thing is to install not only the basic package but also thespecific adapter for your warehouse. Each warehouse has its own adapter, because SQL dialects differ.
# Installa dbt Core con l'adapter per PostgreSQL (ottimo per iniziare in locale)
pip install dbt-postgres
# Per BigQuery
pip install dbt-bigquery
# Per Snowflake
pip install dbt-snowflake
# Per DuckDB (ideale per sviluppo locale senza infrastruttura)
pip install dbt-duckdb
# Verifica l'installazione
dbt --version
# Core:
# - installed: 1.9.0
# - latest: 1.9.0 - Up to date!
Get started with DuckDB for Local Development
If you don't have immediate access to BigQuery or Snowflake, DuckDB it's the way quicker to get started: it is an embedded database that runs directly locally, requires no infrastructure and dbt-duckdb works on any machine. Perfect for learning without costs.
Create the First dbt Project
The command dbt init creates the folder structure of a dbt project and guides you
in the initial configuration:
dbt init jaffle_shop
# dbt chiederà:
# 1. Quale database vuoi usare? (postgres/bigquery/snowflake/...)
# 2. [Per postgres] host, port, user, password, database, schema
# La struttura creata:
jaffle_shop/
├── dbt_project.yml # configurazione principale
├── README.md
├── analyses/ # query ad hoc (non materializzate)
├── macros/ # funzioni Jinja riutilizzabili
├── models/
│ └── example/ # modelli di esempio (da eliminare)
├── seeds/ # CSV statici
├── snapshots/ # snapshot per SCD
└── tests/ # test SQL singolari
The dbt_project.yml file
The heart of the project configuration is dbt_project.yml. Define the here
project name, dbt version, directory paths and global configuration
of the models:
# dbt_project.yml
name: 'jaffle_shop'
version: '1.0.0'
# Versione minima di dbt richiesta
require-dbt-version: ">=1.8.0"
# Percorso del profile da usare (in profiles.yml)
profile: 'jaffle_shop'
# Directory dei modelli
model-paths: ["models"]
analysis-paths: ["analyses"]
test-paths: ["tests"]
seed-paths: ["seeds"]
macro-paths: ["macros"]
snapshot-paths: ["snapshots"]
# Dove salvare i log e i target compilati
target-path: "target"
log-path: "logs"
clean-targets: ["target", "dbt_packages"]
# Configurazione dei modelli per directory
models:
jaffle_shop:
# Tutti i modelli del progetto sono view per default
+materialized: view
staging:
# I modelli in staging/ sono sempre view
+materialized: view
+schema: staging # savedano in schema 'staging'
marts:
+materialized: table # I marts sono table per performance
+schema: marts
The profiles.yml file
The connection credentials go in ~/.dbt/profiles.yml (in the home directory,
never in the repository). Each profile can have multiple targets (dev, staging, prod):
# ~/.dbt/profiles.yml
jaffle_shop:
target: dev # target di default
outputs:
dev:
type: postgres
host: localhost
port: 5432
user: "{{ env_var('DB_USER') }}" # usa variabili d'ambiente
password: "{{ env_var('DB_PASSWORD') }}"
dbname: jaffle_shop_dev
schema: dbt_dev_federico # schema personale per sviluppo
threads: 4
prod:
type: postgres
host: "{{ env_var('PROD_DB_HOST') }}"
port: 5432
user: "{{ env_var('PROD_DB_USER') }}"
password: "{{ env_var('PROD_DB_PASSWORD') }}"
dbname: jaffle_shop_prod
schema: dbt_prod
threads: 8
The convention is that each developer has a personal pattern (e.g. dbt_dev_federico)
to avoid conflicts during parallel development.
dbt models: The Fundamental Unit
A dbt model is a file .sql in the directory models/. The content
it's a simple SELECT — dbt takes care of creating the corresponding view or table in the warehouse.
Let's start with a realistic example. Suppose we have a table raw.orders in
warehouse with raw data from the application:
-- models/staging/stg_orders.sql
-- Staging model: rinomina, casta, pulisce — nessuna logica di business
WITH source AS (
SELECT * FROM {{ source('raw', 'orders') }} -- 'source' punta alla sorgente raw
),
renamed AS (
SELECT
id AS order_id,
user_id AS customer_id,
order_date AS created_at,
status,
CAST(amount AS DECIMAL(10,2)) AS total_amount,
LOWER(payment_method) AS payment_method,
_loaded_at AS loaded_at -- metadata pipeline
FROM source
WHERE id IS NOT NULL -- filtra record corrotti
)
SELECT * FROM renamed
The ref() Macro
The macro ref() it is the most important feature of dbt. When you write
{{ ref('stg_orders') }}, dbt:
- Resolves the correct table/view name in the warehouse for the current environment
- Record the dependency in the directed acyclic graph (DAG)
- Ensures that the dependent model runs first
-- models/marts/finance/orders_monthly.sql
-- Usa ref() per dipendere da stg_orders
WITH orders AS (
SELECT * FROM {{ ref('stg_orders') }} -- dbt risolve automaticamente lo schema
),
monthly_aggregated AS (
SELECT
DATE_TRUNC('month', created_at) AS month,
payment_method,
COUNT(*) AS order_count,
SUM(total_amount) AS gross_revenue,
AVG(total_amount) AS avg_order_value,
COUNT(DISTINCT customer_id) AS unique_customers
FROM orders
WHERE status = 'completed'
GROUP BY 1, 2
)
SELECT * FROM monthly_aggregated
The source() Macro
To access raw sources (tables not created by dbt), use source() instead of
ref(). The sources must be declared in a file sources.yml:
# models/staging/sources.yml
version: 2
sources:
- name: raw # nome del source group
database: raw_db # database nel warehouse
schema: public # schema nel warehouse
tables:
- name: orders
description: "Ordini grezzi dall'applicazione"
loaded_at_field: _loaded_at # campo per freshness check
freshness:
warn_after: {count: 12, period: hour}
error_after: {count: 24, period: hour}
- name: customers
description: "Clienti grezzi dall'applicazione"
With this setup, you can run dbt source freshness to verify that
the sources are updated before starting the transformations.
Run dbt: the Basic Commands
# Esegui tutti i modelli (materialization nel warehouse)
dbt run
# Esegui solo i modelli staging
dbt run --select staging
# Esegui un singolo modello e tutte le sue dipendenze (+)
dbt run --select +orders_monthly
# Esegui tutti i test definiti nello schema YAML
dbt test
# Testa solo un modello specifico
dbt test --select stg_orders
# Compila i modelli senza eseguirli (utile per debug)
dbt compile
# Verifica freshness delle sorgenti
dbt source freshness
# Genera e serve la documentazione
dbt docs generate
dbt docs serve # apre http://localhost:8080
Recommended Template Structure
The three-tier structure recommended by the dbt community:
models/
├── staging/ # Layer 1: vicino alla sorgente
│ ├── sources.yml # dichiarazione sorgenti
│ ├── schema.yml # test + documentazione
│ ├── stg_orders.sql
│ ├── stg_customers.sql
│ └── stg_products.sql
├── intermediate/ # Layer 2: join complessi (opzionale)
│ ├── int_orders_enriched.sql # join ordini + clienti
└── marts/ # Layer 3: pronti per consumo
├── finance/
│ ├── schema.yml
│ ├── orders_monthly.sql
│ └── revenue_by_country.sql
└── marketing/
└── customer_cohorts.sql
Anti-pattern: Business Logic in Staging
Staging templates need to do only "dumb" operations: rename columns, cast types, deduplicate. The business logic (calculations, aggregations, joins) goes into the intermediate models or marts. If your staging model has a GROUP BY or more than a couple of calculated columns, you're probably doing too much in that layer.
Checking the Setup with dbt debug
Before running dbt run, verify that the warehouse connection is working:
dbt debug
# Output atteso:
# Configuration:
# profiles.yml file [OK found and valid]
# dbt_project.yml file [OK found and valid]
# Required dependencies:
# - git [OK found]
# Connection:
# host: localhost
# port: 5432
# user: federico
# database: jaffle_shop_dev
# schema: dbt_dev_federico
# [OK connection ok]
Conclusions and Next Steps
We have configured a working dbt project with the three-level structure (staging →
intermediate → marts), the connection profiles for different environments, the declared sources
with freshness check and the first models with ref() e source().
The next step is to make our SQL dynamic with Jinja: variables, loops, conditions and reusable macros that eliminate code duplication in the warehouse.







