Prerequisites and Installation

dbt Core requires Python 3.8+ and installs via pip. The important thing is to install not only the basic package but also thespecific adapter for your warehouse. Each warehouse has its own adapter, because SQL dialects differ.

# Installa dbt Core con l'adapter per PostgreSQL (ottimo per iniziare in locale)
pip install dbt-postgres

# Per BigQuery
pip install dbt-bigquery

# Per Snowflake
pip install dbt-snowflake

# Per DuckDB (ideale per sviluppo locale senza infrastruttura)
pip install dbt-duckdb

# Verifica l'installazione
dbt --version
# Core:
#   - installed: 1.9.0
#   - latest:    1.9.0 - Up to date!

Get started with DuckDB for Local Development

If you don't have immediate access to BigQuery or Snowflake, DuckDB it's the way quicker to get started: it is an embedded database that runs directly locally, requires no infrastructure and dbt-duckdb works on any machine. Perfect for learning without costs.

Create the First dbt Project

The command dbt init creates the folder structure of a dbt project and guides you in the initial configuration:

dbt init jaffle_shop

# dbt chiederà:
# 1. Quale database vuoi usare? (postgres/bigquery/snowflake/...)
# 2. [Per postgres] host, port, user, password, database, schema

# La struttura creata:
jaffle_shop/
├── dbt_project.yml          # configurazione principale
├── README.md
├── analyses/               # query ad hoc (non materializzate)
├── macros/                 # funzioni Jinja riutilizzabili
├── models/
│   └── example/            # modelli di esempio (da eliminare)
├── seeds/                  # CSV statici
├── snapshots/              # snapshot per SCD
└── tests/                  # test SQL singolari

The dbt_project.yml file

The heart of the project configuration is dbt_project.yml. Define the here project name, dbt version, directory paths and global configuration of the models:

# dbt_project.yml
name: 'jaffle_shop'
version: '1.0.0'

# Versione minima di dbt richiesta
require-dbt-version: ">=1.8.0"

# Percorso del profile da usare (in profiles.yml)
profile: 'jaffle_shop'

# Directory dei modelli
model-paths: ["models"]
analysis-paths: ["analyses"]
test-paths: ["tests"]
seed-paths: ["seeds"]
macro-paths: ["macros"]
snapshot-paths: ["snapshots"]

# Dove salvare i log e i target compilati
target-path: "target"
log-path: "logs"
clean-targets: ["target", "dbt_packages"]

# Configurazione dei modelli per directory
models:
  jaffle_shop:
    # Tutti i modelli del progetto sono view per default
    +materialized: view

    staging:
      # I modelli in staging/ sono sempre view
      +materialized: view
      +schema: staging          # savedano in schema 'staging'

    marts:
      +materialized: table      # I marts sono table per performance
      +schema: marts

The profiles.yml file

The connection credentials go in ~/.dbt/profiles.yml (in the home directory, never in the repository). Each profile can have multiple targets (dev, staging, prod):

# ~/.dbt/profiles.yml
jaffle_shop:
  target: dev                    # target di default

  outputs:
    dev:
      type: postgres
      host: localhost
      port: 5432
      user: "{{ env_var('DB_USER') }}"      # usa variabili d'ambiente
      password: "{{ env_var('DB_PASSWORD') }}"
      dbname: jaffle_shop_dev
      schema: dbt_dev_federico   # schema personale per sviluppo
      threads: 4

    prod:
      type: postgres
      host: "{{ env_var('PROD_DB_HOST') }}"
      port: 5432
      user: "{{ env_var('PROD_DB_USER') }}"
      password: "{{ env_var('PROD_DB_PASSWORD') }}"
      dbname: jaffle_shop_prod
      schema: dbt_prod
      threads: 8

The convention is that each developer has a personal pattern (e.g. dbt_dev_federico) to avoid conflicts during parallel development.

dbt models: The Fundamental Unit

A dbt model is a file .sql in the directory models/. The content it's a simple SELECT — dbt takes care of creating the corresponding view or table in the warehouse.

Let's start with a realistic example. Suppose we have a table raw.orders in warehouse with raw data from the application:

-- models/staging/stg_orders.sql
-- Staging model: rinomina, casta, pulisce — nessuna logica di business

WITH source AS (
    SELECT * FROM {{ source('raw', 'orders') }}   -- 'source' punta alla sorgente raw
),

renamed AS (
    SELECT
        id                                         AS order_id,
        user_id                                    AS customer_id,
        order_date                                 AS created_at,
        status,
        CAST(amount AS DECIMAL(10,2))              AS total_amount,
        LOWER(payment_method)                      AS payment_method,
        _loaded_at                                 AS loaded_at    -- metadata pipeline
    FROM source
    WHERE id IS NOT NULL                           -- filtra record corrotti
)

SELECT * FROM renamed

The ref() Macro

The macro ref() it is the most important feature of dbt. When you write {{ ref('stg_orders') }}, dbt:

  1. Resolves the correct table/view name in the warehouse for the current environment
  2. Record the dependency in the directed acyclic graph (DAG)
  3. Ensures that the dependent model runs first
-- models/marts/finance/orders_monthly.sql
-- Usa ref() per dipendere da stg_orders

WITH orders AS (
    SELECT * FROM {{ ref('stg_orders') }}          -- dbt risolve automaticamente lo schema
),

monthly_aggregated AS (
    SELECT
        DATE_TRUNC('month', created_at)            AS month,
        payment_method,
        COUNT(*)                                   AS order_count,
        SUM(total_amount)                          AS gross_revenue,
        AVG(total_amount)                          AS avg_order_value,
        COUNT(DISTINCT customer_id)                AS unique_customers
    FROM orders
    WHERE status = 'completed'
    GROUP BY 1, 2
)

SELECT * FROM monthly_aggregated

The source() Macro

To access raw sources (tables not created by dbt), use source() instead of ref(). The sources must be declared in a file sources.yml:

# models/staging/sources.yml
version: 2

sources:
  - name: raw                    # nome del source group
    database: raw_db             # database nel warehouse
    schema: public               # schema nel warehouse
    tables:
      - name: orders
        description: "Ordini grezzi dall'applicazione"
        loaded_at_field: _loaded_at    # campo per freshness check
        freshness:
          warn_after: {count: 12, period: hour}
          error_after: {count: 24, period: hour}
      - name: customers
        description: "Clienti grezzi dall'applicazione"

With this setup, you can run dbt source freshness to verify that the sources are updated before starting the transformations.

Run dbt: the Basic Commands

# Esegui tutti i modelli (materialization nel warehouse)
dbt run

# Esegui solo i modelli staging
dbt run --select staging

# Esegui un singolo modello e tutte le sue dipendenze (+)
dbt run --select +orders_monthly

# Esegui tutti i test definiti nello schema YAML
dbt test

# Testa solo un modello specifico
dbt test --select stg_orders

# Compila i modelli senza eseguirli (utile per debug)
dbt compile

# Verifica freshness delle sorgenti
dbt source freshness

# Genera e serve la documentazione
dbt docs generate
dbt docs serve                  # apre http://localhost:8080

Recommended Template Structure

The three-tier structure recommended by the dbt community:

models/
├── staging/                    # Layer 1: vicino alla sorgente
│   ├── sources.yml            # dichiarazione sorgenti
│   ├── schema.yml             # test + documentazione
│   ├── stg_orders.sql
│   ├── stg_customers.sql
│   └── stg_products.sql
├── intermediate/               # Layer 2: join complessi (opzionale)
│   ├── int_orders_enriched.sql # join ordini + clienti
└── marts/                      # Layer 3: pronti per consumo
    ├── finance/
    │   ├── schema.yml
    │   ├── orders_monthly.sql
    │   └── revenue_by_country.sql
    └── marketing/
        └── customer_cohorts.sql

Anti-pattern: Business Logic in Staging

Staging templates need to do only "dumb" operations: rename columns, cast types, deduplicate. The business logic (calculations, aggregations, joins) goes into the intermediate models or marts. If your staging model has a GROUP BY or more than a couple of calculated columns, you're probably doing too much in that layer.

Checking the Setup with dbt debug

Before running dbt run, verify that the warehouse connection is working:

dbt debug

# Output atteso:
# Configuration:
#   profiles.yml file [OK found and valid]
#   dbt_project.yml file [OK found and valid]
# Required dependencies:
#  - git [OK found]
# Connection:
#   host: localhost
#   port: 5432
#   user: federico
#   database: jaffle_shop_dev
#   schema: dbt_dev_federico
#   [OK connection ok]

Conclusions and Next Steps

We have configured a working dbt project with the three-level structure (staging → intermediate → marts), the connection profiles for different environments, the declared sources with freshness check and the first models with ref() e source().

The next step is to make our SQL dynamic with Jinja: variables, loops, conditions and reusable macros that eliminate code duplication in the warehouse.