안녕하세요!

Federico Calò

Sviluppatore Software | Divulgatore Tecnico

Creo applicazioni web moderne e strumenti digitali personalizzati per aiutare le attività a crescere attraverso l'innovazione tecnologica. La mia passione è unire informatica ed economia per generare valore reale.

연락하기

소개

La mia passione per l'informatica è nata tra i banchi dell'Istituto Tecnico Commerciale di Maglie, dove ho scoperto il potere della programmazione e il fascino di creare soluzioni digitali. Fin da subito, ho capito che l'informatica non era solo codice, ma uno strumento straordinario per trasformare idee in realtà.

Durante gli studi superiori in Sistemi Informativi Aziendali, ho iniziato a intrecciare informatica ed economia, comprendendo come la tecnologia possa essere il motore della crescita per qualsiasi attività. Questa visione mi ha accompagnato all'Università degli Studi di Bari, dove ho conseguito la Laurea in Informatica, approfondendo le mie competenze tecniche e la mia passione per lo sviluppo software.

Oggi metto questa esperienza al servizio di imprese, professionisti e startup, creando soluzioni digitali su misura che automatizzano processi, ottimizzano risorse e aprono nuove opportunità di business. Perché la vera innovazione inizia quando la tecnologia incontra le esigenze reali delle persone.

역량

Analisi Dati & Modelli Previsionali

Trasformo i dati in insights strategici con analisi approfondite e modelli predittivi per decisioni informate

프로세스 자동화

Creo strumenti personalizzati che automatizzano operazioni ripetitive e liberano tempo per attività a valore aggiunto

맞춤 시스템

Sviluppo sistemi software su misura, dalle integrazioni tra piattaforme alle dashboard personalizzate

const federico = {
  nome: "Federico Calò",
  ruolo: "Sviluppatore Software",
  città: "Bari, Italia",
  missione: "Aiutare attraverso l'informatica",
  passioni: [
    "Codice Pulito",
    "Innovazione",
    "Crescita Continua"
  ]
};

미션

Credo fermamente che l'informatica sia lo strumento più potente per trasformare le idee in realtà e migliorare la vita delle persone.

🚀

기술의 민주화

La mia missione è rendere l'informatica accessibile a tutti: dalle piccole imprese locali alle startup innovative, fino ai professionisti che vogliono digitalizzare la propria attività. Ogni realtà merita di sfruttare le potenzialità del digitale.

💡

IT와 비즈니스 통합

Non è solo questione di scrivere codice: è capire come la tecnologia possa generare valore reale. Intrecciando competenze informatiche e visione economica, aiuto le attività a crescere, ottimizzare processi e raggiungere nuovi traguardi di efficienza e redditività.

🎯

맞춤 솔루션

Ogni attività è unica, e così devono esserlo le soluzioni. Sviluppo strumenti personalizzati che rispondono alle esigenze specifiche di ciascun cliente, automatizzando processi ripetitivi e liberando tempo per ciò che conta davvero: far crescere il business.

기술로 비즈니스를 혁신하세요

Dicembre 2024

Visualizza

Master SQL

RoadMap.sh

Novembre 2024

Visualizza

Oracle Certified Foundations Associate

Oracle

Ottobre 2024

Visualizza

People Leadership Credential

Connect

Settembre 2024

💻 Linguaggi & Tecnologie

☕Java

🐍Python

📜JavaScript

🅰️Angular

⚛️React

🔷TypeScript

🗄️SQL

🐘PHP

🎨CSS/SCSS

🔧Node.js

🐳Docker

🌿Git

💼

12/2024 - Presente

Custom Software Engineering Analyst

Accenture

Bari, Puglia, Italia · Ibrida Analisi e sviluppo di sistemi informatici attraverso l'utilizzo di Java e Quarkus in Health and Public Sector. Formazione continua su tecnologie moderne per la creazione di soluzioni software personalizzate ed efficienti e sugli agenti.

💼

06/2022 - 12/2024

Analista software e Back End Developer Associate Consultant

Links Management and Technology SpA

Esperienza nell'analisi di sistemi software as-is e flussi ETL utilizzando PowerCenter. Formazione completata su Spring Boot per lo sviluppo di applicazioni backend moderne e scalabili. Sviluppatore Backend specializzato in Spring Boot, con esperienza in progettazione di database, analisi, sviluppo e testing dei task assegnati.

💼

02/2021 - 10/2021

Programmatore software

Adesso.it (prima era WebScience srl)

Esperienza nell'analisi AS-IS e TO-BE, evoluzioni SEO ed evoluzioni website per migliorare le performance e l'engagement degli utenti.

🎓

2018 - 2025

Laurea in Informatica

Università degli Studi di Bari Aldo Moro

Bachelor's degree in Computer Science, focusing on software engineering, algorithms, and modern development practices.

📚

2013 - 2018

Diploma - Sistemi Informativi Aziendali

Istituto Tecnico Commerciale di Maglie

Technical diploma specializing in Business Information Systems, combining IT knowledge with business management.

연락하기

프로젝트가 있으신가요? 아래 양식을 작성해 주시면 빠르게 답변드리겠습니다.

* Campi obbligatori. I tuoi dati saranno utilizzati solo per rispondere alla tua richiesta.

프로덕션에서의 신속한 엔지니어링: 템플릿, 버전 관리 및 테스트

Il 신속한 엔지니어링 종종 실험적 활동으로 취급됩니다. 뭔가를 시도하면 효과가 있고 앞으로 나아갑니다. 프로덕션에서는 이 접근 방식이 실패합니다. 체계적으로. 프롬프트를 변경하면 답변의 품질이 저하될 수 있습니다. 아무도 눈치 채지 못한 채. GPT-4 최적화 프롬프트는 결과를 제공할 수 있습니다. GPT-4o-mini에서는 매우 나쁩니다. 영어로 작동하는 템플릿이 이탈리아어에서는 실패할 수 있습니다.

이 기사에서는 신속한 엔지니어링을 다음과 같이 취급합니다. 공학 분야: 고급 기술(Chain-of-Thought, Few-Shot, Constitutional AI), 템플릿 시스템 변수 및 구성 포함, A/B 평가를 통한 신속한 버전 관리, 테스트 생산 시 자동 및 품질 모니터링. 각 섹션에는 코드가 포함되어 있습니다. 실제 시스템에서 테스트된 Python 실행 파일 및 패턴입니다.

무엇을 배울 것인가

고급 기술: 생각의 사슬, Few-Shot 학습, 생각의 나무
변수, 구성, 상속이 포함된 템플릿 시스템
성능 추적을 통한 신속한 버전 관리
통계적으로 유의미한 프롬프트의 A/B 테스트
안전한 출력을 위한 헌법적 AI와 가드레일
구조화된 출력에 대한 프롬프트(JSON, XML, Markdown)
LLM을 판사로 활용하여 자동화된 프롬프트 테스트
생산 단계의 프롬프트 품질 모니터링

1. 고급 프롬프트 기술

1.1 사고 사슬(CoT)

생각의 사슬 (Wei et al., 2022)은 가장 영향력 있는 기술입니다. 현대적인 프롬프트: 모델에게 주기 전에 "추론을 보여주도록" 요청합니다. 응답은 복잡한 문제에 대한 정확성을 크게 향상시킵니다.

사고 사슬과 표준 프롬프트


# STANDARD PROMPTING - scarsa accuratezza su ragionamento complesso
standard_prompt = """
Questo cliente deve pagare una fattura di 1200 euro.
Ha già pagato il 30%. Quanto deve ancora pagare?
"""
# Risposta tipica: "840 euro" (spesso corretto ma senza garanzie)

# CHAIN-OF-THOUGHT - alta accuratezza
cot_prompt = """
Questo cliente deve pagare una fattura di 1200 euro.
Ha già pagato il 30%. Quanto deve ancora pagare?

Ragiona passo per passo:
1. Prima calcola quanto ha già pagato
2. Poi calcola il rimanente
3. Fornisci la risposta finale
"""
# Risposta:
# "1. Ha pagato: 1200 * 30/100 = 360 euro
#  2. Rimanente: 1200 - 360 = 840 euro
#  3. Il cliente deve ancora pagare 840 euro."

# ZERO-SHOT CoT - basta aggiungere "Let's think step by step"
zero_shot_cot = """
Questo cliente deve pagare una fattura di 1200 euro.
Ha già pagato il 30%. Quanto deve ancora pagare?

Pensiamo passo per passo:
"""
# Il modello genera autonomamente il ragionamento step-by-step

# Implementazione in Python
from langchain_openai import ChatOpenAI
from langchain_core.prompts import ChatPromptTemplate

llm = ChatOpenAI(model="gpt-4o-mini", temperature=0)

cot_template = ChatPromptTemplate.from_template("""
{context}

Domanda: {question}

Ragiona passo per passo prima di rispondere. Struttura la risposta come:
RAGIONAMENTO:
[il tuo ragionamento dettagliato]

RISPOSTA FINALE:
[risposta concisa]
""")

chain = cot_template | llm
response = chain.invoke({
    "context": "Il prezzo base è 1000 euro, con IVA 22% e sconto fedele del 10%.",
    "question": "Qual è il prezzo finale da pagare?"
})

1.2 퓨샷 학습

Il 퓨샷 프롬프트 프롬프트에 입력-출력 예제가 포함되어 있습니다. 모델의 행동을 안내합니다. 특히 다음과 같은 작업에 효과적입니다. 특정 출력 형식 또는 모델에 대한 지식이 거의 없는 특수 영역.

동적 예제 선택을 통한 몇 번의 프롬프트


from langchain_core.prompts import FewShotChatMessagePromptTemplate
from langchain_core.example_selectors import SemanticSimilarityExampleSelector
from langchain_openai import OpenAIEmbeddings
from langchain_community.vectorstores import FAISS

# Libreria di esempi (task di classificazione sentiment)
examples = [
    {
        "input": "Il prodotto è arrivato rotto e il supporto non risponde.",
        "output": "NEGATIVO - Problema prodotto e supporto clienti"
    },
    {
        "input": "Consegna rapidissima e prodotto esattamente come descritto!",
        "output": "POSITIVO - Soddisfazione consegna e prodotto"
    },
    {
        "input": "Il prezzo è nella media, niente di speciale.",
        "output": "NEUTRO - Valutazione prezzo"
    },
    {
        "input": "qualità eccellente, lo riacquistero sicuramente.",
        "output": "POSITIVO - Alta qualità, fidelizzazione"
    },
    {
        "input": "Spedizione lenta, prodotto ok ma poteva andare meglio.",
        "output": "MISTO - Problema spedizione, prodotto accettabile"
    },
    # ... altri esempi
]

# Selettore semantico: scegli i 3 esempi più simili alla query
example_selector = SemanticSimilarityExampleSelector.from_examples(
    examples,
    OpenAIEmbeddings(),
    FAISS,
    k=3
)

# Template per gli esempi
example_prompt = ChatPromptTemplate.from_messages([
    ("human", "{input}"),
    ("ai", "{output}")
])

# Few-shot prompt con selezione dinamica
few_shot_prompt = FewShotChatMessagePromptTemplate(
    example_selector=example_selector,
    example_prompt=example_prompt,
)

# Prompt finale
final_prompt = ChatPromptTemplate.from_messages([
    ("system", """Sei un analista di sentiment per recensioni e-commerce.
Classifica il sentiment come: POSITIVO, NEGATIVO, NEUTRO, o MISTO.
Includi sempre la categoria principale del problema/punto di forza."""),
    few_shot_prompt,
    ("human", "{input}")
])

chain = final_prompt | llm
result = chain.invoke({"input": "Prodotto ottimo ma imballaggio pessimo."})
# Output: "MISTO - qualità prodotto vs problema imballaggio"

1.3 구조화된 출력과 함수 호출

생산에서 가장 중요한 패턴 중 하나: LLM이 결과물을 생산하도록 강제 자유 텍스트 대신 구조화된(JSON, Pydantic 모델) 구문 분석 제거 수동으로 수행하고 형식 오류를 대폭 줄입니다.

Pydantic 및 LangChain을 사용한 구조화된 출력


from pydantic import BaseModel, Field
from typing import List, Optional, Literal
from langchain_openai import ChatOpenAI
from langchain_core.prompts import ChatPromptTemplate


# Definisci il schema dell'output
class ProductAnalysis(BaseModel):
    """Analisi strutturata di un prodotto"""
    sentiment: Literal["positivo", "negativo", "neutro", "misto"] = Field(
        description="Il sentiment generale della recensione"
    )
    score: int = Field(
        description="Score da 1 a 10", ge=1, le=10
    )
    punti_forza: List[str] = Field(
        description="Lista dei punti di forza menzionati",
        default_factory=list
    )
    punti_deboli: List[str] = Field(
        description="Lista dei punti deboli menzionati",
        default_factory=list
    )
    categoria: str = Field(
        description="Categoria principale (es. 'qualità', 'spedizione', 'supporto')"
    )
    risposta_suggerita: Optional[str] = Field(
        description="Risposta suggerita per il team supporto (se sentiment negativo)",
        default=None
    )


class ReviewBatchResult(BaseModel):
    """Risultato dell'analisi di un batch di recensioni"""
    total_reviews: int
    sentiment_distribution: dict
    top_issues: List[str]
    recommendations: List[str]


# LLM con structured output
llm = ChatOpenAI(model="gpt-4o-mini", temperature=0)
structured_llm = llm.with_structured_output(ProductAnalysis)

prompt = ChatPromptTemplate.from_template("""
Analizza questa recensione di prodotto e classifica il sentiment.

Recensione: {review}

Estrai tutte le informazioni richieste in modo preciso.""")

chain = prompt | structured_llm

# L'output è già un oggetto Pydantic validato
result: ProductAnalysis = chain.invoke({
    "review": "Prodotto di ottima qualità, spedizione lenta. Supporto ha risposto velocemente."
})

print(f"Sentiment: {result.sentiment}")
print(f"Score: {result.score}/10")
print(f"Punti forza: {result.punti_forza}")
print(f"Punti deboli: {result.punti_deboli}")

# Output garantito:
# Sentiment: misto
# Score: 6/10
# Punti forza: ['qualità prodotto', 'supporto reattivo']
# Punti deboli: ['spedizione lenta']

2. 제작을 위한 템플릿 시스템

프로덕션에서는 프롬프트가 최고 수준의 리소스로 처리되어야 합니다. 버전이 지정되고 테스트 가능하며 구성 가능합니다. 강력한 템플릿 시스템을 통해 다음을 수행할 수 있습니다. 애플리케이션 코드를 변경하지 않고 업데이트 메시지를 표시합니다.

버전 관리가 포함된 프롬프트 템플릿 레지스트리


from dataclasses import dataclass, field
from typing import Dict, List, Optional, Any
from datetime import datetime
import hashlib
import json
import yaml
from pathlib import Path


@dataclass
class PromptVersion:
    """Una versione specifica di un prompt"""
    version: str
    template: str
    variables: List[str]
    description: str
    created_at: datetime = field(default_factory=datetime.now)
    created_by: str = ""
    tags: List[str] = field(default_factory=list)
    performance_metrics: Dict[str, float] = field(default_factory=dict)
    is_active: bool = True

    @property
    def template_hash(self) -> str:
        """Hash del template per deduplication"""
        return hashlib.md5(self.template.encode()).hexdigest()[:8]

    def render(self, **kwargs) -> str:
        """Renderizza il template con le variabili fornite"""
        try:
            return self.template.format(**kwargs)
        except KeyError as e:
            raise ValueError(f"Variabile mancante nel template: {e}")

    def validate_variables(self, provided: Dict) -> List[str]:
        """Verifica che tutte le variabili richieste siano fornite"""
        missing = [v for v in self.variables if v not in provided]
        return missing


class PromptRegistry:
    """
    Registry centralizzato per la gestione dei prompt in produzione.
    Supporta versioning, A/B testing e rollback.
    """

    def __init__(self, storage_path: str = "./prompts"):
        self.storage_path = Path(storage_path)
        self.storage_path.mkdir(exist_ok=True)
        self.prompts: Dict[str, List[PromptVersion]] = {}
        self._load_from_disk()

    def register(
        self,
        name: str,
        template: str,
        variables: List[str],
        description: str = "",
        version: Optional[str] = None,
        tags: List[str] = None
    ) -> PromptVersion:
        """Registra una nuova versione di un prompt"""
        if name not in self.prompts:
            self.prompts[name] = []

        # Auto-versioning se non specificato
        if version is None:
            existing = len(self.prompts[name])
            version = f"v{existing + 1:03d}"

        prompt_version = PromptVersion(
            version=version,
            template=template,
            variables=variables,
            description=description,
            tags=tags or []
        )

        # Disattiva versione precedente attiva
        for existing_v in self.prompts[name]:
            existing_v.is_active = False

        self.prompts[name].append(prompt_version)
        self._save_to_disk(name, prompt_version)

        return prompt_version

    def get(self, name: str, version: Optional[str] = None) -> PromptVersion:
        """Ottieni una versione del prompt"""
        if name not in self.prompts:
            raise KeyError(f"Prompt '{name}' non trovato nel registry")

        if version is None:
            # Versione attiva più recente
            active = [p for p in self.prompts[name] if p.is_active]
            if not active:
                raise ValueError(f"Nessuna versione attiva per '{name}'")
            return active[-1]

        for p in self.prompts[name]:
            if p.version == version:
                return p

        raise KeyError(f"Versione '{version}' non trovata per '{name}'")

    def rollback(self, name: str, version: str) -> PromptVersion:
        """Rollback a una versione precedente"""
        target = self.get(name, version)

        # Disattiva tutto
        for p in self.prompts[name]:
            p.is_active = False

        # Attiva la versione target
        target.is_active = True

        return target

    def update_metrics(self, name: str, version: str, metrics: Dict[str, float]):
        """Aggiorna le metriche di performance di una versione"""
        prompt = self.get(name, version)
        prompt.performance_metrics.update(metrics)
        self._save_to_disk(name, prompt)

    def get_history(self, name: str) -> List[Dict]:
        """Ottieni la storia delle versioni"""
        if name not in self.prompts:
            return []
        return [
            {
                "version": p.version,
                "created_at": p.created_at.isoformat(),
                "is_active": p.is_active,
                "metrics": p.performance_metrics,
                "hash": p.template_hash
            }
            for p in self.prompts[name]
        ]

    def _save_to_disk(self, name: str, prompt: PromptVersion):
        """Persisti il prompt su disco"""
        file_path = self.storage_path / f"{name}_{prompt.version}.yaml"
        data = {
            "version": prompt.version,
            "template": prompt.template,
            "variables": prompt.variables,
            "description": prompt.description,
            "tags": prompt.tags,
            "is_active": prompt.is_active,
            "performance_metrics": prompt.performance_metrics
        }
        with open(file_path, 'w') as f:
            yaml.dump(data, f, allow_unicode=True)

    def _load_from_disk(self):
        """Carica i prompt da disco all'avvio"""
        for file_path in self.storage_path.glob("*.yaml"):
            try:
                with open(file_path) as f:
                    data = yaml.safe_load(f)

                name = "_".join(file_path.stem.split("_")[:-1])
                if name not in self.prompts:
                    self.prompts[name] = []

                self.prompts[name].append(PromptVersion(**data))
            except Exception:
                pass


# Utilizzo
registry = PromptRegistry()

# Registra prompt RAG v1
registry.register(
    name="rag_answer",
    template="""Sei un assistente tecnico. Rispondi basandoti SOLO sul contesto.

Contesto: {context}
Domanda: {question}
Risposta:""",
    variables=["context", "question"],
    description="Prompt RAG base v1"
)

# Registra prompt RAG v2 (migliorato)
registry.register(
    name="rag_answer",
    template="""Sei un assistente tecnico preciso. Rispondi basandoti ESCLUSIVAMENTE
sul contesto fornito. Se il contesto non contiene la risposta, dillo esplicitamente.

Contesto:
{context}

Domanda: {question}

Fornisci una risposta concisa e accurata:""",
    variables=["context", "question"],
    description="Prompt RAG v2 - più chiaro sul fallback"
)

# Usa la versione attiva
prompt = registry.get("rag_answer")
rendered = prompt.render(context="...", question="...")

3. 프롬프트의 A/B 테스트

A/B 테스트를 통해 실제 트래픽에 대한 두 가지 버전의 프롬프트를 비교할 수 있습니다. 통계적으로 유의미한 변경 사항이 적용되었는지 확인하는 것이 중요합니다. 프롬프트는 실제로 품질을 향상시키고 품질을 저하시키지 않습니다.

프롬프트에 대한 A/B 테스트 프레임워크


import random
from scipy import stats
from typing import Callable, Tuple
import numpy as np


class PromptABTest:
    """Framework per A/B testing di prompt con significativita statistica"""

    def __init__(
        self,
        prompt_a: PromptVersion,
        prompt_b: PromptVersion,
        traffic_split: float = 0.5,  # 50% traffico a B
        min_samples: int = 100        # Campioni minimi per significativita
    ):
        self.prompt_a = prompt_a
        self.prompt_b = prompt_b
        self.traffic_split = traffic_split
        self.min_samples = min_samples

        self.results_a: List[float] = []  # Scores per prompt A
        self.results_b: List[float] = []  # Scores per prompt B

    def assign_variant(self, user_id: str = None) -> Tuple[str, PromptVersion]:
        """
        Assegna una variante all'utente.
        Deterministica se user_id e fornito (stesso utente vede sempre la stessa variante).
        """
        if user_id:
            # Hashing deterministico per consistenza per utente
            h = int(hashlib.md5(user_id.encode()).hexdigest(), 16)
            use_b = (h % 100) < (self.traffic_split * 100)
        else:
            use_b = random.random() < self.traffic_split

        if use_b:
            return "B", self.prompt_b
        return "A", self.prompt_a

    def record_result(self, variant: str, score: float):
        """Registra il risultato per una variante"""
        if variant == "A":
            self.results_a.append(score)
        else:
            self.results_b.append(score)

    def is_statistically_significant(self, alpha: float = 0.05) -> bool:
        """Verifica significativita statistica con t-test"""
        if len(self.results_a) < self.min_samples or len(self.results_b) < self.min_samples:
            return False

        _, p_value = stats.ttest_ind(self.results_a, self.results_b)
        return p_value < alpha

    def get_winner(self) -> Optional[str]:
        """Determina il vincitore se statisticamente significativo"""
        if not self.is_statistically_significant():
            return None  # Non abbastanza dati

        mean_a = np.mean(self.results_a)
        mean_b = np.mean(self.results_b)

        return "B" if mean_b > mean_a else "A"

    def report(self) -> dict:
        """Report completo del test"""
        mean_a = np.mean(self.results_a) if self.results_a else 0
        mean_b = np.mean(self.results_b) if self.results_b else 0

        p_value = None
        if len(self.results_a) > 1 and len(self.results_b) > 1:
            _, p_value = stats.ttest_ind(self.results_a, self.results_b)

        return {
            "samples_a": len(self.results_a),
            "samples_b": len(self.results_b),
            "mean_score_a": mean_a,
            "mean_score_b": mean_b,
            "improvement_pct": ((mean_b - mean_a) / mean_a * 100) if mean_a > 0 else 0,
            "p_value": p_value,
            "is_significant": self.is_statistically_significant(),
            "winner": self.get_winner(),
            "recommendation": "Deploy B" if self.get_winner() == "B" else
                              "Keep A" if self.get_winner() == "A" else
                              "Collect more data"
        }

4. LLM-as-Judge를 사용한 자동 테스트

Il LLM 판사 품질 평가를 위한 가장 확장성이 뛰어난 패턴입니다. 프롬프트: LLM(종종 테스트한 것보다 더 강력함)을 사용하여 평가합니다. 자동으로 출력하여 절감된 비용으로 사람의 평가를 시뮬레이션합니다.

신속한 테스트를 위한 LLM 판사


from langchain_openai import ChatOpenAI
from pydantic import BaseModel


class JudgeScore(BaseModel):
    """Score strutturato del giudice LLM"""
    accuracy: int        # 1-5: precisione fattuale
    relevance: int       # 1-5: rilevanza alla domanda
    clarity: int         # 1-5: chiarezza della risposta
    completeness: int    # 1-5: completezza della risposta
    overall: int         # 1-5: valutazione complessiva
    reasoning: str       # Spiegazione del punteggio
    issues: list         # Lista dei problemi riscontrati


class PromptTester:
    """Framework di testing automatico per prompt LLM"""

    def __init__(
        self,
        model_under_test: ChatOpenAI,
        judge_model: ChatOpenAI = None
    ):
        self.model = model_under_test
        self.judge = judge_model or ChatOpenAI(model="gpt-4o-mini", temperature=0)
        self.structured_judge = self.judge.with_structured_output(JudgeScore)

    def evaluate_single(
        self,
        prompt: str,
        input_vars: Dict,
        expected_output: str = None
    ) -> JudgeScore:
        """Valuta un singolo output del prompt"""
        # Genera output con il modello testato
        rendered_prompt = prompt.format(**input_vars)
        actual_output = self.model.invoke(rendered_prompt).content

        # Valuta con il giudice LLM
        judge_prompt = f"""Valuta questa risposta AI su scale 1-5 per ogni dimensione.

Domanda/Input: {input_vars.get('question', rendered_prompt[:200])}

Risposta generata: {actual_output}

{f"Risposta attesa: {expected_output}" if expected_output else ""}

Valuta obiettivamente e fornisci esempi specifici per ogni punto."""

        return self.structured_judge.invoke(judge_prompt)

    def run_test_suite(
        self,
        prompt: str,
        test_cases: List[Dict]
    ) -> dict:
        """
        Esegui una suite di test su un prompt.
        test_cases: lista di {"input": {vars}, "expected": "output atteso"}
        """
        scores = []
        failed_cases = []

        for i, test_case in enumerate(test_cases):
            try:
                score = self.evaluate_single(
                    prompt=prompt,
                    input_vars=test_case["input"],
                    expected_output=test_case.get("expected")
                )
                scores.append(score)

                # Flag test case con score basso
                if score.overall < 3:
                    failed_cases.append({
                        "index": i,
                        "input": test_case["input"],
                        "score": score.overall,
                        "issues": score.issues,
                        "reasoning": score.reasoning
                    })

            except Exception as e:
                failed_cases.append({"index": i, "error": str(e)})

        if not scores:
            return {"error": "Nessun test completato"}

        return {
            "total_tests": len(test_cases),
            "completed": len(scores),
            "avg_overall": np.mean([s.overall for s in scores]),
            "avg_accuracy": np.mean([s.accuracy for s in scores]),
            "avg_relevance": np.mean([s.relevance for s in scores]),
            "avg_clarity": np.mean([s.clarity for s in scores]),
            "pass_rate": len([s for s in scores if s.overall >= 3]) / len(scores),
            "failed_cases": failed_cases,
            "recommendation": "DEPLOY" if np.mean([s.overall for s in scores]) >= 4.0
                              else "REVIEW" if np.mean([s.overall for s in scores]) >= 3.0
                              else "REJECT"
        }


# Test suite di esempio per un prompt RAG
test_suite = [
    {
        "input": {
            "context": "LangChain è un framework Python per costruire applicazioni LLM.",
            "question": "Cos'è LangChain?"
        },
        "expected": "LangChain è un framework per costruire applicazioni basate su LLM"
    },
    {
        "input": {
            "context": "Il prezzo di abbonamento base è 29 euro al mese.",
            "question": "Quanto costa l'abbonamento premium?"
        },
        "expected": "Il contesto non specifica il prezzo dell'abbonamento premium"
    },
]

5. 모범 사례 및 안티 패턴

모범 사례 생산 현장의 신속한 엔지니어링

코드와 같은 프롬프트 버전을 지정합니다. 프롬프트에 대한 모든 변경은 잠재적인 주요 변경 사항입니다. 시맨틱 버전 관리를 사용하고, 변경 로그를 유지하고 배포하기 전에 테스트하세요.
가능하다면 구조화된 출력을 사용하세요. JSON/Pydantic은 수동 구문 분석 및 형식 문제를 제거합니다. 자유 텍스트 정규식 구문 분석보다 더 안정적입니다.
극단적인 경우 데이터로 테스트합니다. 가장 중요한 질문은 배포되지 않는 질문입니다. 테스트 세트에는 극단적인 사례, 모호한 질문, 잘못된 입력이 포함되어야 합니다.
복잡한 작업에 대한 사고 사슬: 둘 이상의 추론 단계가 필요한 모든 작업의 경우 CoT는 품질을 크게 향상시킵니다(종종 +20-40%).
다양한 예시를 포함한 몇 장의 사진: 가장 일반적인 사례뿐만 아니라 모든 주요 사례를 다루는 예를 포함합니다. 사례의 다양성은 체계적인 편견을 방지합니다.

피해야 할 안티패턴

코드에 하드 코딩된 프롬프트: 소스 코드의 프롬프트는 배포 없이 프로덕션 환경에서 업데이트하는 것이 불가능합니다. 레지스트리 또는 구성 파일을 사용하십시오.
배포 전 테스트 없음: 10개의 수동 예제에서 작동하는 프롬프트는 실제 극단적인 경우에는 실패할 수 있습니다. 최소 50~100개의 다양한 사례로 테스트 스위트를 구축하세요.
결정론적 작업을 위한 고온: 분류, 데이터 추출 및 분석을 위해 항상 온도=0을 사용합니다. 가변성은 일관성의 적입니다.
구조 없이 프롬프트가 너무 깁니다. 500개 이상의 토큰이 있으면 모델은 그 사이에 정보를 잃는 경향이 있습니다. 명확한 섹션을 사용하여 프롬프트를 구성하고 주요 항목을 사용하세요.

결론

생산 현장의 신속한 엔지니어링에는 소프트웨어와 동일한 엄격함이 필요합니다. 기존 엔지니어링: 버전 관리, 테스트, 모니터링 및 제어된 배포. 고급 기술(CoT, Few-Shot, Structured Output)을 적용하는 방법을 살펴보았습니다. 버전이 지정된 레지스트리로 프롬프트를 관리하는 방법, A/B 테스트를 수행하는 방법 통계적 중요성과 판사로서의 LLM을 사용하여 평가를 자동화하는 방법.

핵심 포인트:

Chain-of-Thought는 복잡한 작업의 정확성을 크게 향상시킵니다.
구조화된 출력(Pydantic)은 구문 분석 오류를 제거하고 형식을 보장합니다.
프롬프트는 다른 코드와 마찬가지로 버전 관리, 테스트 및 배포되어야 합니다.
새 버전을 홍보하기 전에 통계적 유의성을 갖춘 A/B 테스트
판사로서의 LLM은 저렴한 비용으로 품질 평가를 확장합니다.

시리즈는 계속됩니다

8조: 다중 에이전트 시스템
제9조: 생산 시 신속한 엔지니어링(현재)
기사 10: AI에 대한 지식 그래프