안녕하세요!

Federico Calò

Sviluppatore Software | Divulgatore Tecnico

Creo applicazioni web moderne e strumenti digitali personalizzati per aiutare le attività a crescere attraverso l'innovazione tecnologica. La mia passione è unire informatica ed economia per generare valore reale.

연락하기

소개

La mia passione per l'informatica è nata tra i banchi dell'Istituto Tecnico Commerciale di Maglie, dove ho scoperto il potere della programmazione e il fascino di creare soluzioni digitali. Fin da subito, ho capito che l'informatica non era solo codice, ma uno strumento straordinario per trasformare idee in realtà.

Durante gli studi superiori in Sistemi Informativi Aziendali, ho iniziato a intrecciare informatica ed economia, comprendendo come la tecnologia possa essere il motore della crescita per qualsiasi attività. Questa visione mi ha accompagnato all'Università degli Studi di Bari, dove ho conseguito la Laurea in Informatica, approfondendo le mie competenze tecniche e la mia passione per lo sviluppo software.

Oggi metto questa esperienza al servizio di imprese, professionisti e startup, creando soluzioni digitali su misura che automatizzano processi, ottimizzano risorse e aprono nuove opportunità di business. Perché la vera innovazione inizia quando la tecnologia incontra le esigenze reali delle persone.

역량

Analisi Dati & Modelli Previsionali

Trasformo i dati in insights strategici con analisi approfondite e modelli predittivi per decisioni informate

프로세스 자동화

Creo strumenti personalizzati che automatizzano operazioni ripetitive e liberano tempo per attività a valore aggiunto

맞춤 시스템

Sviluppo sistemi software su misura, dalle integrazioni tra piattaforme alle dashboard personalizzate

const federico = {
  nome: "Federico Calò",
  ruolo: "Sviluppatore Software",
  città: "Bari, Italia",
  missione: "Aiutare attraverso l'informatica",
  passioni: [
    "Codice Pulito",
    "Innovazione",
    "Crescita Continua"
  ]
};

미션

Credo fermamente che l'informatica sia lo strumento più potente per trasformare le idee in realtà e migliorare la vita delle persone.

🚀

기술의 민주화

La mia missione è rendere l'informatica accessibile a tutti: dalle piccole imprese locali alle startup innovative, fino ai professionisti che vogliono digitalizzare la propria attività. Ogni realtà merita di sfruttare le potenzialità del digitale.

💡

IT와 비즈니스 통합

Non è solo questione di scrivere codice: è capire come la tecnologia possa generare valore reale. Intrecciando competenze informatiche e visione economica, aiuto le attività a crescere, ottimizzare processi e raggiungere nuovi traguardi di efficienza e redditività.

🎯

맞춤 솔루션

Ogni attività è unica, e così devono esserlo le soluzioni. Sviluppo strumenti personalizzati che rispondono alle esigenze specifiche di ciascun cliente, automatizzando processi ripetitivi e liberando tempo per ciò che conta davvero: far crescere il business.

기술로 비즈니스를 혁신하세요

Dicembre 2024

Visualizza

Master SQL

RoadMap.sh

Novembre 2024

Visualizza

Oracle Certified Foundations Associate

Oracle

Ottobre 2024

Visualizza

People Leadership Credential

Connect

Settembre 2024

💻 Linguaggi & Tecnologie

☕Java

🐍Python

📜JavaScript

🅰️Angular

⚛️React

🔷TypeScript

🗄️SQL

🐘PHP

🎨CSS/SCSS

🔧Node.js

🐳Docker

🌿Git

💼

12/2024 - Presente

Custom Software Engineering Analyst

Accenture

Bari, Puglia, Italia · Ibrida Analisi e sviluppo di sistemi informatici attraverso l'utilizzo di Java e Quarkus in Health and Public Sector. Formazione continua su tecnologie moderne per la creazione di soluzioni software personalizzate ed efficienti e sugli agenti.

💼

06/2022 - 12/2024

Analista software e Back End Developer Associate Consultant

Links Management and Technology SpA

Esperienza nell'analisi di sistemi software as-is e flussi ETL utilizzando PowerCenter. Formazione completata su Spring Boot per lo sviluppo di applicazioni backend moderne e scalabili. Sviluppatore Backend specializzato in Spring Boot, con esperienza in progettazione di database, analisi, sviluppo e testing dei task assegnati.

💼

02/2021 - 10/2021

Programmatore software

Adesso.it (prima era WebScience srl)

Esperienza nell'analisi AS-IS e TO-BE, evoluzioni SEO ed evoluzioni website per migliorare le performance e l'engagement degli utenti.

🎓

2018 - 2025

Laurea in Informatica

Università degli Studi di Bari Aldo Moro

Bachelor's degree in Computer Science, focusing on software engineering, algorithms, and modern development practices.

📚

2013 - 2018

Diploma - Sistemi Informativi Aziendali

Istituto Tecnico Commerciale di Maglie

Technical diploma specializing in Business Information Systems, combining IT knowledge with business management.

연락하기

프로젝트가 있으신가요? 아래 양식을 작성해 주시면 빠르게 답변드리겠습니다.

* Campi obbligatori. I tuoi dati saranno utilizzati solo per rispondere alla tua richiesta.

Kubernetes의 AI 및 GPU 워크로드: 장치 플러그인 및 교육 작업

2026년에는 AI 추론 클러스터의 66%가 Kubernetes에서 실행됩니다(CNCF 설문조사 2026). 이유 간단합니다. Kubernetes는 AI 워크로드의 가장 어려운 운영 문제인 예약을 해결합니다. 지능형 GPU 확장, 학습 작업의 탄력적인 확장, 분산 스토리지와의 통합 데이터 세트의 경우 노드 오류가 발생하면 자동 재시도됩니다. 하지만 i용 Kubernetes 설정 GPU 워크로드에는 일반적인 배포를 뛰어넘는 특정 기술이 필요합니다. 웹 애플리케이션.

이번 글에서는 설정 방법을 알아보겠습니다. NVIDIA 장치 플러그인 에 대한 클러스터에 GPU 노출, 예약 방법 분산 학습 작업 PyTorch 및 TensorFlow 사용 방법 카펜터 부분 크기 조정을 수행하려면 GPU(40~70% 비용 절감) 및 워크로드에서 GPU 사용을 최적화하는 패턴 생산에 대한 추론.

무엇을 배울 것인가

Kubernetes용 NVIDIA 장치 플러그인 설치 및 구성
GPU 수요에 따라 포드 예약(nvidia.com/gpu 리소스)
PyTorchJob 및 TFJob(Kubeflow Training Operator)을 사용한 분산 교육
스팟 GPU 노드 자동 프로비저닝을 위한 Karpenter NodePool
여러 포드 간에 GPU를 공유하기 위한 GPU 시간 분할
A100/H100 GPU 파티션용 MIG(멀티 인스턴스 GPU)
DCGM 내보내기 및 Grafana를 사용한 GPU 모니터링
K8s에서 TorchServe를 사용한 높은 처리량 추론 패턴

Kubernetes의 GPU 아키텍처

Kubernetes는 기본적으로 GPU를 인식하지 않습니다. GPU는 다음을 통해 클러스터에 노출됩니다. 장치 플러그인 프레임워크: GPU가 있는 모든 노드에서 실행되는 DaemonSet, kubelet에 등록하고 컨테이너에 대한 GPU 할당을 관리합니다. 는 NVIDIA 장치 플러그인은 이 프레임워크의 가장 널리 사용되는 구현입니다.

NVIDIA 장치 플러그인 설치

# Pre-requisiti: NVIDIA GPU drivers installati sui nodi
# Verifica driver sui nodi
kubectl get nodes -l accelerator=nvidia
kubectl describe node gpu-node-1 | grep -i nvidia

# Installa NVIDIA Device Plugin con Helm
helm repo add nvdp https://nvidia.github.io/k8s-device-plugin
helm repo update

helm install nvdp nvdp/nvidia-device-plugin \
  --namespace kube-system \
  --version 0.16.0 \
  --set failOnInitError=false

# Oppure con manifest diretto
kubectl apply -f https://raw.githubusercontent.com/NVIDIA/k8s-device-plugin/v0.16.0/deployments/static/nvidia-device-plugin.yml

# Verifica che le GPU siano visibili nel cluster
kubectl get nodes -o json | jq '.items[].status.allocatable | select(."nvidia.com/gpu" != null)'
# Output: { "nvidia.com/gpu": "8" }  per un nodo con 8 GPU A100

NVIDIA GPU Operator 설치(권장 접근 방식)

프로덕션 중인 클러스터의 경우 GPU 운영자 NVIDIA가 관리하는 드라이버, 장치 플러그인, 컨테이너 런타임 등 필요한 모든 구성 요소를 자동으로 모니터링을 위한 DCGM 내보내기:

# Installa GPU Operator
helm repo add nvidia https://helm.ngc.nvidia.com/nvidia
helm repo update

helm install gpu-operator nvidia/gpu-operator \
  --namespace gpu-operator \
  --create-namespace \
  --version v24.9.0 \
  --set driver.enabled=true \
  --set mig.strategy=single \
  --set dcgmExporter.enabled=true \
  --set dcgmExporter.serviceMonitor.enabled=true

# Verifica installazione
kubectl get pods -n gpu-operator
# Attendi che tutti i pod siano Running
kubectl wait --for=condition=ready pod -l app=nvidia-device-plugin-daemonset -n gpu-operator --timeout=300s

# Verifica GPU allocabili
kubectl describe node gpu-node-1 | grep -A 5 "Allocatable:"
# nvidia.com/gpu: 8

GPU를 사용하여 포드 예약

장치 플러그인이 활성화되면 다음과 같은 매니페스트에서 GPU를 요청할 수 있습니다. 다른 Kubernetes 리소스. 차이점: 요청을 사용하지 않고 제한만 사용합니다. GPU용(Kubernetes는 항상 필요한 GPU 수를 정확하게 보장합니다).

# pod-gpu-basic.yaml
apiVersion: v1
kind: Pod
metadata:
  name: gpu-test
spec:
  restartPolicy: OnFailure
  containers:
    - name: inference
      image: nvcr.io/nvidia/pytorch:24.01-py3
      command: ["python3", "-c"]
      args:
        - |
          import torch
          print(f"CUDA available: {torch.cuda.is_available()}")
          print(f"GPU count: {torch.cuda.device_count()}")
          print(f"GPU name: {torch.cuda.get_device_name(0)}")
          x = torch.rand(1000, 1000).cuda()
          print(f"Tensor on GPU: {x.device}")
      resources:
        limits:
          nvidia.com/gpu: "1"    # richiedi 1 GPU
          memory: "16Gi"
          cpu: "4"
        requests:
          memory: "16Gi"
          cpu: "4"
      volumeMounts:
        - name: model-storage
          mountPath: /models
  volumes:
    - name: model-storage
      persistentVolumeClaim:
        claimName: model-pvc
  nodeSelector:
    accelerator: "nvidia-a100"  # schedule solo su nodi A100
  tolerations:
    - key: "nvidia.com/gpu"
      operator: "Exists"
      effect: "NoSchedule"

Kubeflow Training Operator를 통해 배포된 교육

대규모 모델을 훈련하려면 여러 노드에 여러 GPU가 필요한 경우가 많습니다. 그만큼 Kubeflow 교육 운영자 분산된 훈련 작업을 관리합니다. PyTorchJob, TFJob, MXJob 및 MPIJob. 먼저 연산자를 설치하십시오.

# Installa Training Operator
kubectl apply -k "github.com/kubeflow/training-operator/manifests/overlays/standalone?ref=v1.8.0"

# Verifica
kubectl get pods -n kubeflow
kubectl get crd | grep kubeflow

다중 GPU 다중 노드 훈련을 위한 PyTorchJob

# pytorch-distributed-training.yaml
apiVersion: kubeflow.org/v1
kind: PyTorchJob
metadata:
  name: llm-finetuning-job
  namespace: ml-training
spec:
  pytorchReplicaSpecs:
    Master:
      replicas: 1
      restartPolicy: OnFailure
      template:
        spec:
          containers:
            - name: pytorch
              image: company.registry.io/training:llm-v2.1
              command:
                - python3
                - -m
                - torch.distributed.run
                - --nproc_per_node=8
                - --nnodes=4
                - --node_rank=$(RANK)
                - --master_addr=$(MASTER_ADDR)
                - --master_port=23456
                - train_llm.py
                - --model=llama-7b
                - --dataset=/data/training_set
                - --batch-size=32
                - --epochs=3
                - --output=/models/finetuned
              env:
                - name: NCCL_DEBUG
                  value: "INFO"
                - name: NCCL_SOCKET_IFNAME
                  value: "eth0"
              resources:
                limits:
                  nvidia.com/gpu: "8"
                  memory: "120Gi"
                  cpu: "32"
                requests:
                  memory: "120Gi"
                  cpu: "32"
              volumeMounts:
                - name: training-data
                  mountPath: /data
                - name: model-output
                  mountPath: /models
                - name: shm
                  mountPath: /dev/shm
          volumes:
            - name: training-data
              persistentVolumeClaim:
                claimName: training-dataset-pvc
            - name: model-output
              persistentVolumeClaim:
                claimName: model-output-pvc
            - name: shm
              emptyDir:
                medium: Memory
                sizeLimit: "64Gi"  # shared memory per NCCL
          nodeSelector:
            accelerator: "nvidia-a100-80gb"
          tolerations:
            - key: "nvidia.com/gpu"
              operator: "Exists"
              effect: "NoSchedule"
    Worker:
      replicas: 3  # 3 worker + 1 master = 4 nodi, 32 GPU totali
      restartPolicy: OnFailure
      template:
        spec: # stesso spec del Master...
          containers:
            - name: pytorch
              image: company.registry.io/training:llm-v2.1
              resources:
                limits:
                  nvidia.com/gpu: "8"
                  memory: "120Gi"
                  cpu: "32"

스팟 GPU 노드 프로비저닝을 위한 Karpenter

GPU는 클라우드에서 가장 비싼 리소스입니다. Spot GPU 인스턴스 비용이 60~70% 저렴합니다. 온디맨드와 비교. 카펜터 자동 프로비저닝을 관리합니다. 중단 시 주문형으로 대체되는 스팟 GPU 노드:

# karpenter-gpu-nodepool.yaml
apiVersion: karpenter.sh/v1
kind: NodePool
metadata:
  name: gpu-spot
spec:
  template:
    metadata:
      labels:
        role: gpu-worker
        accelerator: nvidia
    spec:
      nodeClassRef:
        apiVersion: karpenter.k8s.aws/v1
        kind: EC2NodeClass
        name: gpu-nodeclass
      requirements:
        # Tipologie di istanze GPU AWS
        - key: node.kubernetes.io/instance-type
          operator: In
          values:
            - p4d.24xlarge     # 8x A100 80GB
            - p3.8xlarge       # 4x V100
            - g5.12xlarge      # 4x A10G
            - g4dn.12xlarge    # 4x T4
        # Preferisci spot
        - key: karpenter.sh/capacity-type
          operator: In
          values:
            - spot
            - on-demand  # fallback
        - key: kubernetes.io/os
          operator: In
          values:
            - linux
      taints:
        - key: nvidia.com/gpu
          value: "true"
          effect: NoSchedule
  limits:
    nvidia.com/gpu: 256   # max 256 GPU totali nel cluster
  disruption:
    consolidationPolicy: WhenEmptyOrUnderutilized
    consolidateAfter: 30m  # rimuovi nodi GPU spot quando il job finisce
---
apiVersion: karpenter.k8s.aws/v1
kind: EC2NodeClass
metadata:
  name: gpu-nodeclass
spec:
  amiFamily: AL2
  role: KarpenterNodeRole
  subnetSelectorTerms:
    - tags:
        karpenter.sh/discovery: "my-cluster"
  securityGroupSelectorTerms:
    - tags:
        karpenter.sh/discovery: "my-cluster"
  instanceStorePolicy: RAID0  # usa i dischi NVMe locali per storage temporaneo
  userData: |
    #!/bin/bash
    # Installa NVIDIA drivers al primo boot
    /etc/eks/bootstrap.sh my-cluster
    nvidia-smi  # verifica GPU disponibili

GPU 시간 분할: 여러 포드 간에 GPU 공유

가벼운 추론이나 개발 워크로드의 경우 전체 GPU가 낭비되는 경우가 많습니다. 그만큼 GPU 시간 분할 여러 포드 간에 물리적 GPU를 공유할 수 있습니다. 각각은 계산 시간의 일부가 포함된 "가상 GPU"를 봅니다.

# gpu-time-slicing-config.yaml
# Configura il Device Plugin per time-slicing
apiVersion: v1
kind: ConfigMap
metadata:
  name: time-slicing-config
  namespace: gpu-operator
data:
  any: |-
    version: v1
    flags:
      migStrategy: none
    sharing:
      timeSlicing:
        renameByDefault: false
        failRequestsGreaterThanOne: false
        resources:
          - name: nvidia.com/gpu
            replicas: 4   # ogni GPU fisica diventa 4 "GPU" logiche
---
# Applica il config all'operator
kubectl patch clusterpolicy gpu-cluster-policy \
  -n gpu-operator \
  --type merge \
  -p '{"spec": {"devicePlugin": {"config": {"name": "time-slicing-config"}}}}'

# Verifica: ogni nodo con 1 GPU A100 ora mostra 4 GPU allocabili
kubectl describe node gpu-node-1 | grep nvidia.com/gpu
# Allocatable:
#   nvidia.com/gpu:  4

# Pod che usa 1/4 di GPU
apiVersion: v1
kind: Pod
metadata:
  name: inference-small
spec:
  containers:
    - name: model-server
      image: company.registry.io/inference:v1
      resources:
        limits:
          nvidia.com/gpu: "1"  # ottiene 1/4 della GPU fisica

MIG: A100 및 H100용 멀티 인스턴스 GPU

NVIDIA A100 및 H100 GPU 지원 MIG(멀티 인스턴스 GPU), 그 GPU를 격리된 하드웨어 인스턴스로 분할합니다(단순한 시간 공유가 아님). 각 MIG 인스턴스 이는 메모리와 계산을 보장하며 다른 것을 방해하지 않습니다.

# Configura MIG sul nodo (eseguito sul nodo GPU, non da kubectl)
# Richiede: driver NVIDIA >= 525, GPU A100 o H100

# Abilita MIG mode sulla GPU
sudo nvidia-smi -mig 1

# Crea 7 istanze MIG da 1/7 di A100 (1g.10gb)
sudo nvidia-smi mig -cgip -p 0,9  # Profile 9 = MIG 1g.10gb

# Verifica istanze create
sudo nvidia-smi mig -lgi
# +-------------------------------------------------------+
# | GPU instances:                                         |
# | GPU   Name             Profile  Instance   Placement  |
# |                        ID       ID         Start:Size |
# |=======================================================|
# |   0  MIG 1g.10gb       9        1          0:1        |
# |   0  MIG 1g.10gb       9        2          1:1        |
# |   0  MIG 1g.10gb       9        3          2:1        |
# ... (7 istanze totali)

# Nel cluster Kubernetes, configurare MIG Strategy nel GPU Operator
kubectl patch clusterpolicy gpu-cluster-policy \
  -n gpu-operator \
  --type json \
  -p '[{"op":"replace","path":"/spec/mig/strategy","value":"mixed"}]'

# Pod che richiede specifica istanza MIG
apiVersion: v1
kind: Pod
metadata:
  name: inference-mig
spec:
  containers:
    - name: model
      image: nvcr.io/nvidia/pytorch:24.01-py3
      resources:
        limits:
          nvidia.com/mig-1g.10gb: "1"  # richiedi 1 istanza MIG 1g.10gb

DCGM 내보내기를 사용한 GPU 모니터링

훈련 작업이 효율적인지 이해하려면 GPU 모니터링이 필수적입니다. FinOps용입니다. DCGM 내보내기는 GPU 측정항목을 Prometheus에 공개합니다.

# DCGM Exporter viene installato automaticamente con GPU Operator
# Verifica che le metriche siano disponibili
kubectl port-forward svc/gpu-operator-dcgm-exporter 9400:9400 -n gpu-operator &
curl -s localhost:9400/metrics | grep DCGM_FI

# Metriche chiave da monitorare:
# DCGM_FI_DEV_GPU_UTIL        - utilizzo GPU (0-100%)
# DCGM_FI_DEV_MEM_COPY_UTIL   - utilizzo memoria GPU
# DCGM_FI_DEV_FB_USED         - memoria GPU usata (MB)
# DCGM_FI_DEV_POWER_USAGE     - consumo energetico (W)
# DCGM_FI_DEV_SM_CLOCK        - clock streaming multiprocessor
# DCGM_FI_DEV_GPU_TEMP        - temperatura GPU

# Alert: GPU sottoutilizzata (< 50% per 30 minuti = spreco)
- alert: GPUUnderutilized
  expr: DCGM_FI_DEV_GPU_UTIL < 50
  for: 30m
  labels:
    severity: warning
  annotations:
    summary: "GPU {{ $labels.gpu }} sul nodo {{ $labels.Hostname }} utilization < 50%"
    description: "Valuta se il job puo essere terminato o ottimizzato"

# Dashboard Grafana: importa ID 12239 (NVIDIA DCGM Exporter Dashboard)

TorchServe를 사용한 추론용 모델 배포

프로덕션 추론을 위해서는 로드 밸런싱을 처리하는 모델 서버가 필요합니다. 다중 모델 복제, 요청 일괄 처리 및 버전 관리를 포함합니다. 토치서브 공식 PyTorch 솔루션:

# torchserve-deployment.yaml
apiVersion: apps/v1
kind: Deployment
metadata:
  name: model-inference
  namespace: ml-inference
spec:
  replicas: 3   # 3 replica per alta disponibilita
  selector:
    matchLabels:
      app: model-inference
  template:
    metadata:
      labels:
        app: model-inference
      annotations:
        prometheus.io/scrape: "true"
        prometheus.io/port: "8082"   # TorchServe metrics port
    spec:
      containers:
        - name: torchserve
          image: pytorch/torchserve:0.11.0-gpu
          args:
            - torchserve
            - --start
            - --model-store=/models
            - --models=text-classifier=bert-classifier.mar
            - --ts-config=/config/config.properties
          ports:
            - containerPort: 8080  # inference API
            - containerPort: 8081  # management API
            - containerPort: 8082  # metrics
          resources:
            limits:
              nvidia.com/gpu: "1"
              memory: "16Gi"
              cpu: "4"
            requests:
              memory: "8Gi"
              cpu: "2"
          readinessProbe:
            httpGet:
              path: /ping
              port: 8080
            initialDelaySeconds: 60
            periodSeconds: 10
          livenessProbe:
            httpGet:
              path: /ping
              port: 8080
            initialDelaySeconds: 120
            periodSeconds: 30
          volumeMounts:
            - name: model-store
              mountPath: /models
            - name: ts-config
              mountPath: /config
      volumes:
        - name: model-store
          persistentVolumeClaim:
            claimName: model-store-pvc
        - name: ts-config
          configMap:
            name: torchserve-config
      tolerations:
        - key: "nvidia.com/gpu"
          operator: "Exists"
          effect: "NoSchedule"
---
apiVersion: v1
kind: ConfigMap
metadata:
  name: torchserve-config
  namespace: ml-inference
data:
  config.properties: |
    inference_address=http://0.0.0.0:8080
    management_address=http://0.0.0.0:8081
    metrics_address=http://0.0.0.0:8082
    number_of_gpu=1
    batch_size=32
    max_batch_delay=100   # ms: attendi fino a 100ms per fare batching
    max_response_size=6553500
    install_py_dep_per_model=true
---
# HPA basato su latenza con KEDA (event-driven autoscaling)
apiVersion: keda.sh/v1alpha1
kind: ScaledObject
metadata:
  name: inference-scaler
  namespace: ml-inference
spec:
  scaleTargetRef:
    name: model-inference
  minReplicaCount: 1
  maxReplicaCount: 10
  triggers:
    - type: prometheus
      metadata:
        serverAddress: http://prometheus.monitoring.svc:9090
        metricName: torchserve_queue_latency_microseconds
        threshold: "100000"  # 100ms di coda = scala up
        query: avg(torchserve_queue_latency_microseconds{model_name="bert-classifier"})

Kubernetes의 AI 워크로드 모범 사례

GPU 비용 최적화

학습을 위한 장소, 추론을 위한 주문형: 훈련은 체크포인트를 통해 중단을 처리할 수 있습니다. 추론은 항상 가능해야 한다
자주 확인하는 사항: 30분마다 체크포인트를 저장하여 특정 중단 후 훈련을 재개합니다.
개발 시간 분할: 개발자용 시분할 GPU, 프로덕션용 MIG 또는 전체 GPU 사용
0으로 축소: Karpenter는 훈련이 끝나면 스팟 GPU 노드를 제거합니다. 유휴 GPU에 대한 비용은 지불하지 않습니다.
추론의 일괄 처리: Batch_size=32인 TorchServe는 단일 요청에 비해 처리량을 10~20배 증가시킵니다.
배포 전 프로필: NVIDIA Nsight를 사용하여 훈련 작업을 프로파일링하고 비효율성을 식별하세요

Kubernetes의 일반적인 GPU 오류

오염 내성이 없는 컨테이너: GPU 노드에 오염이 있음 nvidia.com/gpu=true:NoSchedule; 허용하지 않으면 Pod가 GPU 노드에 예약되지 않습니다.
메모리 분리 실패: GPU는 CPU처럼 컨테이너 간에 메모리를 격리하지 않습니다. 1개의 GPU를 할당했지만 모델이 사용 가능한 것보다 더 많은 메모리를 사용하는 경우 작업이 CUDA OOM과 충돌합니다.
공유 메모리가 없는 NCCL: PyTorch 분산 교육은 대용량 /dev/shm(일반적으로 10-60GB)이 필요한 NCCL을 사용합니다. 항상 매체: 메모리로 빈Dir을 구성하십시오.
GPU 사용률을 모니터링하지 마세요. 20% 활용도의 GPU는 엄청난 낭비입니다. DCGM 대시보드는 모든 배포 후 가장 먼저 확인해야 하는 장소입니다.

결론 및 다음 단계

Kubernetes는 우연히 AI/ML 워크로드의 표준 플랫폼이 되었습니다. 자원 추상화, 고급 스케줄링 시스템 및 운영자 생태계 (Kubeflow, Training Operator)는 두 훈련을 모두 조정하는 데 이상적인 컨텍스트를 만듭니다. 규모에 대한 추론. Karpenter를 사용하면 노드 프로비저닝을 자동으로 관리할 수 있습니다. Spot GPU, 학습 작업 비용을 사용량 대비 40~70% 절감 가능 주문형 인스턴스의 수입니다.

다음 단계는 이러한 워크로드를 완전한 MLOps 파이프라인과 통합하는 것입니다. MLflow를 사용한 모델, DVC를 사용한 데이터 세트 관리, 자동 재학습을 위한 CI/CD. Kubernetes용 FinOps 문서(이 시리즈의 9항)에서는 측정 방법을 자세히 설명합니다. 클러스터에서 GPU 워크로드의 총 비용을 최적화합니다.