03 - Agentic Workflows: Decomposing Problems for AI
There is a precise moment when AI stops being an assistant and becomes a collaborator: when you stop asking it to "write this function" and start asking it to "solve this problem". That mental shift - from a single instruction to a complex task - is at the heart of agentic workflows. It is also the point where most developers get stuck, because building an effective agentic workflow requires a deep paradigm shift: it is not about longer prompts, it is about architecture.
In 2025, 92% of American developers use AI tools in their daily work (Stack Overflow Developer Survey 2025), but only a fraction of them truly leverage the potential of agentic systems. The problem is not access to technology: it is the lack of clear architectural patterns for decomposing complex problems into tasks that an AI agent can solve reliably, verifiably and repeatably.
In this article, we will build together a deep understanding of agentic workflows: from fundamental patterns (Sequential, Parallel, Hierarchical, Iterative) to practical implementation with Claude Code, including context management, quality metrics and the anti-patterns that turn a promising workflow into an unreliable system. All with real case studies and working code.
What You Will Learn
- The four fundamental decomposition patterns: Sequential, Parallel, Hierarchical, Iterative
- Agentic workflow architecture: Planner, Executor, Reviewer, Memory
- How to structure CLAUDE.md to guide agents on real projects
- The Plan-Execute-Review loop and self-healing workflows
- Tool use and context management for long agentic sessions
- Metrics to evaluate agentic workflow quality
- The most dangerous anti-patterns and how to avoid them
- Case study: Angular codebase refactoring with an agentic workflow
What Is an Agentic Workflow
An agentic workflow is a structured sequence of operations where one or more AI agents plan, execute, and verify actions to achieve a complex objective. Unlike a single LLM call, an agentic workflow has memory across steps, can use tools (file system, terminal, web, APIs), can delegate subtasks to specialized agents, and can adapt to errors without human intervention.
The key distinction from "naive" vibe coding lies in deliberate structure. Asking Claude to "refactor this codebase" produces mediocre results. Building a workflow that: (1) analyzes the codebase, (2) identifies components to refactor in priority order, (3) refactors one component at a time with regression tests, (4) verifies each step before proceeding, produces professional results. The difference is decomposition.
Operational Definition
An agentic workflow is reliable when: every step is independently verifiable, failure of one step does not corrupt the entire workflow, the final output is deterministic with respect to the input, and a human operator can inspect and correct the workflow at any checkpoint. This definition comes from the Anthropic "Building effective agents" framework (2024) and remains the compass for evaluating any implementation.
Problem Decomposition: The Core of Agentic Workflows
The TDAG research (Task Decomposition and Agent Generation, 2025) demonstrated that decomposition quality is the most predictive factor of multi-agent system success: wrong decomposition propagates errors exponentially through subsequent steps, while correct decomposition isolates failures and enables recovery.
There are four fundamental decomposition patterns, each suited to different types of problems:
Pattern 1: Sequential (Chain)
The simplest pattern: each task depends on the output of the previous one. Suited to linear workflows where order is semantically significant (e.g., analysis -> design -> implementation -> test).
Sequential Pattern:
[Input]
|
v
[Task A] --> output_A
|
v
[Task B] --> output_B
|
v
[Task C] --> [Final Output]
Characteristics:
- Each task receives previous output as context
- Task B failure blocks Task C
- Simple debugging: error localized to failed task
- Latency: sum of all times (A + B + C)
Typical use case: Code generation pipeline
1. Requirements analysis
2. Architectural design
3. Implementation
4. Test generation
5. Documentation
Pattern 2: Parallel (Scatter-Gather)
Multiple independent tasks are executed simultaneously by separate agents, with an aggregator collecting and synthesizing results. Dramatically reduces latency but requires subtasks to be truly independent (no shared mutable state).
Parallel Pattern (Scatter-Gather):
[Input]
|
[Orchestrator/Splitter]
/ | \
v v v
[Task A] [Task B] [Task C]
out_A out_B out_C
\ | /
v v v
[Aggregator/Merger]
|
[Final Output]
Characteristics:
- Tasks A, B, C run in parallel (via sub-agents)
- Latency: max(A, B, C) instead of A + B + C
- Partial failure: manageable if aggregator is robust
- Risk: race conditions on shared resources
Typical use case: Multi-dimensional review
Agent 1: Security review
Agent 2: Performance analysis
Agent 3: Code style check
Agent 4: Test coverage analysis
Aggregator: Synthesize findings
Pattern 3: Hierarchical (Supervisor-Worker)
A supervisor agent decomposes the problem into subtasks and delegates to specialized workers. Workers can in turn have sub-workers. It is the most powerful pattern for large-scale problems but also the most complex to debug. LangGraph documented this pattern as the most adopted in 2025 for enterprise systems.
Hierarchical Pattern:
[Planner Agent]
"Refactor auth module"
/ | \
v v v
[Backend Agent] [Frontend] [Test Agent]
"Refactor "Update "Update
AuthService" LoginCmp" test suite"
| | |
[sub-tasks] [sub-tasks] [sub-tasks]
| | |
done done done
\ | /
v v v
[Planner: Merge & Verify]
|
[Final Output]
Typical levels in real systems:
L0: Problem Planner (decomposes global goal)
L1: Domain Agents (backend, frontend, infra)
L2: Task Workers (individual files, functions)
L3: Tool Calls (bash, file system, tests)
Pattern 4: Iterative (ReAct / Reflexion)
The agent operates in a loop: executes an action, observes the result, reflects on the current state, and decides the next step. This is the pattern of the ReAct framework (Reasoning + Acting) and its extension Reflexion (which adds an explicit critique). Suited to exploratory problems where the solution path is not known in advance.
Iterative Pattern (ReAct + Reflexion):
[Goal]
|
v
[Think] <-----------+
| |
v |
[Act / Tool Use] | (if max_iter not
| | reached and goal
v | not satisfied)
[Observe] |
| |
v |
[Critique/Reflect]--+
|
(if goal satisfied)
|
v
[Output]
Key elements:
- Scratchpad: memory of previous steps
- Stop condition: goal reached OR max_iterations
- Critique: explicit evaluation of partial output
- Tool repertoire: set of tools available to the agent
Use case: Debugging a failing test
1. Read error message
2. Analyze related code
3. Formulate hypothesis
4. Apply fix
5. Run test
6. If still failing: return to step 2
7. If passing: write explanation
Agentic Workflow Architecture
Regardless of the chosen pattern, a mature agentic workflow has four fundamental components. Understanding them is the prerequisite for building robust systems in production.
1. Planner
The Planner receives the high-level goal and transforms it into a structured plan: a sequence (or DAG) of subtasks with dependencies, agent assignments, success criteria for each step, and resource estimates (token budget, required tools). A good Planner produces verifiable plans: every step has a well-defined expected output.
2. Executor
The Executor takes individual tasks from the Planner and executes them, using available tools:
file system, bash, web search, APIs. Each specialized Executor (backend agent, test agent,
doc agent) has access only to the tools necessary for its domain, following the principle of
least privilege. Claude Code implements this through the permissions system
and custom sub-agents with configurable allowedTools.
3. Reviewer
The Reviewer verifies that each Executor's output satisfies the success criteria defined by the Planner. It is not a simple "looks good": a quality Reviewer runs automated tests, static analysis, regression checks. The Reviewer can approve (workflow proceeds), request modifications (Executor retries), or escalate to a human (mandatory checkpoint).
4. Memory
Memory manages context across workflow steps. It has two levels:
- Short-term (in-context): the content of the current context window, including outputs from previous steps. Limited by available tokens.
-
Long-term (external): state files (e.g.,
claude-progress.txt), databases, git history. Enables resuming interrupted workflows across different sessions.
The claude-progress.txt Pattern
Anthropic recommends using a claude-progress.txt file in the project root for
inter-session memory. The initializer agent writes workflow state at each checkpoint; the
subsequent agent reads this file to understand where work stands and what to do next. Combined
with git log, it provides full context without saturating the context window.
Practical Implementation with Claude Code
Claude Code offers three main levers for building agentic workflows: the
CLAUDE.md file for guiding agent behavior, the Task tool
for delegating to sub-agents, and custom agents (defined in
.claude/agents/) for specialization. Let us see how to combine them.
CLAUDE.md Structure for Agentic Workflows
The CLAUDE.md file is your project's "constitution" for AI agents. A CLAUDE.md designed for agentic workflows includes not only project information, but the workflow structure itself: which agents exist, how they coordinate, what the success criteria are for each phase.
# CLAUDE.md - Agentic Workflow for Angular Project
## Project
Portfolio Angular with SSR, Angular 21, TypeScript strict.
Stack: Angular, Firebase, SCSS.
## Available Agentic Workflows
### Workflow: Feature Development
Use this workflow to implement new features:
**Phase 1 - Planning** (mandatory):
- Read all files related to the feature
- Create `docs/plans/[feature-name].md` with:
- Components to create/modify
- TypeScript interfaces needed
- Tests to write (TDD: write tests first)
- Dependencies and risks
- DO NOT implement until plan is approved
**Phase 2 - TDD Implementation**:
- Write unit tests BEFORE implementation
- Implement the minimum needed to pass tests
- Then refactor
- Verify `npm test` passes without errors
**Phase 3 - Review**:
- Run `npm run lint`
- Verify SSR build compiles: `npm run build`
- Check no regressions
### Workflow: Refactoring
For refactoring existing components:
1. Create branch: `git checkout -b refactor/[name]`
2. Analyze component dependencies with grep/Glob
3. Refactor ONE component at a time
4. Verify tests after each component
5. Atomic commit for each component
### Workflow: Debug
For bug fixing:
1. Reproduce the bug with a failing test
2. Identify root cause (do NOT fix symptoms)
3. Apply minimal fix
4. Verify that the test now passes
5. Check for regressions
## Specialized Agents
Available in `.claude/agents/`:
- `architect.md`: For architectural decisions
- `security-reviewer.md`: Before every commit
- `code-reviewer.md`: After every implementation
- `tdd-guide.md`: For TDD workflow
## Global Success Criteria
- TypeScript strict: zero errors `tsc --noEmit`
- Test coverage: minimum 80%
- SSR Build: `npm run build` must complete without errors
- No implicit `any`
- No state mutation (immutability pattern)
## Error Handling
If a command fails:
1. Read the complete error
2. DO NOT proceed to the next step
3. Identify and resolve before continuing
4. If unable after 2 attempts: STOP and ask for clarification
Prompt Engineering for Task Decomposition
Decomposition quality depends directly on the quality of the initial prompt. Here is a tested template for guiding an agent through complex task decomposition:
# Prompt Template: Task Decomposition
## Context
You are a Planning Agent. Your task is to decompose the following goal
into concrete, verifiable subtasks assignable to specialized agents.
## Goal
[GOAL DESCRIPTION]
## Constraints
- Each subtask must have: ID, description, expected input, expected output,
responsible agent, success criteria (automatically verifiable)
- Subtasks must be ordered by dependencies (DAG)
- No subtask should take more than [X] minutes / [Y] tokens
- Define mandatory checkpoints where a human must approve
## Expected Output
Produce a structured plan in this JSON format:
{
"goal": "goal description",
"estimated_complexity": "low|medium|high",
"subtasks": [
{
"id": "T001",
"description": "Analyze AuthService component structure",
"agent": "analyzer",
"inputs": ["src/app/services/auth.service.ts"],
"outputs": ["docs/analysis/auth-service.md"],
"success_criteria": ["file created", "contains sections: deps, interfaces, methods"],
"depends_on": [],
"estimated_tokens": 8000
},
{
"id": "T002",
"description": "Write tests for AuthService",
"agent": "tdd-agent",
"inputs": ["docs/analysis/auth-service.md", "src/app/services/auth.service.ts"],
"outputs": ["src/app/services/auth.service.spec.ts"],
"success_criteria": ["npm test -- --testPathPattern=auth.service passes"],
"depends_on": ["T001"],
"estimated_tokens": 12000
}
],
"checkpoints": ["after T001: plan review", "after T003: implementation review"],
"rollback_strategy": "git stash before every destructive change"
}
Using the Task Tool for Sub-Agents
Claude Code exposes the Task tool to delegate work to sub-agents. Each sub-agent operates in an isolated context with its own context window, which allows managing workflows that exceed the limits of a single session. Anthropic research (2026 Agentic Coding Trends Report) indicates the most effective pattern uses Opus for orchestration and Sonnet for workers, reducing costs by 40-60% while maintaining quality.
# Prompt for orchestrating sub-agents with Task tool
I need to refactor the authentication module of this Angular app.
I have analyzed the codebase and identified these independent parallel tasks:
**Task 1 - Security Review** (use Task tool):
Prompt: "Read src/app/services/auth.service.ts and all files that import it.
Analyze security vulnerabilities: token storage, session management,
CSRF protection. Produce a markdown report in docs/security/auth-review.md
with priorities: CRITICAL, HIGH, MEDIUM, LOW."
Tools: Read, Grep, Write
**Task 2 - Test Coverage Analysis** (use Task tool):
Prompt: "Analyze src/app/services/auth.service.spec.ts vs auth.service.ts.
Identify uncovered functions. Produce list in docs/analysis/test-gaps.md"
Tools: Read, Glob, Grep, Write
**Task 3 - Dependency Graph** (use Task tool):
Prompt: "Map all AuthService dependencies using grep and glob.
Create docs/analysis/auth-deps.md with ASCII dependency graph."
Tools: Read, Glob, Grep, Write
Run the 3 Tasks in parallel. When all complete, read the 3 reports
and produce docs/plans/auth-refactoring.md with the consolidated
refactoring plan, ordered by priority."
Advanced Workflow Patterns
Plan-Execute-Review Loop
The PER (Plan-Execute-Review) loop is the most robust pattern for complex workflows. Each iteration produces a verifiable artifact before proceeding to the next. The key is that the Review step is not optional: it is the mechanism that prevents error propagation.
Plan-Execute-Review Loop:
ITERATION 1:
Plan: "Analyze AuthService and create refactoring plan"
Execute: Agent reads code, writes docs/plans/auth.md
Review: Verify docs/plans/auth.md exists with required sections
-> PASS: proceed to iteration 2
-> FAIL: retry Execute (max 2 times), then escalate
ITERATION 2:
Plan: "Write tests for AuthService (based on plan)"
Execute: Agent writes auth.service.spec.ts
Review: `npm test -- --testPathPattern=auth` must pass
-> PASS: proceed to iteration 3
-> FAIL: agent debug + retry
ITERATION 3:
Plan: "Refactor AuthService (TDD: tests must stay green)"
Execute: Agent modifies auth.service.ts
Review: `npm test` + `tsc --noEmit` + `npm run lint`
-> PASS: proceed to iteration 4
-> FAIL: `git checkout src/app/services/auth.service.ts` + retry
ITERATION 4:
Plan: "Update components using AuthService"
Execute: Agent updates LoginComponent, ProfileComponent, etc.
Review: `npm run build` (SSR build completes)
-> PASS: workflow completed
-> FAIL: rollback + root cause analysis
Loop Metrics:
- Retry rate per iteration (ideal: <20%)
- Tokens consumed per iteration
- Time per iteration
- Overall success rate
Multi-Agent Code Review Pipeline
A multi-agent code review pipeline is one of the most mature use cases for agentic workflows in 2025. Each specialized agent brings a different perspective, and the aggregation produces a more complete review than any single agent could deliver.
Multi-Agent Code Review Pipeline:
Input: Pull Request with codebase changes
PHASE 1 - Parallel Review (4 agents in parallel):
Agent A: Security Reviewer
- Looks for: SQL injection, XSS, CSRF, exposed secrets
- Tools: Read, Grep (patterns: hardcoded secrets, eval, innerHTML)
- Output: security-report.md (CRITICAL/HIGH/MEDIUM/LOW)
Agent B: Performance Analyst
- Looks for: memory leaks, N+1 queries, bundle size impact
- Tools: Read, Glob, Bash (npm run analyze)
- Output: performance-report.md
Agent C: Type Safety Checker
- Looks for: implicit any, unsafe type assertions, missing null checks
- Tools: Read, Bash (tsc --noEmit --strict)
- Output: types-report.md
Agent D: Test Coverage
- Verifies: new functions have tests, edge cases covered
- Tools: Read, Bash (npm test -- --coverage)
- Output: coverage-report.md
PHASE 2 - Synthesis (1 agent):
Input: the 4 parallel reports
Task: Synthesize into PR-review.md with:
- Issues grouped by severity
- Blockers (no merge until resolved)
- Suggestions (optional but recommended)
- Positive findings (what is done well)
PHASE 3 - Human Checkpoint:
Developer reads PR-review.md and decides:
- Merge as is (zero blockers)
- Fix blockers and re-run pipeline
- Request clarification on specific issues
Self-Healing Workflows: Retry and Fallback
Production workflows fail. The question is not "if" but "when" and "how to recover." A self-healing workflow implements retry strategies with exponential backoff, automatic rollback to the last valid checkpoints, and fallback to alternative strategies when the primary path fails repeatedly.
Self-Healing Workflow Pattern:
STRATEGY 1: Retry with Backoff
Attempts: 1, 2, 3 (max)
Wait: 0s, 30s, 120s
Retry condition: transient error (timeout, rate limit)
NO retry condition: logic error (file not found, syntax error)
STRATEGY 2: Checkpoint + Rollback
Before every destructive change:
$ git stash push -m "checkpoint-[step-id]-[timestamp]"
If step fails after max retries:
$ git stash pop # rollback to previous state
-> Notify human with error context
STRATEGY 3: Alternative Path
Primary step: Refactor with TypeScript strict
If fails 3 times:
Fallback 1: Refactor with non-strict TypeScript + TODO comments
If still failing:
Fallback 2: Document the problem instead of solving
Escalate: create docs/issues/[step-id]-blocked.md
STRATEGY 4: Failure Isolation
10-step workflow:
Steps 1-5: completed successfully
Step 6: fails
-> DO NOT undo steps 1-5
-> Save "partially completed" state
-> Resume from step 6 after manual fix
IMPLEMENTATION in Claude Code:
"If the command fails, DO NOT proceed to the next step.
Before any file modifications, run:
git stash push -m 'pre-[description]'
If after 2 attempts the task does not work, STOP.
Create docs/blocked/[task-id].md with:
- Command that failed
- Complete error output
- Hypotheses about the cause
- What was completed before the block"
Tool Use and Context Management
Context management is probably the most subtle technical challenge in agentic workflows. A poorly managed context window produces degraded results, omissions, and hallucinations. A well-managed context window enables agentic sessions lasting hours across codebases of thousands of files.
Token Budget and Prioritization
Claude Code operates with a 200,000-token context window, but consuming all of it in a single session is an anti-pattern. Anthropic research suggests operating in the 60-80% range of the maximum context window to maintain consistent quality. Beyond 80%, response quality degrades significantly.
Context Management Strategies:
1. PROGRESSIVE SUMMARIZATION
After each completed step:
"Update claude-progress.txt with a concise summary of this step:
- What was done
- Modified files (list)
- Test status (pass/fail)
- Expected next step
MAX 200 words. Then use /compact to compress the conversation."
2. SELECTIVE LOADING
Do NOT read all project files at the start.
Read only files relevant to the current task:
- Use Glob to identify files by pattern
- Use Grep to find specific dependencies
- Read files only when necessary (lazy loading)
3. EXTERNAL STATE
Persistent state files (survive across sessions):
- claude-progress.txt: current workflow state
- docs/plans/[feature].md: approved plan
- docs/analysis/[component].md: completed analyses
Session start: "Read claude-progress.txt and active docs/plans/.
Tell me where we are in the workflow and what to do now."
4. EFFICIENT TOOL CHAINING
INSTEAD OF: Read 20 files + analyze everything
DO THIS:
1. Grep for specific pattern (find relevant files)
2. Glob for directory structure
3. Read ONLY identified relevant files
4. Process
Typical savings: 60-70% tokens
5. CHECKPOINT COMPACTION
Every 5-10 complex steps, use /compact in Claude Code.
Then reload context from claude-progress.txt.
Keep the session fresh for remaining tasks.
Anti-Pattern: Tool Call Storm
A common mistake is asking the agent to read files one by one with explicit loops.
This generates hundreds of tool calls and quickly saturates the context. Instead, use
patterns like Glob + Grep to identify relevant files, then read only those.
Claude Code can execute multiple tool calls in parallel when they are independent: this
feature reduces latency by 40-60% compared to sequential calls.
Metrics and Evaluation of Agentic Workflows
"It works" is not a metric. To evaluate and improve an agentic workflow, you need quantitative metrics across four dimensions: reliability, output quality, efficiency, and security.
Metrics Framework for Agentic Workflows:
RELIABILITY
- Task Completion Rate (TCR): % tasks completed without human intervention
Target: >85% for production workflows
- Retry Rate: % steps requiring more than 1 attempt
Target: <20% per step (>20% indicates poorly specified task)
- Escalation Rate: % steps escalated to human
Target: depends on domain risk (5-30%)
OUTPUT QUALITY
- Test Pass Rate: % tests passing after workflow
Target: 100% (zero regressions acceptable)
- Build Success Rate: % SSR builds completing without errors
Target: 100%
- Code Review Score: score from code-reviewer agent (1-10)
Target: >7 before merge
EFFICIENCY
- Tokens per Task: tokens consumed / task completed
Baseline: measure in first 10 executions
- End-to-End Latency: total workflow time
Optimize with parallelism when bottleneck identified
- Cost per Feature: total API cost / feature implemented
Target: business-defined, typical $0.10 - $2.00
SECURITY
- Destructive Operations Count: ops that modify/delete data
Flag every operation with rm, DROP, DELETE, overwrite
- Unauthorized Tool Use: tool calls not permitted by CLAUDE.md
Target: 0 (monitored via hooks)
- Secret Exposure: secrets in generated code
Automated check with grep patterns
Anti-Patterns to Avoid
The 2025 literature on agentic systems has identified recurring failure patterns. Knowing them in advance is the most efficient way to build robust workflows.
1. Over-Decomposition
Breaking a task into too many subtasks creates coordination overhead that exceeds the benefit. If a task takes less than 5 minutes and 3,000 tokens, it probably does not make sense to delegate it to a separate sub-agent. Multi-agent research shows that systems with more than 5-7 agents active simultaneously tend to have exponentially higher error rates (the so-called "17x error trap" documented by Towards Data Science, 2025).
Over-Decomposition: Example
Wrong: Creating 15 sub-agents to refactor 15 functions in a single 200-line
file. The coordination overhead (context setup for each agent, result merging, conflict
management) exceeds the time a single agent would have taken.
Correct: A single agent reads the file, identifies the 15 functions,
refactors them sequentially with intermediate tests. Sub-agents only for truly independent
files/modules of significant size.
2. Under-Specification
Vaguely described tasks produce unpredictable outputs. "Improve the code" is not a task: it is a hope. Every task must specify: what to do, on which files, what constraints to respect, how to verify success. Under-specification is the number 1 cause of workflows that appear to work but produce poor-quality output.
3. Context Pollution
Loading too much irrelevant context into the context window (unnecessary files, previous conversations, verbose documentation) degrades response quality. The "lost in the middle" phenomenon - documented in LLM research in 2024 - shows that LLMs pay less attention to information in the middle of the context window. Keep the context clean, focused, and structured.
4. Missing Rollback Strategy
The 2025 Replit incident - where an agent deleted a production database - is the most cited example of a workflow without a rollback strategy. Every destructive operation (delete, overwrite, DROP) must have an undo mechanism: git stash, backup, reversible transactions. "The agent knew what it was doing" is not a disaster recovery strategy.
5. No Human-in-the-Loop
Fully automated workflows without human checkpoints are appropriate only for low-risk, well-understood tasks. Anthropic research (2026 Agentic Coding Trends Report) shows that developers delegate completely (0% supervision) only 20% of tasks: the remaining 80% requires at least one review checkpoint. Design workflows with explicit Human-in-the-Loop for architectural decisions, deployments, and data modifications.
6. Agent Without Inter-Session Memory
Every new Claude Code session starts from scratch. A complex workflow (5+ hours of work)
that does not write external state is destined to lose progress. Always use
claude-progress.txt, plan files in docs/ and frequent commits
as persistence mechanisms.
Case Study: Angular Codebase Refactoring with Agentic Workflow
Let us see how these principles apply to a real case: refactoring the blog module of an Angular portfolio from a legacy architecture (large components, logic in templates, no tests) to a modern architecture (small components, separate services, 80%+ coverage).
Context
- Codebase: Angular 21, SSR, ~3,000 lines in the blog module
- Problem: 0% test coverage, 500+ line components, no separation of concerns
- Goal: complete refactoring without functional regressions
- Constraint: active production, zero downtime tolerated
Designed Workflow
WORKFLOW: Blog Module Refactoring
PHASE 0 - Setup (5 min, human):
$ git checkout -b refactor/blog-module
Create claude-progress.txt with initial state
Create docs/plans/blog-refactoring.md (empty, agent will fill)
PHASE 1 - Analysis (agent, ~30 min):
Task: "Analyze the complete blog module.
Read all files in src/app/articles/ and src/app/services/blog.service.ts.
Create docs/analysis/blog-module.md with:
- Complete list of components and their sizes (lines)
- Dependencies between components (who imports whom)
- Identified duplicate logic
- Separation of concerns violations
- Refactoring priority (impact x effort)"
Review: Human reads docs/analysis/blog-module.md and approves plan
PHASE 2 - Test Foundation (agent, ~60 min):
Task: "Write E2E tests for critical blog features
BEFORE any refactoring. Use Playwright.
Features to cover:
- Navigate to article list
- Open specific article
- Series navigation (prev/next)
- IT/EN language switch
Tests must pass on the CURRENT version of the code."
Review: `npm run test:e2e` must have 100% green E2E tests
PHASE 3 - Service Extraction (agent, ~90 min, iterative):
Task: "Extract logic from BlogComponent into separate services.
ONE service at a time, in this order:
1. BlogFilterService (article filtering and search)
2. BlogSeriesService (series navigation)
3. BlogSEOService (meta tags for articles)
For each service:
a) Create .service.ts file with extracted logic
b) Create .service.spec.ts with unit tests
c) Update BlogComponent to use the service
d) Verify: npm test passes, npm run build passes
e) Commit: git commit -am 'refactor: extract [ServiceName]'"
Review after each service: E2E tests must stay green
PHASE 4 - Component Split (agent, ~120 min, iterative):
Task: "Split BlogComponent (current 480 lines) into:
- BlogListComponent (article list)
- BlogCardComponent (single article card)
- BlogFilterComponent (filters and search)
- BlogPaginationComponent (pagination)
One component at a time. Same pattern as PHASE 3:
implement, test, verify build, atomic commit."
Review: Human verifies UI visually + E2E tests
PHASE 5 - Coverage Check (agent, ~30 min):
Task: "Verify coverage is >80% for all new files.
For each file below threshold, add missing tests.
Produce report in docs/analysis/coverage-report.md"
Review: Coverage report + npm test
PHASE 6 - Final Review (human):
- Reads complete PR diff
- Verifies docs/analysis/coverage-report.md
- Merges to master if all ok
TYPICAL RESULTS FOR THIS WORKFLOW:
Coverage: 0% -> 82%
Component size: 480 lines -> 95 lines average
Build time: -15% (smaller components, more effective lazy loading)
Human time: ~45 min (phase 0 + review + phase 6)
Agent time: ~5-6 hours
Estimated API cost: $3-8 (depends on model)
Lessons Learned from the Case Study
Three insights from running this type of workflow on real codebases:
- E2E tests before refactoring are non-negotiable. Without a functional safety net, every refactoring step is a leap in the dark. The time invested in PHASE 2 (E2E tests) pays off 10x in preventing undetected regressions.
-
Atomic commits are workflow memory. One commit per refactored
service/component makes rollback surgical: if a refactoring introduces a problem,
you do
git reverton that single commit without losing previous work. - Human time drops to 10-15% of the total. With a well-designed workflow, the developer spends most of their time reviewing outputs and approving checkpoints, not writing code. This is mature vibe coding: not delegating everything, but delegating strategically while maintaining oversight on key decisions.
Key Data Point
21% of YC Winter 2025 startups have a codebase with over 91% AI-generated code. But the most mature of these startups do not use "raw" vibe coding: they use structured agentic workflows with human checkpoints, automated tests, and rollback strategies. The difference between "generated code" and "quality software generated by AI" is precisely the structure of the workflow.
The Future of Agentic Workflows
Research in 2025-2026 indicates three directions of evolution for agentic workflows in the context of software development:
- Dynamic Task Decomposition: Frameworks like TDAG (2025) shift decomposition from static (defined by the developer) to dynamic (the agent decides the workflow structure based on the problem). Early results are promising but still require human supervision for production codebases.
- Persistent Agent Memory: The integration of vector databases with agent frameworks (LangGraph + pgvector, CrewAI + Chroma) allows agents to remember patterns across different projects, accumulating "experience" on specific codebases.
- Agent Skills as Standard: Anthropic introduced Agent Skills in 2025: bundles of instructions, scripts, and resources that agents can load dynamically. The idea is that skills like "Angular Expert", "Security Auditor", or "Performance Optimizer" become reusable modules across teams and organizations.
Conclusions
Agentic workflows are not magic: they are architecture. The difference between an agentic system that works and one that fails almost always comes down to decomposition quality, clarity of success criteria, and robustness of error recovery strategies.
The four patterns (Sequential, Parallel, Hierarchical, Iterative) are not mutually exclusive: the most effective workflows combine them. A hierarchical Planner can delegate parallel tasks, each of which uses an iterative loop to reach the required quality. Structure always serves the specific problem.
The most important practical step you can take today: take a complex task that you typically solve in a single AI session and try to decompose it into 3-5 verifiable subtasks. Define the success criteria for each before starting. Add a human checkpoint after each critical phase. Then measure: does the structured workflow produce better output? Almost certainly yes. And that is the starting point for becoming an architect of agentic systems.
Series: Vibe Coding and Agentic Development
- 01 - Vibe Coding: The Paradigm That Changed 2025
- 02 - Claude Code: Agentic Development from the Terminal
- 03 - Agentic Workflows: Decomposing Problems for AI (this article)
- 04 - Multi-Agent Coding: LangGraph, CrewAI and AutoGen
- 05 - Testing AI-Generated Code
- 06 - Prompt Engineering for IDEs and Code Generation
- 07 - Security in Vibe Coding: Risks and Mitigations
- 08 - The Future of Agentic Development in 2026
Related Reading
- Claude and the Model Context Protocol (MCP) - How MCP extends agent capabilities with external tools
- Web Security: APIs and Vulnerabilities - Security risks specific to agentic workflows







