Inside Loki Mode: 41 Agents, 8 Swarms, and the RARV Cycle
A technical deep dive into how Loki Mode orchestrates 41 specialized AI agents across 8 swarms using the Reason-Act-Reflect-Verify cycle
People ask me how Loki Mode actually works. Not the elevator pitch, not the high-level architecture, but the specifics. How do 41 agents coordinate without stepping on each other? How does the RARV cycle enforce discipline? What happens when an agent produces bad output?
This is the deep dive.
The Agent Taxonomy
Loki Mode defines 41 agent types, and each one has a specific job. This is not 41 variations of the same general-purpose assistant. Each agent type has defined inputs, expected outputs, evaluation criteria, and failure modes.
Here is how they break down across the 8 swarms:
Planning Swarm (6 agents):
- Requirements Analyst: decomposes user requests into structured requirements
- Task Decomposer: breaks requirements into atomic, actionable tasks
- Risk Assessor: identifies technical risks, dependencies, and potential blockers
- Architecture Advisor: evaluates architectural implications of proposed changes
- Estimation Agent: provides complexity and effort estimates for tasks
- Priority Ranker: orders tasks based on dependencies, risk, and value
Implementation Swarm (7 agents):
- Code Generator: writes new code based on specifications
- Refactoring Agent: restructures existing code while preserving behavior
- API Designer: designs API contracts and endpoint specifications
- Schema Designer: creates and modifies database schemas
- Migration Writer: produces database migration scripts
- Config Manager: handles configuration file changes
- Integration Agent: implements connections between components
Review Swarm (6 agents):
- Code Reviewer: evaluates code quality, patterns, and conventions
- Security Auditor: checks for vulnerabilities and security anti-patterns
- Performance Analyst: identifies performance bottlenecks and optimization opportunities
- Accessibility Reviewer: evaluates UI changes for accessibility compliance
- API Contract Reviewer: validates API changes against contracts and conventions
- Dependency Auditor: reviews dependency changes for security and compatibility
Testing Swarm (5 agents):
- Unit Test Writer: creates unit tests for individual functions and methods
- Integration Test Writer: builds tests for component interactions
- E2E Test Writer: develops end-to-end test scenarios
- Test Strategy Agent: designs overall testing approach for a feature
- Coverage Analyst: evaluates test coverage and identifies gaps
Documentation Swarm (4 agents):
- API Doc Writer: generates API documentation from code and contracts
- Architecture Doc Writer: creates and updates architecture documentation
- Runbook Writer: produces operational runbooks for new features
- Inline Doc Agent: adds meaningful code comments and type annotations
DevOps Swarm (5 agents):
- Pipeline Agent: creates and modifies CI/CD pipeline configurations
- IaC Agent: writes infrastructure as code (Terraform, CloudFormation, etc.)
- Deploy Strategist: designs deployment strategies (blue-green, canary, etc.)
- Monitor Agent: sets up monitoring, alerting, and dashboards
- Container Agent: manages Dockerfiles and container configurations
Debug Swarm (4 agents):
- Root Cause Analyst: investigates failures and identifies root causes
- Log Analyst: examines log output to find error patterns
- Repro Agent: creates minimal reproduction steps for bugs
- Fix Verifier: validates that a fix actually resolves the reported issue
Research Swarm (4 agents):
- Tech Evaluator: assesses technologies for specific use cases
- Pattern Researcher: finds relevant patterns and prior art
- Dependency Researcher: evaluates potential dependencies for quality and fit
- Migration Planner: plans migration strategies for technology transitions
The RARV Cycle in Detail
Every significant task enters the RARV pipeline: Reason, Act, Reflect, Verify. Here is what happens at each stage.
Reason Phase
The Planning Swarm activates. The Requirements Analyst parses the incoming task and produces a structured specification. The Task Decomposer breaks it into subtasks. The Risk Assessor flags potential problems. The Architecture Advisor evaluates whether the proposed approach fits the existing system.
# Reason phase output structure
{
"requirements": [...],
"subtasks": [...],
"risks": [...],
"architecture_notes": "...",
"estimated_complexity": "medium",
"recommended_approach": "..."
}
The Reason phase must produce a valid plan before the pipeline advances. If the Planning Swarm determines the task is ambiguous, under-specified, or architecturally risky, it flags the task for human review rather than proceeding with assumptions.
This is a quality gate. No plan, no execution.
Act Phase
The Implementation Swarm takes the plan and executes. Each subtask is assigned to the appropriate agent type. Code generation goes to the Code Generator. Schema changes go to the Schema Designer. Configuration updates go to the Config Manager.
Agents in the Act phase operate independently on their assigned subtasks but share a coordination context. If the Schema Designer creates a new table, the Code Generator's context includes that schema change. This prevents agents from working against each other.
# Act phase coordination
for subtask in "${SUBTASKS[@]}"; do
agent_type=$(determine_agent "$subtask")
context=$(build_context "$subtask" "$SHARED_STATE")
result=$(invoke_agent "$agent_type" "$subtask" "$context")
update_shared_state "$result"
done
The coordination model is sequential within dependency chains and parallel across independent subtasks. If subtask B depends on subtask A, A completes before B starts. If subtasks C and D are independent, they can execute in parallel.
Reflect Phase
The Review Swarm examines everything the Implementation Swarm produced. This is where the three parallel Opus feedback loops come in.
Three independent review agents examine the same work. Each brings a different perspective:
- Code quality review: patterns, conventions, readability, maintainability
- Security review: vulnerabilities, input validation, authentication, authorization
- Performance review: efficiency, scalability, resource usage, query optimization
The three reviews are aggregated and deduplicated. Issues found by multiple reviewers are prioritized higher. Issues found by only one reviewer are still flagged but may receive lower priority.
# Reflect phase: parallel review
review_1=$(invoke_agent "code_reviewer" "$implementation" &)
review_2=$(invoke_agent "security_auditor" "$implementation" &)
review_3=$(invoke_agent "performance_analyst" "$implementation" &)
wait
findings=$(aggregate_reviews "$review_1" "$review_2" "$review_3")
If the Reflect phase identifies blocking issues, the pipeline loops back to the Act phase with the review findings as additional context. The Implementation Swarm addresses the issues, and the work goes through Reflect again. This loop continues until the reviews pass or a maximum iteration count is reached.
Verify Phase
The Testing Swarm validates the implementation. The Test Strategy Agent determines what types of tests are needed. The appropriate test writers create them. The tests execute against the implementation.
Verification is not just "do the tests pass." It includes:
- Functional verification: does the implementation meet the original requirements?
- Regression verification: did the changes break anything that was working before?
- Coverage verification: is the test coverage above the configured threshold?
- Lint verification: does the code pass all configured linting rules?
# Verify phase gates
GATE_TESTS_PASS=true
GATE_MIN_COVERAGE=80
GATE_LINT_CLEAN=true
GATE_NO_REGRESSIONS=true
If any gate fails, the pipeline routes back to the appropriate phase. Test failures go to the Debug Swarm for root cause analysis, then back to Implementation for fixes. Coverage gaps go back to Testing for additional test creation. Lint failures go back to Implementation for cleanup.
Error Handling and Recovery
One of the harder engineering challenges in Loki Mode is handling failures gracefully. Agents can fail in several ways:
Model errors. The underlying LLM returns an error, hits a rate limit, or produces malformed output. The retry logic handles transient errors with exponential backoff. Persistent errors trigger a provider fallback if configured.
Quality failures. An agent produces output that does not meet quality criteria. The RARV loop handles this by routing back through Reflect and Act.
Coordination failures. An agent's output contradicts another agent's work. The shared state mechanism catches most of these, but edge cases exist. When detected, the conflicting subtasks are re-executed with explicit conflict resolution instructions.
Timeout failures. Complex tasks can exceed time budgets. The orchestrator tracks elapsed time and can abort long-running agents, splitting their work into smaller pieces for retry.
The Shell Orchestration Layer
All of this runs on shell scripts. I get questions about this choice regularly, so let me address it directly.
The orchestration layer is approximately 2,000 lines of bash across the core scripts. It handles provider invocation, state management, quality gate evaluation, and pipeline flow control. It does not handle anything that bash is bad at, like complex data transformation or UI rendering.
Shell is the right choice for three reasons:
-
Zero dependencies. Any machine with bash and an AI CLI tool can run Loki Mode. No Python virtual environments, no Node.js version management, no Docker requirement.
-
Natural process orchestration. Shell is designed to launch processes, pipe their output, manage background jobs, and coordinate execution. This is exactly what an AI agent orchestrator does.
-
Transparency. You can read every line of the orchestration logic. There is no framework magic, no dependency injection, no abstract base classes. The code does what it says it does.
Lessons from Production Use
Running Loki Mode on real projects has taught me things that designing it never could.
The RARV cycle's biggest value is not the individual phases; it is the transitions between them. The moment where planning output becomes implementation input, where implementation output becomes review input: these transitions are where quality issues are caught. Getting the transition contracts right matters more than optimizing any individual phase.
Agent specialization works. A dedicated security auditor catches vulnerabilities that a general-purpose reviewer misses. A dedicated performance analyst identifies optimization opportunities that a code quality reviewer overlooks. Specialization is worth the complexity it adds to the system.
The three-reviewer pattern catches more issues than a single reviewer but not three times as many. The overlap is typically around 40%, meaning each additional reviewer adds unique findings but with diminishing returns. Three reviewers appears to be the sweet spot between coverage and cost.
Quality gates need to be configurable. Different projects have different standards. A prototype does not need 80% test coverage. A production financial system might need 95%. Making gates configurable without making them optional was an important design decision.
The system works best when humans stay in the loop for high-level decisions and let agents handle the mechanical work. Architecture choices, requirement prioritization, and risk tolerance are human decisions. Code generation, test writing, and review are agent tasks. This division of labor produces the best results.