Inside Loki Mode: 41 Agents, 8 Swarms, and the RARV Cycle

People ask me how Loki Mode actually works. Not the elevator pitch, not the high-level architecture, but the specifics. How do 41 agents coordinate without stepping on each other? How does the RARV cycle enforce discipline? What happens when an agent produces bad output?

This is the deep dive.

The Agent Taxonomy

Loki Mode defines 41 agent types, and each one has a specific job. This is not 41 variations of the same general-purpose assistant. Each agent type has defined inputs, expected outputs, evaluation criteria, and failure modes.

Here is how they break down across the 8 swarms:

Planning Swarm (6 agents):

Requirements Analyst: decomposes user requests into structured requirements
Task Decomposer: breaks requirements into atomic, actionable tasks
Risk Assessor: identifies technical risks, dependencies, and potential blockers
Architecture Advisor: evaluates architectural implications of proposed changes
Estimation Agent: provides complexity and effort estimates for tasks
Priority Ranker: orders tasks based on dependencies, risk, and value

Implementation Swarm (7 agents):

Code Generator: writes new code based on specifications
Refactoring Agent: restructures existing code while preserving behavior
API Designer: designs API contracts and endpoint specifications
Schema Designer: creates and modifies database schemas
Migration Writer: produces database migration scripts
Config Manager: handles configuration file changes
Integration Agent: implements connections between components

Review Swarm (6 agents):

Code Reviewer: evaluates code quality, patterns, and conventions
Security Auditor: checks for vulnerabilities and security anti-patterns
Performance Analyst: identifies performance bottlenecks and optimization opportunities
Accessibility Reviewer: evaluates UI changes for accessibility compliance
API Contract Reviewer: validates API changes against contracts and conventions
Dependency Auditor: reviews dependency changes for security and compatibility

Testing Swarm (5 agents):

Unit Test Writer: creates unit tests for individual functions and methods
Integration Test Writer: builds tests for component interactions
E2E Test Writer: develops end-to-end test scenarios
Test Strategy Agent: designs overall testing approach for a feature
Coverage Analyst: evaluates test coverage and identifies gaps

Documentation Swarm (4 agents):

API Doc Writer: generates API documentation from code and contracts
Architecture Doc Writer: creates and updates architecture documentation
Runbook Writer: produces operational runbooks for new features
Inline Doc Agent: adds meaningful code comments and type annotations

DevOps Swarm (5 agents):

Pipeline Agent: creates and modifies CI/CD pipeline configurations
IaC Agent: writes infrastructure as code (Terraform, CloudFormation, etc.)
Deploy Strategist: designs deployment strategies (blue-green, canary, etc.)
Monitor Agent: sets up monitoring, alerting, and dashboards
Container Agent: manages Dockerfiles and container configurations

Debug Swarm (4 agents):

Root Cause Analyst: investigates failures and identifies root causes
Log Analyst: examines log output to find error patterns
Repro Agent: creates minimal reproduction steps for bugs
Fix Verifier: validates that a fix actually resolves the reported issue

Research Swarm (4 agents):

Tech Evaluator: assesses technologies for specific use cases
Pattern Researcher: finds relevant patterns and prior art
Dependency Researcher: evaluates potential dependencies for quality and fit
Migration Planner: plans migration strategies for technology transitions

The RARV Cycle in Detail

Every significant task enters the RARV pipeline: Reason, Act, Reflect, Verify. Here is what happens at each stage.

Reason Phase

The Planning Swarm activates. The Requirements Analyst parses the incoming task and produces a structured specification. The Task Decomposer breaks it into subtasks. The Risk Assessor flags potential problems. The Architecture Advisor evaluates whether the proposed approach fits the existing system.

# Reason phase output structure
{
  "requirements": [...],
  "subtasks": [...],
  "risks": [...],
  "architecture_notes": "...",
  "estimated_complexity": "medium",
  "recommended_approach": "..."
}

The Reason phase must produce a valid plan before the pipeline advances. If the Planning Swarm determines the task is ambiguous, under-specified, or architecturally risky, it flags the task for human review rather than proceeding with assumptions.

This is a quality gate. No plan, no execution.

Act Phase

The Implementation Swarm takes the plan and executes. Each subtask is assigned to the appropriate agent type. Code generation goes to the Code Generator. Schema changes go to the Schema Designer. Configuration updates go to the Config Manager.

Agents in the Act phase operate independently on their assigned subtasks but share a coordination context. If the Schema Designer creates a new table, the Code Generator's context includes that schema change. This prevents agents from working against each other.

# Act phase coordination
for subtask in "${SUBTASKS[@]}"; do
    agent_type=$(determine_agent "$subtask")
    context=$(build_context "$subtask" "$SHARED_STATE")
    result=$(invoke_agent "$agent_type" "$subtask" "$context")
    update_shared_state "$result"
done

The coordination model is sequential within dependency chains and parallel across independent subtasks. If subtask B depends on subtask A, A completes before B starts. If subtasks C and D are independent, they can execute in parallel.

Reflect Phase

The Review Swarm examines everything the Implementation Swarm produced. This is where the three parallel Opus feedback loops come in.

Three independent review agents examine the same work. Each brings a different perspective:

Code quality review: patterns, conventions, readability, maintainability
Security review: vulnerabilities, input validation, authentication, authorization
Performance review: efficiency, scalability, resource usage, query optimization

The three reviews are aggregated and deduplicated. Issues found by multiple reviewers are prioritized higher. Issues found by only one reviewer are still flagged but may receive lower priority.

# Reflect phase: parallel review
review_1=$(invoke_agent "code_reviewer" "$implementation" &)
review_2=$(invoke_agent "security_auditor" "$implementation" &)
review_3=$(invoke_agent "performance_analyst" "$implementation" &)
wait

findings=$(aggregate_reviews "$review_1" "$review_2" "$review_3")

If the Reflect phase identifies blocking issues, the pipeline loops back to the Act phase with the review findings as additional context. The Implementation Swarm addresses the issues, and the work goes through Reflect again. This loop continues until the reviews pass or a maximum iteration count is reached.

Verify Phase

The Testing Swarm validates the implementation. The Test Strategy Agent determines what types of tests are needed. The appropriate test writers create them. The tests execute against the implementation.

Verification is not just "do the tests pass." It includes:

Functional verification: does the implementation meet the original requirements?
Regression verification: did the changes break anything that was working before?
Coverage verification: is the test coverage above the configured threshold?
Lint verification: does the code pass all configured linting rules?

# Verify phase gates
GATE_TESTS_PASS=true
GATE_MIN_COVERAGE=80
GATE_LINT_CLEAN=true
GATE_NO_REGRESSIONS=true

If any gate fails, the pipeline routes back to the appropriate phase. Test failures go to the Debug Swarm for root cause analysis, then back to Implementation for fixes. Coverage gaps go back to Testing for additional test creation. Lint failures go back to Implementation for cleanup.

Error Handling and Recovery

One of the harder engineering challenges in Loki Mode is handling failures gracefully. Agents can fail in several ways:

Model errors. The underlying LLM returns an error, hits a rate limit, or produces malformed output. The retry logic handles transient errors with exponential backoff. Persistent errors trigger a provider fallback if configured.

Quality failures. An agent produces output that does not meet quality criteria. The RARV loop handles this by routing back through Reflect and Act.

Coordination failures. An agent's output contradicts another agent's work. The shared state mechanism catches most of these, but edge cases exist. When detected, the conflicting subtasks are re-executed with explicit conflict resolution instructions.

Timeout failures. Complex tasks can exceed time budgets. The orchestrator tracks elapsed time and can abort long-running agents, splitting their work into smaller pieces for retry.

The Shell Orchestration Layer

All of this runs on shell scripts. I get questions about this choice regularly, so let me address it directly.

The orchestration layer is approximately 2,000 lines of bash across the core scripts. It handles provider invocation, state management, quality gate evaluation, and pipeline flow control. It does not handle anything that bash is bad at, like complex data transformation or UI rendering.

Shell is the right choice for three reasons:

Zero dependencies. Any machine with bash and an AI CLI tool can run Loki Mode. No Python virtual environments, no Node.js version management, no Docker requirement.
Natural process orchestration. Shell is designed to launch processes, pipe their output, manage background jobs, and coordinate execution. This is exactly what an AI agent orchestrator does.
Transparency. You can read every line of the orchestration logic. There is no framework magic, no dependency injection, no abstract base classes. The code does what it says it does.

Lessons from Production Use

Running Loki Mode on real projects has taught me things that designing it never could.

The RARV cycle's biggest value is not the individual phases; it is the transitions between them. The moment where planning output becomes implementation input, where implementation output becomes review input: these transitions are where quality issues are caught. Getting the transition contracts right matters more than optimizing any individual phase.

Agent specialization works. A dedicated security auditor catches vulnerabilities that a general-purpose reviewer misses. A dedicated performance analyst identifies optimization opportunities that a code quality reviewer overlooks. Specialization is worth the complexity it adds to the system.

The three-reviewer pattern catches more issues than a single reviewer but not three times as many. The overlap is typically around 40%, meaning each additional reviewer adds unique findings but with diminishing returns. Three reviewers appears to be the sweet spot between coverage and cost.

Quality gates need to be configurable. Different projects have different standards. A prototype does not need 80% test coverage. A production financial system might need 95%. Making gates configurable without making them optional was an important design decision.

The system works best when humans stay in the loop for high-level decisions and let agents handle the mechanical work. Architecture choices, requirement prioritization, and risk tolerance are human decisions. Code generation, test writing, and review are agent tasks. This division of labor produces the best results.

Inside Loki Mode: 41 Agents, 8 Swarms, and the RARV Cycle

The Agent Taxonomy

The RARV Cycle in Detail

Reason Phase

Act Phase

Reflect Phase

Verify Phase

Error Handling and Recovery

The Shell Orchestration Layer

Lessons from Production Use

keep reading

gstack vs Loki Mode: 105,000 Stars Does Not Settle the Argument

AI Agents Are Already Out of Control and Nobody is Ready

The State of AI Agents in Early 2026

get this in your inbox