Q3 2025: Best of the Quarter in AI

Every quarter, I take stock of the AI developments that mattered most and try to separate signal from noise. Q3 2025 has been one of the most consequential quarters yet, with meaningful progress in autonomous AI systems, open source model releases, and the maturing of agent infrastructure protocols.

Here is what mattered.

Autonomous Coding Matures

The biggest theme of Q3 was autonomous AI coding moving from demo to daily use. Earlier in the year, autonomous coding agents were impressive but unreliable. By the end of Q3, teams are running them in production workflows with measurable results.

Claude Code's evolution has been notable. The combination of tool use, file system access, and terminal integration creates an agent that can handle real engineering tasks end to end. I have been running Loki Mode on top of Claude Code for months now, and the reliability improvement over the quarter has been significant.

Codex CLI and Gemini CLI have also matured. The provider landscape is genuinely competitive now, which benefits everyone. When three major AI labs are iterating rapidly on coding agents, the pace of improvement accelerates for users of any provider.

The key insight from Q3: autonomous coding is not about replacing developers. The teams seeing the biggest gains are the ones using autonomous agents for the mechanical parts of engineering (writing tests, generating boilerplate, refactoring, documentation) while humans focus on architecture, design, and judgment-heavy decisions.

Open Source Models Keep Closing the Gap

The trend that started with DeepSeek R1 earlier in the year continued through Q3. Open source models are competitive with closed-source offerings for an increasing number of use cases.

Meta's Llama releases continued to push the open source frontier. The models are good enough for many production use cases, and the community around fine-tuning and deployment continues to grow.

The practical implication is that the cost of building AI-powered applications is dropping. When you can run a capable model on your own infrastructure, the economics of AI-assisted workflows change fundamentally. Inference costs are no longer the limiting factor they were a year ago.

For builders of agent systems, open source models open up architectures that were not economically viable before. Running specialized fine-tuned models for specific agent types, using smaller models for simple tasks and larger models for complex ones, deploying models locally for latency-sensitive operations: these are all practical options now.

MCP Ecosystem Grows

The Model Context Protocol ecosystem had a strong quarter. The number of available MCP servers has grown significantly, covering services across developer tools, cloud infrastructure, communication, and enterprise software.

More importantly, the quality of MCP servers has improved. Early MCP servers were often proof-of-concept implementations with minimal error handling and limited feature coverage. The servers being published now are production-grade, with comprehensive API coverage, robust error handling, and proper documentation.

The MCP registry and discovery infrastructure is maturing as well. Finding the right MCP server for a specific service is becoming easier, and installation is becoming more streamlined. We are not at the "npm install" level of simplicity yet, but the trajectory is clear.

Google's A2A protocol announcement earlier in the year is starting to bear fruit with initial implementations. The combination of MCP for agent-to-tool communication and A2A for agent-to-agent communication is creating a comprehensive protocol stack for agent systems.

Reasoning Models Improve

The reasoning model category that DeepSeek R1 and OpenAI's o1 series established has continued to develop. Models that can show their work, think through problems step by step, and self-correct during reasoning produce meaningfully better results on complex tasks.

For agent systems, reasoning models are particularly valuable in the planning and review phases. When a planning agent can reason through the implications of an architectural decision before committing to it, the quality of the plan improves. When a review agent can reason about whether a code change introduces subtle bugs, the review is more thorough.

The cost of reasoning models remains higher than standard models, which creates interesting optimization questions. Not every agent invocation needs deep reasoning. Simple code generation tasks work fine with standard models. Complex architectural decisions benefit from reasoning models. Building agent systems that route to the appropriate model type based on task complexity is an active area of development.

Enterprise Adoption Accelerates

Q3 saw a noticeable shift in enterprise AI adoption. The "experimental" phase is ending. Companies are moving from pilots to production deployments of AI coding tools.

The driver is not hype; it is data. Organizations that ran rigorous pilots are seeing measurable productivity improvements. Not the 10x claims from vendor marketing, but genuine 20 to 40 percent improvements in specific workflows. Those numbers, applied across an engineering organization, translate to meaningful business impact.

The compliance and security concerns that slowed enterprise adoption are being addressed. SOC 2 compliant AI tool deployments, data residency controls, audit logging for AI-generated code changes: the enterprise requirements are being met by vendors and open source solutions alike.

Regulation Takes Shape

Regulatory frameworks for AI continued to develop through Q3. The EU AI Act's implementation is progressing, and other jurisdictions are developing their own approaches. For builders of AI systems, the regulatory landscape is becoming clearer, if not simpler.

The most relevant development for AI agent builders is the emerging consensus around transparency and auditability requirements. Autonomous systems that modify code and interact with production infrastructure need to be auditable. You need to be able to explain what the system did, why it did it, and how to verify that it did the right thing.

This is actually positive for well-engineered autonomous systems. The quality gates, audit logs, and verification mechanisms that responsible builders have already implemented are exactly what regulators are looking for. If you built your system with these features from the start, compliance is a documentation exercise, not an engineering overhaul.

What I Shipped

On the personal front, Q3 was productive.

The Autonomi framework launched, unifying Loki Mode, LokiMCPUniverse, and the application-layer projects under a coherent architecture. This was more of a documentation and integration effort than a feature development effort, but making the connections between projects explicit has been valuable for adoption.

K9s GUI reached a stable release with positive feedback from teams using it to make Kubernetes more accessible to non-platform engineers.

The MIT AI/ML program continued through the summer, and the intersection of theory and practice is proving as valuable as I hoped. Formal foundations are informing practical design decisions in Loki Mode, particularly around the review aggregation and conflict resolution mechanisms.

What to Watch in Q4

Looking ahead, several trends are worth watching.

Agent benchmarks. The industry needs standardized benchmarks for autonomous coding agents. How do you compare Agent A to Agent B? The current approach of anecdotal comparisons and demo videos is not sufficient. Expect to see more rigorous evaluation frameworks emerge.

Multi-agent standardization. A2A implementations will mature, and we will start seeing multi-agent systems that interoperate across frameworks. Today, Loki Mode agents only work within Loki Mode. In the near future, agents from different systems should be able to collaborate.

Cost optimization. As autonomous agents generate more API calls, the cost of running them at scale becomes a first-class concern. Expect innovation in model routing (using cheaper models for simple tasks), caching (reusing results for similar prompts), and batching (combining multiple small requests).

Safety frameworks. As autonomous systems become more capable, the safety conversation will become more practical. Expect frameworks and tools specifically designed for testing, monitoring, and constraining autonomous AI agents.

Q3 was a quarter where autonomous AI moved from promising to practical. Q4 is where the infrastructure and ecosystem around it solidifies. It is a good time to be building in this space.

Q3 2025: Best of the Quarter in AI

Autonomous Coding Matures

Open Source Models Keep Closing the Gap

MCP Ecosystem Grows

Reasoning Models Improve

Enterprise Adoption Accelerates

Regulation Takes Shape

What I Shipped

What to Watch in Q4

keep reading

Q4 2025: Best of the Quarter in AI

gstack vs Loki Mode: 105,000 Stars Does Not Settle the Argument

AI Agents Are Already Out of Control and Nobody is Ready

get this in your inbox