Published:

April 30, 2025

6 Min Read

Written By

Nicolas Belissent

AI Engineer

April 30, 2025

6 Min Read

Tools not Rules

Designing a really good agent is hard. You want responses that answer user questions. You want sources for users to dig deeper. You need a consistent output format without streaming issues. These basic user expectations seem simple. Yet they create a complex technical challenge.

As we worked to address these challenges, we began identifying certain limitations in the pure rule-based approach, even with step-wise frameworks like Langraph that simplify this approach. We kept hearing about Model Context Protocol (MCP) and decided to explore whether it might solve some of our specific challenges.

Understanding the current state

At StackOne, we've built StackChat, a retrieval-augmented agent that delivers precise API information by dynamically accessing and synthesizing answers from our proprietary knowledge base, featuring custom retrieval and specialized processing of API specifications. Currently used internally (for now) by our solutions engineers, sales team, and integrations team, this tool streamlines access to our technical documentation.

StackChat currently uses a Directed Acyclic Graph (DAG) within its AI pipelines. This structure consists of a workflow in which components are executed in a defined sequence. Think of it as an assembly line for information — each station (node) performs its specialized task before passing results to the next station.

This implementation followed a straightforward architecture:

Question Classifier: Categorises incoming queries as either:

Irrelevant queries (sent to spam response)
Easy queries (processed directly without additional context)
Hard queries (requiring complex retrieval and reasoning)\

Planner: For complex queries, generates lists of sub-queries for searches.
Searcher: Multiple search paths executed in parallel:

Vector search against our knowledge base
Web search for supplementary information

Search Evaluator: Reranks search results for relevance before passing them to the response generator.
Feedback Loop: Can trigger additional searches when needed.

The Strengths

DAG architectures excel at structured workflows with clearly defined steps. Our implementation gave us several key advantages:

Control over structure: We could precisely define how information flowed through the system
Control over models: Each node could use different models optimized for specific tasks
Comprehensive logging: Every step produced logs, enabling detailed analysis and debugging
Deterministic behavior: The same input reliably produced the same output path
Effective guardrails: We could implement safety checks at each processing stage
Consistent formatting: Response structures could be strictly enforced

The architecture provided visibility into the processing pipeline and made it straightforward to trace information flow through the system.

Despite these advantages, we hit a wall with our DAG approach. We found ourselves endlessly tweaking prompts and optimizing retrieval metrics without knowing if we were making real improvements. The relationship between better vector search results and better final answers remained unclear. Our model responses were constrained by node-specific prompts rather than applying their reasoning capabilities holistically. This problem only became more evident with the release of advanced reasoning models like GPT-4.1, Claude Sonnet 3.7, and Gemini 2.5 Pro. Our stepwise architecture may not be fully leveraging these models’ advanced reasoning capabilities, improved tool utilization, and enhanced context processing.

The Pain Points

The biggest problem with our DAG architecture was its rigidity. The system followed fixed execution paths that couldn’t adapt to varying information needs or processing requirements.

Unpredictable Query Needs

Our DAG system couldn’t predict what information different questions would need. We had set question types, but we couldn’t tell how many searches each query would require. Simple questions often triggered too many searches. Complex questions didn’t get enough context. The planning component made poor judgments about search depth. This wasted resources or led to incomplete responses.

The Reranking Problem

Reranking became our biggest bottleneck as evaluating reranking quality for complex API specifications was challenging. Our finetuning attempts yielded poor results. We questioned whether our chunking strategy for complex JSON documents was capturing the critical information. With our entire RAG pipeline dependent on effective reranking, these limitations significantly impacted retrieval quality and overall system performance.

Prompt Engineering Challenges

Managing prompts across classification, planning, evaluation, and generation created significant maintenance challenges. Our prompt library grew huge with numerous edge cases and complex conditional logic.

Architectural Rigidity

The rigid structure couldn’t dynamically decide when to query more or less. Determining whether to use the vector database, web search, or just rely on the reasoning model’s knowledge required predefined rules rather than adaptive judgment. The system’s behavior was also heavily influenced by the examples provided in the prompt. This inflexibility meant we were often retrieving too much or too little information for optimal responses.

Introducing Model Context Protocol

Model Context Protocol takes a fundamentally different approach. Rather than defining a fixed series of steps, MCP empowers the model (or agent) itself to determine what information it needs and when. The model is given access to tools and the autonomy to decide when and how to use them based on each query’s requirements.

For our first iteration of a stackchat-mcp toolset, we created a diverse set of tools that give the model access to capabilities including:

Keyword search for document matching
Vector search across our knowledge base
cURL tool for direct API interaction
OpenAPI specification summarizer
Web search for supplementary information
Provider metadata tools
YAML validation capabilities
Status code explanations

With MCP, the execution flow emerges dynamically based on the model’s assessment of what tools are needed for each specific query.

Advantages of Model Context Protocol

Dynamic Tool Usage

This is a major strength of MCP. Having the client agent determine which tools to use and when to use them creates a more adaptive system that can handle unpredictable queries without requiring explicit programming for every scenario. The elimination of rigid query planning is particularly valuable for complex information needs where the optimal tool sequence isn’t known in advance.

Natural Relevance Assessment

This benefit addresses one of the most challenging aspects of traditional systems — determining what information is actually relevant to the user’s query. By allowing the same intelligence to both evaluate information and generate responses, MCP creates a more coherent experience. The model can make nuanced judgments about relevance that would be difficult to capture in separate reranking systems.

Architectural Simplicity

The reduction in system complexity is perhaps the most practical advantage from an engineering perspective. By eliminating separate classification and planning stages, MCP creates a more maintainable architecture. The point about “adding new capabilities now means simply adding new tools” is especially powerful — this modularity allows for faster iteration and expansion of system capabilities without needing to rewrite orchestration logic.

While MCP has solved many challenges, it has its own limitations.

Limitations of Model Context Protocol

Decreased System Ownership

With MCP, the model controls the entire interaction flow on the server. This cuts down bugs in our code but we’ve lost significant control over response generation. The protocol gives most control to the server-side model, making it harder to showcase specific features or guarantee certain behaviors. As the model picks tools dynamically, we depend more on server components we don’t directly manage. Even with careful tool design, the ultimate decision authority rests with the model, creating unpredictability in how our system functions in edge cases.

Limited Custom Formatting

MCP offers less control over response formatting since the model creates the entire response after using tools. In traditional DAG approaches, we could format each node’s output separately. Now we depend entirely on the model’s natural formatting abilities, especially for things like citations and API responses. Precise formatting needs require extra processing steps that wouldn’t be necessary in a more controlled system. The inability to directly influence formatting during generation means additional complexity in our post-processing pipeline.

Reduced Logging and Observability

The protocol hides reasoning steps inside the model on the server. We can log tool calls but can’t see why tools were chosen or how results were used. Server-side processing makes it hard to track how data points influence the final response. Instead of inspecting a clear pipeline, we must analyze input-output patterns. This means less visibility into tool usage and less explanation of the model’s reasoning. Even with additional logging tools, we’re still inferring rather than directly observing the model’s decision process.

Conclusion

The more tools is better approach has created a system that our wider team can more easily interact with, suggest improvements for, and use more effectively.

While Model Context Protocol isn’t perfect, it has addressed the fundamental paradox of our DAG architecture — the uncomfortable combination of unpredictability and rigidity that made certain scenarios challenging. By embracing the model itself as the orchestrator, we’ve built a more adaptable system for complex queries.

Importantly, both our DAG and MCP approach provide value in different contexts. Our DAG implementation continues to do well in scenarios requiring precise control, consistent formatting, and comprehensive logging. Meanwhile, the MCP approach excels when dealing with unpredictable queries requiring dynamic information gathering. Rather than seeing these as competing architectures, we view them as complementary tools in our AI engineering toolkit, each with its proper use case.

‍