guides

How to Build Dyad: The Production-Grade Blueprint

A deep dive into building Dyad-the Dual-Agent Generator/Verifier architecture. Full tech stack breakdown, implementation code, and scaling strategies for enterprise use cases.

10 min read
How to Build Dyad: The Production-Grade Blueprint

How to Build Dyad: The Production-Grade Blueprint

Building a Dyad is the architectural answer to the “confident hallucination” problem. While beginners rely on single-shot prompts, and intermediates use RAG, senior architects implement Dyads-dual-agent systems where a Generator and a Verifier work in a tight, iterative loop to ensure output reliability before the user ever sees a response.

Note: This guide focuses on the Dyad Architectural Pattern (the Generator-Verifier loop), not the local low-code tool dyad-sh or the JuliaHub simulation platform.

🏗️ The Architecture

We are not just scripting; we are engineering a self-correcting system.

The Dyad Architecture consists of two distinct LLM personas that share a stateful memory (Redis).

  1. The Generator (The Artist): Optimized for speed and creativity. It drafts the initial response.
  2. The Verifier (The Critic): Optimized for reasoning and logic. It strictly evaluates the draft against a set of constraints or facts.
  3. The Controller (The Loop): A deterministic state machine that manages the conversation between the two. If the Verifier rejects the draft, the Controller passes the feedback back to the Generator for a retry.

This “Dyadic” communication effectively reduces hallucination rates by forcing the model to “show its work” and critique itself before final output.

🛠️ The Stack

  • Orchestration: Python 3.12+ (Typed)
  • Data Validation: Pydantic (Strict schemas are non-negotiable here).
  • Model Routing: LiteLLM (Allows us to use a cheap model for Generation and an SOTA model for Verification).
  • State Management: Redis (To store the “internal monologue” and prevent infinite loops).
  • Observability: LangFuse (To trace the multi-turn internal dialogue).

💻 Implementation

Here is the core DyadEngine. We use Pydantic to force the Verifier to return structured feedback, not just free text. This is critical for programmatic decision-making.

import os
import json
import logging
from typing import List, Optional, Literal
from pydantic import BaseModel, Field
from litellm import completion
import redis

# Configure Logging
logging.basicConfig(level=logging.INFO)
logger = logging.getLogger("DyadEngine")

# --- 1. Define the Structures ---

class VerificationResult(BaseModel):
    """The structured output from the Verifier Agent."""
    is_valid: bool = Field(..., description="True if the response meets all criteria.")
    score: int = Field(..., description="Quality score from 0-100.")
    reasoning: str = Field(..., description="Detailed explanation of the critique.")
    corrections: List[str] = Field(default_factory=list, description="Specific instructions for the Generator to fix errors.")

class DyadResponse(BaseModel):
    final_output: str
    iterations: int
    confidence_score: int
    history: List[dict]

# --- 2. The Dyad Engine ---

class DyadEngine:
    def __init__(self, 
                 redis_url: str = "redis://localhost:6379",
                 gen_model: str = "gpt-4o-mini", # Fast, cheaper
                 ver_model: str = "gpt-4o",      # Smart, expensive
                 max_retries: int = 3):
        
        self.redis = redis.from_url(redis_url)
        self.gen_model = gen_model
        self.ver_model = ver_model
        self.max_retries = max_retries

    def _generate(self, context: str, feedback: Optional[str] = None) -> str:
        """The Generator Persona: Drafts or Refines."""
        messages = [
            {"role": "system", "content": "You are a precise technical writer. Output only the requested content."}
        ]
        
        if feedback:
            # If refining, we inject the previous context and the critique
            messages.append({"role": "user", "content": f"Context: {context}\n\nPrevious draft was rejected. Feedback: {feedback}\n\nPlease rewrite."})
        else:
            messages.append({"role": "user", "content": context})

        response = completion(model=self.gen_model, messages=messages)
        return response.choices[0].message.content

    def _verify(self, context: str, draft: str) -> VerificationResult:
        """The Verifier Persona: Critiques."""
        prompt = f"""
        You are a Senior QA Engineer. Review the following draft against the user request.
        
        User Request: {context}
        Draft Response: {draft}
        
        Analyze for:
        1. Factual Accuracy
        2. Adherence to constraints
        3. Tone and Style
        
        Return your assessment in JSON.
        """
        
        response = completion(
            model=self.ver_model,
            messages=[{"role": "user", "content": prompt}],
            response_format=VerificationResult # Force Pydantic Structure
        )
        
        # LiteLLM/Pydantic parsing (simplified for brevity)
        try:
            content = response.choices[0].message.content
            return VerificationResult.model_validate_json(content)
        except Exception as e:
            logger.error(f"Verification parsing failed: {e}")
            # Fail safe: reject if we can't parse
            return VerificationResult(is_valid=False, score=0, reasoning="Parser Error", corrections=["System Error"])

    def run(self, user_prompt: str, session_id: str) -> DyadResponse:
        """The Main Control Loop."""
        logger.info(f"Starting Dyad for session {session_id}")
        
        # Check Cache/State
        cache_key = f"dyad:{session_id}"
        
        current_draft = self._generate(user_prompt)
        history = [{"step": "initial_draft", "content": current_draft}]
        
        for i in range(self.max_retries):
            logger.info(f"Iteration {i+1}/{self.max_retries}")
            
            # 1. Verify
            critique = self._verify(user_prompt, current_draft)
            history.append({"step": f"verification_{i}", "critique": critique.model_dump()})
            
            # 2. Check Exit Condition
            if critique.is_valid or critique.score > 90:
                logger.info("Draft accepted.")
                return DyadResponse(
                    final_output=current_draft,
                    iterations=i+1,
                    confidence_score=critique.score,
                    history=history
                )
            
            # 3. Refine (Loop)
            logger.info(f"Draft rejected (Score: {critique.score}). Refining...")
            feedback_str = f"Score: {critique.score}. Reasoning: {critique.reasoning}. Fix these: {', '.join(critique.corrections)}"
            current_draft = self._generate(user_prompt, feedback=feedback_str)
            history.append({"step": f"refinement_{i}", "content": current_draft})

        # Fallback if max retries reached
        logger.warning("Max retries reached. Returning best effort.")
        return DyadResponse(
            final_output=current_draft,
            iterations=self.max_retries,
            confidence_score=critique.score, # Score of the last check
            history=history
        )

# Example Usage
if __name__ == "__main__":
    # Ensure OPENAI_API_KEY is set
    engine = DyadEngine()
    result = engine.run("Write a Python function to calculate Fibonacci numbers efficiently.", "sess_123")
    print(f"Final Output:\n{result.final_output}")
    print(f"Confidence: {result.confidence_score}%")

⚠️ Production Pitfalls (The “Senior” Perspective)

When scaling this to 10k concurrent users, the simple loop above will kill your latency and your wallet. Here is what breaks and how to fix it:

1. Latency Spikes (The “Double Cost” Problem)

The Issue: A Dyad architecture inherently doubles (or triples) the token count per request because of the verification loop. The Fix:

  • Optimistic UI: Stream the “Generator’s” initial draft to the frontend immediately, but mark it as “Verifying…”. If the Verifier catches an error, update the UI with the refined version. This keeps perceived latency low.
  • Model Tiering: Use a distilled model (e.g., Llama-3-8b) for the Generator and a reasoning model (GPT-4o) ONLY for the Verifier.

2. The “Sycophancy” Loop

The Issue: Sometimes the Generator will simply agree with the Verifier without actually fixing the code, or the Verifier will get stuck in a loop of nitpicking minor details. The Fix:

  • Temperature Decay: Lower the temperature of the Generator on every retry (e.g., 0.7 -> 0.5 -> 0.1) to force convergence.
  • Divergence Penalty: In the prompt, explicitly forbid the Generator from apologizing. “Do not apologize. Just output the corrected code.”

3. State Management

The Issue: If your container crashes during refinement_2, the user loses progress. The Fix:

  • Persist every step to Redis (as shown in the stack). If the process dies, the next worker can pick up the history and resume from the last critique rather than starting from scratch.

🚀 Final Verdict

The Dyad architecture is the difference between a “demo” and an “enterprise product.”

If you are building an internal tool where accuracy is paramount (e.g., Legal AI, FinTech), you must use a Dyad. The cost of extra tokens is negligible compared to the cost of a liability lawsuit from a hallucination.

Start with the code above. Add Redis persistence. Then, tune your “Verifier” prompt-that is where 80% of your system’s intelligence lives.