Peer Review Agent

The most reliable way for an AI agent to catch its own mistakes is to not do it alone. This example spawns two concurrent actors that tackle the same problem independently, then have each one critique the other's output and produce a final consensus answer.

Each actor has its own isolated context, its own inference calls, and its own confidence score. The parent process uses gather to collect both results, then routes based on agreement between the two reviewers.

peer_review_agent.tn

// Turn Language: Peer Review Agent
// Two actors independently analyze a problem, then critique each other.
// The parent gathers results and forms a consensus.

struct Analysis {
  summary: Str,
  recommendation: Str,
  risk_level: Str
};

struct Critique {
  agrees: Bool,
  disagreement_reason: Str,
  revised_recommendation: Str
};

struct Consensus {
  final_recommendation: Str,
  confidence_level: Str,
  dissenting_note: Str
};

let subject = "Evaluate the credit risk of a Series B fintech startup with $3M ARR, 120% NRR, and 18 months runway.";

// Analyst A: conservative risk lens
let analyst_a = spawn_link turn() {
  context.append("You are a conservative credit risk analyst. Be cautious.");
  let analysis = infer Analysis { subject; };
  return analysis;
};

// Analyst B: growth-oriented lens
let analyst_b = spawn_link turn() {
  context.append("You are a growth-oriented venture analyst. Weigh upside heavily.");
  let analysis = infer Analysis { subject; };
  return analysis;
};

// Gather both independent analyses concurrently
let results = gather [analyst_a, analyst_b];

let result_a = results[0];
let result_b = results[1];

// Analyst A reviews Analyst B's work
let critique_a = spawn_link turn() {
  context.append("Your prior analysis: " + result_a.recommendation);
  context.append("Peer's analysis: " + result_b.recommendation);
  let critique = infer Critique { "Review the peer's recommendation. Do you agree?"; };
  return critique;
};

// Analyst B reviews Analyst A's work
let critique_b = spawn_link turn() {
  context.append("Your prior analysis: " + result_b.recommendation);
  context.append("Peer's analysis: " + result_a.recommendation);
  let critique = infer Critique { "Review the peer's recommendation. Do you agree?"; };
  return critique;
};

let critiques = gather [critique_a, critique_b];

let crit_a = critiques[0];
let crit_b = critiques[1];

// Form consensus based on whether the reviewers agree
if crit_a.agrees and crit_b.agrees {
  let consensus = infer Consensus {
      "Both analysts agree. Synthesize a high-confidence final recommendation.";
      result_a.summary;
      result_b.summary;
  };
  call("echo", "[CONSENSUS] " + consensus.final_recommendation);
  call("echo", "[CONFIDENCE] " + consensus.confidence_level);
} else {
  let consensus = infer Consensus {
      "The analysts disagree. Write a balanced recommendation that acknowledges both views.";
      crit_a.disagreement_reason;
      crit_b.disagreement_reason;
  };
  call("echo", "[SPLIT OPINION] " + consensus.final_recommendation);
  call("echo", "[DISSENT] " + consensus.dissenting_note);
}

How It Works

The pattern has three distinct phases:

Phase 1: Independent Analysis. Two actors are spawned with different system prompts, giving each a distinct analytical persona. Because each actor has its own isolated context window, neither sees the other's reasoning during this phase. gather collects both results concurrently, so the total wall time is that of the slower analyst, not both combined.

Phase 2: Cross-Critique. Each analyst is shown their own prior recommendation alongside their peer's. They must produce a typed Critique with a boolean agrees field and a structured revised_recommendation. The Bool field is critical: it makes disagreement a machine-readable signal the parent can route on, not just text for a human to parse.

Phase 3: Consensus Routing. The parent reads the two agrees booleans and follows one of two typed infer Consensus calls, each with a distinct prompt that reflects the actual state of agreement. The final output is always a structured Consensus record, not unstructured prose.

Why This Is Self-Reflection

Standard prompt engineering asks a single model to "review your own answer." The model almost always endorses its own output because its context already contains the reasoning that led to it.

Turn's approach is structurally different:

Approach	Reviewer's Context	Disagreement Mechanism
Prompt trick ("review yourself")	Contains the original reasoning	Rarely fires; model is anchored
Peer Review Agent	Fresh actor, no prior context	Disagreement is a typed `Bool`

The second actor has no memory of how the first conclusion was reached. It can only evaluate the output. This is the same mechanism that makes peer review effective in academic publishing: reviewers are not the authors.

Running It

export TURN_LLM_PROVIDER=openai
export OPENAI_API_KEY=sk-...

turn run peer_review_agent.tn --id peer_review

Use turn inspect peer_review after the run to see the full belief state and context of each actor. You will see four distinct process entries, each with their own context snapshots and confidence scores.

Peer Review Agent

How It Works

Why This Is Self-Reflection

Running It

Next Steps