Inference Providers
When you write infer in a Turn program, the VM does not make HTTP calls. Instead, it delegates the entire request pipeline to a Wasm inference driver: a sandboxed WebAssembly plugin that knows how to talk to a specific LLM provider.
This page explains how the system works, how to configure it, and how to write your own.
The Architecture
Turn's inference pipeline is built on a strict security boundary: the Turn VM is the only component that can access the network. The Wasm driver is purely computational.
Turn VM (Host)
│
│ (1) Passes Turn Inference Request JSON to Wasm module
▼
Wasm Driver (sandboxed)
│
│ (2) Returns HTTP Config JSON: URL, headers (with $env: templates), body
│ The driver CANNOT access the network or filesystem
▼
Turn VM (Host)
│
│ (3) Substitutes $env:OPENAI_API_KEY → real value from process environment
│ (4) Executes the HTTPS call via reqwest
│ (5) Passes raw HTTP response JSON back to Wasm module
▼
Wasm Driver (sandboxed)
│
│ (6) Parses HTTP response → structured Turn result JSON
▼
Turn VM (Host)
│
└──▶ result bound to the infer expressionWhy this matters:
- A Wasm driver cannot read your SSH keys, scan your disk, or exfiltrate your API keys. The W3C sandbox prevents all system calls.
- Credentials are never in driver code. The driver writes
$env:OPENAI_API_KEYas a template string. The Host substitutes the real value before making the HTTP call. - A single
.wasmfile runs everywhere. macOS, Linux, and Windows are all supported wherever the Turn VM runs. No native binary distribution per platform. - Microsecond cold starts. Wasm modules initialize in under 100μs vs. 10–50ms for an OS subprocess.
A Wasm inference driver is a pure transformation function: JSON string → JSON string. It has no host imports. It cannot access the network, filesystem, environment variables, or system clock directly.
Configuring a Provider
Set TURN_LLM_PROVIDER to specify the inference provider to use:
export TURN_LLM_PROVIDER=openai
export OPENAI_API_KEY=sk-...
turn run my_agent.tnThe provider path is resolved once at VM startup. All infer calls in the program use the same provider.
Official Providers
All official drivers are compiled to wasm32-unknown-unknown and available in the Turn repository under providers/:
Standard OpenAI
Connects to api.openai.com. Uses OpenAI's structured outputs (JSON Schema mode) for Cognitive Type Safety.
export TURN_LLM_PROVIDER=openai
export OPENAI_API_KEY=sk-...
export OPENAI_MODEL=gpt-4o # optional, default: gpt-4oAzure OpenAI
Connects to your Azure OpenAI deployment endpoint.
export TURN_LLM_PROVIDER=azure_openai
export AZURE_OPENAI_ENDPOINT=https://my-resource.openai.azure.com
export AZURE_OPENAI_API_KEY=...
export AZURE_OPENAI_DEPLOYMENT=gpt-4oAzure AI Foundry Anthropic
Connects to Anthropic's Claude via Azure AI Foundry (not the direct Anthropic API).
export TURN_LLM_PROVIDER=azure_anthropic
export AZURE_ANTHROPIC_ENDPOINT=https://my-foundry-resource.azure.com
export AZURE_ANTHROPIC_API_KEY=...Anthropic
Connects directly to api.anthropic.com. Uses Anthropic's Messages API with structured system prompts for Cognitive Type Safety.
export TURN_LLM_PROVIDER=anthropic
export ANTHROPIC_API_KEY=sk-ant-...
export ANTHROPIC_MODEL=claude-3-5-sonnet-20241022 # optional, default: claude-3-5-sonnet-20241022Google Gemini
Connects to generativelanguage.googleapis.com. Uses Gemini's generateContent API with system_instruction support for structured prompts.
export TURN_LLM_PROVIDER=gemini
export GEMINI_API_KEY=...
export GEMINI_MODEL=gemini-1.5-pro # optional, default: gemini-1.5-proxAI Grok
Connects to api.x.ai using the OpenAI-compatible chat completions endpoint.
export TURN_LLM_PROVIDER=grok
export XAI_API_KEY=...
export GROK_MODEL=grok-3 # optional, default: grok-3Ollama
Connects to a local Ollama server via /api/chat. No API key required. Ideal for local development and air-gapped deployments.
export TURN_LLM_PROVIDER=ollama
export OLLAMA_MODEL=llama3 # optional, default: llama3
export OLLAMA_HOST=http://localhost:11434 # optional, default: http://localhost:11434
turn run my_agent.tnTIP
Ollama supports a wide range of open-weight models. Run ollama pull llama3 (or any model) before starting your Turn program.
The $env: Template Syntax
Wasm drivers use $env:VARIABLE_NAME placeholders in their HTTP config output. The Turn Host resolves these before executing the request:
{
"url": "https://api.openai.com/v1/chat/completions",
"method": "POST",
"headers": {
"Authorization": "Bearer $env:OPENAI_API_KEY",
"Content-Type": "application/json"
},
"body": { "model": "$env:OPENAI_MODEL", "messages": [...] }
}After substitution, the HTTP request the Host sends uses your real credentials, but the .wasm file itself never contains or reads them.
Writing Your Own Provider
A Turn inference driver is a Rust cdylib compiled to wasm32-unknown-unknown. It must export exactly three C-ABI functions:
use serde_json::{json, Value};
// Memory management - the Turn host calls this to allocate space for JSON strings
#[no_mangle]
pub extern "C" fn alloc(len: u32) -> u32 {
let mut buf: Vec<u8> = Vec::with_capacity(len as usize);
let ptr = buf.as_mut_ptr();
std::mem::forget(buf);
ptr as usize as u32
}
// Pass 1: Turn Request → HTTP Config
// Input: JSON string (Turn Inference Request)
// Output: JSON string (HTTP Request Config with $env: templates)
// Returns: packed u64 = (ptr << 32) | len
#[no_mangle]
pub unsafe extern "C" fn transform_request(ptr: u32, len: u32) -> u64 {
let input = read_string(ptr, len);
let req: Value = serde_json::from_str(&input).unwrap();
let prompt = req["params"]["prompt"].as_str().unwrap_or("");
let schema = &req["params"]["schema"];
let body = json!({
"model": "$env:MY_MODEL",
"messages": [{"role": "user", "content": prompt}],
"response_format": { "type": "json_object", "schema": schema }
});
let config = json!({
"url": "https://my-llm-provider.com/v1/completions",
"method": "POST",
"headers": { "Authorization": "Bearer $env:MY_API_KEY" },
"body": body
});
pack_string(config.to_string())
}
// Pass 2: HTTP Response → Turn Result
// Input: JSON string (HTTP response: { status, headers, body })
// Output: JSON string (JSON-RPC result: { jsonrpc, id, result } or { error })
#[no_mangle]
pub unsafe extern "C" fn transform_response(ptr: u32, len: u32) -> u64 {
let input = read_string(ptr, len);
let http_res: Value = serde_json::from_str(&input).unwrap();
if http_res["status"].as_u64().unwrap_or(0) != 200 {
return pack_string(json!({
"jsonrpc": "2.0", "id": 1,
"error": format!("HTTP {}: {}", http_res["status"], http_res["body"])
}).to_string());
}
// Parse provider-specific response format
let response: Value = serde_json::from_str(
http_res["body"].as_str().unwrap_or("{}")
).unwrap_or(json!({}));
let content = response["choices"][0]["message"]["content"].as_str().unwrap_or("{}");
let result: Value = serde_json::from_str(content).unwrap_or(json!(content));
pack_string(json!({ "jsonrpc": "2.0", "id": 1, "result": result }).to_string())
}
// ── Helpers ──────────────────────────────────────────────────────────────────
unsafe fn read_string(ptr: u32, len: u32) -> String {
let buf = Vec::from_raw_parts(ptr as *mut u8, len as usize, len as usize);
String::from_utf8_lossy(&buf).into_owned()
}
fn pack_string(s: String) -> u64 {
let len = s.len() as u64;
let mut buf = s.into_bytes();
let ptr = buf.as_mut_ptr() as u64;
std::mem::forget(buf);
(ptr << 32) | len
}Build it:
# Install the Wasm target if you haven't already
rustup target add wasm32-unknown-unknown
# Build
cargo build --target wasm32-unknown-unknown --release
# The driver is at:
ls target/wasm32-unknown-unknown/release/my_provider.wasm
# Use it
export TURN_LLM_PROVIDER=customTIP
Because providers are pure JSON transformers, they can target any HTTP API: a local Ollama server, Llama.cpp, a private inference cluster, or a custom gateway. The Wasm model means the Turn community can build and distribute drivers for any provider without touching the core VM.