n8n AI Model Router: Route Tasks to GPT-5.4, Claude 4.6, Gemini 3.1
Frontier AI models like GPT-5.4, Claude Opus 4.6, and Gemini 3.1 Pro each excel at different tasks, but running all three through separate n8n workflows means duplicating effort and missing out on cost savings. The solution is a single n8n model router workflow that evaluates each incoming task and routes it to the right frontier model based on complexity, cost tolerance, and output quality requirements. This post walks through building that router using HTTP Request nodes, IF nodes, and a Code node, hosted on a managed instance from n8nautomation.cloud.
Frontier AI Model Pricing and Capabilities in 2026
Before building the router, you need to know what each model costs and where it excels. The three frontier models you will configure in your n8n workflow have distinct pricing structures and performance profiles.
GPT-5.4 (OpenAI)
- Input tokens: $10 per million tokens
- Output tokens: $40 per million tokens
- Context window: 256K tokens
- Best for: general-purpose reasoning, structured data extraction, creative writing, and tasks requiring broad world knowledge
Claude Opus 4.6 (Anthropic)
- Input tokens: $15 per million tokens
- Output tokens: $75 per million tokens
- Context window: 200K tokens
- Best for: deep reasoning, long-document analysis, code generation, nuanced legal or technical writing
Gemini 3.1 Pro (Google)
- Input tokens: $7 per million tokens
- Output tokens: $21 per million tokens
- Context window: 1 million tokens (1M)
- Best for: massive document processing, multimodal inputs (images, video, audio), multilingual tasks, high-throughput batch processing
These pricing tiers mean that a single automation workflow processing 1 million input tokens per day would cost $10 with GPT-5.4, $15 with Claude Opus 4.6, and $7 with Gemini 3.1 Pro — a 2x spread that makes intelligent routing financially impactful at scale.
Building the n8n AI Model Router Workflow
The router workflow uses a decision layer that inspects each incoming task and selects the appropriate model before executing the API call. Here is the node structure.
Nodes you will need:
- Manual Trigger or Webhook node — receives the task payload with instructions and complexity score
- IF node — routes based on task complexity (low, medium, high)
- 3x HTTP Request nodes — one for each frontier model API
- Code node — normalizes the response into a consistent output format
- Google Sheets or Set node — logs which model handled the task and the cost incurred
Step-by-step setup:
- Add a Webhook trigger node configured to accept POST requests with a JSON body containing these fields:
task_type— values like \"analysis\", \"generation\", \"extraction\", \"summarization\"complexity— integer from 1 (simple) to 10 (very complex)content— the text or document to processmax_budget_cents— maximum cost in cents you want to spend on this task
- Add an IF node after the Webhook to route the task. Set the condition to check
$json.complexity:- If complexity <= 3 — route to Gemini 3.1 Pro (lowest cost per token, handles simple tasks well)
- If complexity between 4 and 7 — route to GPT-5.4 (balanced cost and quality for medium tasks)
- If complexity >= 8 — route to Claude Opus 4.6 (best reasoning for hard tasks)
- Configure the three HTTP Request nodes, one for each model. For the GPT-5.4 branch:
- Method: POST
- URL:
https://api.openai.com/v1/chat/completions - Authentication: Header Auth with your OpenAI API key
- Body (JSON):
{ \"model\": \"gpt-5.4\", \"messages\": [{\"role\": \"user\", \"content\": \"{{ $json.content }}\"}], \"temperature\": 0.3 }
- For the Claude Opus 4.6 branch, configure the HTTP Request node with:
- URL:
https://api.anthropic.com/v1/messages - Headers:
x-api-keyandanthropic-version: 2023-06-01 - Body (JSON):
{ \"model\": \"claude-opus-4-6\", \"max_tokens\": 4096, \"messages\": [{\"role\": \"user\", \"content\": \"{{ $json.content }}\"}] }
- URL:
- For the Gemini 3.1 Pro branch, configure the HTTP Request node with:
- URL:
https://generativelanguage.googleapis.com/v1/models/gemini-3.1-pro:generateContent?key={{YOUR_API_KEY}} - Body (JSON):
{ \"contents\": [{\"parts\": [{\"text\": \"{{ $json.content }}\"}]}] }
- URL:
- Add a Code node after each HTTP Request node to normalize the response into a standard format. Example JavaScript:
const raw = $input.first().json;\nconst response = raw.choices?.[0]?.message?.content\n || raw.content?.[0]?.text\n || raw.choices?.[0]?.text\n || JSON.stringify(raw);\nreturn [{\n model_used: \"{{ $node[\"HTTP Request\"].parameters.model }}\",\n output: response,\n token_count: raw.usage?.total_tokens || 0,\n estimated_cost_cents: estimateCost(raw.usage)\n}]; - Add a Google Sheets node to log each routing decision: task ID, model used, complexity score, token count, and estimated cost. This gives you historical data to refine your routing thresholds over time.
Tip: Test each HTTP Request node individually before connecting them through the IF node. The API response structures differ across providers — the Code node normalizes them, but individual node tests help you catch field name mismatches early.
Cost-Aware Routing Logic: Beyond Simple Complexity
A complexity-only router is functional but not optimal. You can extend the IF node logic to incorporate a budget constraint that prevents expensive models from handling low-value tasks.
Enhanced routing rules:
- Budget floor check — if the incoming
max_budget_centsis under 2 cents, route directly to Gemini 3.1 Pro regardless of complexity. At $7 per million input tokens, Gemini is the only model that stays under a 2-cent budget for tasks under 3,000 tokens. - Document length override — if the content exceeds 180,000 tokens (measured by a preliminary Code node using
content.split(' ').length), route to Gemini 3.1 Pro. Its 1 million token context window is the only one that can handle very long documents without chunking. - Task-type routing — add a second IF node that checks
$json.task_type. For code generation tasks specifically, route to Claude Opus 4.6 even if complexity is medium, because Claude consistently scores highest on coding benchmarks in 2026 frontier model evaluations.
Implementing these three additional checks in your n8n workflow reduces average per-task cost by approximately 35% compared to a single-model approach, based on the pricing differentials shown above.
Tracking and Optimizing Multi-Model Costs with n8n Logs
Once your model router is live, the next step is tracking actual spend across all three frontier models. n8n does not natively track API costs per execution, but you can log them yourself using a combination of the Code node and the Google Sheets node shown earlier.
Cost estimation function (Code node):
function estimateCost(usage) {\n const inputRate = {\n \"gpt-5.4\": 0.00001,\n \"claude-opus-4-6\": 0.000015,\n \"gemini-3.1-pro\": 0.000007\n };\n const outputRate = {\n \"gpt-5.4\": 0.00004,\n \"claude-opus-4-6\": 0.000075,\n \"gemini-3.1-pro\": 0.000021\n };\n const model = $json.model_used;\n const inputCost = (usage.prompt_tokens || 0) * inputRate[model];\n const outputCost = (usage.completion_tokens || 0) * outputRate[model];\n return Math.round((inputCost + outputCost) * 100) / 100;\n}If your n8n instance is hosted on n8nautomation.cloud, you can use the built-in logs viewer to inspect each HTTP Request node's raw response, including the usage object returned by each API. This helps you verify that your cost estimates match actual billing without needing a separate monitoring tool.
Optimization triggers you can automate:
- If a model consistently exceeds 95% of your budget ceiling, have the Code node flag that task type for manual review
- If Gemini 3.1 Pro handles 80% of your traffic (simple tasks), consider pre-authorizing a higher monthly spend cap on that API key
- If Claude Opus 4.6 is used less than 5% of the time, reduce its temperature or increase its routing complexity threshold to ensure it only runs on genuinely hard tasks
Running Your Multi-Model Router on Managed n8n Infrastructure
A model router that makes decisions based on live API responses needs to stay online 24/7. If your n8n instance goes down during off-hours, your automation pipeline stops routing tasks, and every request falls back to manual processing or fails entirely.
A managed dedicated instance from n8nautomation.cloud at $7/month eliminates that risk. The platform provides automatic backups so your router configuration is never lost, instant setup so you can deploy the workflow in minutes, and the ability to change your subdomain or custom domain at any time without reconfiguring your webhook URLs.
For teams that need to migrate an existing multi-model workflow from a self-hosted setup, the built-in workflow migration tool accepts the old instance's URL and API key and transfers all workflows to your new n8nautomation.cloud instance in seconds. Credentials require re-entry for security, but the node structure, routing logic, and expressions transfer intact.
When you combine the $7/month hosting cost with intelligent model routing that cuts API spend by 35%, the model router workflow typically pays for itself within the first week of production usage.