AI tooling & token governance

Predictable cost at maximum throughput.

We don't just hand engineers Cursor and Claude. We run a governed AI-native environment where every token has a job and your monthly spend stays inside a number you approved.

Internal tools
our cost · not yours

These are the tools we use to assemble teams, accelerate our own engineers, and ship faster. They are part of how we operate, and they never appear on your invoice.

  • Internal RAG over our pattern library, past engagements, and reusable components.
  • Sourcing and vetting pipeline (CV parsing, async technical interviews, reference automation).
  • Code review and architectural critique agents trained on our standards.
  • Internal knowledge base across engineering, security, and compliance practice.

You are not paying for our productivity. You are paying for the outcome in your production environment. If our internal tooling lets a 6-week build ship in 4, the benefit goes to your timeline — not to a line item you need to justify to finance.

External tools
your cost · your perimeter · your contracts

LLM APIs, embedding models, vector stores, orchestration frameworks, evaluation platforms, and inference infrastructure deployed into your environment.

Model providers

OpenAI, Anthropic, Google, Mistral, plus open-weights (Llama, DeepSeek, Qwen) deployed in your VPC where residency or cost requires it.

Inference & serving

vLLM, TGI, Bedrock, Vertex, Azure OpenAI, or sovereign EU options (OVH, Scaleway, Aleph Alpha) selected against your NIS2 / GDPR / Gaia-X posture.

Orchestration & evals

LangGraph, LlamaIndex, Inspect, Promptfoo, Braintrust, Langfuse — chosen per workload, not per fashion.

Vector / retrieval

Postgres+pgvector by default. Qdrant / Weaviate / Elastic where the workload justifies the operational tax.

Coding agents in your SDLC

Cursor, Claude Code, Copilot, Windsurf, Cline — selected per role with a documented data-exfiltration and prompt-logging policy.

Token governance
included in FDE rate · capped · auditable

In 2026, intensive use of Cursor plus Claude Opus / Sonnet plus agentic workflows routinely runs into several thousand dollars per engineer per month. It is one of the most expensive and least transparent line items in AI delivery. We treat it as an engineering problem, not as a consumable.

  • Professional tooling included in the FDE rate.

    Cursor Teams, Claude, OpenAI, Gemini and the rest of the working set are part of what you pay an AI-native engineer for. No separate line item for developer subscriptions, no per-seat upsell.

  • Token routing framework.

    Simple tasks are routed to fast, cheap models. Hard tasks are routed to frontier models. The graph decides per call against a cost and quality budget — not the engineer by feel at 2am.

  • Prompt compression, caching, RAG.

    Standard techniques applied as a baseline, not an afterthought. In our engagements they cut token spend by 35–50% against naive use of the same models, with no quality regression on the eval set.

  • Weekly token spend report.

    Where tokens went, on which workloads, against which outcomes. Same shape as a cloud cost report, only for inference. Your finance team can reconcile it; your CTO can act on it.

  • Transparent caps with pass-through opt-in.

    A baseline token volume is included in the fixed FDE rate. If a workload pushes past it, we optimise first. We only move to pass-through billing with your written consent, against a documented reason.

What you get: predictable monthly cost, engineers on best-in-class tooling, and a healthier margin on AI work than teams whose entire tooling strategy is we use Cursor.

No resale. No vendor margin.

The Discovery memo specifies each tool, its cost model, expected token economics, and its data residency. If a workflow only makes sense when model costs are ignored, it is not production-ready, and we will say so.