Carina Ops metering and routing spec
Product boundaries, billing boundaries, and current plan surfaces: Capability inventory.
This document defines the metering model for Carina Ops so the platform stays economically viable when model switching is enabled. Current packaging is on carinaai.uk/pricing.
The rule is simple:
- Carina AI stays free for personal use.
- Carina Ops is a paid addon for teams that need BYOK, audit, deployment control, and spend limits.
- Managed inference must never be unlimited. Every paid plan needs a measurable allowance and a hard stop.
Product goal
Carina Ops is not a model marketplace. It is an operations layer around Carina AI that lets teams:
- choose providers and models
- use their own keys or Carina-managed keys
- inspect cost before a run
- cap monthly spend
- review usage and audit history
- keep the business profitable
If a plan cannot absorb the cost of expensive models, the system must either:
- downgrade to a cheaper class
- require a top-up
- or require BYOK
Why the plan includes credits
Provider costs vary too much for a flat token promise to stay safe.
That spread is too wide for a single flat token rate to protect margin. Use credits as the customer-facing unit and keep the internal cost model weighted by model class.
Credit model
Define one internal credit as a standard billing unit, not as one literal token.
Recommended reference model:
- 1 standard credit equals a small, predictable amount of balanced-model work
- cheaper models burn fewer credits per token
- expensive models burn more credits per token
- cached input should be discounted
- tool calls should carry a separate fixed charge if they create real cost
Suggested billable formula
billable_credits =
input_tokens * input_weight
+ output_tokens * output_weight
+ cached_input_tokens * cache_weight
+ tool_calls * tool_weight
+ batch_jobs * batch_weight
Recommended starting weights:
| Class | Weight |
|---|---|
| Cheap | 0.75x |
| Balanced | 1.00x |
| Premium | 2.50x |
| Frontier | 5.00x |
Use these as plan-unit multipliers, not as literal provider pass-through. Update them when provider pricing changes.
Margin rule
Keep managed inference gross margin above 65 percent. If a model class would push a customer below that margin, do one of the following:
- block the class on that plan
- route to a cheaper class
- ask for a top-up
- require BYOK
Routing policy
Carina should route by policy, not by user guesswork.
Default routing order
- Cheap model for simple tasks
- Balanced model for normal work
- Premium model only when the plan allows it
- Frontier model only for Enterprise or explicit override
Routing inputs
The router should consider:
- requested model or model class
- task type
- token estimate
- current org balance
- current month burn
- current gross margin forecast
- whether the org is BYOK
Routing decisions
| Condition | Action |
|---|---|
| No allowance left | Block or require top-up |
| Burn forecast exceeds margin floor | Downgrade or block |
| BYOK enabled | Allow broader model choice |
| Premium requested on Starter | Block or require top-up |
| Balanced requested on Team | Allow |
Metering rules
Every run must record both the estimate and the actual usage.
Record on estimate
- org id
- user id
- provider
- model
- model class
- estimated input tokens
- estimated output tokens
- estimated credits
- estimated provider cost
- estimated margin
- routing decision
Record on completion
- actual input tokens
- actual output tokens
- cached input tokens
- tool call count
- batch flag
- actual provider cost
- actual billable credits
- remaining balance
- request id
- run status
Do not do this
- Do not bill raw tokens directly to customers without a model class multiplier.
- Do not allow unlimited managed usage on a fixed price plan.
- Do not let the UI display a token count as if all tokens cost the same.
- Do not rewrite historical usage when pricing changes.
Technical spec
Tables
Add or extend the following tables in the hosted billing layer:
ops_plansops_org_subscriptionsops_plan_entitlementsops_model_rate_cardsops_credit_ledgerops_usage_eventsops_usage_dailyops_routing_policies
Minimum fields
ops_plans
idnameprice_monthly_pencemember_limitincluded_creditsbyok_onlyaudit_retention_dayscreated_atupdated_at
ops_model_rate_cards
idprovidermodelmodel_classinput_weightoutput_weightcache_weighttool_weightbatch_weightactive_fromactive_to
ops_credit_ledger
idorg_idevent_typecredits_deltacurrency_delta_penceprovidermodelrequest_idcreated_at
ops_usage_events
idorg_iduser_idprovidermodelmodel_classinput_tokensoutput_tokenscached_input_tokenstool_callsbatch_jobsprovider_cost_pencebillable_creditsestimated_creditsrequest_idstatuscreated_at
API contract
Expose these endpoints from the hosted ops API:
GET /api/carina-ops/models
POST /api/carina-ops/estimate
POST /api/carina-ops/workbench/run
POST /api/carina-ops/credits/topup
GET /api/carina-ops/usage
GET /api/carina-ops/billing
GET /api/carina-ops/audit
Workbench flow
- User selects provider and model class.
- UI requests an estimate before executing.
- Router checks plan allowance and current balance.
- If allowed, the run executes.
- Actual usage is written to the ledger.
- The usage page and billing page refresh from the ledger, not from ad hoc counters.
Recommended implementation order
Phase 1
- Add rate cards and credit ledger tables.
- Add estimate and finalize logic.
- Add balance checks before execution.
- Add usage and cost reporting in the ops UI.
Phase 2
- Add plan limits and top-up packs.
- Add routing by model class.
- Add premium gating.
- Add batch and cache discounts.
Phase 3
- Add BYOK-first defaults for heavier teams.
- Add reconciler jobs against provider invoices.
- Add monthly margin reporting by org and model.
Phase 4
- Add Enterprise plan controls.
- Add customer-facing usage alerts.
- Add admin override tools for rate cards and entitlements.
Commercial rules
The business stays safe if all of the following are true:
- Starter and Team are capped plans
- premium models cost more credits than balanced models
- BYOK is the default path for heavier users
- provider price changes only update future rate cards
- you can block or downgrade before a call lands
If a plan cannot survive its own model mix, the answer is not to sell more volume. The answer is to change the plan, increase the price, or move the customer to BYOK.