Guide · Financial modeling

Financial modeling for AI startups: a driver-based guide

Most AI founders inherit a SaaS template, plug in seats and ARPU, and discover three months later that compute spend has rewritten the P&L. This guide walks through the driver-based model I build for AI-native startups — one that treats token revenue, inference cost, and gross margin as the variables they actually are.

By Sergei Mochtchenkov, CFA · Fractional CFO for AI startups

1. Why SaaS templates break for AI

The standard B2B SaaS model assumes a stable, almost-zero marginal cost of serving one more customer. Pricing is per seat, gross margin is 75–85%, and once you cover hosting, every incremental dollar of revenue is mostly contribution. That assumption is doing a lot of work — and it doesn't hold for AI-native products.

In an AI startup, the cost of serving one more customer scales with their usage, not their seat count. A single power user hammering a chat interface can cost more in inference than ten dormant seats pay in subscription. When the founder forecasts ARR using a SaaS template, they're implicitly assuming gross margin is stable. It isn't — it's a function of product mix, model choice, and customer behavior, all of which change month to month.

A driver-based financial model fixes this by treating the variables that actually move — tokens per request, model price per million tokens, requests per active user — as first-class inputs instead of burying them inside a single COGS line.

2. Driver-based thinking, in one diagram

A driver-based model is just a financial model where every revenue and cost line traces back to a small set of operational levers the business can pull. Instead of writing Revenue = 1.15 × prior month, you write Revenue = active_users × avg_requests × price_per_request. When something changes — pricing, behavior, model — you change the driver, and the financials reforecast themselves.

Drivers              →  Revenue & COGS              →  Statements

active users          ┐
requests per user     │
tokens per request    ├─→  usage volume   ─┐
price per request     ┘                     ├─→  Revenue   ─┐
                                            │               ├─→  P&L
model price (in/out)  ┐                     │               │
cache hit rate        ├─→  inference COGS ──┘ ─→  Gross margin
infra & ops overhead  ┘                                     │
                                                            └─→  Cash & runway

For an AI startup the interesting drivers are concentrated on the left side of that diagram. The rest of the model — payroll, S&M, R&D, taxes — looks like any other early-stage company. The moment you switch to driver-based thinking, three numbers do almost all the explaining: tokens per request, model unit cost, and active usage frequency.

3. Revenue drivers for token-based pricing

Pricing in AI rarely fits one mold. Most startups land somewhere on a spectrum between three archetypes, and a usable model has to support all three at the same time:

  • Pure seat-based. Flat $/user/month. Easy to forecast, dangerous because heavy users silently erode gross margin.
  • Pure usage-based. Pay per request, per token, or per credit. Revenue tracks COGS by construction, but it's harder to forecast and customers hate surprise invoices.
  • Hybrid (seat + overage). A platform fee that bundles a usage allowance, with metered overage on top. This is where most serious AI products end up, and it's the model the spreadsheet must support.

In a driver-based model, revenue for a hybrid plan looks like this:

platform_revenue   = paid_seats × seat_price
overage_revenue    = max(0, total_requests − included_requests) × overage_price
revenue            = platform_revenue + overage_revenue

The included allowance is itself a driver — change it from 5,000 to 10,000 requests per seat and you'll see the impact on both overage revenue and inference cost flow through the statements. That's the point. The model should let the founder ask "what happens if I double the free allowance to reduce churn?" and get a credible answer in 30 seconds, not 30 hours of rebuild.

4. Modeling inference cost as a variable driver

This is where AI startup models live or die. Inference cost is not "hosting" — it scales with the actual tokens passing through the system, and it has to be modeled with the same care as revenue.

The minimum-viable formula for inference COGS in any month is:

requests_billable  = total_requests × (1 − cache_hit_rate)

input_tokens       = requests_billable × avg_input_tokens
output_tokens      = requests_billable × avg_output_tokens

inference_cogs     = input_tokens  × input_price_per_1M  / 1_000_000
                   + output_tokens × output_price_per_1M / 1_000_000

Five drivers do the work: total requests, cache hit rate, average input/output tokens per request, and the model's per-million-token prices for input and output. If your product mixes models (a cheap router for easy queries, GPT-class for hard ones), build the same block for each model and sum. The cache hit rate alone routinely moves gross margin by 10–20 points — it deserves a row in the dashboard, not a footnote.

Add a thin layer above inference for the rest of variable COGS: vector database queries, embedding generation, third-party retrieval APIs, payment processing, and the bandwidth/egress that actually shows up on the AWS bill. Together they form the variable cost stack — distinct from R&D headcount training the next model, which belongs in operating expense, not COGS.

5. The AI-adjusted gross margin stack

AI investors don't accept a single "gross margin" number anymore. They want the stack: what the margin looks like before inference, after inference, and after all variable infrastructure. Build it into the model explicitly so the conversation in the data room is about your numbers, not your definition.

  • Revenue. Net of refunds, credits, and revenue share to partners.
  • Less: inference cost. The block from section 4.
  • Less: other variable infrastructure. Vector DB, embeddings, retrieval, egress.
  • Less: third-party fees. Payment processing, marketplace cuts.
  • = Contribution margin. The honest number. This is what scales.
  • Less: hosting, support, customer success.Semi-fixed costs of serving customers in aggregate.
  • = GAAP-style gross margin. What you put on the slide.

The gap between contribution margin and GAAP gross margin tells the investor how much operating leverage is still on the table. A healthy AI business has contribution margin above 60% even when GAAP gross margin is 40–50% — that's the leverage they're underwriting.

6. Cohorts, NRR, and usage expansion

For a usage-based product, net revenue retention is not just a cohort math exercise — it's the single most important growth variable in the model. Expansion happens automatically as customers run more workloads through the product, with no sales motion attached. That's the AI-native flywheel, and your model has to capture it.

The cleanest way to model it is per-cohort, with three drivers:

  • Logo retention — what fraction of a starting cohort still has at least one paid seat 12 months in.
  • Seat expansion — average paid seats per surviving customer over time.
  • Usage expansion — average requests per surviving seat over time. This is the AI-specific line that SaaS models don't have.

Multiply those three through a cohort table and you get a defensible NRR number that ties straight back to product behavior. When usage expansion is above 1.2× annually, you can survive meaningful logo churn and still print net dollar retention above 120%. That's the story Series A and B investors are pricing in.

7. Scenarios investors actually want to see

Build the model once and run it under at least four scenarios. Anything fewer and you're presenting a forecast; anything more and you're presenting noise.

  • Base. What you actually believe, with current pricing, current model mix, and last-quarter retention.
  • Model-cost shock. Hold revenue flat, drop input and output token prices 50%. Shows what happens to margin when the next OpenAI / Anthropic price cut lands.
  • Usage shock. Hold price flat, double requests per active user. Stress-tests whether inference cost outruns revenue at scale.
  • Funding bridge. No new round. Shows the cash runway if you have to operate as-is for 18 months.

The model-cost and usage-shock scenarios are the two an AI-savvy partner will ask about within the first 15 minutes of a diligence call. Having them pre-built — with the driver inputs visible — changes the conversation from defensive to collaborative.

8. Common mistakes that blow up the model

  • Treating inference as a fixed line. If "AI costs" is one hard-coded number per month, the model can't answer any of the scenarios above.
  • Hiding R&D inside COGS. Salaries of the team training your next model are operating expense, not COGS. Mixing them inflates COGS and depresses reported gross margin — investors will unmix it anyway, and trust takes a hit.
  • Ignoring the input/output token split. Output tokens cost 3–5× input tokens at most foundation-model providers. A model that averages over both numbers loses the ability to predict margin when prompt patterns change.
  • Forecasting ARR off seats only. If 70% of your revenue is overage, forecasting ARR off paid seats is forecasting 30% of the business and praying for the rest.
  • One blended gross margin. Free-tier users and enterprise contracts have wildly different unit economics. Model them separately, then weight by mix.

9. A 10-line checklist before sending the model

  1. Every revenue line has at least one volume driver and one price driver.
  2. Inference COGS uses tokens × per-million pricing, by model.
  3. Cache hit rate is a visible, editable assumption.
  4. Free-tier users appear in COGS but not revenue.
  5. R&D headcount is in OpEx, not COGS.
  6. NRR is built from cohorts with logo, seat, and usage expansion.
  7. Four scenarios run from the same driver sheet, not copies.
  8. Contribution margin and GAAP gross margin are both reported.
  9. Cash runway updates automatically from the scenario you pick.
  10. A non-finance reader can change one driver and see the impact.

Need this built for your startup?

I build driver-based models like this for AI-native startups every week.

Fractional CFO engagements range from a 2–3 week financial audit through ongoing board and fundraising support.