How to Evaluate an AI Startup

The AI boom has produced thousands of startups, and a large share of them are a clever prompt wrapped around a model someone else built. Some of those will become real businesses; many won’t. This guide is a practical checklist for evaluating an AI startup — useful whether you’re considering investing, taking a job, or buying the product.

Start with the obvious question: what’s underneath?

Most AI startups sit on top of a foundation model from OpenAI, Anthropic, Google or an open-weight provider. That’s fine — but it raises the key question:

If a model provider shipped this as a feature tomorrow, would the company still exist?

The answer reveals whether you’re looking at a product or a temporary gap in the market. Durable companies have something the model alone doesn’t provide.

Where real defensibility comes from

A thin wrapper is easy to copy. Look for at least one of these moats:

Proprietary data. Access to data others can’t get, which makes the product better in a way a generic model can’t match.
Workflow integration. The product is wired deeply into how a business already operates, so switching is painful.
Distribution. An existing channel or customer base that’s hard for a newcomer to reach.
Domain depth. Hard-won expertise — in law, medicine, manufacturing — encoded into the product, not just the model.

The model is rarely the moat. What surrounds it usually is.

Read the unit economics carefully

AI products have a cost structure traditional software doesn’t: inference is expensive. Every query may call a paid model, so usage costs money in a way a static web app never did.

Questions to ask

What does it cost to serve one customer, and does that shrink with scale?
Are gross margins healthy, or is the company effectively subsidising compute to show growth?
If model prices fall (they have been), does the business improve — or does the value just flow to customers?

A company growing fast while losing money on every query is buying revenue, not building a business.

Look at the team and the data flywheel

The best AI startups improve as they’re used: more usage produces more data, which makes the product better, which attracts more usage. Ask whether such a flywheel exists, or whether the product is static.

On the team, you want people who understand both the domain and the technology. A brilliant ML team with no sense of the customer’s problem tends to build impressive things nobody needs.

Separate the demo from the deployment

AI demos are dangerously good. The gap between a polished demo and reliable production is where most startups stall.

Does it work on messy, real-world inputs, or only on clean examples?
What’s the accuracy on the boring 80% of cases, not the flashy 20%?
What happens when it’s wrong? Is there a safety net, or does a hallucination reach the customer?

Watch the dependency risk

Building on one model provider means inheriting its pricing, availability and policy changes. Strong startups stay model-agnostic where they can, so they can switch providers as the market shifts. Total dependence on a single vendor is a structural risk worth pricing in.

A quick scorecard

Signal	Healthy	Worrying
Defensibility	Data, workflow or distribution moat	Pure prompt wrapper
Margins	Improving with scale	Eroded by inference cost
Demo vs reality	Works on messy inputs	Only shines in demos
Model dependence	Provider-flexible	Locked to one vendor

The bottom line

The right question is never “is the AI impressive?” — most of it is. It’s “what does this company own that a model and a weekend hackathon can’t replicate?” Companies with a clear answer are worth your time. The rest are riding a wave, and waves recede.