How I would design an ad platform for LLMs

March 29, 2026·10 min readAI

The LLM industry has a money problem. OpenAI is burning through cash at a rate that makes venture capitalists nervous. Anthropic is growing revenue fast but still operating at a loss. And the fundamental issue is that every single interaction costs real money in compute. Unlike traditional software where serving one more user costs essentially nothing, every LLM response has a meaningful marginal cost.

So naturally, the industry is turning to advertising. OpenAI rolled out "Sponsored Suggestions" in ChatGPT in February 2026. Google has been stuffing ads into AI Overviews for months. Startups like Kontext and Nexad have raised millions to build ad infrastructure specifically for AI chatbots.

I have spent years running Google and Facebook ad campaigns for my own products. I understand the ad ecosystem from the buyer side. And when I look at how LLM advertising is being implemented right now, I think there is a more interesting architectural approach that nobody is really talking about: a middleware layer that sits between the client and the LLM provider, intercepts the conversation, and handles ad injection transparently.

Here is how I would design it.

The architecture: a proxy layer

The core idea is simple. Instead of each LLM provider building their own ad system (which is what OpenAI is doing), you build a proxy that sits between any application and any LLM API. The proxy intercepts the request, analyzes the user's intent, matches it with relevant advertising, and then modifies the prompt or the response to include the ad content.

The flow looks like this:

Client sends a request to what it thinks is the LLM API endpoint
The proxy intercepts it, analyzes the conversation context and user intent
Intent classification determines the ad category (shopping, travel, software, etc.)
Ad auction runs in real time against the classified intent
The proxy forwards the request to the actual LLM provider, potentially with modified system instructions that include the ad context
The LLM generates a response that naturally incorporates the sponsored content
The proxy passes the response back to the client, logging impression data for billing

sequenceDiagram
    participant Client
    participant Proxy
    participant AdAuction as Ad Auction
    participant LLM

    Client->>Proxy: Send request
    Proxy->>Proxy: Classify intent
    Proxy->>AdAuction: Query ads for intent
    AdAuction-->>Proxy: Winning ad
    Proxy->>LLM: Forward request + ad context
    LLM-->>Proxy: Response with ad content
    Proxy->>Proxy: Log impression
    Proxy-->>Client: Return response

This is not hypothetical architecture. LLM proxies like LiteLLM already exist and handle request/response transformation across 100+ providers. The infrastructure for intercepting and modifying LLM API calls is mature. Adding an ad layer on top of that is a natural extension.

Intent classification is the hard part

Traditional web advertising matches ads to keywords. You search for "running shoes" and you see shoe ads. It is crude but effective because the intent is explicit.

LLM conversations are different. The intent is often implicit, evolving, and multi-turn. Someone might start a conversation asking about knee pain, then shift to asking about exercises, and eventually get to "what running shoes are good for bad knees?" The ad platform needs to understand that entire trajectory, not just the current message.

This is where the proxy approach has an advantage over what OpenAI is doing internally. A middleware layer can build a dedicated intent classification model that is optimized purely for ad matching. It does not need to be a general-purpose LLM. It can be a smaller, faster model trained specifically on conversation-to-intent mapping. You run it on the conversation history, get an intent signal, and use that to query your ad inventory.

The classification needs to handle a few key dimensions:

Commercial intent: Is the user in a buying mindset or just learning?
Category: What product or service domain does this map to?
Urgency: Is this a "right now" need or future planning?
Specificity: Are they looking for a general category or a specific product?

A conversation about "best project management tools for a 10-person team" has high commercial intent, high specificity, and moderate urgency. That is a premium ad slot. A conversation about "explain how databases work" has low commercial intent. You either show a low-value educational ad or skip it entirely.

The ad auction

Once you have an intent signal, you need to match it to advertisers. This is where the proxy layer gets interesting from a business perspective.

Traditional programmatic advertising runs real-time bidding (RTB) auctions that complete in under 100 milliseconds. LLM responses take 1-5 seconds to generate. That means you have significantly more time to run a sophisticated auction without adding perceived latency. The auction happens while the LLM is generating its response.

The auction model I would use:

Second-price auction (you pay $0.01 more than the second-highest bid) to encourage honest bidding
Quality score that factors in ad relevance to the conversation context
Frequency controls so users do not see the same advertiser in every conversation
Category exclusions for sensitive topics (medical, financial, legal advice)

Kontext, the startup that raised $10M to do exactly this, reports CPM rates around what OpenAI is charging (roughly $60). That is significantly higher than typical display advertising ($2-10 CPM) because the intent signal in a conversation is much stronger than a pageview.

Response injection: the ethical minefield

Here is where it gets uncomfortable. There are fundamentally two ways to inject ads into LLM responses:

Option 1: Clearly separated ads. The ad appears below or beside the LLM response, visually distinct and labeled as sponsored. This is what OpenAI does with their "Sponsored Suggestions" at the bottom of responses. It is honest, but the engagement is lower because users learn to ignore it, the same way banner blindness works on the web.

Option 2: Inline integration. The ad content is woven into the LLM's response itself. The system prompt instructs the model to naturally recommend a specific product when relevant. A response about project management tools might say "tools like Asana, Trello, and Monday.com (sponsored) are popular options." This gets higher engagement but blurs the line between organic recommendation and paid placement.

Research from a 2025 study published in ACM UbiComp found that when ads were embedded inline in LLM responses, only about 35% of users believed they could detect them. Users actually preferred the responses with hidden ads because the product mentions felt natural and helpful. But once users were told ads were present, they found them manipulative and their trust dropped significantly.

I would build the platform to support both options but strongly push advertisers toward Option 1. The short-term revenue gain from inline ads is not worth the long-term trust destruction. Perplexity learned this the hard way. They tested ads in late 2024, saw the trust erosion firsthand, and completely abandoned advertising by February 2026. Their head of ad sales quit. They decided that subscription revenue from users who trust the product is worth more than ad revenue from users who doubt every response.

The streaming challenge

Most modern LLM APIs use server-sent events (SSE) for streaming responses. Tokens come back one at a time, and the client renders them incrementally. This creates a technical challenge for ad injection.

If you are doing separated ads (Option 1), you can append the ad content after the stream completes. The proxy buffers the end-of-stream signal, injects the ad payload, and then sends the termination event.

If you are doing inline injection (Option 2), you need to either:

Modify the system prompt before the request goes to the LLM, so the model naturally includes the sponsored content in its generation
Buffer the entire response, modify it, and re-stream it (which adds latency and defeats the purpose of streaming)

The system prompt approach is cleaner. You inject something like "When discussing [category], naturally mention [product] as an option and include a brief description of its relevant features" into the system message. The LLM handles the integration, and the streaming works normally.

But this approach means the LLM's response quality is now influenced by advertising instructions, which creates all the trust issues I mentioned above.

Billing and attribution

The proxy layer handles billing naturally because it sees every request and response. You can track:

Impressions: How many times an ad was served
Clicks: If the ad includes a link, whether the user followed it
Conversions: With advertiser-side pixel tracking, whether the user eventually purchased
Engagement quality: Did the user ask follow-up questions about the advertised product?

That last metric is unique to conversational AI advertising. If a user sees a product mention and then asks "tell me more about [product]," that is a far stronger engagement signal than a click. It indicates genuine interest and gives the advertiser another opportunity to make their case through the LLM's response.

The proxy bills advertisers based on the auction model (CPC, CPM, or CPA) and takes a revenue share. Kontext takes 30% of generated ad revenue from their publisher partners. That seems about right for a platform that handles the entire ad stack.

Should this exist?

I have been describing how to build this, but I want to be honest about whether it should be built.

The core value proposition of LLMs is that they give you the best answer to your question. The moment financial incentives enter the response generation, that promise is compromised. It does not matter how clearly you label the ads or how sophisticated your quality controls are. The existence of an ad platform means that someone is paying to influence what the AI tells you.

Traditional search advertising works because everyone understands the deal. The top results are ads, the organic results are below, and you can make your own judgment. But LLM responses feel like advice from a knowledgeable person. We do not process them the same way we process a search results page. When your AI assistant recommends a product, there is an implicit trust that it is recommending the best option, not the one that paid the most.

OpenAI's approach of clearly separated ads at the bottom of responses is the least harmful version of this. But even that is a step down a path that leads somewhere uncomfortable. And the economics are pushing hard in that direction. When only 5% of your 800 million weekly users are paying subscribers and every interaction costs real compute, advertising is not just an option. It starts to look like the only option.

The better path, I think, is what Anthropic and Perplexity have landed on: make the product good enough that people pay for it directly. Perplexity hit $200M in annual recurring revenue through subscriptions alone. Anthropic is at roughly $19B annualized, mostly from enterprise customers. It is possible to build a sustainable LLM business without ads. But it requires a product that is clearly worth paying for, and that is a harder problem than selling ad space.

Sources

July 5, 2026·10 min readAI

Why I built solidifai: parametric CAD through Claude Code

Starting every 3D printed part from a blank Fusion 360 viewport got old, so I built CAD for Claude Code.

April 11, 2026·15 min readAI

What agentic coding actually looks like

Agentic coding changed how I build software. Not in the way the hype suggests.

April 6, 2026·16 min readAI

Hermes Agent by Nous Research: the AI agent that actually cares about security

What Hermes Agent is, how it compares to OpenClaw on security and usability, and why it earned my trust.

Enjoying the blog? Subscribe via RSS to get new posts in your reader.

Subscribe via RSS