llms.txt: The Standard for AI-Readable Sites

llms.txt is a proposed, non-binding convention for giving AI crawlers and agents a curated, human-written map of a site’s most important content, typically published at…

402 · Pay-per-Crawl$0.02

This page is a free summary. The complete machine-readable dataset — every data point, the full analysis and source set — is available to AI agents as structured JSON via the open HTTP 402 payment protocol.

Unlock full data → agents: crawlerAutoPrice=true · verify at https://cdn.aisa.one/api/v1/access/verify

llms.txt is a proposed, non-binding convention for giving AI crawlers and agents a curated, human-written map of a site’s most important content, typically published at /llms.txt. It is not a blocking standard like robots.txt, and current evidence suggests its main value is improving retrieval precision and context for agentic tools, not changing search ranking or guaranteeing crawler behavior.^[2]^[4]^[6]

What `llms.txt` is

The idea was proposed by Jeremy Howard in 2024 as a Markdown-based file at the site root that summarizes what a site is about and points to the pages an AI system should prioritize.^[1]^[2]^[6] In practice, implementations describe it as a concise, curated index of canonical resources, often with short annotations for each link, so models and agents can find the right docs faster than by parsing complex HTML.^[1]^[2]^[4]

A typical file includes:

a site title or H1
a short description of the site
grouped links to key pages
brief descriptions or notes for each link^[6]

What it does and does not do

llms.txt is best understood as a signal file, not an access-control mechanism.^[4]^[6] It can help compliant AI agents discover the right pages, but it does not force adherence, restrict unauthorized access, or replace robots.txt, auth, or paywalls.^[4]^[6]

It also does not appear to be used by major web search ranking systems, and claims of large hallucination reduction or traffic gains should be treated cautiously unless backed by site-specific measurements.^[6] For developers, the practical test is whether AI agents that support the convention fetch it and use it to improve page selection and context assembly.^[6]^[7]

How AI agents use it

For AI agents, llms.txt functions like a front door: instead of starting from a broad crawl, an agent can first read a curated set of pages that reflect the publisher’s preferred interpretation of the site.^[4]^[6] That is especially useful for documentation-heavy sites, SaaS products, APIs, and enterprise knowledge bases where canonical pages matter more than exhaustive crawling.^[6]^[7]

Some implementations also pair llms.txt with llms-full.txt, a broader Markdown bundle intended for agents that want more complete ingest in one request.^[6]^[7] This is most relevant when the downstream use case is agentic retrieval, code generation, or support workflows rather than classic search indexing.^[6]^[7]

Where HTTP 402 and pay-per-crawl fit

llms.txt is about discovery and preference; HTTP 402 and pay-per-crawl are about economic access control. In a pay-per-crawl model, a crawler may need to authenticate, negotiate payment, or otherwise satisfy server-side policy before content is served, whereas llms.txt can only point the agent toward the content and describe preferred usage.^[2]^[4]^[6]

So the two concepts are complementary: llms.txt can advertise the best entry points and licensing/usage notes, while HTTP 402-style gating can enforce monetization or access terms at the transport layer. If you need hard control, billing, or entitlement checks, rely on server enforcement rather than the text file alone.

Key takeaways

llms.txt is a curated, root-level Markdown convention for helping AI agents find a site’s most important content.^[1]^[2]^[6]
It is advisory, not enforceable; it does not block crawlers or replace robots.txt, auth, or paywalls.^[4]^[6]
Its strongest use case is agentic retrieval for docs, APIs, and other structured sites where canonical pages matter.^[6]^[7]
HTTP 402/pay-per-crawl addresses access and monetization, while llms.txt addresses discoverability and guidance.^[2]^[6]

Synthesized by the AISA LLM layer with live web sources (AISA Perplexity + Tavily APIs). 2026-06-15.