What Is llms.txt? The New Standard for AI Content Access
What llms.txt is, how it works, how it differs from robots.txt, the file format, current adoption, and how to create one for your site.
There is a new file joining the collection of well-known files that live at the root of your website. Alongside robots.txt, sitemap.xml, and favicon.ico, you will increasingly see llms.txt. Its purpose is straightforward: provide structured, LLM-friendly content that AI systems can consume directly.
This guide explains what llms.txt is, why it exists, how it differs from robots.txt, the file format, and whether you should create one for your site. For the complete robots.txt reference, see our robots.txt Guide.
The Problem llms.txt Solves
Large language models (LLMs) like GPT, Claude, and Gemini increasingly need to access website content. Sometimes this is for training data. Sometimes it is for retrieval-augmented generation (RAG), where the model looks up current information to answer a user's question. Sometimes it is for AI-powered search results.
The problem is that websites are designed for humans. They have navigation menus, sidebars, footers, cookie banners, JavaScript widgets, and all sorts of visual chrome that is useful for human visitors but is noise for an AI trying to extract the core content.
When an LLM processes a web page, it has to wade through all of that markup to find the actual content. This wastes tokens, introduces noise, and can lead to less accurate responses.
llms.txt provides a solution: a clean, structured file that tells AI systems what your site contains and where to find LLM-friendly versions of your content.
What llms.txt Is
llms.txt is a Markdown file placed at the root of your website (https://yoursite.com/llms.txt) that provides a structured overview of your site's content. It is designed to be consumed by LLMs, not by humans browsing your site or by search engine crawlers.
The file contains:
- A brief description of your site or organization
- Links to key pages with short descriptions
- Optionally, links to Markdown-formatted versions of your content that strip away HTML chrome
The format was proposed by Jeremy Howard (co-founder of fast.ai and Answer.AI) and has gained traction among developer-focused sites, documentation platforms, and content publishers.
How llms.txt Differs from robots.txt
This is the key distinction that people get confused about. robots.txt and llms.txt do fundamentally different things.
robots.txt controls access. It tells crawlers which pages they are and are not allowed to fetch. It is a gate. It says "you may enter here" or "you may not enter here." For the full breakdown, see what robots.txt does.
llms.txt provides content. It does not control access. It provides a structured guide to your content that is optimized for AI consumption. It says "here is what we have and here is the best way to read it."
| | robots.txt | llms.txt | |---|---|---| | Purpose | Control crawler access | Provide structured content for LLMs | | Format | Custom plain text syntax | Markdown | | Audience | All web crawlers | Large language models | | Function | Gatekeeper (allow/deny) | Guide (here's our content) | | Standard | RFC 9309 (formal standard) | Community proposal | | Adoption | Universal | Growing |
They are complementary. You might use robots.txt to block AI training crawlers from scraping your entire site while simultaneously providing an llms.txt file that gives approved AI systems a clean way to access specific content. See our guide on blocking AI crawlers for the robots.txt side of this.
The File Format
The llms.txt format is deliberately simple. It uses standard Markdown with a specific structure.
Basic structure
# Site Name
> Brief description of the site or organization.
## Main sections
- [Page Title](https://yoursite.com/page-url): Short description of what this page covers.
- [Another Page](https://yoursite.com/another-page): Description of this page.
## Optional: Detailed content
- [Page Title (Full)](https://yoursite.com/page-url.md): Markdown version of the page content.
A real-world example
# Acme Documentation
> Acme is a widget management platform. This file provides an overview of our documentation for AI systems.
## Getting Started
- [Quick Start Guide](https://docs.acme.com/quickstart): Set up Acme in 5 minutes.
- [Installation](https://docs.acme.com/install): System requirements and installation steps.
- [Configuration](https://docs.acme.com/config): Configuration options and environment variables.
## API Reference
- [REST API](https://docs.acme.com/api/rest): Complete REST API documentation.
- [GraphQL API](https://docs.acme.com/api/graphql): GraphQL schema and query examples.
- [Webhooks](https://docs.acme.com/api/webhooks): Webhook events and payload formats.
## Guides
- [Authentication](https://docs.acme.com/guides/auth): OAuth2, API keys, and JWT.
- [Rate Limiting](https://docs.acme.com/guides/rate-limits): Rate limit policies and best practices.
The llms-full.txt variant
Some sites also provide llms-full.txt, which contains the full text content of key pages rather than just links. This is useful for smaller sites where putting everything in one file is practical. For larger sites, linking to individual Markdown-formatted pages makes more sense.
Test your robots.txt
Make sure your robots.txt is correctly configured alongside your llms.txt. Validate directives and check for conflicts.
Who Created It
The llms.txt proposal came from Jeremy Howard, published on llmstxt.org. The motivation was practical: as LLMs became better at processing text but continued to struggle with complex HTML pages, there needed to be a way for site owners to provide clean, structured content specifically for AI consumption.
The proposal is not an RFC or formal internet standard. It is a community-driven convention, similar to how robots.txt started as an informal agreement before eventually becoming RFC 9309. Its strength comes from adoption rather than formal standardization.
Additional community resources and tooling have emerged around the standard, documented at llms-txt.org.
Current Adoption
As of early 2026, llms.txt adoption is growing but far from universal. The sites most likely to have an llms.txt file are:
Developer documentation sites. These are a natural fit because they already have structured, text-heavy content that LLMs frequently need to access.
AI companies. Companies building AI products tend to adopt AI-friendly standards early. Many AI tool and API documentation sites have llms.txt files.
Technical blogs and knowledge bases. Sites that produce content frequently referenced by AI assistants have an incentive to provide clean versions of their content.
Open source projects. Projects that want their documentation easily accessible to AI coding assistants.
Mainstream corporate websites, e-commerce sites, and small businesses are largely not adopting llms.txt yet. As LLM usage continues to grow, this will likely change.
How to Create One
Creating an llms.txt file is straightforward.
Step 1: Identify your key content
List the pages that would be most useful for an LLM to know about. Focus on your most important and most frequently referenced content. You do not need to include every page on your site.
Step 2: Write the Markdown file
Follow the format above. Start with a heading and description, then organize your links into logical sections. Write clear, concise descriptions for each link.
Step 3: Optionally create Markdown versions of pages
If you want to provide LLM-optimized versions of your content, create Markdown files that strip away HTML navigation, headers, footers, and other chrome. Just the core content.
Step 4: Deploy to your root URL
Place the file so it is accessible at https://yoursite.com/llms.txt. Make sure it returns a 200 status code and is served as text/markdown or text/plain.
Step 5: Keep it updated
Like a sitemap, llms.txt is only useful if it reflects your current content. Update it when you add, remove, or significantly change key pages.
Relationship with robots.txt and AI Crawler Blocking
Here is where the strategy gets interesting. You can use robots.txt and llms.txt together to create a nuanced AI content access policy.
Scenario 1: Block training, provide reference content. Use robots.txt to block AI training crawlers (GPTBot, ClaudeBot, etc.) from scraping your entire site. Simultaneously provide an llms.txt that points to specific pages you want AI systems to reference when answering user questions. This way your content is not used for training data, but it can still be cited in AI-powered responses.
Scenario 2: Full openness. Allow all crawlers in robots.txt and provide llms.txt for optimized access. AI systems can crawl your site normally and also use the clean Markdown versions from llms.txt for better results.
Scenario 3: Full lockdown. Block all AI crawlers in robots.txt and do not provide an llms.txt. This is the most restrictive approach.
The key point is that robots.txt handles the access control question, and llms.txt handles the content quality question. They address different needs. See the robots exclusion protocol for the formal standard that robots.txt is built on.
llms.txt is voluntary, like robots.txt
There is no technical enforcement. An AI system can ignore your llms.txt and crawl your HTML pages directly (assuming robots.txt allows it). The value of llms.txt is that it provides a better option, not that it restricts anything.
Should You Create One?
Yes, if: your site has documentation, educational content, or reference material that AI systems frequently access. Providing clean versions of this content benefits you (better representation in AI responses) and AI users (more accurate answers).
Maybe, if: you are unsure how AI systems are using your content. Creating an llms.txt is low effort, so the risk is minimal.
Not yet, if: your site is primarily transactional (e-commerce product pages, booking systems) with no content that LLMs would reference. The format is most useful for content-heavy sites.
Regardless of whether you create an llms.txt, make sure your robots.txt accurately reflects your preferences about AI crawling. That is the more urgent and impactful configuration.
Related Articles
References
- llmstxt.org - The llms.txt Proposal
- llms-txt.org - Community Resources
- RFC 9309 - Robots Exclusion Protocol
Test your robots.txt for free
Validate your robots.txt file instantly. Check directives, find crawling issues, and ensure search engines can access your site.
Test Your robots.txt