Why a Checklist Matters in 2026
Most websites are invisible to AI search engines and their owners do not know it. They check Google rankings, ignore everything else, and wake up one day to find that ChatGPT, Perplexity, and Google AI Overviews never mention them.
This is not a vague risk. Over 60% of websites block at least one AI crawler accidentally, and the vast majority lack the structured data, citability signals, and entity clarity that AI models need to recommend a business confidently.
This checklist breaks down the 16 most important AI visibility checks for 2026. You can run them manually using the instructions below, or use our free AI Exposure audit to run all 16 in 60 seconds.
Category 1: Technical SEO (4 Checks)
The foundation. If AI crawlers cannot reach your site or parse it cleanly, nothing else matters.
☐ 1. robots.txt allows AI crawlers and references sitemap
Your robots.txt should not block GPTBot, ClaudeBot, PerplexityBot, Google-Extended, or any other AI crawler. It should also reference your sitemap.
User-agent: *
Allow: /
Sitemap: https://yoursite.com/sitemap.xml
See our full guide to AI crawlers for details on all 11 major bots.
☐ 2. sitemap.xml exists and lists all important pages
A valid /sitemap.xml with <lastmod> dates on every URL. Submit it to Google Search Console and Bing Webmaster Tools so crawlers discover updates fast.
☐ 3. Canonical URL set on every page
Every page should declare its canonical URL:
<link rel="canonical" href="https://yoursite.com/page-path" />
Prevents duplicate-content confusion when AI models compare versions of your page.
☐ 4. Open Graph tags present
Helps social platforms and some AI engines understand your page identity:
<meta property="og:title" content="..." />
<meta property="og:description" content="..." />
<meta property="og:image" content="..." />
Category 2: Content Quality (4 Checks)
AI models prefer content that is clear, factual, and citable. Marketing fluff gets ignored.
☐ 5. Exactly one H1 that describes the page
Each page should have a single <h1> that clearly describes what the page is about. Multiple H1s confuse AI parsing.
☐ 6. At least 1,000 words of informative content on key pages
Pages with under 300 words are routinely deprioritized by AI engines because there is not enough context to cite from. Aim for 1,000+ words on your homepage and key landing pages.
☐ 7. FAQ section with 5+ questions
A clear FAQ section, ideally with FAQPage schema, gives AI engines ready-made Q&A pairs to surface in their answers. This is one of the highest-ROI signals.
☐ 8. Marketing-to-information ratio under 2%
Pages dominated by marketing phrases like “world-class,” “industry-leading,” or “innovative solutions” are penalized. AI models reward fact-rich content with specific numbers, dates, and concrete claims.
Category 3: Structured Data (3 Checks)
Schema.org markup gives AI engines a machine-readable map of your business. See our structured data guide for full code examples.
☐ 9. Organization schema with sameAs links
JSON-LD Organization schema on your homepage with sameAs links to LinkedIn, Twitter, Crunchbase, Wikipedia, and any other authoritative profile. This is the single highest-impact addition for AI entity recognition.
☐ 10. WebSite schema with SearchAction
A WebSite schema with a SearchAction lets AI engines understand how to send users to a search on your site. Especially valuable for content-heavy sites.
☐ 11. FAQPage schema on FAQ content
If you have an FAQ section (check 7), wrap it in FAQPage JSON-LD so AI engines can pull individual Q&A pairs directly into their answers.
Category 4: GEO Readiness (3 Checks)
Generative Engine Optimization signals specific to AI search — these are what differentiates a site that gets cited from one that gets ignored.
☐ 12. llms.txt file at /llms.txt
A machine-readable summary of your site at yoursite.com/llms.txt. Acts as an “elevator pitch” AI models can fall back on. See our llms.txt guide for templates.
☐ 13. Clear entity description in the first section of the homepage
AI engines need to understand who you are in one sentence. Your homepage should clearly state: “X is a [type] that helps [audience] to [benefit].” No marketing fluff — just a clean factual definition.
☐ 14. At least 5 citable blocks (facts, statistics, definitions)
Pages should contain self-contained, fact-rich paragraphs (130-170 words each) with specific numbers, dates, or definitions. These are what AI models quote when answering user questions.
Category 5: AI Crawler Access (2 Checks)
Even with perfect content, blocked crawlers means zero visibility.
☐ 15. All Tier 1 AI bots explicitly allowed
The most important bots to check individually:
| Bot | Company | Role |
|---|---|---|
| GPTBot | OpenAI | ChatGPT training + browsing |
| OAI-SearchBot | OpenAI | ChatGPT search results |
| ChatGPT-User | OpenAI | Live ChatGPT browsing |
| ClaudeBot | Anthropic | Claude content access |
| PerplexityBot | Perplexity | Perplexity citations |
None of these should appear under Disallow in your robots.txt.
☐ 16. Google-Extended and major Tier 2 bots allowed
Google-Extended controls whether your content appears in Google AI Overviews and Gemini. Blocking it has zero impact on Google Search rankings but kills your AI Overviews visibility. Also check Applebot-Extended (Siri), Bytespider (TikTok AI), and CCBot (Common Crawl, used by many models).
How to Run This Checklist in 60 Seconds
You can go through these 16 checks manually — open robots.txt, inspect your HTML, validate schemas, count citable paragraphs — but it takes a few hours per site.
Or you can run a free AI Exposure audit and get all 16 results in under a minute, plus a prioritized action plan with step-by-step fixes and code examples for everything that fails.
What the Best Sites Get Right
The websites that AI engines consistently cite share five traits:
- They were intentional about GEO from day one instead of bolting it on later
- They publish structured data on every important page
- They include a llms.txt file describing their business clearly
- They never block AI crawlers — see our full crawler guide
- They write fact-rich content with specific numbers and citations
You do not need to be a Fortune 500 to get cited by AI. You need to be discoverable, citable, and clearly scoped to your topic.
Want to know exactly which of these 16 checks your site passes or fails? Run a free AI Exposure audit — get your score across all 16 checks in 60 seconds, with a prioritized action plan including step-by-step fixes.