# Configuring My Site for AI Discoverability Published: April 20, 2026 Tags: geo, seo, cloudflare, llms, web-development How I set up this site for GEO. Raw Markdown, llms.txt, Content-Signal, and the Cloudflare bits that tie it all together. A growing share of web traffic doesn't come from people anymore. It comes from models reading on their behalf. ChatGPT, Claude, Perplexity, Copilot. They fetch a handful of pages, summarize, and ship the answer back. If your site isn't readable by those agents, you don't exist to them. People are calling this [GEO](https://wikipedia.org/wiki/Generative_engine_optimization), short for Generative Engine Optimization. It overlaps with SEO but the priorities are different. Agents don't care about your layout. They care about your prose, your metadata, and how many tokens it costs them to read you. This post covers how I configured this site for GEO. The first half is framework-agnostic. The second half is specific to my setup on Cloudflare, and includes a deliberate choice that fails a popular GEO audit. I'll explain why. ## Part 1: general GEO techniques ### Serve raw Markdown alongside HTML The single biggest GEO win is giving agents a version of each page without the navigation, styling, and scripts. HTML is designed for browsers. Markdown is designed for readers, human or otherwise. Agents spend their context window on your prose, not your DOM. Every blog post on this site has a mirror URL with a `.md` suffix: - `/blog/my-post` is the full HTML page for humans - `/blog/my-post.md` is the raw Markdown, served as `text/markdown` In Astro, this is a two-line route at `src/pages/blog/[slug].md.ts`: ```ts {4} export const GET = async ({ params }) => { const post = await getPostById(params.slug); return new Response(formatPostMarkdown(post), { headers: { "Content-Type": "text/markdown; charset=utf-8" }, }); }; ``` Both variants are pre-generated at build time. Same content, **roughly half the tokens** for an agent to consume. ### Advertise the Markdown version in `` Agents landing on the HTML need to know the Markdown exists. A single `` in the head does it: ```html ``` Browsers ignore this tag. Agents that parse the head follow it. ### Publish an `llms.txt` index [`llms.txt`](https://llmstxt.org/) is a convention for a Markdown file at the root of your site listing your content with short descriptions and links. Think of it as a sitemap an LLM can actually read. I ship two variants: - `/llms.txt` is the index. Title, description, one line per post with a link to its `.md` version. - `/llms-full.txt` is the full corpus. Every post body concatenated into a single response. Why both? An agent researching a specific topic can fetch `llms.txt`, pick the relevant links, and pull them. An agent doing deep research on the site as a whole fetches `llms-full.txt` once and has everything it needs in one request. Either way there's no crawling. ### Declare your AI stance in `robots.txt` `robots.txt` now carries a `Content-Signal` directive for AI use. Mine reads: ```txt {2} User-agent: * Content-Signal: search=yes, ai-train=no, ai-input=yes Allow: / Sitemap: https://morello.dev/sitemap-index.xml ``` Three independent knobs: - `search=yes` lets search engines index - `ai-train=no` says my content is not for training data - `ai-input=yes` says my content _can_ be retrieved and used as input for AI answers This is the stance I'm comfortable with. I want to show up when someone asks Claude about something I've written; I just don't want my posts absorbed into the next base model. > Whether any given operator actually honors this is another question. The signal's there regardless, and I'd rather be on record than silent about it. ### Add structured data that actually describes the content Most blogs ship JSON-LD schema by reflex. Few of them include the fields that help a generative engine decide whether your article is worth fetching. On each post I emit a `BlogPosting` graph with: - `wordCount` and `timeRequired` (ISO 8601 duration), so an agent can estimate how much context it'll spend before fetching - `articleBody`, the full text machine-readable, with no HTML parsing required - `author` linked to a `Person` node with `knowsAbout` so the entity is grounded in real topics - `BreadcrumbList` for site hierarchy All of it goes into a single `@graph` per page rather than scattered `