How to Crawl Any Website with Claude Code
Crawly's built-in MCP server connects your crawl data directly to Claude Code. Here's how to set it up and what you can ask it.
12 May 2026 · 6 min read
Most SEO crawlers give you a table of data and leave you to figure out what it means. Crawly does something different: it ships with a built-in Model Context Protocol (MCP) server that connects your crawl data directly to Claude Code. For the full list of MCP tools and capabilities, see the MCP integration feature page. Instead of exporting a CSV and filtering it manually, you just ask Claude what you want to know.
This guide walks through how to set it up and what you can do with it.
What the MCP integration actually does
When Crawly crawls a site, it stores every URL in a local SQLite database on your Mac - titles, meta descriptions, H1s, status codes, response times, images, hreflang tags, security headers, and more.
The MCP server sits on top of that database and exposes it as a set of tools Claude Code can call. When you ask Claude a question like "which pages are missing a meta description?", Claude calls Crawly's get_pages tool with the right filter, reads the results, and answers in plain English. No SQL. No spreadsheet. No intermediate steps.
Prerequisites
- Crawly installed on your Mac - download it here (free, no page cap)
- Claude Code set up and working in your terminal
Step 1: Run a crawl in Crawly
Open Crawly, click New Crawl, and enter the URL you want to audit. You can configure the crawl depth, concurrency, user agent, and whether to respect robots.txt. For most sites, the defaults work fine.
Hit start and let it run. Crawly will spider the site and store everything locally. You can watch pages come in live in the Pages tab.
Once the crawl completes, the data is ready to query. You do not need to do anything else inside the app.
Step 2: Add Crawly to your Claude Code MCP config
Claude Code reads MCP server configuration from a settings file. Open your Claude Code settings (or run claude mcp add from your terminal) and add Crawly as a server. Crawly's documentation covers the exact config path for your version of the app.
Once added, restart Claude Code. You should see Crawly listed as an available tool when you run /mcp in the Claude Code terminal.
Step 3: Ask your first question
Open Claude Code and start a new session. Try a simple question first to confirm the connection is working:
What crawls have been run recently?
Claude will call the list_crawls tool and return a list of your recent crawls with their URLs, page counts, and dates.
Now try something more specific:
Which pages on the last crawl are missing a meta description?
Claude identifies the most recent crawl, calls get_pages with the missing_meta filter, and returns a list of URLs - with their titles and H1s for context.
Example queries and what they return
Here are queries that work well, along with what Crawly returns:
Find SEO issues
- "What are the top SEO issues on this crawl?" - calls
get_issues, returns counts grouped by category (title, meta, heading, errors, security) - "Show me all pages with duplicate H1s" - filters pages by
duplicate_h1and groups them by shared heading text - "Which pages have titles over 60 characters?" - filters by
title_too_long, returns URLs with their full title text
Diagnose technical problems
- "Show me all 4xx errors and the pages that link to them" - calls
get_broken_with_inlinks, returns each broken URL alongside its inbound links so you know exactly where to fix the source - "Which pages are non-indexable?" - filters by
non_indexable, includes the reason (noindex tag, canonical pointing elsewhere, etc.) - "What redirect chains exist on this site?" - surfaces URLs with multi-hop redirect sequences
Audit images
- "List all pages with images missing alt text" - filters by
missing_alt, returns page URLs and image counts - "How many images are missing alt text across the whole site?" - calls
get_summarywhich includes the aggregate image stats
Read page content
- "What does the homepage actually say?" - calls
get_page_contentfor the root URL, returns the full Markdown-rendered body text so Claude can read and reason about the content itself
Triggering a new crawl from Claude Code
You can also start a crawl directly from Claude Code without opening the Crawly app:
Crawl https://example.com and tell me what the main SEO issues are.
Claude calls the crawl_site tool, waits for it to complete, then immediately queries the results and gives you a summary. The crawl also appears in the Crawly app as normal - shared database, both directions.
Practical use cases
Client site audits
Crawl a client's site, then ask Claude to write a structured summary of the issues found. You get a first-draft audit report in seconds rather than spending an hour filtering tables and writing up findings.
Pre and post-migration checks
Crawl the site before a migration, then crawl again after. Ask Claude to compare the two crawls: "What changed between the two most recent crawls?" Crawly's get_issue_diff tool returns added, removed, and changed URLs with field-level diffs.
Competitor research
Crawly crawls any site, not just ones you own. Crawl a competitor's site and ask Claude what their site structure looks like, how deep their content goes, or what schema types they are using. For pulling specific data points from every page, use Crawly's custom extraction feature alongside the MCP queries.
What the tools cannot do
Crawly's MCP is read-only - it queries crawl data, it does not make changes to any website. It also cannot render JavaScript (it crawls raw HTML), so pages that rely heavily on client-side rendering may show incomplete data.
For most standard websites and the typical SEO audit workflow, neither of these limitations matters.
The combination of a fast native crawler and a natural language interface removes the biggest friction point in technical SEO: getting from raw crawl data to actionable insight. Instead of spending time in spreadsheets, you spend it fixing problems.
Download Crawly and give it a try.