Crawl Configuration Guide

Starting a crawl

Open Crawly and click New Crawl. Enter a root URL - this is the starting point Crawly spiders from. In spider mode, Crawly follows every internal link it finds and builds a complete picture of the site. You can also switch to List Mode for targeted audits (see below).

Crawl depth

Crawl depth controls how many link hops Crawly will follow from the seed URL. A depth of 1 crawls only pages directly linked from the homepage. A depth of 5 crawls pages up to five clicks deep.

For most site audits, leave depth unlimited. Restricting depth is useful when you want a quick overview of the top-level structure without waiting for a full crawl of a very large site.

Concurrency

Concurrency is how many pages Crawly fetches simultaneously. The default is 10 - this is fast enough to crawl most sites in a few minutes without putting significant load on the server.

Lower the concurrency (to 1-3) when crawling:

Shared hosting environments that are sensitive to load
Sites with aggressive rate limiting that returns 429 errors under normal concurrency
Staging or development servers with limited resources

You will see a higher proportion of 5xx errors during a crawl if the server is being overwhelmed. That is a signal to reduce concurrency and re-crawl.

robots.txt: respect vs bypass

When Respect robots.txt is on, Crawly honours the disallow rules in your robots.txt file and will not crawl blocked paths - this mirrors how Googlebot behaves.

When to turn it off:

Auditing your own staging or development site (robots.txt often disallows all crawlers on staging)
Checking what is actually behind a disallow rule - useful for verifying that blocked paths contain only what you expect
Crawling a site where you want a complete picture of all URLs regardless of robots directives

Do not bypass robots.txt on third-party sites without permission. For auditing client sites, always use the respect setting unless the client has specifically asked you to check blocked areas.

User agent picker

The user agent Crawly sends affects what the server returns. Some sites serve different content to different crawlers or browsers. The options in Crawly:

Googlebot - use this to see how the site appears to Google. If a site uses user agent detection to serve different content to Googlebot (cloaking), this will surface it. Good default for SEO audits.
Google Smartphone - Googlebot's mobile user agent. Use this on sites where mobile and desktop serve different content or have separate URLs.
Bingbot - useful if you are auditing for Bing specifically or suspect the site treats Bingbot differently.
Chrome 128 - a real browser user agent. Use this when you want to see what a typical user's browser receives, rather than what the site returns to crawlers.
Custom - paste any user agent string. Useful for testing specific crawler behaviours or checking how the site responds to your own bot.

For most audits, Googlebot is the right choice. Switch to Chrome 128 if you are seeing discrepancies between what the crawler finds and what users see.

List mode

List mode lets you paste a set of URLs and audit only those - Crawly does not follow any links. It fetches each URL directly and records its status code, title, meta description, H1, and all other on-page data, exactly as in a spider crawl.

When list mode is most useful:

Post-migration redirect checks. Export your old URLs, paste them into list mode, and verify every one returns a 301 to the correct new URL. Pair this with the Redirect Checker for individual spot-checks.
Targeted audits after content changes. If you updated titles or meta descriptions on 50 pages, paste those 50 URLs and verify the changes went live without re-crawling the entire site.
Auditing a specific section. Export pages from a particular folder or tag from your CMS and audit only those.

List mode results appear in the same tabs as a normal crawl and are compatible with the Issues dashboard - issues are detected and flagged just as they would be in a full spider crawl.

Once you have the right configuration in place, follow the technical SEO audit guide to work through each tab systematically and export your findings.

Ready to run your first crawl?

Download Crawly free - no licence, no page cap.

Download free