Free Backlink CheckerFree Backlink Checker
Crawly
All articles
Technical SEOIndexing

What is Page Indexability?

Indexability determines whether Google can discover, crawl, and add a page to its index. Here is every factor that can prevent a page from being indexed.

16 May 2026 · 7 min read

Page indexability refers to whether a search engine can discover, crawl, and add a page to its index. An indexable page is eligible to appear in search results. A non-indexable page is not, regardless of its content quality or backlink profile.

A page can fail to be indexed for many different reasons. Some are deliberate (a noindex tag on a login page), and some are unintentional (a misconfigured robots.txt blocking Googlebot from your entire site). Identifying and fixing unintentional indexability issues is a core part of technical SEO.

The indexability pipeline

For a page to appear in search results, it must pass through several stages:

  1. Discovery: Google must know the URL exists. It finds URLs through sitemaps, internal links, and external backlinks.
  2. Crawling: Googlebot must be permitted to visit the URL. If the URL is blocked by robots.txt, it cannot be crawled.
  3. Rendering: Google processes the page's HTML and, for JavaScript-heavy sites, executes JavaScript to see the fully rendered content.
  4. Indexing: Google decides whether to add the page to its index. Several signals affect this decision: content quality, canonicalisation, noindex directives, and whether the page is a duplicate.

A page can fail at any of these stages.

Common reasons a page is not indexable

Blocked by robots.txt

A Disallow rule in your robots.txt file stops Googlebot from crawling the URL. If Google cannot crawl the page, it cannot index it.

User-agent: Googlebot
Disallow: /private/

Any URL matching /private/ is blocked. Pages in this path will not be crawled.

Note: robots.txt blocks crawling, not indexing. A blocked URL can still appear in search results (without a snippet) if it receives external backlinks. To remove a page from the index, use a noindex tag rather than robots.txt.

Noindex directive

A <meta name="robots" content="noindex" /> tag in the page's <head> tells Google not to include the page in its index. Google must be able to crawl the page to read this tag.

The X-Robots-Tag HTTP response header achieves the same effect for non-HTML resources.

Canonical tag pointing elsewhere

If a page's canonical tag points to a different URL, Google treats the other URL as the preferred version and typically will not index the page with the canonical tag on it.

This is intentional behaviour when dealing with duplicate content. It becomes a problem when canonical tags are misconfigured and point to the wrong URL, accidentally suppressing pages you want indexed.

Redirect

A page that redirects to another URL is not itself indexed. Google indexes the destination of the redirect. If the destination is also a redirect (a redirect chain), Google follows the chain to the final URL and indexes that.

4xx or 5xx response code

Pages returning 404 (Not Found), 410 (Gone), 500 (Server Error), or other error codes cannot be indexed. Google crawls the URL, receives an error, and does not index it.

Thin or duplicate content

Even when a page is technically crawlable and has no explicit noindex tag, Google may choose not to index it if it determines the content is too thin, duplicates another page, or provides no value to users.

This is a soft failure: the page is technically indexable but Google elects not to index it. Canonicalisation, content improvement, or consolidation are the relevant fixes.

Soft 404

A soft 404 is a page that returns a 200 HTTP status code but displays a "not found" or empty results message. Google recognises these and may treat them as 404s for indexing purposes. Common on search results pages, empty category pages, and out-of-stock product pages.

JavaScript-only content

If page content is rendered exclusively via JavaScript and Google cannot execute it fully, the indexed version may be empty or near-empty. Google does crawl JavaScript but with a delay and some limitations. Critical content should be in the server-rendered HTML wherever possible.

How to check if a page is indexed

The quickest check for a specific URL is the site: operator in Google:

site:example.com/the-page/

If the page appears in results, it is indexed. If it does not, it may be non-indexable or simply not yet crawled and indexed.

For a complete picture of your site's indexability, use Google Search Console's Pages report. It groups all your URLs by indexing status and explains why each non-indexed page was excluded.

For a site-wide audit without Search Console access, use an SEO crawler. Crawly crawls every URL on your site and reports the status code, noindex status, canonical tag, and robots.txt rules that apply to each one, giving you a complete indexability map of your site.

Fixing indexability issues

Issue Fix
Blocked by robots.txt Remove the Disallow rule (if unintentional) or accept the block (if intentional)
Noindex tag Remove the tag if the page should be indexed
Wrong canonical Correct the canonical to point to the right URL
Redirect Update internal links and sitemap to point to the final URL
4xx error Restore the page, redirect to a relevant URL, or remove all internal links to it
Thin content Expand and improve the content, or consolidate with a more complete page

After applying fixes, submit the URL for recrawling via Google Search Console's URL Inspection tool. Indexing changes typically take days to weeks to propagate.


Indexability underpins everything else in SEO. A page that is not in Google's index cannot rank for anything. Download Crawly to audit every page on your site and surface indexability issues across your full URL set.

Try it yourself with Crawly

Free to download. No page cap. Claude Code MCP built in.

Download free