Web Data
The Crawly Index
Our web index powers every backlink, authority score, and spam signal in Crawly's tools and API. 124M domains. 4.75B backlinks. Processed, scored, and updated every month.
What is the Crawly Index?
The Crawly Index is our processed web dataset covering 124 million domains and 4.75 billion links between them. We build and maintain it from open web crawl data - the same kind of publicly available, openly licensed data that underpins many of the web's most widely used datasets.
From that raw crawl, we run our own processing pipeline to extract the link graph, calculate authority signals, rank every domain by harmonic centrality and PageRank, and produce the scores and metrics exposed across Crawly's free tools, REST API, and MCP server.
The index is refreshed monthly. Every tool, every API call, and every MCP tool response is backed by the same underlying dataset.
How authority scores are calculated
Our scoring pipeline runs in three stages after each monthly index refresh.
Link graph extraction
We extract all domain-level links from the crawl - who links to whom, with what anchor text, from how many pages. This produces the raw link graph.
Harmonic centrality ranking
We run harmonic centrality over the full link graph to rank every domain by how central it is to the web's overall link structure. The more high-authority sites that link to a domain - and the more those sites are themselves well-linked - the higher it ranks.
Score normalisation
We map each domain's harmonic rank to a 0-100 authority score using a logarithmic formula. This ensures scores are spread across the full range - mid-tier domains score in the 20-50 band rather than clustering near zero.
Data points in the index
Every domain in the index has some or all of the following data points available.
Authority Score
A 0-100 score calculated from harmonic centrality across the full link graph. Higher means more authoritative.
Referring Domains
The number of unique domains linking to a given domain. The single most important signal for authority.
Total Backlinks
The total count of individual inbound links, including multiple links from the same domain.
Harmonic Rank
A domain's global rank by harmonic centrality - a measure of how central it is to the web's link structure.
PageRank Rank
A domain's global rank by PageRank - the original link-based ranking algorithm, still a strong authority signal.
Host Count
The number of unique IP hosts linking to a domain. Used as an IP diversity signal in spam scoring.
Spam Score
A 0-100 risk score derived from link density, IP diversity, and authority signals. Lower is better.
Link Quality Rating
Each linking domain is rated High, Medium or Low quality based on its own authority signals.
How to access the data
Crawly Index data is available in three ways depending on how you work.
REST API
Programmatic access to domain authority, backlinks, and spam scores. 100 requests/day free.
MCP Server
Query the index in plain English from Claude Code, Cursor, Windsurf or Codex.
Tools powered by the Crawly Index
All free, no login required. Also see the Top 1000 Sites by Domain Authority.