
Python Backlink Checker: Pull Backlink Data with the RankParse API
Build a Python backlink checker using the RankParse SDK. Working script, pagination, bulk analysis, and error handling — tested against real domains.
You want backlink data in a Python script. Not in a dashboard. Not downloaded as a CSV. In a script, as a Python dict, so you can pipe it into whatever you're building.
Most backlink APIs make this harder than it should be. OAuth flows, SDKs you have to install, rate limits that aren't documented until you hit them. The RankParse Python SDK is designed to get out of your way: install it, pass your API key, call a method.
This tutorial covers exactly that. We'll build a working backlink checker, handle pagination for large domains, run a bulk competitor audit, and deal with the errors you'll actually hit. All tested against real domains.
What you need before starting
- Python 3.8+ — f-strings and context managers are the only requirements.
rankparse—pip install rankparse- A RankParse API key — sign up here. No credit card required. The free tier gives you 100 credits, enough to run everything in this post and then some.
Store your key as an environment variable. Never put it in source code.
export RANKPARSE_API_KEY="rp_your_key_here"The working script

Here's the complete script. Everything else in this post builds on this foundation.
import os
from rankparse import RankParseClient
DOMAIN = "stripe.com"
with RankParseClient(api_key=os.environ["RANKPARSE_API_KEY"]) as client:
result = client.backlinks(DOMAIN, limit=100)
backlinks = result["data"]
print(f"{len(backlinks)} backlinks returned")
for link in backlinks[:5]:
print(link["from_url"], "→", link["to_url"])We ran this against stripe.com and got back 100 results immediately, pulling from Common Crawl's quarterly snapshot. The top results came from HackerNews, dev.to, TechCrunch, and GitHub — exactly the kind of signal you'd expect for Stripe.
Three things worth understanding:
The context manager. with RankParseClient(...) as client: handles connection setup and teardown automatically. The underlying HTTP client (httpx) is closed cleanly when the block exits, even if an exception is raised. For scripts, this is the right pattern. For a long-running server, you'd instantiate RankParseClient once and call client.close() on shutdown.
The domain parameter. Pass stripe.com, not https://stripe.com or stripe.com/. The SDK strips the protocol and trailing slash, but being explicit prevents subtle bugs when you're looping over a list.
Empty results aren't errors. If a domain has no backlinks in the dataset, you get a 200 with "data": [] — never a 404. That's the correct behavior for a bulk pipeline where missing data shouldn't crash the loop.
What each backlink looks like
API response shape — /v1/backlinks
Key fields
The full response envelope:
{
"data": [
{
"from_url": "https://news.ycombinator.com/item?id=29382910",
"to_url": "https://stripe.com/blog/payment-api-design",
"anchor_text": "payment API design",
"link_type": "follow",
"domain_authority": 91
}
],
"meta": {
"total": 4821,
"offset": 0,
"limit": 100
}
}meta.total is an approximation — it's computed from index statistics, not a live count. For a domain like stripe.com the true number of backlinks in the quarterly snapshot is in that range, but don't use total as a precise figure. Use it to decide how many pages to fetch.
link_type is either "follow" or "nofollow". In our testing across 50 domains, roughly 70–80% of backlinks in the Common Crawl dataset are follow links. The ratio varies significantly by industry — software documentation pages tend to attract more follow links than news sites.
Handling pagination for large domains
One API call returns at most 1000 results. For a domain like stripe.com with thousands of backlinks, you need to page through using offset.
import os
from rankparse import RankParseClient
LIMIT = 100
def fetch_all_backlinks(domain, max_pages=50):
all_links = []
offset = 0
with RankParseClient(api_key=os.environ["RANKPARSE_API_KEY"]) as client:
while True:
body = client.backlinks(domain, limit=LIMIT, offset=offset)
batch = body["data"]
if not batch:
break
all_links.extend(batch)
offset += len(batch)
total = body["meta"]["total"]
print(f" {len(all_links)} / ~{total}")
if len(all_links) >= total or len(all_links) >= max_pages * LIMIT:
break
return all_links
links = fetch_all_backlinks("stripe.com")
print(f"Done. {len(links)} backlinks.")Two details that matter:
The loop terminates when the SDK returns an empty batch — that's the reliable signal. meta.total is an estimate, so using it as the exact stop condition can leave you one page short or one page over.
The SDK sets a default timeout of 30 seconds per request. You can override it at construction time: RankParseClient(api_key=..., timeout=60.0). Typical response times are under 500ms — 30 seconds is conservative and should never fire under normal conditions.
Each call costs 2 credits. A full paginated run of stripe.com (about 50 pages) costs 100 credits.
Paginating a large domain — offset pattern
stripe.com · ~5,000 backlinks · limit=100
Bulk competitor analysis
If you have a list of domains to audit, this script fetches a backlink summary for each one and writes the results to CSV.
import csv
import os
from rankparse import RankParseClient
DOMAINS = ["stripe.com", "vercel.com", "supabase.com", "railway.app"]
def get_backlink_summary(client, domain):
body = client.backlinks(domain, limit=100)
backlinks = body["data"]
total = body["meta"]["total"]
avg_da = (
sum(link["domain_authority"] for link in backlinks) / len(backlinks)
if backlinks else 0
)
return {
"domain": domain,
"total": total,
"avg_da": round(avg_da, 1),
"sample": len(backlinks),
}
results = []
with RankParseClient(api_key=os.environ["RANKPARSE_API_KEY"]) as client:
for domain in DOMAINS:
print(f"Checking {domain}...")
summary = get_backlink_summary(client, domain)
results.append(summary)
print(f" {summary['total']} backlinks, avg DA {summary['avg_da']}")
with open("backlink_summary.csv", "w", newline="") as f:
writer = csv.DictWriter(f, fieldnames=["domain", "total", "avg_da", "sample"])
writer.writeheader()
writer.writerows(results)
print("Saved to backlink_summary.csv")Notice that the single RankParseClient instance is shared across all domain calls. One context manager, one HTTP connection pool — more efficient than creating a new client per domain.
We ran this against four developer infrastructure companies. Results from the most recent Common Crawl snapshot:

Bulk backlink summary — 4 domains
stripe.com
vercel.com
supabase.com
railway.app
avg_da = average domain authority of the first 100 linking domains
Four domains, 8 credits, about 3 seconds. That's a competitive backlink audit for a one-time cost of a few cents.
Error handling in production
The SDK raises typed exceptions instead of making you inspect HTTP status codes. Import what you need:
from rankparse import RankParseClient
from rankparse.errors import AuthError, InsufficientCreditsError, RateLimitError, APIErrorThree errors you'll actually encounter:
AuthError — wrong or missing API key (HTTP 401). Check that RANKPARSE_API_KEY is set in your environment.
InsufficientCreditsError — out of credits (HTTP 402). Check your balance at rankparse.com/dashboard or via client.credits().
RateLimitError — too many requests (HTTP 429). Add a short sleep and retry:
import time
import os
from rankparse import RankParseClient
from rankparse.errors import RateLimitError
def fetch_with_retry(domain, retries=3):
with RankParseClient(api_key=os.environ["RANKPARSE_API_KEY"]) as client:
for attempt in range(retries):
try:
return client.backlinks(domain, limit=100)
except RateLimitError:
wait = 2 ** attempt
print(f"Rate limited. Waiting {wait}s...")
time.sleep(wait)
raise RuntimeError(f"Failed after {retries} retries")Exponential backoff (1s, 2s, 4s) is usually enough. You're unlikely to hit rate limits on a simple bulk loop — they're more relevant for high-concurrency pipelines.
What the data is and isn't
The data comes from Common Crawl, a nonprofit that crawls billions of pages quarterly and publishes the results publicly. RankParse processes those snapshots into queryable backlink data served on Cloudflare's edge.
What this means for your use case:
- Batch audits, competitor research, AI pipelines: quarterly freshness is fine. The link profile for an established domain doesn't change dramatically in three months.
- Monitoring a live campaign: not the right tool. If you need to know whether a link appeared in the last 24 hours, use Ahrefs or Semrush.
- Checking a new domain: backlinks discovered after the last crawl won't appear yet. Common Crawl runs roughly every 3 months; check commoncrawl.org for the most recent release date.
Being explicit about this saves debugging time later.
What to build next
- Add referring domains to count unique linking domains, not just individual links — use
client.referring_domains(domain). - Combine backlinks with domain authority scores to filter by link quality —
client.domain_authority(domain). - Use the batch endpoint to pull multiple signal types for a domain in one request instead of looping individually.
- Connect the MCP server to query backlink data directly from Claude or Cursor without writing any code.
Frequently asked questions
How do I get backlinks for a domain in Python?
Install the rankparse package (pip install rankparse), then use RankParseClient with your API key: client.backlinks("yourdomain.com"). The method returns a dict with a data list of backlink objects. Full working code is in the script above.
Is there a free Python backlink checker?
Google Search Console provides free backlink data for your own domains via a manual export (no API). For programmatic access to any domain, RankParse offers 100 free credits on signup — enough for 50 domain lookups. There's no ongoing free tier for bulk API access.
How accurate is backlink data from Common Crawl?
Common Crawl covers billions of pages per quarterly crawl, making it one of the largest open web datasets available. Coverage skews toward higher-traffic pages. Thin or newly-published pages may not appear in a given snapshot. For most competitive research and bulk audits, the coverage is sufficient.
How do I handle pagination in the RankParse backlink API?
Use the offset parameter: client.backlinks(domain, limit=100, offset=100). The meta.total field gives an approximate count of total backlinks. Loop until the SDK returns an empty data list — that's the reliable stop signal. See the fetch_all_backlinks() function above for the complete implementation.
Start with 100 free credits
No subscription. No card. $0.009 per call after that, and credits never expire.
Get your free API key