
Find Your Competitor's Best Backlinks (Python + API)
Use the RankParse competitor-gap API to find domains linking to your competitor but not you. Working Python script, DA filtering, real stripe.com example.
Backlink gap analysis is one of those tasks that everyone agrees is valuable and almost nobody automates. You open Ahrefs, type in a competitor, type in yourself, export a CSV, sort by DR in Excel, and start emailing. A week later the data is stale and you do it again.
Ahrefs tells you what links exist. It does not tell you which ones to go get. That second step — turning the gap into a prioritized outreach list — is where most of the time goes, and it is exactly what a script should be doing.
This post walks through the RankParse competitor-gap endpoint: what it returns, how to call it from Python with the SDK, and how to turn the result into a ranked target list. We tested it against stripe.com and built a real outreach pool from the output. There's also a section at the end on doing the same analysis through Claude or ChatGPT without writing any code — useful if your SEO lead doesn't want to learn Python.
What the competitor-gap endpoint returns
GET /v1/competitor-gap?domain=YOU&vs=COMPETITOR returns every referring domain that links to your competitor but does not link to you, sorted by the domain authority of the linking site.
A trimmed sample response for domain=railway.app&vs=stripe.com:
{
"domain": "railway.app",
"vs": "stripe.com",
"data": [
{
"from_domain": "github.com",
"total_links": 4128,
"from_domain_score": 97
},
{
"from_domain": "developer.mozilla.org",
"total_links": 312,
"from_domain_score": 93
},
{
"from_domain": "smashingmagazine.com",
"total_links": 41,
"from_domain_score": 88
}
],
"total": 18742,
"returned": 50
}Three fields per row, and they each tell you something different:
from_domain— the site that links to your competitor.total_links— how many distinct pages on that site link to the competitor. A site that links 40 times is a stronger signal than one that links once.from_domain_score— the domain authority of the linking site (0–100), the same score returned by/v1/domain-authority.
total is the size of the full gap pool — how many distinct referring domains exist in the gap before the response was sliced. returned is how many came back in data after the limit and sort.
One detail that matters: the server already sorts results by from_domain_score descending before slicing. It scores a pool of up to 2000 candidates from the gap and returns the top limit by DA. So the first row is always the highest-DA opportunity, not the most-linked one. You don't need to re-sort the response — but you do need to know this when you write filtering logic.
The endpoint costs 5 credits per call regardless of how many gaps it returns. One competitor, one credit charge.
The working Python script

Install the SDK and set your key:
pip install rankparse
export RANKPARSE_API_KEY="rp_your_key_here"The minimal call:
import os
from rankparse import RankParseClient
YOU = "railway.app"
VS = "stripe.com"
with RankParseClient(api_key=os.environ["RANKPARSE_API_KEY"]) as client:
result = client.competitor_gap(YOU, vs=VS, limit=100)
gaps = result["data"]
print(f"{result['total']} domains in gap pool, {len(gaps)} returned")
for row in gaps[:10]:
print(f" DA {row['from_domain_score']:>2} {row['from_domain']:<40} ({row['total_links']} links)")Running this for railway.app vs stripe.com against the most recent Common Crawl snapshot returned 18,742 gap domains in the pool, with the top 100 by DA in the response. The first ten rows were all DA 85+ — major developer publications, university sites, and government domains all linking to Stripe but not to Railway.
A few things worth understanding about this snippet:
The argument order. competitor_gap(domain, vs=...) puts your domain first and the competitor as a keyword argument. The mental model is "find gaps for me, versus them" — the first argument is whose link profile you want to fill.
limit defaults to 50, max is 200. Passing limit=500 gets you 200; the API clamps silently. If you need more than 200 prioritized targets in one pass, you have bigger sourcing problems than a tool can solve.
Empty data is normal. If your domain and the competitor have nearly identical referring domain sets (rare, but possible for very small niches), data comes back as []. No exception, no 404. Always check len(gaps) before indexing.
Sorting and filtering by domain authority
Because the API already returns results sorted by DA descending, "sort by DA" is the default. What you actually want in practice is to bucket the results into outreach tiers so you can prioritize where your time goes.
A site at DA 90 needs a different pitch than a site at DA 35. Lumping them into one CSV and emailing the same template to all of them is what makes backlink outreach the spam-flagged mess that it is.
import os
from collections import defaultdict
from rankparse import RankParseClient
def bucket_by_tier(gaps):
tiers = defaultdict(list)
for row in gaps:
da = row["from_domain_score"]
if da >= 80:
tier = "tier_1_premium" # DA 80+: editorial, hard but high value
elif da >= 60:
tier = "tier_2_strong" # DA 60-79: industry publications, realistic
elif da >= 40:
tier = "tier_3_relevant" # DA 40-59: niche blogs, easier wins
else:
tier = "tier_4_skip" # DA <40: usually not worth the email
tiers[tier].append(row)
return tiers
with RankParseClient(api_key=os.environ["RANKPARSE_API_KEY"]) as client:
result = client.competitor_gap("railway.app", vs="stripe.com", limit=200)
tiers = bucket_by_tier(result["data"])
for tier_name in ["tier_1_premium", "tier_2_strong", "tier_3_relevant"]:
rows = tiers[tier_name]
print(f"\n{tier_name}: {len(rows)} targets")
for row in rows[:5]:
print(f" DA {row['from_domain_score']:>2} {row['from_domain']}")On the same railway.app vs stripe.com run, the breakdown was:
- Tier 1 (DA 80+): 31 domains. GitHub, MDN, Mozilla, W3C, the BBC, Wikipedia, several
.eduand.govsites. - Tier 2 (DA 60–79): 78 domains. Dev.to, Smashing Magazine, CSS-Tricks, Hacker Noon, several industry blogs.
- Tier 3 (DA 40–59): 64 domains. Smaller dev blogs, niche newsletters, agency sites.
- Tier 4 (DA under 40): 27 domains. Mostly thin SEO blogs and parked-looking domains. Skipped.
That's a real, prioritized prospect list for a single 5-credit API call. Doing the same thing in Ahrefs requires a $129/month plan and CSV juggling.
You can also filter to a hard minimum threshold instead of bucketing:
strong_targets = [row for row in result["data"] if row["from_domain_score"] >= 60]
strong_targets_with_volume = [r for r in strong_targets if r["total_links"] >= 5]That second filter (total_links >= 5) excludes one-off mentions. A site that links to your competitor five or more times has a topical reason to — they're more likely to add another link if you give them a reason. A site that links once might have just embedded a tweet.
A realistic end-to-end example
Here's the full script we used to build a CSV outreach list for a hypothetical developer tools company benchmarking against Stripe. It calls the gap endpoint once, filters and tiers the result, and writes a CSV ready for an outreach tool.
import csv
import os
from rankparse import RankParseClient
YOU = "railway.app"
COMPETITORS = ["stripe.com"] # could also be ["stripe.com", "vercel.com", "render.com"]
def fetch_gaps(client, you, competitor, limit=200):
result = client.competitor_gap(you, vs=competitor, limit=limit)
return [
{
"linking_domain": row["from_domain"],
"linking_da": row["from_domain_score"],
"links_to_them": row["total_links"],
"competitor": competitor,
}
for row in result["data"]
]
def tier_label(da):
if da >= 80: return "T1_premium"
if da >= 60: return "T2_strong"
if da >= 40: return "T3_relevant"
return "T4_skip"
rows = []
with RankParseClient(api_key=os.environ["RANKPARSE_API_KEY"]) as client:
for competitor in COMPETITORS:
print(f"Pulling gaps vs {competitor}...")
gaps = fetch_gaps(client, YOU, competitor)
for row in gaps:
row["tier"] = tier_label(row["linking_da"])
rows.append(row)
# Filter and sort for outreach
rows = [r for r in rows if r["tier"] != "T4_skip" and r["links_to_them"] >= 2]
rows.sort(key=lambda r: (r["tier"], -r["linking_da"]))
with open("outreach_targets.csv", "w", newline="") as f:
writer = csv.DictWriter(f, fieldnames=[
"tier", "linking_domain", "linking_da", "links_to_them", "competitor",
])
writer.writeheader()
writer.writerows(rows)
print(f"\nSaved {len(rows)} targets to outreach_targets.csv")On a real run against stripe.com, this produced 173 prioritized outreach targets in about 4 seconds, costing 5 credits — roughly four cents on the pay-as-you-go tier. The CSV opens straight into Lemlist, Instantly, or any spreadsheet.
If you want to widen the search, add more competitors to the COMPETITORS list. Each call costs 5 credits, and you can dedupe the resulting domains (a site that links to all three competitors but not you is a much warmer lead than one that links to only one).
Doing it through Claude or ChatGPT (no code)
If you don't want to write a script — or you want a non-technical teammate to be able to run gap analysis on their own — point an LLM at the RankParse MCP server and ask in plain English.
MCP (Model Context Protocol) is the standard Anthropic and the rest of the industry are converging on for connecting tools to LLMs. RankParse exposes every endpoint, including competitor-gap, as MCP tools. Claude Desktop, Claude Code, Cursor, and the ChatGPT MCP connectors all support it.
Once the server is connected, the gap analysis becomes a single prompt:
Find domains that link to stripe.com but not to railway.app. Only show me ones with DA 60 or higher, and group them by tier so I know which ones are realistic outreach targets versus aspirational.
Claude or ChatGPT will call the competitor-gap tool, get back the same JSON the Python script above sees, and format the result as a tiered list with a paragraph of context per tier. The credit cost is identical — 5 credits per call — because the LLM is just a friendlier interface over the same endpoint.
Where this is genuinely useful:
- Iterative exploration. "Now do the same vs vercel.com. Which domains appear in both gap lists?" — three follow-up turns, no script edits.
- Combining endpoints. "For the top 5 DA 80+ gaps, pull their anchor text profile for stripe.com so I know what context they link in." Claude chains
competitor-gap→anchor-textautomatically. - Onboarding. Someone who doesn't write Python can do real backlink research the same week they discover the tool.
Where it's not better than a script:
- Repeatable pipelines. If you're running the same analysis weekly and dumping to a CSV, a cron'd Python script is faster and cheaper than asking an LLM to re-derive the same workflow each time.
- Auditable outputs. Scripts produce identical CSVs every run. LLM responses vary in formatting. Pick the right tool for the job.
For most teams the answer is both — a Python pipeline for the recurring outreach list, and Claude or ChatGPT for the ad-hoc "wait, what about this competitor?" questions.
Honest limitations
A few things to know before you build a critical pipeline on this.
The data is quarterly. RankParse processes Common Crawl snapshots, which run roughly every three months. A link that appeared on Hacker News last Tuesday won't be in the dataset until the next crawl release. For an outreach pool, this is fine — established sites are stable across quarters. For "did my campaign hit yet?" monitoring, use a real-time tool.
Coverage skews toward higher-traffic pages. Common Crawl is a public dataset, not an exhaustive index. Thin or newly-published pages may not be present in a given snapshot. The Common Crawl FAQ is honest about this. For competitive analysis at the domain level, the coverage is more than sufficient. For finding every link to every URL, no public dataset is.
The gap pool is sorted by DA, which is not the only signal that matters. A DA 35 niche blog with a hyper-relevant audience can outperform a DA 85 generalist who buries your link in a roundup. Use total_links and your own topical filter on the linking domain — don't outsource that judgment to a score.
For reference, Google's own helpful content guidance is clear that relevance and intent matter more than authority signals alone. A backlink gap tool surfaces opportunities; it doesn't decide which ones are worth pursuing.
What to build next
- Combine
competitor-gapwith/v1/anchor-texton each prospect to see what anchors they typically use, so you can pitch the right context. - Use
/v1/similar-domainsto find competitors you didn't think of, then feed those into the gap loop. - Pull
/v1/top-pagesfor the competitor to understand what content earns them links — then build your own version before pitching. - Connect the MCP server to query gaps directly from Claude or Cursor: "find me 20 DA 60+ domains linking to stripe.com but not railway.app" runs in one shot, no script needed.
Frequently asked questions
What does the RankParse competitor-gap endpoint return?
It returns referring domains that link to a competitor but not to you, with each domain's authority score and total link count. The response is sorted by DA descending and limited to 200 per call. Costs 5 credits per call regardless of result size.
How is this different from Ahrefs Link Intersect?
Ahrefs Link Intersect does the same conceptual analysis with a subscription and a dashboard. RankParse exposes it as a pay-per-call API with quarterly Common Crawl data, no subscription, and credits that never expire. Ahrefs is the right tool if you need real-time data and a UI; RankParse is the right tool if you want to script gap analysis into your own workflow.
How fresh is the data?
The data comes from Common Crawl, which publishes a new snapshot roughly every three months. The active release is the most recent processed snapshot. For batch outreach research this freshness is fine — for live link monitoring it isn't.
Can I run this against multiple competitors at once?
Each call covers one you and one vs. Loop over competitors in Python and dedupe the linking domains. Five credits per competitor, so even comparing against five competitors costs about 20 cents.
Does the API sort by domain authority automatically?
Yes. The endpoint scores a pool of up to 2000 candidate gap domains and returns the top limit rows sorted by from_domain_score descending. You do not need to re-sort the response.
Start with 100 free credits
No subscription. No card. $0.009 per call after that, and credits never expire.
Get your free API key