How to Clean a URL Export from Ahrefs or Search Console
Every SEO tool exports beautifully and cleans up nothing. Pull a list of URLs from Ahrefs, Semrush, or Search Console and you get exactly what the tool stored: duplicates, full URLs when you wanted domains, tracking parameters, mixed protocols, and the occasional stray row. Before you can analyze that list — or open it, or compare it to another — it needs cleaning.
This guide is a fast, repeatable routine for turning a raw export into a clean, analysis-ready list, using free browser-based tools. The whole thing takes under a minute once you know the order.
The Four Cleaning Passes
A messy export usually needs some combination of four passes. Run only the ones your list needs:
- Extract — get the URLs out of a report or HTML blob into a clean column.
- Strip tracking — remove UTM and click-ID parameters.
- Trim — reduce to root domains when you care about sites, not pages.
- Dedupe — collapse the inevitable repeats into a unique list.
Order matters a little: extract first, then strip/trim, then dedupe last (so it catches the duplicates that trimming and stripping create).
Pass 1: Extract the URLs
If your export is a clean CSV, just copy the URL column — it pastes as one URL per line, ready to go.
If the URLs are embedded in a report, a PDF copy-paste, or a block of HTML, they won’t come out as a tidy column. Paste the whole blob into a URL extractor and it isolates every valid link, dropping the surrounding text. Now you have a real list.
Pass 2: Strip Tracking Parameters
Exports of landing pages, campaign URLs, or referral data are often littered with ?utm_source=…, fbclid, and gclid. Those tags fragment your data — the same page shows up as several “different” URLs — and add noise to any analysis.
Remove the UTM and tracking parameters in one pass. The page’s real address stays intact; only the tracking junk is removed. This single step often reveals that your “200 unique landing pages” are really 60.
Pass 3: Trim to Root Domain (When You Want Sites)
For backlink analysis, disavow prep, or any “how many sites are here?” question, you want domains, not pages. Trim the list to root domain to strip the protocol, path, and query from every line at once. https://blog.example.com/post?ref=1 becomes example.com.
Skip this pass when you genuinely need page-level URLs (a content audit, a redirect check). It’s only for when the site is the unit of analysis.
Pass 4: Deduplicate
This is the last pass, because the earlier ones create new duplicates: stripping parameters and trimming to domain both collapse once-distinct URLs into identical lines. Remove the duplicates to get your final, unique list — and an accurate count you can actually trust.
Putting It Together: Two Common Recipes
Cleaning a backlink export for domain-level analysis:
Extract (if needed) → Trim to root domain → Dedupe. Result: a clean list of unique referring domains.
Cleaning a landing-page / campaign export for reporting:
Extract (if needed) → Strip UTM parameters → Dedupe. Result: real page URLs with the tracking noise gone and repeats collapsed.
In both cases, once the list is clean you can open it in bulk to spot-check the pages, or feed it straight into your analysis.
Why Bother Cleaning First?
A dirty list quietly corrupts everything downstream:
- Counts are wrong. Duplicates and parameter variants inflate every total.
- Analysis is skewed. The same domain weighted five times distorts your view of a backlink profile.
- Time is wasted. Opening or checking duplicate URLs means doing the same review twice.
Sixty seconds of cleaning prevents an hour of chasing numbers that were never real.
Frequently Asked Questions
Why does my export have the same URL multiple times?
Tools store a row per data point — a backlink export lists every linking page, a crawl lists every internal reference. The same destination naturally repeats. Deduping collapses them.
Should I dedupe before or after trimming?
After. Trimming to root domain turns many distinct page URLs into identical domain lines, so dedupe last to catch them all.
Are these tools safe for client exports?
Yes. Extract, trim, dedupe, and the UTM remover all run locally in your browser — nothing is uploaded or stored — so client URLs stay private.
Do I need to install anything?
No. Every step here is a free, web-based tool. Paste, click, copy.
Clean your next export in under a minute. Start with the extract, trim, and dedupe tools — all free, no login, nothing stored.
