Find URLs in a list

Extract URLs from messy logs, scraped HTML, CSV cells, or mixed text - paste the list and keep only lines containing a URL. Default regex `https?://\S+|www\.\S+` catches `http://`, `https://`, and `www.`-prefixed hosts. For just the URL substring, chain Replace with a capture group.

Input
Ready
Output
Live

A regex filter pre-tuned for URLs

The default pattern `https?://\S+|www\.\S+` is greedy to whitespace - a URL followed by a space or end-of-line is captured cleanly. Embedded URLs within longer text (log lines with timestamps, sentences with "visit https://..." phrasing) also match, because the filter keeps any line where the pattern is present.

For stricter matching, anchor the pattern: `^(https?://\S+|www\.\S+)$` rejects lines where the URL is not the sole content. For broader matching, add schemes: `(?:https?|ftp|file|mailto):\S+`. All customization happens in the Match pattern field - no separate toggles needed.

Invert flips the filter from keep-matching to drop-matching - useful for stripping URLs from a text dump before running word-frequency analysis on the remaining prose.

How to use find urls in a list

  1. 1Paste your mixed text / log / scrape output into the input panel
  2. 2Default regex matches `http://`, `https://`, `www.` URLs
  3. 3Customize the Match pattern for stricter or broader matching
  4. 4Toggle Invert to drop URL-containing lines instead of keeping them
  5. 5For just the URL substring, chain Replace with a capture group

Keyboard shortcuts

Drive ListShift without touching the mouse.

Shortcut Action
Ctrl ZUndo last input change
Ctrl Shift ZRedo
Ctrl Shift EnterToggle fullscreen focus on the editor
EscExit fullscreen
Ctrl KOpen the command palette to jump to any tool
Ctrl SSave current pipeline draft Plus
Ctrl PRun a saved pipeline Plus

What this tool actually does

Line-level URL filter. Three knobs: pattern, invert, case sensitivity.

Keeps lines containing `http(s)://` or `www.`

The default regex catches embedded URLs too - a line like `Visit https://x.com today` is kept because the pattern matches anywhere in the line. To force whole-line-is-a-URL, anchor with `^` and `$`.

Works on scraped data and logs

Paste raw HTML dumps, Nginx access logs, Slack message exports - anywhere URLs are mixed with other text. The line-filter approach leaves surrounding context intact, unlike bare extraction tools.

Invert for URL stripping

Toggle Invert to REMOVE URL lines, keeping only the surrounding prose. Useful for word-frequency analysis (Find most frequent) where URLs would drown out meaningful terms.

Custom patterns for custom schemes

Need `ftp://` or `mailto:`? Change Match pattern to `(?:https?|ftp|mailto):\S+`. Pattern is any JavaScript regex - no limitations beyond what the engine supports.

For just the URL text, chain Replace

This tool keeps whole lines. To extract only the URL substring, run this filter first, then Replace with regex `.*?(https?://\S+|www\.\S+).*` and replacement `$1`.

Common use cases

Concrete scenarios where a URL filter pays off vs. ad-hoc grep.

Scraped data dumps

Dumped HTML or JSON from a scraper into a flat-line list? Run this to pull out every line that contains a link - then chain Dedupe to collapse repeat URLs across pages.

Server and app logs

Nginx access logs, application traces, and crash dumps often embed URLs in request lines and referer fields. Paste a log segment and this filter isolates URL-bearing lines for audit or outbound-link analysis.

Email and message dumps

Exported inboxes, Slack archive dumps, and support ticket threads mix URLs with signatures and free-text. Filter keeps only the URL-containing lines; Invert drops them if you want URL-free prose for sentiment or topic analysis.

Markdown and documentation sweeps

Pipe a documentation file through and the filter surfaces every line containing a link reference. Handy for link-rot audits - feed the output into Dedupe, then hit each URL.

Worked example

Keeps lines containing `http(s)://` or `www.`. The `[email protected]` line is dropped (has `@example.com` but no URL scheme).

Input
Homepage: https://example.com
Contact: [email protected]
Mirror: http://another-example.org
See also: www.partner.com
No web address on this line
Output
Homepage: https://example.com
Mirror: http://another-example.org
See also: www.partner.com

Settings reference

How each option shapes the output using the sample above.

Setting What it does Effect on the sample
Match pattern: `https?://\S+|www\.\S+` (default), Mode: Regex Keeps lines containing `http://`, `https://`, or `www.` 3 of 5 lines kept
Pattern: `https://\S+` (stricter) Only HTTPS URLs Would drop the `http://` and `www.` lines
Invert: on Drops URL lines Keeps `Contact: [email protected]` and `No web address on this line`
Pattern: `\.(com|org|net)$` Lines ending with a common TLD Different match criteria - catches `www.partner.com` but not lines with trailing text

FAQ

What URL shapes does the default pattern match?
Three: `http://...`, `https://...`, and `www....`. The `\S+` greedy match stops at whitespace, so a URL ending at a space or line end is captured. Embedded URLs within longer text are also caught.
Does it extract just the URL or keep the whole line?
Keeps the whole line. To extract only the URL portion, use Replace with a regex capture group, or pre-split words with Split.
Does it match FTP, file://, or other protocols?
No - only `http(s)://` and `www.` by default. To match other schemes, change the pattern to `(?:https?|ftp|file)://\S+`.
How do I keep only lines where the URL is the entire content?
Change the pattern to `^(https?://\S+|www\.\S+)$` with `^` and `$` anchors. A line like `Visit https://x.com today` would no longer match.
Is the match case-sensitive?
Default is off, so `HTTPS://EXAMPLE.COM` also matches. Toggle Case sensitive on for stricter matching - useful for data where URL shape is canonical.