A regex filter pre-tuned for URLs
The default pattern `https?://\S+|www\.\S+` is greedy to whitespace - a URL followed by a space or end-of-line is captured cleanly. Embedded URLs within longer text (log lines with timestamps, sentences with "visit https://..." phrasing) also match, because the filter keeps any line where the pattern is present.
For stricter matching, anchor the pattern: `^(https?://\S+|www\.\S+)$` rejects lines where the URL is not the sole content. For broader matching, add schemes: `(?:https?|ftp|file|mailto):\S+`. All customization happens in the Match pattern field - no separate toggles needed.
Invert flips the filter from keep-matching to drop-matching - useful for stripping URLs from a text dump before running word-frequency analysis on the remaining prose.
How to use find urls in a list
- 1Paste your mixed text / log / scrape output into the input panel
- 2Default regex matches `http://`, `https://`, `www.` URLs
- 3Customize the Match pattern for stricter or broader matching
- 4Toggle Invert to drop URL-containing lines instead of keeping them
- 5For just the URL substring, chain Replace with a capture group
Keyboard shortcuts
Drive ListShift without touching the mouse.
What this tool actually does
Line-level URL filter. Three knobs: pattern, invert, case sensitivity.
Keeps lines containing `http(s)://` or `www.`
The default regex catches embedded URLs too - a line like `Visit https://x.com today` is kept because the pattern matches anywhere in the line. To force whole-line-is-a-URL, anchor with `^` and `$`.
Works on scraped data and logs
Paste raw HTML dumps, Nginx access logs, Slack message exports - anywhere URLs are mixed with other text. The line-filter approach leaves surrounding context intact, unlike bare extraction tools.
Invert for URL stripping
Toggle Invert to REMOVE URL lines, keeping only the surrounding prose. Useful for word-frequency analysis (Find most frequent) where URLs would drown out meaningful terms.
Custom patterns for custom schemes
Need `ftp://` or `mailto:`? Change Match pattern to `(?:https?|ftp|mailto):\S+`. Pattern is any JavaScript regex - no limitations beyond what the engine supports.
For just the URL text, chain Replace
This tool keeps whole lines. To extract only the URL substring, run this filter first, then Replace with regex `.*?(https?://\S+|www\.\S+).*` and replacement `$1`.
Common use cases
Concrete scenarios where a URL filter pays off vs. ad-hoc grep.
Scraped data dumps
Dumped HTML or JSON from a scraper into a flat-line list? Run this to pull out every line that contains a link - then chain Dedupe to collapse repeat URLs across pages.
Server and app logs
Nginx access logs, application traces, and crash dumps often embed URLs in request lines and referer fields. Paste a log segment and this filter isolates URL-bearing lines for audit or outbound-link analysis.
Email and message dumps
Exported inboxes, Slack archive dumps, and support ticket threads mix URLs with signatures and free-text. Filter keeps only the URL-containing lines; Invert drops them if you want URL-free prose for sentiment or topic analysis.
Markdown and documentation sweeps
Pipe a documentation file through and the filter surfaces every line containing a link reference. Handy for link-rot audits - feed the output into Dedupe, then hit each URL.
Worked example
Keeps lines containing `http(s)://` or `www.`. The `[email protected]` line is dropped (has `@example.com` but no URL scheme).
Homepage: https://example.com Contact: [email protected] Mirror: http://another-example.org See also: www.partner.com No web address on this line
Homepage: https://example.com Mirror: http://another-example.org See also: www.partner.com
Settings reference
How each option shapes the output using the sample above.
| Setting | What it does | Effect on the sample |
|---|---|---|
| Match pattern: `https?://\S+|www\.\S+` (default), Mode: Regex | Keeps lines containing `http://`, `https://`, or `www.` | 3 of 5 lines kept |
| Pattern: `https://\S+` (stricter) | Only HTTPS URLs | Would drop the `http://` and `www.` lines |
| Invert: on | Drops URL lines | Keeps `Contact: [email protected]` and `No web address on this line` |
| Pattern: `\.(com|org|net)$` | Lines ending with a common TLD | Different match criteria - catches `www.partner.com` but not lines with trailing text |