A real-DOM HTML extractor
The parser feeds your input into the browser's native `DOMParser`, which handles nested tags, entity decoding (`&` → `&`), and forgiving HTML the way a browser would. It then runs `querySelectorAll` on your chosen tag and pulls the `.textContent` of each match - so `<li>Apples & pears</li>` gives you `Apples & pears`, and `<li>Oranges <em>(Valencia)</em></li>` gives you `Oranges (Valencia)`.
Tag is free-text. Type `li` (default), `p`, `h2`, `h3`, `td`, `a`, or any custom element. CSS-selector syntax beyond a plain tag name is not supported - for class or attribute selectors, pre-process the HTML externally or use a dedicated scraper.
If `DOMParser` fails (extremely malformed input), the tool falls back to a regex that extracts `<tag>content</tag>` pairs and strips inner HTML. The regex fallback is less accurate with deeply nested content - usually the DOMParser path wins.
How to use convert html to a list
- 1Paste your HTML into the input panel
- 2Set Tag to extract to the element you want (default `li`)
- 3The text content of every matching element becomes one output line
- 4Toggle Trim to control whitespace stripping, Dedupe to drop duplicates
- 5Copy or download the list; pair with convert-a-list-to-html to round-trip
Keyboard shortcuts
Drive ListShift without touching the mouse.
What this extractor actually does
DOMParser-based, tag-targeted, text-only output.
Real-DOM parsing, not regex
Input goes through `new DOMParser().parseFromString(input, "text/html")`, which handles nested tags, self-closing elements, HTML entities, and forgiving syntax the way a browser would. Regex-based extractors choke on these; this one does not.
Automatic entity decoding
HTML entities inside elements decode automatically. `&` becomes `&`, `<` becomes `<`, `"` becomes `"`. The output is clean plain text regardless of how the source was encoded.
Any tag, not just <li>
Default is `li`; change Tag to extract to `p`, `h2`, `td`, `a`, or any element name. The tool uses `querySelectorAll(tag)`, so any valid plain tag name works. CSS selectors with classes or attributes are not supported.
Inner markup is flattened to text
`<li>Oranges <em>(Valencia)</em></li>` becomes `Oranges (Valencia)` - the `<em>` tags are discarded, their text kept. That is `.textContent` behaviour. If you need the HTML intact, this is not the right tool.
Regex fallback if DOMParser fails
On the rare malformed input that makes `DOMParser` throw, the tool falls back to a regex that captures `<tag>…</tag>` pairs and strips inner HTML. Less accurate with nesting - if results look wrong, simplify your input.
Worked example
Default `li` extraction with entity decoding and nested `<em>` flattening.
<ul> <li>Apples & pears</li> <li>Oranges <em>(Valencia)</em></li> <li>Cherries</li> </ul>
Apples & pears Oranges (Valencia) Cherries
Settings reference
How each option shapes the extracted list using the sample above.
| Setting | What it does | Effect on the sample |
|---|---|---|
| Tag to extract: li (default) | Extracts text content of every `<li>` in the document | Three items emitted |
| Tag to extract: p | Switches to paragraphs | No `<p>` in sample - output empty |
| Tag to extract: em | Extracts only inline `<em>` elements | One item: `(Valencia)` |
| Entity decoding (automatic) | HTML entities decoded via DOMParser | `&` becomes `&` in the first item |
| Trim: on (default) | Strips leading/trailing whitespace per item | No visible change on the sample |
| Dedupe: on | Drops duplicate values case-insensitively | No duplicates in sample - no change |