Convert HTML to a list

Paste HTML and get the text content of every matching tag as a list, one item per line. Defaults to `<li>` but you can target any tag - `<p>`, `<h2>`, `<td>`, or a custom element. Uses the browser's `DOMParser` so nested markup and HTML entities decode cleanly.

Input
Ready
Output
Live

A real-DOM HTML extractor

The parser feeds your input into the browser's native `DOMParser`, which handles nested tags, entity decoding (`&amp;` → `&`), and forgiving HTML the way a browser would. It then runs `querySelectorAll` on your chosen tag and pulls the `.textContent` of each match - so `<li>Apples &amp; pears</li>` gives you `Apples & pears`, and `<li>Oranges <em>(Valencia)</em></li>` gives you `Oranges (Valencia)`.

Tag is free-text. Type `li` (default), `p`, `h2`, `h3`, `td`, `a`, or any custom element. CSS-selector syntax beyond a plain tag name is not supported - for class or attribute selectors, pre-process the HTML externally or use a dedicated scraper.

If `DOMParser` fails (extremely malformed input), the tool falls back to a regex that extracts `<tag>content</tag>` pairs and strips inner HTML. The regex fallback is less accurate with deeply nested content - usually the DOMParser path wins.

How to use convert html to a list

  1. 1Paste your HTML into the input panel
  2. 2Set Tag to extract to the element you want (default `li`)
  3. 3The text content of every matching element becomes one output line
  4. 4Toggle Trim to control whitespace stripping, Dedupe to drop duplicates
  5. 5Copy or download the list; pair with convert-a-list-to-html to round-trip

Keyboard shortcuts

Drive ListShift without touching the mouse.

Shortcut Action
Ctrl ZUndo last input change
Ctrl Shift ZRedo
Ctrl Shift EnterToggle fullscreen focus on the editor
EscExit fullscreen
Ctrl KOpen the command palette to jump to any tool
Ctrl SSave current pipeline draft Plus
Ctrl PRun a saved pipeline Plus

What this extractor actually does

DOMParser-based, tag-targeted, text-only output.

Real-DOM parsing, not regex

Input goes through `new DOMParser().parseFromString(input, "text/html")`, which handles nested tags, self-closing elements, HTML entities, and forgiving syntax the way a browser would. Regex-based extractors choke on these; this one does not.

Automatic entity decoding

HTML entities inside elements decode automatically. `&amp;` becomes `&`, `&lt;` becomes `<`, `&quot;` becomes `"`. The output is clean plain text regardless of how the source was encoded.

Any tag, not just <li>

Default is `li`; change Tag to extract to `p`, `h2`, `td`, `a`, or any element name. The tool uses `querySelectorAll(tag)`, so any valid plain tag name works. CSS selectors with classes or attributes are not supported.

Inner markup is flattened to text

`<li>Oranges <em>(Valencia)</em></li>` becomes `Oranges (Valencia)` - the `<em>` tags are discarded, their text kept. That is `.textContent` behaviour. If you need the HTML intact, this is not the right tool.

Regex fallback if DOMParser fails

On the rare malformed input that makes `DOMParser` throw, the tool falls back to a regex that captures `<tag>…</tag>` pairs and strips inner HTML. Less accurate with nesting - if results look wrong, simplify your input.

Worked example

Default `li` extraction with entity decoding and nested `<em>` flattening.

Input
<ul>
  <li>Apples &amp; pears</li>
  <li>Oranges <em>(Valencia)</em></li>
  <li>Cherries</li>
</ul>
Output
Apples & pears
Oranges (Valencia)
Cherries

Settings reference

How each option shapes the extracted list using the sample above.

Setting What it does Effect on the sample
Tag to extract: li (default) Extracts text content of every `<li>` in the document Three items emitted
Tag to extract: p Switches to paragraphs No `<p>` in sample - output empty
Tag to extract: em Extracts only inline `<em>` elements One item: `(Valencia)`
Entity decoding (automatic) HTML entities decoded via DOMParser `&amp;` becomes `&` in the first item
Trim: on (default) Strips leading/trailing whitespace per item No visible change on the sample
Dedupe: on Drops duplicate values case-insensitively No duplicates in sample - no change

FAQ

Can I use CSS selectors like `.item` or `li[data-x]`?
No, only a plain tag name. The Tag field feeds into `querySelectorAll(tag)`, which accepts full CSS selectors, but we deliberately restrict input to tag names to keep the behaviour predictable and documented. For attribute/class selectors, pre-filter the HTML externally.
What happens to inline tags inside a matched element?
They are flattened to text. `<li>Oranges <em>(Valencia)</em></li>` produces `Oranges (Valencia)` - the `<em>` tags disappear, their contents kept. That is `.textContent` behaviour from the DOM.
Does it decode HTML entities?
Yes, automatically. `&amp;`, `&lt;`, `&gt;`, `&quot;`, numeric entities (`&#39;`, `&#x27;`) - all decoded by the browser's DOMParser before the text is extracted.
What if my HTML is malformed?
DOMParser is forgiving and handles most malformed HTML by closing tags for you. On the extremely rare case it throws, the tool falls back to a regex-based extraction which is less accurate with nesting. If results look wrong, clean up the HTML first.
Can I build HTML from a list?
Yes, use convert-a-list-to-html - wrap each line in `<li>`, `<ol>`, or a definition list, with automatic entity escaping.