Convert HTML to List - Free Online HTML Tag Extractor

A real-DOM HTML extractor

The parser feeds your input into the browser's native `DOMParser`, which handles nested tags, entity decoding (`&` → `&`), and forgiving HTML the way a browser would. It then runs `querySelectorAll` on your chosen tag and pulls the `.textContent` of each match - so `<li>Apples & pears</li>` gives you `Apples & pears`, and `<li>Oranges (Valencia)</li>` gives you `Oranges (Valencia)`.

Tag is free-text. Type `li` (default), `p`, `h2`, `h3`, `td`, `a`, or any custom element. CSS-selector syntax beyond a plain tag name is not supported - for class or attribute selectors, pre-process the HTML externally or use a dedicated scraper.

If `DOMParser` fails (extremely malformed input), the tool falls back to a regex that extracts `<tag>content</tag>` pairs and strips inner HTML. The regex fallback is less accurate with deeply nested content - usually the DOMParser path wins.

How to use convert html to a list

1Paste your HTML into the input panel
2Set Tag to extract to the element you want (default `li`)
3The text content of every matching element becomes one output line
4Toggle Trim to control whitespace stripping, Dedupe to drop duplicates
5Copy or download the list; pair with convert-a-list-to-html to round-trip

Keyboard shortcuts

Drive ListShift without touching the mouse.

Shortcut	Action
`Ctrl` `Z`	Undo last input change
`Ctrl` `Shift` `Z`	Redo
`Ctrl` `Shift` `Enter`	Toggle fullscreen focus on the editor
`Esc`	Exit fullscreen
`Ctrl` `K`	Open the command palette to jump to any tool
`Ctrl` `S`	Save current pipeline draft Plus
`Ctrl` `P`	Run a saved pipeline Plus

What this extractor actually does

DOMParser-based, tag-targeted, text-only output.

Real-DOM parsing, not regex

Input goes through `new DOMParser().parseFromString(input, "text/html")`, which handles nested tags, self-closing elements, HTML entities, and forgiving syntax the way a browser would. Regex-based extractors choke on these; this one does not.

Automatic entity decoding

HTML entities inside elements decode automatically. `&` becomes `&`, `<` becomes `<`, `"` becomes `"`. The output is clean plain text regardless of how the source was encoded.

Any tag, not just <li>

Default is `li`; change Tag to extract to `p`, `h2`, `td`, `a`, or any element name. The tool uses `querySelectorAll(tag)`, so any valid plain tag name works. CSS selectors with classes or attributes are not supported.

Inner markup is flattened to text

`<li>Oranges (Valencia)</li>` becomes `Oranges (Valencia)` - the `` tags are discarded, their text kept. That is `.textContent` behaviour. If you need the HTML intact, this is not the right tool.

Regex fallback if DOMParser fails

On the rare malformed input that makes `DOMParser` throw, the tool falls back to a regex that captures `<tag>…</tag>` pairs and strips inner HTML. Less accurate with nesting - if results look wrong, simplify your input.

Worked example

Default `li` extraction with entity decoding and nested `` flattening.

Input

<ul>
  <li>Apples &amp; pears</li>
  <li>Oranges <em>(Valencia)</em></li>
  <li>Cherries</li>
</ul>

Output

Apples & pears
Oranges (Valencia)
Cherries

Settings reference

How each option shapes the extracted list using the sample above.

Setting	What it does	Effect on the sample
Tag to extract: li (default)	Extracts text content of every `<li>` in the document	Three items emitted
Tag to extract: p	Switches to paragraphs	No `<p>` in sample - output empty
Tag to extract: em	Extracts only inline `<em>` elements	One item: `(Valencia)`
Entity decoding (automatic)	HTML entities decoded via DOMParser	`&` becomes `&` in the first item
Trim: on (default)	Strips leading/trailing whitespace per item	No visible change on the sample
Dedupe: on	Drops duplicate values case-insensitively	No duplicates in sample - no change

FAQ

Can I use CSS selectors like `.item` or `li[data-x]`?

No, only a plain tag name. The Tag field feeds into `querySelectorAll(tag)`, which accepts full CSS selectors, but we deliberately restrict input to tag names to keep the behaviour predictable and documented. For attribute/class selectors, pre-filter the HTML externally.

What happens to inline tags inside a matched element?

They are flattened to text. `<li>Oranges (Valencia)</li>` produces `Oranges (Valencia)` - the `` tags disappear, their contents kept. That is `.textContent` behaviour from the DOM.

Does it decode HTML entities?

Yes, automatically. `&`, `<`, `>`, `"`, numeric entities (`'`, `'`) - all decoded by the browser's DOMParser before the text is extracted.

What if my HTML is malformed?

DOMParser is forgiving and handles most malformed HTML by closing tags for you. On the extremely rare case it throws, the tool falls back to a regex-based extraction which is less accurate with nesting. If results look wrong, clean up the HTML first.

Can I build HTML from a list?

Yes, use convert-a-list-to-html - wrap each line in `<li>`, `<ol>`, or a definition list, with automatic entity escaping.

Also known as

HTML list converter HTML to list tool HTML list extraction HTML markup to list HTML parser to list HTML list processor HTML to structured list

Convert HTML to a list

A real-DOM HTML extractor

How to use convert html to a list

Keyboard shortcuts

What this extractor actually does

Real-DOM parsing, not regex

Automatic entity decoding

Any tag, not just <li>

Inner markup is flattened to text

Regex fallback if DOMParser fails

Worked example

Settings reference

FAQ

Also known as

Explore another workspace

Sort

Count

Filter

Clean

Compare