A leaf-element XML extractor
The parser matches elements whose text content does not itself contain more elements - i.e. leaves. Given `<items><item>Apple</item><item>Banana</item></items>`, you get `Apple` and `Banana` out; the enclosing `<items>` is skipped because its content contains other tags, not just text. This is the common case for list-shaped XML (RSS `<item>`, sitemap `<url>`, configuration key-value pairs) and keeps the output predictable.
The Only tag option filters the results by exact tag name, case-insensitively. Type `item` to match `<item>`, `<Item>`, and `<ITEM>`; leave blank to extract every leaf element. The Extract mode switch flips what gets returned: Text content (default) emits the inner text of each matched element; Element names emits the tag name itself (useful for auditing what elements exist in a document).
Attributes are ignored - the extractor reads the opening tag but does not parse or return attribute values. Namespace prefixes are part of the tag name, not stripped: if your XML has `<ns:item>` elements, type `ns:item` into the Only-tag filter to match them (typing just `item` will NOT match them). Trim and Dedupe run after extraction as standard post-processing.
How to use convert xml to a list
- 1Paste your XML into the input panel
- 2Set Only tag to filter to one element name (e.g. `item`), or leave blank for every leaf
- 3Pick Extract: Text content (default) for element values, or Element names to audit the tag shape of the document
- 4Toggle Trim to strip leading/trailing whitespace from each value
- 5Toggle Dedupe to drop repeated values
- 6Copy the extracted list or feed it into the next tool - pair with convert-a-list-to-xml to round-trip
Keyboard shortcuts
Drive ListShift without touching the mouse.
What this extractor actually does
Leaf-only, one regex pass, four post-processing options.
Leaf elements only (no parent tags)
The matcher requires text-only content inside the tag - `<tag>text</tag>` matches, `<parent><child>text</child></parent>` only yields the inner `<child>`, not the parent. That keeps the output predictable on list-shaped XML and avoids emitting concatenated child text as a single line.
Case-insensitive exact-tag-name filter
Only tag compares the full captured tag name against your filter case-insensitively. `Item`, `item`, and `ITEM` all match a filter of `item`. Namespace prefixes are part of the name - `<ns:item>` requires typing `ns:item`; a bare `item` filter will not match it. Leave the filter blank to extract every leaf, namespace prefixes and all.
Two extract modes
Text content (default) returns what is between the opening and closing tags. Element names returns the tag name itself - useful when you want to answer "what elements appear in this document" rather than "what values do they contain."
Attributes are ignored
The regex reads up to the first `>` to grab the opening tag plus its attributes, but attribute values are not parsed or returned. If you need attribute values, grep your XML separately or use a dedicated XPath tool.
Standard trim + dedupe post-processing
After extraction, Trim strips leading/trailing whitespace per item, and Dedupe removes duplicates using a case-insensitive compare. Both are toggleable, both run after the extract so they work regardless of which extract mode you picked.
Worked example
Three `<item>` leaves extracted; the `<items>` parent is skipped because its content contains child tags.
<items> <item>Apple</item> <item category="fruit">Banana</item> <item>Cherry</item> </items>
Apple Banana Cherry
Settings reference
How each option shapes the extracted list using the sample above.
| Setting | What it does | Effect on the sample |
|---|---|---|
| Only tag: (blank) | Extracts every leaf element regardless of name | Three `<item>` values emitted |
| Only tag: item | Restricts to elements named `item` (case-insensitive) | Same three `<item>` values; any other leaves would be dropped |
| Extract: Text content (default) | Emits the text between opening and closing tags | `Apple`, `Banana`, `Cherry` |
| Extract: Element names | Emits the tag name of each matched leaf instead of its content | `item`, `item`, `item` |
| Trim: on | Strips leading/trailing whitespace per item | No visible change on the sample (already trimmed) |
| Dedupe: on | Drops duplicate values after extraction | If Extract = Element names, output collapses from 3 `item`s to 1 |