How to use the command xmllint (with examples)

How to use the command xmllint (with examples)

xmllint is a command-line tool that can be used to parse and navigate XML files. It supports XPath, a syntax for querying and traversing XML trees. This article provides examples of different use cases for the xmllint command.

Use case 1: Return all nodes (tags) named “foo”

Code:

xmllint --xpath "//foo" source_file.xml

Motivation: This use case is helpful when we want to extract all the nodes with a specific tag name from an XML file.

Explanation:

  • --xpath specifies that we want to use the XPath expression to query the XML.
  • "//foo" is the XPath expression that selects all the nodes named “foo”.
  • source_file.xml is the path to the XML file.

Example output:

<foo>...</foo>
<foo>...</foo>
...

Use case 2: Return the contents of the first node named “foo” as a string

Code:

xmllint --xpath "string(//foo)" source_file.xml

Motivation: This use case is useful when we need to extract the inner text of a specific node in an XML file.

Explanation:

  • --xpath specifies that we want to use the XPath expression to query the XML.
  • "string(//foo)" is the XPath expression that selects the first node named “foo” and returns its contents as a string.
  • source_file.xml is the path to the XML file.

Example output:

foo content

Use case 3: Return the href attribute of the second anchor element in an HTML file

Code:

xmllint --html --xpath "string(//a[2]/@href)" webpage.xhtml

Motivation: This use case is valuable when we want to extract a specific attribute value from an HTML file.

Explanation:

  • --html specifies that the input file is an HTML file.
  • --xpath specifies that we want to use the XPath expression to query the HTML.
  • "string(//a[2]/@href)" is the XPath expression that selects the second anchor element (//a[2]) and returns the value of its href attribute (/@href).
  • webpage.xhtml is the path to the HTML file.

Example output:

https://www.example.com/link2

Use case 4: Return human-readable (indented) XML from file

Code:

xmllint --format source_file.xml

Motivation: This use case is useful when we want to format an XML file for better readability.

Explanation:

  • --format instructs xmllint to format the XML file in human-readable form.
  • source_file.xml is the path to the XML file.

Example output:

<root>
  <element>
    ...
  </element>
  ...
</root>

Use case 5: Check that an XML file meets the requirements of its DOCTYPE declaration

Code:

xmllint --valid source_file.xml

Motivation: This use case is essential to verify if an XML file is valid according to its DOCTYPE declaration.

Explanation:

  • --valid tells xmllint to validate the XML file against its DOCTYPE declaration.
  • source_file.xml is the path to the XML file.

Example output:

source_file.xml validates

Use case 6: Validate XML against DTD schema hosted online

Code:

xmllint --dtdvalid URL source_file.xml

Motivation: This use case is crucial when we want to validate XML against a DTD schema hosted online.

Explanation:

  • --dtdvalid URL instructs xmllint to validate the XML file against the DTD schema specified by the provided URL.
  • URL is the URL of the DTD schema.
  • source_file.xml is the path to the XML file.

Example output:

source_file.xml validates against URL

Conclusion:

The xmllint command is a versatile tool for parsing and manipulating XML files. With its support for XPath expressions, it can perform various operations such as querying nodes, extracting values, and validating XML against schemas. By understanding the different use cases illustrated in this article, users can effectively utilize xmllint for their XML processing needs.

Related Posts

How to use the command 'git clear-soft' (with examples)

How to use the command 'git clear-soft' (with examples)

The ‘git clear-soft’ command is a part of the ‘git-extras’ extension and is used to clear a Git working directory as if it was freshly cloned with the current branch, excluding files in the ‘.

Read More
How to use the command 'ip link' (with examples)

How to use the command 'ip link' (with examples)

The ip link command is used to manage network interfaces in Linux.

Read More
Managing Git Packaging Repositories with pkgctl repo (with examples)

Managing Git Packaging Repositories with pkgctl repo (with examples)

Introduction Git is a popular version control system that allows developers to track changes in their code and collaborate with others.

Read More