Jesper Tverskov, November 4, 2005

Transform XHTML to XHTML with XSLT

Considering how useful it could be to transform the XHTML based web to another format or to use XHTML as an XML data store, it is surprisingly tricky to transform XHTML. Most XSLT developers need to be told the secrets of XHTML transformation in order to do it.

In this article I will show you the two secrets of transforming XHTML. In the example we transform XHTML to XHTML using the identity template and two additional templates to add new elements to head and body section.

1. Default namespace is the problem

An XHTML document, XHTML 1.1 in our example, well-formed and valid, has at the top of the document a DOCTYPE declaration, and the outermost element, <html>, has a default namespace declaration, i.e. a namespace declaration without a prefix. It could look like this:

<?xml version="1.0"?>
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.1//EN" "http://www.w3.org/TR/xhtml11/DTD/xhtml11.dtd">
<html xmlns="http://www.w3.org/1999/xhtml" xml:lang="en">
<!-- rest of html goes here -->
</html>

The default namespace, xmlns="http://www.w3.org/1999/xhtml", without a prefix, is the big troublemaker, when trying to transform an XHTML input document. It is simply impossible to get to the nodes in a namespace, the easy way, if the namespace does not use a prefix. That is if the namespace is declared as default. [1]

2. First secret: use prefix

The first secret of the solution is to leave the input document as it is with the default namespace declaration (without a prefix), but to declare the same namespace in the XSLT stylesheet with a prefix, e.g. "xhtml":

xmlns:xhtml="http://www.w3.org/1999/xhtml"

3. Second secret: use both

The second secret of the solution is only to use the "xhtml" prefix when getting to the nodes of the XHTML input document, but to use the prefix-less default namespace for XHTML in the markup you add or change during the transformation.

In order to use both the prefixed "xmlns:xhtml" namespace and the default namespace, "xmlns", both must be declared in the XSLT stylesheet’s top-element, and they must point to the same URI, "http://www.w3.org/1999/xhtml":

<stylesheet
   xmlns="http://www.w3.org/1999/xhtml"
  xmlns:xhtml="http://www.w3.org/1999/xhtml"
<!-- etc -- >

If you only use one of the two declarations, you will either not get to the nodes in the XHTML input document or you will get a lot of nasty xmlns="" in the XHTML output document.

4. Code example: XHTML to XHTML

Below I show you a simple XHTML input document and an XSLT stylesheet to transform it to a new XHTML document. In this example we want to keep everything in the input document but we also want to add a CSS stylesheet to the head section and a navigation menu to the top of the body section.

We will use the identity template copying everything from the input document to the new document node for node, and we will use two additional templates to match the head section and the body section in order to overrule the copying of the identity template for head and body section.

The templates for "head" and "body" call the identity template in order to restart node for node copying at the right time in the right context.

5. The XHTML input document

<?xml version="1.0"?>
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.1//EN" "http://www.w3.org/TR/xhtml11/DTD/xhtml11.dtd"> [2]
<html xmlns="http://www.w3.org/1999/xhtml" xml:lang="en">
  <head>
    <title>Test document</title>
  </head>
  <body>
    <h1>Test document</h1>
    <p>This paragraph indicates the rest of the document.</p>
  </body>
</html>

6. The XSLT stylesheet

<?xml version="1.0"?>

<xsl:stylesheet version="2.0" [3]
  xmlns:xhtml="http://www.w3.org/1999/xhtml" [4]
  xmlns="http://www.w3.org/1999/xhtml" [5]
  xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
  xmlns:xs="http://www.w3.org/2001/XMLSchema"
  exclude-result-prefixes="xhtml xsl xs"> [6]

<xsl:output method="xml" version="1.0" encoding="UTF-8" doctype-public="-//W3C//DTD XHTML 1.1//EN" doctype-system="http://www.w3.org/TR/xhtml11/DTD/xhtml11.dtd" indent="yes"/> [7]

<!-- the identity template -->
<xsl:template match="@*|node()">
  <xsl:copy>
    <xsl:apply-templates select="@*|node()"/>
  </xsl:copy>
</xsl:template>

<!-- template for the head section. Only needed if we want to change, delete or add nodes. In our case we need it to add a link element pointing to an external CSS stylesheet. -->

<xsl:template match="xhtml:head">
  <xsl:copy>
    <link rel="StyleSheet" href="xhtml_test.css" type="text/css"/>
    <xsl:apply-templates select="@*|node()"/>
  </xsl:copy>
</xsl:template>

<!-- template for the body section. Only needed if we want to change, delete or add nodes. In our case we need it to add a div element containing a menu of navigation. -->

<xsl:template match="xhtml:body">
  <xsl:copy>
    <div class="menu">
      <p><a href="home">Homepage</a> &gt; <strong>Test document</strong></p>
    </div>
    <xsl:apply-templates select="@*|node()"/>
  </xsl:copy>
</xsl:template>
</xsl:stylesheet>

7. The XHTML output document

<?xml version="1.0"?>
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.1//EN" "http://www.w3.org/TR/xhtml11/DTD/xhtml11.dtd">
<html xmlns="http://www.w3.org/1999/xhtml" xml:lang="en">
  <head>
    <link rel="StyleSheet" href="xhtml_test.css" type="text/css" />
    <title>Test document</title>
  </head>
  <body>
    <div class="menu">
      <p><a href="home">Homepage</a> &gt; <strong>Test document</strong></p>
    </div>
    <h1>Test document</h1>
    <p>This paragraph indicates the rest of the document.</p>
  </body>
</html>

8. It is easy in XSLT 2.0

All what is said so far is true for XSLT 1.0 and XSLT 2.0. But in XSLT 2.0 we have an even better method for the second namespace declaration. We have a new attribute, "xpath-default-namespace", made to solve the problems with input xml markup being in a default namespace.

When transforming XHTML to XHTML we still need to declare the XHTML namespace twice in the XSLT stylesheet. The first time without a prefix for the output. The second time we use the new "xpath-default-namespace" attribute to make it possible to get to the XHTML input.

xpath-default-namespace= "http://www.w3.org/1999/xhtml"

By using the "xpath-default-namespace" attribute we can get to XHTML in the input document without the need of a prefix making our XPath expressions easier to write and read.

9. Conclusion

In order to get to the nodes in a default namespace declared in the XHTML input document, we must declare the default namespace twice in the XSLT stylesheet. In XSLT 1.0 we must declare it with a prefix in order to get to it, and we must declare it as default (without a prefix) in order to avoid xmlns="" in the XHTML output document.

Both namespace declarations must point to the same URI.

This trickery is not only true for XHTML transformed to XHTML but is generic and a most to know in the sense that we need it also when transforming XHTML to some other XML application or when transforming any XML markup in a default namespace.

In XSLT 2.0 we still need to declare the default namespace twice but the second time we can use the "xpath-default-namespace" attribute eliminating the need for a prefix.

Footnotes

[1]

The identity template works because it uses "@*|node()" but as soon as the XSLT stylesheet author wants to get to some specific node in order to overwrite the identity template, there is no direct way of getting to prefix-less nodes in a default namespace.

[2]

In the code examples I use XHTML 1.1 but could just as well have used XHTML 1.0 Strict! The first should be used with mime-type application/xhtml+xml and for the last we are also allowed to use text/html.

[3]

You can use version 1.0 if you like.

[4]

Here we have added a prefix to the default namespace used in the input XHTML document. We need it in order to get to the nodes.

[5]

Here we have the same default namespace without a prefix (defaults never have) as in the input XHTML document. We need it to avoid xmlns="" in the output document.

[6]

If we don't do this all these namespaces show up in the output document. Our XHTML document is not valid with all those namespaces not mentioned in the DTD.

[7]

In XSLT 1.0 we could only choose between "XML", "HTML" and "Text" as output. Since XHTML is an XML application we just used XML when outputting to XHTML. In XSLT 2.0 we can also output to "XHTML". This option is for XHTML served as text/html to very old browsers in need of, e.g., a <br /> with a space before the backslash instead of a <br/>.

The XHTML output option could have been useful in XSLT 1.0 but there are no legitimate old browsers not supporting XHTML in a proper way around anymore. I used the XML option for XHTML in XSLT 1.0 and have never encountered any problems. Now, 5 years later, it is absolutely safe always to use the "XML" option for XHTML.

Updated 2009-08-06