Jesper Tverskov, July 15, 2011

Disable-output-escaping and xsl:character-map

DOE is short for "disable-output-escaping". DOE is the name of an attribute we can use in xsl:value-of and in xsl:text. XSLT processors are not required to support it, and in XSLT 2.0 DOE is deprecated. We should use xsl:character-map instead. "Character-map" is a general method for replacing a character with a string when output is serialized.

1. DOE is necessary to make a HTML5 doctype in XSLT 1.0

Normally a DOCTYPE tag is made with the help of the "doctype-public" and the "doctype-system" attributes of the xsl:output element. But a HTML5 doctype can only be created as text with xsl:value-of or xsl:text and the "disable-output-escaping" attribute as in the following example:

  1. <xsl:value-of select="'&lt;!DOCTYPE html>'" disable-output-escaping="yes"/>

2. HTML5 doctype made with xsl:character-map

In XSLT 2.0 we can use xsl:character-map instead of DOE:

  1. <xsl:character-map name="a">
  2.    <xsl:output-character character="&lt;" string="&lt;"/>
  3.    <xsl:output-character character="&gt;" string="&gt;"/>
  4. </xsl:character-map>
  5. <xsl:output method="xhtml" use-character-maps="a"/>
  6. <!--and inside some template: -->
  7. <xsl:value-of select="'&lt;!DOCTYPE html>'"/>

Note that a character-map has a name. We can make many character-maps with different names and refer to them from xsl:output and xsl:result-document. It takes six times more code to make an HTML5 doctype with a character-map than with "disable-output-escaping".

3. HTML5 doctype made with xsl:character-map (better)

Often the character-map in the previous example is risky because there could be instances of &lt; and &gt; in the output document where we don't want to disable output escaping. The trick is to use some other character references as replacements for &lt; and &gt;. It is Best Practice to use character references to non assigned UNICODE characters from the Private Use Area, Range: E000–F8FF, like: &#xE801; and &#xE802; and &#xE803; etc.

  1. <xsl:character-map name="a">
  2.    <xsl:output-character character="&#xE801;" string="&lt;"/>
  3.    <xsl:output-character character="&#xE802;" string="&gt;"/>
  4. </xsl:character-map>
  5. <xsl:output method="xhtml" use-character-maps="a"/>
  6. <!--and inside some template: -->
  7. <xsl:value-of select="'&#xE801;!DOCTYPE html&#xE802;'"/>

4. DOE can be necessary to make CDATA section

CDATA sections can be made directly in an XSLT stylesheet but since the purpose of a CDATA section is "to escape blocks of text containing characters which would otherwise be recognized as markup", the block of text simpy ends up in output with markup escaped! The CDATA section is gone! What do we do if we want the CDATA section to make it to output as a CDATA section?

The xsl:output element and the xsl:result-document have an attribute that can be used to create CDATA sections, "cdata-section-elements". The value is a space separated list of element names where we want to wrap content in a CDATA sections.

Most often the "cdata-section-elements" attribute is enough but is can be too crude if we have several output elements with the same name and not all of them should have a CDATA section. In that situation DOE is also a must in XSLT 1.0.

  1. <xsl:value-of select="concat('&lt;![CDATA[' , $x,' ]>')" disable-output-escaping="yes"/>

The $x variable contains a block of text with markup that should be regarded as text by the XML processor. In XSLT 2.0 we should use "character-map" instead.

  1. <xsl:character-map name="a">
  2.    <xsl:output-character character="&#xE801;" string="&lt;"/>
  3.    <xsl:output-character character="&#xE802;" string="&gt;"/>
  4. </xsl:character-map>
  5. <xsl:output method="xhtml" use-character-maps="a"/>
  6. <!--and inside some template: -->
  7. <xsl:value-of select="concat('&#xE801;![CDATA[', $x, ' ]&#xE802;')'"/>

Once again we use references to non assigned UNICODE characters from the Private Use Area as replacements for "&lt;" and "&gt;" in order to avoid conflicts. We don't want to disable escaping of all instances of "&lt;" and "&gt;" but only when making the CDATA section.

5. When DOE is very bad because it is not necessary

DOE is sometimes used to hack an element when a programmer can not figure out how to do it right. The following input file is a little difficult for a beginner in XSLT. The "item" elements are items of a list but there is no explicit list container.

  1. <doc>
  2.    <para>Some text aaa.</para>
  3.    <item list="1">bread</item>
  4.    <item list="1">milk</item>
  5.    <item list="1">butter</item>
  6.    <para>Some text bbb.</para>
  7.    <item list="2">paper</item>
  8.    <item list="2">ink</item>
  9.    <item list="2">book</item>
  10.    <para>Some text ccc.</para>
  11. </doc>

A list without a container, only consisting of items, is not as rare as one should think. The above example is actually a simplified version of how lists are made in MS Word's wordprocessingML. Quite a challenge to transform to let us say XHTML, at least the first time.

Beginners in XSLT often end up with the following DOE hack (we only show the "bad" template):

  1. <xsl:template match="item">
  2.    <xsl:choose>
  3.       <xsl:when test="local-name(preceding::*[1]) != 'item'">
  4.          <xsl:value-of select="'&lt;ol&gt;'" disable-output-escaping="yes"/>
  5.             <li>
  6.                <xsl:value-of select="."/>
  7.             </li>
  8.       </xsl:when>
  9.       <xsl:when test="local-name(following::*[1]) != 'item'">
  10.             <li>
  11.                <xsl:value-of select="."/>
  12.             </li>
  13.          <xsl:value-of select="'&lt;/ol&gt;'" disable-output-escaping="yes"/>
  14.       </xsl:when>
  15.       <xsl:otherwise>
  16.             <li>
  17.                <xsl:value-of select="."/>
  18.             </li>
  19.       </xsl:otherwise>
  20.    </xsl:choose>
  21. </xsl:template>

Above we match all item elements and make an "xsl:choose" switch. If the item element is the first in a list we create the "ol" list container's start tag (!) and the first "li" element. If the item element is the last in a list, we create the closing "</ol>" container tag and the last "li" element. For all other item elements in the list, we just create the "li" element.

Since an "ol" element is not well-formed if we create the start and the end tag inside different "xsl:when" elements, we need to use DOE. But we can do without DOE if we know how to create the "ol" element the correct way:

  1. <xsl:template match="item[not(local-name(preceding::*[1]) = 'item')]">
  2.    <xsl:variable name="list" select="@list"/>
  3.    <ol>
  4.       <xsl:for-each select="../item[@list = $list]">
  5.          <li>
  6.             <xsl:value-of select="."/>
  7.          </li>
  8.       </xsl:for-each>
  9.    </ol>
  10. </xsl:template>
  11. <xsl:template match="item[local-name(preceding::*[1]) = 'item']"/>

In the much better solution above, we need two templates. One that matches all item elements being the first in a list, and another one matching the rest of the item elements. Since all item elements are handled by the xsl:for-each inside the first template, the second template should do nothing: all item elements have already been dealt with.

If we don't have the second template, the XSLT processor's build-in templates will take over and the text-nodes of the item elements not matched in the first template will be copied to output twice.

6. A general xsl:character-map example

In the following small XML file of one line, we would like to change the Danish "ø" to the international "o" in output. Also we would like to change &#169; to "Copyright ©" in output.

6.1 Input file

  1. <test>&#169; 1850 by Søren Kierkegaard</test>

6.2 XSLT stylesheet

  1. <xsl:stylesheet version="2.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
  2.    <xsl:character-map name="a">
  3.       <xsl:output-character character="ø" string="o"/>
  4.       <xsl:output-character character="&#169;" string="Copyright &#169;"/>
  5.    </xsl:character-map>
  6.    <xsl:output use-character-maps="a"/>
  7.    <xsl:template match="/">
  8.       <test>
  9.          <xsl:value-of select="test"/>
  10.       </test>
  11.    </xsl:template>
  12. </xsl:stylesheet>

6.3 Output file

  1. <test>Copyright © 1850 by Soren Kierkegaard</test>

7. To DOE or not to DOE

DOE is necessary for making a HTML5 doctype is XSLT 1.0 and can be the only option for making a CDATA section in XSLT 1.0. The only other exception where DOE is necessary in XSLT 1.0 could be when we for some reason want to output HTML that is not well-formed. [1]

DOE is deprecated in XSLT 2.0 and even in XSLT 1.0 it is only an optional feature of an XSLT processor. We cannot create a HTML5 doctype in client-side XSLT 1.0 in a browser like Firefox (works in IE, Opera, Chrome and Safari)! Even if implemented we can only use DOE in xsl:value-of and in xsl:text if they are used directly for the result three. If used for a temporary three like inside a variable, the XSLT processor must ignore DOE or thow an error.

For a HTML5 doctype and a CDATA section DOE is much easier to use than a "character-map". For that reason DOE will probably survive for a while even in XSLT 2.0. All the hard facts abot DOE is defined in the XSLT 1.0 spec: http://www.w3.org/TR/xslt#disable-output-escaping.

Footnotes

[1]

In XSLT we can only create HTML that is well-formed but a few things end up not well-formed when we use serialization method="html". E.g.: When we make singletag HTML elements like "<br/>" we must terminate them with "/", but the "/" is removed during the serialization step. If we for some reason want other things to be not well-formed in output we could achieve it with DOE.

Updated: 2011-08-09