Jesper Tverskov, June 12, 2009

Google's Writely and XSLT for web pages

This article is written with Google's Writely and can at any time be edited by me at Google Docs. This XHTML webpage, on the other hand, is at my own website. It's a transformation of the page at Google Docs using XSLT. A script, called from my document at Google Docs, takes care of publication at my website.

1. Web based wordprocessing

When I write articles, reports and tutorials for publication at my website, www.xmlplease.com, I use headings, paragraphs, lists, images, links, tables, drawings, footnotes and a few other things, over and over again. For that, that is for writing, almost any WYSIWYG wordprocessor will do.

When I write, I don't want to be concerned about markup and webdesign. I want to write and for writing nothing beats a wordprocessor like MS Word or other dedicated wordprocessors. When writing, I don't want to work in the source code. I want to focus on writing and to add the necessary formatting in the most simple of possible ways to get the job done.

I prefer to use a browser-based online wordprocessor like Google's Writely. Also I prefer to save my documents to some online harddisk I don't need to care about. I want something like Google Docs. I like the idea that I don't need to install, maintain and update the wordprocessor, and that I have easy access to wordprocessor and my documents from any online computer. [1]

2. Own website and Google Docs

For web publishing our needs vary. Some can do with Facebook and Twitter, others can do with a blog or another small website CMS (Content Management System). The use case I'm interested in, and for which Google Docs could provide half the solution, is longer articles, tutorials and reports like this very article.

My www.xmlplease.com website is where I promote my training courses in XML Technologies like XML Schema and XSLT. That is, I can't do with a Blog for my writing. I need to publish at my own website to make a collection of articles, tutorials and reports that help make my small XML training business creditable.

3. Google's Writely

Writely is saving into one of the worst ever seen code-soup versions of HTML. We can use styles for heading levels and normal paragraphs, e.g. we can make a "h1" heading, replacing a "DIV" element with a "H1" element. The "normal paragraph" style replaces a "DIV" element with a "P" element but they look exactly like a "DIV" element so the user has no incentive to do so consistently.

Using Writely, we end up with a mixture of cases for element names, some single tag elements are terminated with a forward slash but most often not, some elements have a closing end-tag some not. Attributes are sometimes quoted but just as often not, and we have things like <BR> and <br/> in the same document. [2]

Apart from the lousy markup, the most obvious problems with Google's Writely are:

Any programming language with string handling like Regular Expressions can transform Writely's junk markup to XHTML. One of the best programming language for such a task is XSLT 2.0. Here we only need the cumbersome Regular Expressions to make the input document well-formed (XML). For the finer details we can use XSLT templates transforming unspecified XML to XHTML.

4. Publishing at Google Docs

Google provides three easy ways to publish documents written with Writely at Google Docs:

  1. As a shared document at Google Docs.
  2. As an entry to your Google blog, if you have one.
  3. As a standalone web page with an URL provided by Google.

Except for the URL to the document, all the above ways of using Google Docs are not of my interest. I only want to use the URL as input for a transformation to a more useful XHTML web page at my own website. But how can I publish the Google document as XHTML at my website? [3]

I have made a script called from a webpage at my website to do the job. When I finish a new document at Google Docs, I publish it as standalone webpage to get the URL. I then use the URL as an ID and put it into an URL to my webpage than can do the transformation. I place this URL in a small "config" table at the top of my Google Doc document.

When I have edited the document at Google Docs, I click the link to my website. The link activates the script at my website. The transformation takes place and the new XHTML page is created at the right location at my website. Whenever I click the link at the top of the Google Docs document my XHTML webpage is updated. When I add an extra parameter, entrance pages at my website are also updated with a link to the new resource or I can delete it all again.

5. Transforming HTML codesoup

It is a major task to make an XSLT stylesheet that can transform any Google Docs document to XHTML. I thought that a handful of Regular Expressions and a few additional XSLT templates would be enough but I soon realized that Writely's documents are not only dirty but can be extremely nasty. I ended up using David Carlisle's htmlparse.xsl stylesheet, "Step 2", to make Writely's documents well-formed (XML). See David's blog for more about htmlparse.xsl and for other libraries that can help us making HTML codesoup well-formed like Tidy[4]

So here is what I do. I have made a pipeline of four transformations. A server-side webpage script glues it all together:

Step Purpose XSLT
1 Pre-cleaning mostly white space problems. w2x-cleaning.xsl
2 Making input well-formed. w2x-htmlparse.xsl
3 Making well-formed into XHTML data. w2x-xhtml.xsl
4 Making XHTML data into XHTML webpage. w2x-webpage.xsl

It is not my job this time around to make the ultimate writely2xhtml XSLT stylesheet. I'm satified if my solution can handle how I use Writely. I have not tested for all sorts of weirdness some user of Writely might come up with, especially in "Edith HTML" mode. I will improve the stylesheet as problems arise.

6. Some writely2xhtml problems

This article is only to demonstrate proof of concept, and to explore some of the most relevant issues involved. Let me mention a few of the problems of transformation:

6.1 Tables

Writely's table tool can not join cells. A very experienced user can go to Writely's "Edit HTML" mode and edit the table and insert "colspan" etc.

In Writely you can align table and even use float (I haven't manage to make float work). Aligning a table is useful but a little difficult in XHTML: a table is a block level element which a "p" element can not contain. But a table can be set to "display:inline-table" inside a div element having e.g. a "text-align:center".

To make it possible to use table headers, I have come up with the following solution: In Writely place the table inside the second row of another table of one column and two rows. In the first row use the word "xmlplease:th". [5]

6.2 Indent

Writely has an indent tool but the markup is blockquote elements inside blockquote elements of all horrors! In a traditional indent tool only left margin is indented (in left-to-right languages). Advanced word-processors will also have options for both margins, hanging indent, etc.

Since indents can be useful e.g. for long quotations if the use of blockquote isn't enough indention, a table of one column and two rows can be used. The first row must only contain the word xmlplease:indent-1 or xmlplease:indent-2, etc. What is inside the second row of the table will be indented 1 to x em inside a div. [6]

6.3 User-defined CSS

In Writely a user can add own CSS section, and a dialog for document level CSS also exists. The problem with these options are that most users think they know about webdesign but they don't. They wouldn't test if their user-defined CSS works in all relevant browsers, in many window sizes, on a mobile phone, also for keyboard users, for elderly users and for blind users, etc.

Since users are not expected to know much about webdesign, user-defined CSS and changes made at document level are not respected.

6.4 Code

Writely has no "code" style for the XHTML "code" element. Such a style must be available to make the document more accessible when read aloud in a text reader. I have decided that pink background-color, (255, 0 , 255), (#ff00ff), means "code" element. A similar solution could be used for other needed XHTML elements like "abbr". [7]

For longer blocks of code a table of one column and two rows can be used. The first row must only contain the word "xmlplease:code". The second row is for your code. You must make the indention and coloring in Writely. Let us take the "identity" template as example. If I do the following in Google's Writely:

It will end up like this when transformed to XHTML:

  1. <xsl:template match="@*|node()">
  2.  <xsl:copy>
  3.   <xsl:apply-templates select="@*|node()"/>
  4.  </xsl:copy>
  5. </xsl:template

7. Proof of the pudding

When I have done a little more testing, I will make it possible for you to publish at one of my websites from your own Google Docs to give you a better understanding of how things work.

8. Alternatives to Writely

Google's Writely is powerful enough for my purposes and Google Docs is sufficient as document management system. I would have liked some extra styles especially the markup for HTML code, <code>, but I use a color instead. Hopefully Writely will improve its markup over time. [8]

Interesting alternatives to using Google's Writely could be to use the FCK-Editor outputting XHTML or the Xopos editor outputting XML. I could integrate one of them at my own website, and use WYSIWYG functionality. But in addition I would need a licence and I would have to implement some document management system.

I prefer to use something like Google Docs. The whole idea is to make use of services already in place.

Update, 2011-08-08. I have dropped Google Docs for the time being and have returned to Microsoft. The file formats of Google Docs are simply too far out. I might return the day Google Docs save into an XML format.

Footnotes

 [1]
At the moment, all webpages except this one at my website are written in MS Word, saved to WordprocessingML and transformed to XHTML with a homemade XSLT stylesheet.
 [2]
I must admit that the markup created by Writely is too far out. I will go many extra miles if an alternative shows up creating well-formed XHTML output.
 [3]
At Google Sites, a tool to make websites, it is possible to replace the URL for a Google Site webpage with an URL based on the domain name of one of your own websites. That is, your own URL is redirected. This option does not exist for Writely at the moment.
 [4]
When I get the time I would like to test Tidy together with XSLT. Until now I have found htmlparse.xsl more than powerfull enough to handle the codesoup of Writely.
 [5]
Such a "table" method could be used to make other things possible that can not be done in Writely directly.
 [6]
Note the use of a qualifying namespace, "xmlplease", in order not to limit the use of tables to make tables.
 [7]
I prefer to explain terms when needed in plain language. The "abbr" and "acronym" elements in XHTML are gadgets distracting the user with no benefits. We have real online dictionaries all over the place.
 [8]
Apart from Writely at Google Docs, Google also have a tool to make websites called Google Sites. The editor is not as powerful as Writely. Among other things we can not make footnotes and there is no spell checking.

Updated: 2011-08-08