A Cross-Browser Method for Embedding Images in Self-Contained HTML Documents
Two notes:
-BPH 3/18/2007
One potential application for DHTML is as a light weight, freely editable, self-contained document format. It can be used as a flow-based replacement for PDF. In fact, it is possible to create documents that are self-contained, but significantly richer than what PDF offers. One of the more difficult issues to resolve in putting DHTML to this application is that of embedding image data in a document in a manner that is compatible across browsers.
Netscape, Mozilla, and FireFox support data URLs, but they only handle miniscule amounts of data. Internet Explorer has no support for this at all. In order to embed and display images in a self-contained document you have to fool the rendering engine into doing what you want. You need to encode your image data in the file in some manner so that you can feed it to the renderer and see an image for your troubles. You would want this encoding to be of a reasonable size - it is only marginally useful to create a self-contained document that bloats up to several times the size of its component pieces. It would also be nice if you could re-use the same image in many places in the document instead of having to duplicate the data at each point of usage.
The obvious place to start looking is the table. If you create a table of 1px by 1px cells with no cell padding or cell spacing and set the background colors of each cell to the appropriate color, you'll have an image. Going with this approach will work, but it is slow, and the overhead for the table is enormous. This won't do for your need for a compact storage arrangement. It also won't do for image re-use. You could get clever and replace runs of N repititions of the same color with cells that span N columns. This would save some space, but still not enough. You'd still have so many table cell definitions that the bytes per pixel would be unacceptably large. And you'd still be unable to re-use the image data.
The next obvious direction to turn is client-side JavaScript and DOM manipulation. This reduces the problem to encoding the image data in a reasonably compact manner that can be decoded on the fly in an acceptable amount of time. Once you have created the DOM structure from the encoded data, you can re-use it in as many places as you like. So not only can the data be re-used, it need only be decoded once. The encoding that makes the most sense is a run-length encoding algorithm. Since you would like to minimize the amount of DOM structure created (and thus make rendering time as fast as possible) you'll hang on to that idea of spanning columns for sequences of same-color pixels. For gray images you need only encode the gray values. For color images, you have to palettize to keep images to a size that does not exceed your design goals. The run-length encoded data can be stored as a base 64 JavaScript string. That string is the data member for a class that will de-base64, run-length decode, and generate the DOM hierarchy for the colored table.
A final touch is to allow for printing. Instead of using the background color of the cells, you use the border color of a div element in the cell. Most browsers, by default, do not print background color but do print border color. By making use of the border color you make it easy to print the image without requiring the user to change browser settings. Since you are adding another DOM node per pixel, this does slow rendering down, however. For documents that are printed rarely or not at all, background color is the way to go.
Since posting a message regarding this method and the availability of source code about 6 months or so ago, I have received a steady stream of requests from a pretty diverse group of people from several different countries. It would appear that more than a few people share the idea that HTML is a useful format for transmitting and storing self-contained documents - if they can only get around that issue of imagery. There are probably one or two optimizations that can be done to wring a bit more efficiency out of the encoded size and decoding time, but the real solution would be the creation of a W3C standard for embedding non-text data (raster, SVG, audio) into HTML documents. Until then, JavaScript is your friend.
In addition to the commented source code of this article, further examples of this technique can be found at:|
Benn Herrera March 25, 2005 benn@bennherrera.com |