next up previous contents
Next: 23 00 13 Language Up: Integrating the document Previous: Target documents

Translation from source to target

Media professionals use databases with conversion filters which generate specific types of Layout Format designed for different media such as book, CD-ROM, or the Web. The software industry has produced more and more powerful products which are now capable of handling very large databases, and high-volume querying and conversion. Anyone who has consulted the website of a large media company, or indeed other large company, will have seen automatically generated documents of this kind.

On a small scale, a number of specific conversion programs are available, which assume that

  1. the source document is not a database but a conventional hierarchically structured (tree-structured) text,
  2. the formatting of the source document is explicitly defined in terms of style-sheets.

One program of this type is latex2html, which takes source documents in LaTeX format, and another is RTFtoHTML, which takes source documents formatted with MS-Word styles. Both produce HTML target documents. In fact, the present document is produced with latex2html. Standard word processors such as MS-Word or StarOffice also convert their own formats into HTML, but restrict their conversion to single page output, not exactly state of the art conversion.

A technique which has been under development for many years, and which is gradually becoming a standard technique, is to format the source documents in XML, and use special filtering languages (DSSSL, XSL, ...) to convert them into other formats. In addition, MS-Word HTML output contains many kinds of extraneous style information which are intended to permit reconstruction of the original document; this is in general not a property of converters, which are designed to convert one media specific Layout Format into another media specific Layout Format, and not to enforce the Layout Format designed for one medium (paper) into another medium.

Final tasks for the term:


next up previous contents
Next: 23 00 13 Language Up: Integrating the document Previous: Target documents

Dafydd Gibbon, Thu Jul 19 17:47:45 MET DST 2001