The Holy Grail of Technical Writing
The concept of single sourcing documentation long has been the Holy Grail of technical writing. Single source publishing means maintaining a single source of content but producing multiple output formats.
Before computers were common there was only one output format — dead trees. Single sourcing was not a challenge then. Documents were hand written and typed on typewriters. The draft was outsourced to a different department or company for document design. The final paper output was completed in a printer’s shop using a printing press.
Early computers were available only to a few people, but a new output format appeared — screen formatting.
Computer text editors existed but provided no meaningful page layout and formatting support. Some formatting was possible such as bold and italics using ASCII but little more.
One of the early challenges with text formatting was the man page. Special processing tools such as runoff and roff were developed to produce the desired formatting. This gave birth to the idea of digital typesetting and word processing.
Another early effort to format text was the TeX typesetting system.
These tools introduced the idea of content markup.
The concept of a markup language is the content is output agnostic. The markup language only provides structure. Producing the final output is done with other software.
Early efforts to create output agnostic content included the Standardized General Markup Language (SGML). SGML became an ISO standard in 1986. SGML is the basis for eXtensible Markup Language (XML), which is the basis for Hypertext Markup language (HTML).
Then computers became popular and usable for a majority of people. Dot matrix printers allowed people to print without a dependence on movable type or a printing press. The first Hewlett-Packard LaserJets appeared. For the first time in human history many people could self-publish although the primary output was the same dead trees.
The advent of personal computers saw the rise of word processor software such as WordStar and WordPerfect. Despite previous efforts to create output agnostic content, word processor software used proprietary file formats rather a common markup language.
Early word processor software mostly targeted personal and business users rather than publishing. The primary goal was content formatting rather than extensive page layout.
Along with the introduction of the Macintosh computer in 1984, that absence gave rise to desktop publishing software. Early software in this genre included Interleaf and FrameMaker. FrameMaker supported its own markup language called Maker Interchange Format (MIF) as well as SGML.
Combining the features of text formatting and page layout introduced a concept known as What You See Is What You Get (WYSIWYG) publishing. What is seen on the computer monitor screen is what would be seen in the final output.
The World Wide Web was introduced in 1989. This introduced the output format of HTML.
In 1993 the specifications appeared for the Portable Document Format (PDF). Along with XML and HTML, people could produce multiple document outputs other than dead trees.
Modern output now includes wikis. Most wikis display output in HTML but many use their own markup language.
Also new on the scene is ePub.
Because of these multiple storage and output formats, people pursued the idea of single source publishing. While WYSIWYG was popular, structured writing targeted the idea of What You See Is What You Mean (WYSIWYM). Similar to TeX, tools such as DocBook were developed to support structure rather than final format.
Today markup languages such as Markdown and AsciiDoc exist to help people write output agnostic content. Additional software is still needed to produce the desired output.
Along the way came an XML authoring process known as Darwin Information Typing Architecture (DITA), a framework with a goal of writing exchangeable, reusable chunks of information. With content stored as XML and therefore focusing on structure, the files are exchangeable and output agnostic.
One of the benefits of output agnostic content is an ability to choose the final output and tools to produce that output.
Many software vendors use proprietary file formats. Often publishing means being locked into a specific vendor’s software. That makes single sourcing challenging. Converting files from one format to another is painful. In the early days of word processor and desktop publishing software, this led to the introduction of file conversion software such as Word for Word by Mastersoft.
File format conversion today remains a challenge. Modern software such as Pandoc helps resolve challenges but is imperfect.
Even if a perfect tool chain existed that was free of proprietary encumbrance, single sourcing remains a challenge.
Readability affects the output format. A long standing debate is which fonts to use. On paper a common convention is to use serif fonts and on screen use sans serif fonts. Adding to the confusion, PDF files often are generated as though the document is paper but is actually read on screen.
While there has been notable progress in the past decades and many ways to produce output agnostic content, the primary challenge of single sourcing remains unchanged through the years — proprietary file formats and the back end processing tools to transform output agnostic content into the desired format.
Nobody yet has truly resolved the challenge of single source publishing.