[lug] Typesetting Programs

Thu Dec 12 14:04:55 MST 2002

>>>>> "David" == David Morris <lists at morris-clan.net> writes:

David> The core processing code for creating the documentation is all
David> very similar, just the method that data is gathered is
David> different....all I really need is to find some set of tags I
David> can insert into a text stream that will then get fed into a
David> formatting engine.

I think you're handwaving a bit hard here, and that it will come back
to haunt you.  Then again, I'm a noted pessimist, so... :)

When you use a phrase like "As an example, creating reports from the
content of a database [...] and creating a documentation file from
it.", I really worry about what you're going to come up with.  Sure,
you can get shiny output that's equivalent to "DESCRIBE my_table", but
that's all it is -- and while some people like that sort of thing, I
find it annoying.

Basically, magically deriving semantic markup ("this is a header",
"this is a link", "this table describes street addresses and is
capable of handling international postcodes") is *hard*.

The best solutions I know of involve having a way to put the
documentation right next to the code or objects that it describes, at
least for low-level API issues. This is where JavaDoc and Doxygen come
into play.  A slightly more comprehensive approach would be Knuth's
Web weave/tangle literate programming style.

For raw data streams ... I don't see how to apply formatting
"blindly"; see comments above about AI research, knowledge
representation, etc, etc.  If the data is already structured, you
should be able to ues a transformation to turn the existing structure
into something that is palatable to an output engine (DocBook, LaTeX,
*roff, TeX, etc).  If the data doesn't have useful semantic markup,
though, you're going to have a tough time of it.

(This is speaking from a position of some experience.  A few years
back, I was working on a project that would go through PDF files and
try to reconstruct a table of contents, based on font size and style,
position on page, etc.  It worked, but not without a lot of struggle.)

David> The most important part is that the documentation *must* look
David> good in a printed format

These are mostly physical formatting issues.  Any of the technologies
mentioned in my original reply can generate great-looking output in
all the media you like (including best-effort HTML publishing).  To do
it well, however, the input must have sufficient semantic markup and
good stylesheets.

Having said all that, I do wish you good luck, and I'd be interested
to hear how it works out.  :)

t.