Accreditions of ideas behind structured text

Stx and its implementation, stx2any, have not been developed as casual hacks. They benefit of the traditions (in some cases not more than a few years old) of typography, markup languages, plaintext markup, Unix, functional programming, literate programming, and web authoring. This document tries to record the origins of most ideas.

For a rationale of many plain text formatting conventions (that I was not aware of when I designed stx2any), see Project Gutenberg's Volunteers' FAQ. You'll be surprised how close their recommendations are to mine — at least I was :)

using structured plaintext as a generic authoring tool

This idea definitely comes from wikis. They are a prime example of how convenient and fast authoring can be.

emphasis rules

These were developed from my earlier work, a heuristic text-to-HTML converter. I think these proper, traditional emphasis rules are one of the most important aspects of Stx. The literal emphasis was motivated by Zope structured text.

header blocks

RFC822 mail headers.

explicit markup

This arose naturally from the choice of using m4 as an implementation language. This explicit markup thing I deem one of the biggest advantages over other rich structured texts, which tend to be really syntax-heavy. We use abbreviation syntax for things that have natural abbreviations or are written a lot, explicit markup for other things.

environment and counter system

From LaTeX, of course.

indentation system and processing blocks line by line

The way of using indentation comes partially from Python and partially from PikiPiki (which probably got it from Python). Sub-character indents were a late addition to accommodate the complex structure of lists.

Processing blocks line by line also comes from PikiPiki. It is a great idea, freeing us from complicated analysis in the parsing phase, and leaving empty lines for what they are good for, that is, denoting paragraphs. If we processed block types paragraph by paragraph, we would have to require empty lines in all kinds of places were they might be unnatural (such as the end of a list).

using sed and m4 for the implementation

Well, this was indirectly motivated by a discussion with Dirk A. Froembgen, where I talked about how the ultimate macro processor would have custom-syntax (regexp?) macros and Dirk stated that this was actually not the job of the macro processor but possibly rather a separate parsing phase. (Macro processors are good for building text, let them stay so.)

For simple text producing (even if stx2any is not that simple), macro processors are superb. They are concise, efficient, natural and easy to program. They are not very good as programming languages, but their natural functionality (= semblance to functional languages) makes them higher-level than, say, C or Java.

But why exactly sed and m4? Why not Perl and cpp (or why not make it all in Perl)? Well, first, sed and m4 are practically everywhere. Besides, m4 is a really good macro processor, and gives us mechanisms of abstraction (such as diversions and runtime combining of definition files) which would require work to implement in many other languages. m4 is also a consistent and relatively elegant language, and gives the users of stx2any rich facilities for defining their own markup. Not to mention that Perl is actually a bloated-off piece of shit. (Except for those who actually use it…)

As for sed, I have to admit that I like its clumsiness, weird looks, and unbelievably weird semantics.

using pipes

Pipes are a beautiful programming construct, and their power is well demonstrated by the Unix tradition and probably especially the family of troff tools. Of course, pipes were a natural choice because of the two-phase processing in stx2any. stx2any may well start some 8 processes when invoked!

Pipes also make it more easy to debug the program when something goes amiss. It is nice to be able to inspect the result of the conversion in different phases of processing. That is part of the power of plain text (not as opposed to binary data, but as opposed to intelligent data such as objects).

templating system and diversions

These come from my earlier product, a website building engine that was also implemented in m4. The diversion subsystem is a really useful abstraction layer over m4 diversions, which suck somewhat.

the w_ prefix of Stx macros

From the website building engine mentioned in previous item. That system had no abbreviations, and I used something like w_p for marking paragraphs! I think it strikes a good balance between brevity and syntax-share.

passing HTML, LaTeX etc. directly through

This is a natural consequence of m4 as an implementation language, and was also present in the aforementioned website templating system. I noticed how handy it is when I used that system. Linuxdoc's wiki markup does this too.

special characters: quotes, long dashes, ellipses, …

Mostly from LaTeX, but some were motivated by Textile. The rules for quotes and long dashes were my invention, and I'm very content with them.

linking macros and link abbreviations

These are from my experience with web authoring and wikis. There are many kinds of links, and providing semantically sensible categorisation of them requires careful thought about what we are trying to do with the link. Different link categories also tend to produce different results in different output formats.

The idea of gather_stx_titles is based on the aforementioned website building system.

Link resolution in link abbreviations arose from reflection on the way people process links when they read plaintext.

The idea of the syntax of link abbreviations mostly comes from Markdown, except label syntax comes from AFT, and automatic labeling of headings is a great idea stolen from ReST.

syntax for line breaks

An earlier project of modifying PikiPiki, which got it from LaTeX and poetry.

syntax for footnotes

Mostly plaintext tradition, and the act of noticing that footnotes are like parentheses, only stronger.

floats

LaTeX. But the addition of making margin paragraphs floats, and the float type resolution system, are my additions and I think they are very good.

syntax for tables

LaTeX. This, too, I deem one of the big advantages of Stx over other rich structured texts: you don't have any weird syntax for tables, you just write them as they seem natural. Very nice. The || marker is a natural extension of the // syntax, and sure looks better (and uses less syntax space) than &.

paragraph system

Plaintext tradition, LaTeX, troff. I came up with the idea that sometimes even a lone newline might be relevant markup (this is not used very much in Stx). It would also make sense to translate paragraphs to different stuff based on context — for example, within tables, into table groups.

block system

numbered list syntax, heading syntax

From my work on PikiPiki. These were tried out and found good.

preformatted (literal) block syntax

From PikiPiki. This, too, is a great advantage, because it allows us to write e.g. program code as in a normal source file. Indented literal blocks are good for short passages, but hell for proper literate programming etc.

literate programming

Donald Knuth.

regression testing

tee-hee…