Accreditions of ideas behind structured text
Stx and its implementation, stx2any, have not been developed as
casual hacks. They benefit of the traditions (in some cases not more
than a few years old) of typography, markup languages, plaintext markup,
Unix, functional programming, literate programming, and web authoring.
This document tries to record the origins of most ideas.
For a rationale of many plain text formatting conventions (that I was
not aware of when I designed stx2any), see Project Gutenberg's
Volunteers' FAQ. You'll be surprised how close their recommendations
are to mine — at least I was :)
- using structured plaintext as a generic authoring tool
- This idea definitely comes from wikis. They are a prime example
of how convenient and fast authoring can be.
- emphasis rules
- These were developed from my earlier work, a heuristic
text-to-HTML converter. I think these proper, traditional
emphasis rules are one of the most important aspects of Stx.
The literal emphasis was motivated by Zope structured text.
- header blocks
- RFC822 mail headers.
- explicit markup
- This arose naturally from the choice of using m4 as an
implementation language. This explicit markup thing I deem one
of the biggest advantages over other
rich structured texts,
which tend to be really syntax-heavy. We use abbreviation
syntax for things that have natural abbreviations or are written
a lot, explicit markup for other things.
- environment and counter system
- From LaTeX, of course.
- indentation system and processing blocks line by line
- The way of using indentation comes partially from Python and
partially from PikiPiki (which probably got it from Python).
Sub-character indents were a late addition to accommodate the
complex structure of lists.
Processing blocks line by line also comes from PikiPiki. It is
a great idea, freeing us from complicated analysis in the
parsing phase, and leaving empty lines for what they are good
for, that is, denoting paragraphs. If we processed block types
paragraph by paragraph, we would have to require empty lines in
all kinds of places were they might be unnatural (such as the
end of a list).
- using sed and m4 for the implementation
- Well, this was indirectly motivated by a discussion with Dirk A.
Froembgen, where I talked about how the ultimate macro processor
would have custom-syntax (regexp?) macros and Dirk stated that
this was actually not the job of the macro processor but
possibly rather a separate parsing phase. (Macro processors are
good for building text, let them stay so.)
For simple text producing (even if stx2any is not that
simple), macro processors are superb. They are concise,
efficient, natural and easy to program. They are not very good
as programming languages, but their natural
semblance to functional languages) makes them higher-level than,
say, C or Java.
But why exactly sed and m4? Why not Perl and cpp
(or why not make it all in Perl)? Well, first, sed and
m4 are practically everywhere. Besides, m4 is a really
good macro processor, and gives us mechanisms of abstraction
(such as diversions and runtime combining of definition files)
which would require work to implement in many other languages.
m4 is also a consistent and relatively elegant language, and
gives the users of stx2any rich facilities for defining
their own markup. Not to mention that Perl is actually a
bloated-off piece of shit. (Except for those who actually use
As for sed, I have to admit that I like its clumsiness,
weird looks, and unbelievably weird semantics.
- using pipes
- Pipes are a beautiful programming construct, and their power is
well demonstrated by the Unix tradition and probably especially
the family of troff tools. Of course, pipes were a natural
choice because of the two-phase processing in stx2any.
stx2any may well start some 8 processes when invoked!
Pipes also make it more easy to debug the program when something
goes amiss. It is nice to be able to inspect the result of the
conversion in different phases of processing. That is part of
the power of plain text (not as opposed to binary data, but as
opposed to intelligent data such as objects).
- templating system and diversions
- These come from my earlier product, a website building engine
that was also implemented in m4. The diversion subsystem
is a really useful abstraction layer over m4 diversions,
which suck somewhat.
- the w_ prefix of Stx macros
- From the website building engine mentioned in previous item.
That system had no abbreviations, and I used something like
w_p for marking paragraphs! I think it strikes a good balance
between brevity and syntax-share.
- passing HTML, LaTeX etc. directly through
- This is a natural consequence of m4 as an implementation
language, and was also present in the aforementioned website
templating system. I noticed how handy it is when I used that
system. Linuxdoc's wiki markup does this too.
- special characters: quotes, long dashes, ellipses, …
- Mostly from LaTeX, but some were motivated by Textile. The
rules for quotes and long dashes were my invention, and I'm very
content with them.
- linking macros and link abbreviations
- These are from my experience with web authoring and wikis.
There are many kinds of links, and providing semantically
sensible categorisation of them requires careful thought about
what we are trying to do with the link. Different link
categories also tend to produce different results in different
The idea of gather_stx_titles is based on the aforementioned
website building system.
Link resolution in link abbreviations arose from reflection on
the way people process links when they read plaintext.
The idea of the syntax of link abbreviations mostly comes from
Markdown, except label syntax comes from AFT, and automatic
labeling of headings is a great idea stolen from ReST.
- syntax for line breaks
- An earlier project of modifying PikiPiki, which got it from
LaTeX and poetry.
- syntax for footnotes
- Mostly plaintext tradition, and the act of noticing that
footnotes are like parentheses, only stronger.
- LaTeX. But the addition of making margin paragraphs floats, and
the float type resolution system, are my additions and I think
they are very good.
- syntax for tables
- LaTeX. This, too, I deem one of the big advantages of Stx over
rich structured texts: you don't have any weird syntax
for tables, you just write them as they seem natural. Very
|| marker is a natural extension of the
syntax, and sure looks better (and uses less syntax space) than
- paragraph system
- Plaintext tradition, LaTeX, troff. I came up with the idea
that sometimes even a lone newline might be relevant markup
(this is not used very much in Stx). It would also make sense
to translate paragraphs to different stuff based on context —
for example, within tables, into table groups.
- block system
- numbered list syntax, heading syntax
- From my work on PikiPiki. These were tried out and found good.
- preformatted (literal) block syntax
- From PikiPiki. This, too, is a great advantage, because it
allows us to write e.g. program code as in a normal source file.
Indented literal blocks are good for short passages, but hell
for proper literate programming etc.
- literate programming
- Donald Knuth.
- regression testing