- using structured plaintext as a generic authoring tool
- This idea definitely comes from wikis. They are a prime example
of how convenient and fast authoring can be.
- emphasis rules
- These were developed from my earlier work, a heuristic
text-to-HTML converter. I think these proper, traditional
emphasis rules are one of the most important aspects of Stx.
The literal emphasis was motivated by Zope structured text.
- header blocks
- RFC822 mail headers.
- explicit markup
- This arose naturally from the choice of using m4 as an
implementation language. This explicit markup thing I deem one
of the biggest advantages over other
rich
structured texts,
which tend to be really syntax-heavy. We use abbreviation
syntax for things that have natural abbreviations or are written
a lot, explicit markup for other things.
- environment and counter system
- From LaTeX, of course.
- indentation system and processing blocks line by line
- The way of using indentation comes partially from Python and
partially from PikiPiki (which probably got it from Python).
Sub-character indents were a late addition to accommodate the
complex structure of lists.
Processing blocks line by line also comes from PikiPiki. It is
a great idea, freeing us from complicated analysis in the
parsing phase, and leaving empty lines for what they are good
for, that is, denoting paragraphs. If we processed block types
paragraph by paragraph, we would have to require empty lines in
all kinds of places were they might be unnatural (such as the
end of a list).
- using sed and m4 for the implementation
- Well, this was indirectly motivated by a discussion with Dirk A.
Froembgen, where I talked about how the ultimate macro processor
would have custom-syntax (regexp?) macros and Dirk stated that
this was actually not the job of the macro processor but
possibly rather a separate parsing phase. (Macro processors are
good for building text, let them stay so.)
For simple text producing (even if stx2any is not that
simple), macro processors are superb. They are concise,
efficient, natural and easy to program. They are not very good
as programming languages, but their natural functionality
(=
semblance to functional languages) makes them higher-level than,
say, C or Java.
But why exactly sed and m4? Why not Perl and cpp
(or why not make it all in Perl)? Well, first, sed and
m4 are practically everywhere. Besides, m4 is a really
good macro processor, and gives us mechanisms of abstraction
(such as diversions and runtime combining of definition files)
which would require work to implement in many other languages.
m4 is also a consistent and relatively elegant language, and
gives the users of stx2any rich facilities for defining
their own markup. Not to mention that Perl is actually a
bloated-off piece of shit. (Except for those who actually use
it…)
As for sed, I have to admit that I like its clumsiness,
weird looks, and unbelievably weird semantics.
- using pipes
- Pipes are a beautiful programming construct, and their power is
well demonstrated by the Unix tradition and probably especially
the family of troff tools. Of course, pipes were a natural
choice because of the two-phase processing in stx2any.
stx2any may well start some 8 processes when invoked!
Pipes also make it more easy to debug the program when something
goes amiss. It is nice to be able to inspect the result of the
conversion in different phases of processing. That is part of
the power of plain text
(not as opposed to binary data, but as
opposed to intelligent data such as objects).
- templating system and diversions
- These come from my earlier product, a website building engine
that was also implemented in m4. The diversion subsystem
is a really useful abstraction layer over m4 diversions,
which suck somewhat.
- the w_ prefix of Stx macros
- From the website building engine mentioned in previous item.
That system had no abbreviations, and I used something like
w_p for marking paragraphs! I think it strikes a good balance
between brevity and syntax-share.
- passing HTML, LaTeX etc. directly through
- This is a natural consequence of m4 as an implementation
language, and was also present in the aforementioned website
templating system. I noticed how handy it is when I used that
system. Linuxdoc's wiki markup does this too.
- special characters: quotes, long dashes, ellipses, …
- Mostly from LaTeX, but some were motivated by Textile. The
rules for quotes and long dashes were my invention, and I'm very
content with them.
- linking macros and link abbreviations
- These are from my experience with web authoring and wikis.
There are many kinds of links, and providing semantically
sensible categorisation of them requires careful thought about
what we are trying to do with the link. Different link
categories also tend to produce different results in different
output formats.
The idea of gather_stx_titles is based on the aforementioned
website building system.
Link resolution in link abbreviations arose from reflection on
the way people process links when they read plaintext.
The idea of the syntax of link abbreviations mostly comes from
Markdown, except label syntax comes from AFT, and automatic
labeling of headings is a great idea stolen from ReST.
- syntax for line breaks
- An earlier project of modifying PikiPiki, which got it from
LaTeX and poetry.
- syntax for footnotes
- Mostly plaintext tradition, and the act of noticing that
footnotes are like parentheses, only stronger.
- floats
- LaTeX. But the addition of making margin paragraphs floats, and
the float type resolution system, are my additions and I think
they are very good.
- syntax for tables
- LaTeX. This, too, I deem one of the big advantages of Stx over
other
rich
structured texts: you don't have any weird syntax
for tables, you just write them as they seem natural. Very
nice. The ||
marker is a natural extension of the //
syntax, and sure looks better (and uses less syntax space) than
&
.
- paragraph system
- Plaintext tradition, LaTeX, troff. I came up with the idea
that sometimes even a lone newline might be relevant markup
(this is not used very much in Stx). It would also make sense
to translate paragraphs to different stuff based on context —
for example, within tables, into table groups.
- block system
- numbered list syntax, heading syntax
- From my work on PikiPiki. These were tried out and found good.
- preformatted (literal) block syntax
- From PikiPiki. This, too, is a great advantage, because it
allows us to write e.g. program code as in a normal source file.
Indented literal blocks are good for short passages, but hell
for proper literate programming etc.
- literate programming
- Donald Knuth.
- regression testing
- tee-hee…