Stx2any

hwechtla-tl: Stx2any

Mikä on WikiWiki?
nettipäiväkirja
koko wiki (etsi)
viime muutokset

This page discusses the piece of software called stx2any. I haven't been adding hordes of new features to it, but I (and other people) continue to use it, so I'm maintaining it regularly. To see what has changed, look at the changelog (http://sange.fi/~atehwa/Stx/debian/changelog).

What is it?

stx2any is a conversion utility from structured text (Stx) into HTML, man, LaTeX and DocBook XML formats. It comes with an emacs mode (extension to the text editor emacs, for authoring documents in Stx).

What is structured text?

Structured text (Stx) is a kind of plaintext with unambiguous markup for formatting constructs. It is designed to adhere to the tradition and readability of plaintext, with _emphasis_ marked by underscores, paragraph breaks by empty lines, and so on.

Stx is meant to be as good an authoring format as possible. It combines the benefits of structured markup (mid-level semantic markup such as DocBook or HTML) with the ease and readability of plaintext.

Here is an example of structured text:

What, when, *why*?  This is
the important question.  You
should ponder it thoroughly.

Otherwise, the consequences
will be... dire.

When converted to HTML, it will look like (depending on the stylesheet):

What, when, why? This is the important question. You should ponder it thoroughly.

Otherwise, the consequences will be... dire.

For a longer and more complicated example, see http://sange.fi/~atehwa/Stx/examples/Stx-doc.html .

What kind of structured text formats are there?

Currently, there are several similar projects as this one to come up with an optimal format for authoring. The ones I know include (in order of my preference):

txt2tags (http://txt2tags.sourceforge.net/)
reStructuredText (ReST) (http://docutils.sourceforge.net/rst.html)
Creole (http://www.wikicreole.org/)
deplate (http://deplate.sourceforge.net/)
Grutatxt (http://www.triptico.com/software/grutatxt.html)
Markdown (http://daringfireball.net/projects/markdown/) (see http://www.wilfred.me.uk/blog/2012/07/30/why-markdown-is-not-my-favourite-language/)
Textile (http://www.textism.com/tools/textile/)
Simple Document Format (SDF) (http://cpansearch.perl.org/src/IANC/sdf-2.001/doc/paper/sdfintro.html)
Almost Free Text (AFT) (http://www.maplefish.com/todd/aft.html)
asciiDoc (http://www.methods.co.nz/asciidoc/)
Structured Text, as available in Zope (http://www.zope.org/)
Almost Plain Text (APT) (http://www.xmlmind.com/aptconvert.html)

See below ("How do the different structured texts compare?") for a discussion of differences between these and Stx.

Who should use structured text (and why?)

Stx is meant to be a general-purpose documentation and authoring format. This means that it is aimed as a format for writing documents (and that's quite a broad area of application). The benefits of writing your miscellaneous documents in Stx include:

you get to control the exact content of the document. (With ordinary word processors, you never know what exactly your document contains - it might have remnants of previous versions, some invisible formatting, etc.)
your documents are readily convertible to many different media, such as web publishing, printing on paper, and sending as e-mail.
your documents become eternal, not depending on any particular program or version of a program. Any text editor will do.
Stx is faster to write than conventional markup languages or text in a word processor. After a while, you'll probably feel relieved by not having to meddle with formatting.

There is a provocative essay about word processing vs. text processing at http://ricardo.ecn.wfu.edu/~cottrell/wp.html .

What are the design goals?

The design goals of Stx include:

simple, intuitive, traditional syntax
unrendered, it should look aesthetical and concise
the markup should be pretty powerful, i.e. you should be able to use Stx to make relatively complex documents.
it should not get in your way: it should allow for expansion with your own markup, as well as writing inline destination format (eg. HTML) markup in the document

The design goals of stx2any include:

clean, small and moderately efficient
to produce clean, simple output that is preferably typical of the output format
to preserve the original document's formatting as it comes to whitespace
to provide means for extension by, for instance, add-on macro packages and new output formats.

Where is this stuff documented?

http://sange.fi/~atehwa/Stx/README.html contains pointers to the documentation.

Stx is documented in:

stx2any is documented in:

The internals of stx2any are documented in:

Where can I download the software?

That's http://sange.fi/~atehwa/Stx.tar.gz .

John Magolske has also made a Vim file mode for stx2any (thanks!), which is available at http://b79.net/code/stx2any.vim .

What does it need to run?

stx2any is a typical Unix program. I reckon it will run under Cygwin; it will require sh (or a sh-compatible shell, such as ksh or bash), sed, and m4. All these programs are available on your average Unix box. Note that stx2any does not require a C compiler to build (it isn't written in C).

To use the emacs editing mode, you need emacs (doh). Conversion to plaintext uses w3m; conversion to XHTML uses tidy; and html2stx is written in Python.

How do I install it?

Well,

$ make && su -c 'make install'

or, if you are in a NetBSD box, for example,

# gmake PREFIX=/usr/local EMACSDIR=/usr/pkg/share/emacs/site-lisp install

On a debian box, you can build a debian package out of the source. That's

$ fakeroot debian/rules binary && dpkg -i ../stx2any_*.deb

There is also an APT repository available at:

http://sange.fi/~atehwa/ debian/

Add that into your /etc/apt/sources.list and do an apt-get update; apt-get install stx2any.

Could you tell me some highlights about Stx and stx2any?

You can extend it in a number of ways. For example, I can easily imagine someone making a webpage templating system upon stx2any. (I might.)
It provides several facilities for writing your own markup: an environment system, a diversion system, counters and such niceties.
It comes with a simple literate programming tool, strip_stx.
It has a utility for converting html to Stx.
You can make slide shows with it.
Constructs that don't have sensible abbreviations are commonly done by m4 markup. For example, you do links by writing: w_link(destination, Link text).
The regular expressions of some formatting are so complicated that old GNU libc's practically freeze with them.
It is carefully designed not to irritate you, get into your way or second-guess you.

These are some properties of Stx that are likely to irritate somebody (but have good reasons from my point of view):

Heading syntax (begin lines with exclamation points) and literal block syntax (triple-braces) are unnatural for some.
Link abbreviations (not enabled by default) practically force you to quote _any_ square brackets that are _not_ links.
The fact that we do miscellaneous things with markup like w_something(blah) is bound to irritate someone.
The fact that you can write miscellaneous (HTML, man, LaTeX, DocBook) markup in the document means that you can produce syntactically invalid documents in Stx. (There is a command line option to prevent this.)
Syntactic quoting is done with m4-style `quotes', which is thought ugly by many. But you very seldom need quotes.
Tables don't look very "tablish" in the source.

How do the different structured texts compare?

Let the reader be warned that the following comparison is mostly based on autumn 2005 impressions. It's been updated a couple of times since; and most notes still stand.

(In the following assessments, "extensible" means that users can, within the document, specify their own markup and definitions, even change how the "native" constructs work. That is, extensibility on the document (not implementation) level. For instance, if I want my document to use en dashes instead of em dashes (common in some Finnish literature), I can say define(`w_emdash',`w_endash') to stx2any.)

Creole is "markup design done right". It is simple, unobtrusive, practice-backed and quite constrained. Also, they've put goals before principles, principles before specification, and specification before implementation. This makes sense for an application area that has been prototyped for over 300 times. They've done some stupid choices in their assessment of markup, such as seemingly treating all wiki engines equal in spite of their differing popularity, but all in all, they've done great work.

Deplate is about the only one that seems to be able to do most everything stx2any is able to do. It's extensible, has "enough" conversion targets, feature-rich, written in Ruby. Sadly, it doesn't convert currently into man. It seems to have its own built-in programming language, which is not Ruby[1]. Markup, which is mostly a matter of preference and custom, is somewhat, but not too different. Emphasis requires two characters, enumerated lists have this '1.' look which I don't like, and tables have to be written "lined up". But chances are that you like exactly those choices.

[1] stx2any's programming language is m4, which is also stx2any's implementation language. deplate also seems to provide escapes to Ruby in a somewhat similar fashion. The unification of extension and implementation language is necessary for documents to be able to change the behaviour of builtin constructs.

Deplate clearly tries to do many things I wouldn't even consider for a program like this: I see structured texts as light wrappers around real formatting engines like browsers or TeX, so things like converting inline images to ascii fall outside the scope of stx2any. Consequently, deplate and its dependencies are quite big compared to stx2any.

Grutatxt and txt2tags, about which I found out rather late, are strikingly similar to stx2any in their "attitude". However, neither of them is as construct-rich as stx2any or as extensible. txt2tags has more target formats than stx2any, though.

txt2tags has been around for rather long, converts to many formats, and seems nice all around. I understood that it quotes the input text (as most other tools), so it's hard to write target language markup if you're only interested in one conversion target. It has some facilities to poke the internals (options, macros and substitutions), but is not very extensible after all. txt2tags' formatting constructs are even less obtrusive than stx2any's (which means they're a bit more to write). txt2tags comes with a Tk GUI (yecch if you ask me), and is written in Python (good).

Grutatxt produces HTML, man, me (another troff macro package), and LaTeX output. The most important differences are: (1) Grutatxt is not extensible, (2) it has "aesthetical" rather than "handy" markup for tables and headings, (3) it is more minimalistic than stx2any. Grutatxt is written in Perl.

ReST has a kind of serious-document-ish feel to it. It is relatively technically oriented and the conversion utilities are unforgiving on errors, but it is backed up by a strong community and the markup is quite well thought out. It is also extensible -- but not quite as extensible as Stx. ReST states unambiguity as its requirement, which leads to it having tons of syntax. (I don't quite understand what they mean by ambiguity, by the way. Maybe the situation where one construct eats the syntax space of another? But every construct eats the syntax space of ordinary text. Which rule takes precedence, is the decision that makes the markup unambiguous again.)

ReST currently converts into HTML and LaTeX, but the LaTeX support is somewhat limited due to its late addition. Overall, ReST has taken curiously long to produce important utilities, taken the size of the project (in people). ReST is implemented in Python.

Markdown also feels well thought out. However, it is quite HTML-centric, and not extensible as far as I know. It has many ease-of-use niceties. Markdown is a Perl script, AFAIK. Markdown is nowadays immensely popular, and its lack of extensibility and design is starting to show very badly. Also, it doesn't have markup for description lists, which are one of the coolest constructions around. (This, unlike other things in this comparison, is undisputable. :))

(Markdown has a (Python) utility to produce Markdown documents out of HTML. I expect HTML to become some kind of lingua franca among structured texts in the long run.)

Textile is very interesting. It has a lot of markup for "small but important things", such as acronyms and special characters. Textile is in some ways superior to Markdown, and in some ways, inferior. It is also HTML-centric, and not extensible.

AFT has been around for ages. It is relatively practical and somewhat extensible, but there is a definite arbitrariness to the markup and many wiki-reminescent markup conventions that are both counterintuitive and ugly to me: dependence on specific indent levels, emphasis by apostrophes. Written in Perl.

asciiDoc seems to have been conjured up as a frontend for writing DocBook documents. For some reason, it does not allow inline DocBook markup, and as DocBook is quite an extensive standard, it has lots of syntax (probably even more than ReST!). The emphasis syntaxes are a little bit weird, and its list markup is not indentation sensitive for some reason -- in practice, you need to indent anyway if you want readable source. Because of all this, it takes up so much syntax space that false positives for markup are quite common.

However, asciiDoc is fairly extensible. It is written in Python. Curiously, the language it gives the user to write extension logic in is not Python but an ad-hoc programming-like language. The configuration for producing documents is not embedded in the documents but put in a separate configuration file, which causes extra hassle when sending documents between authors.

Zope's structured text is about as technically-oriented as ReST, but in addition, it is not as well thought out and not extensible. Moreover, I'm not sure it has command-line support at all. Zope is written in Python.

APT is big and has many features, but its markup is even more arbitrary than AFT's and its output is not as minimalistic as I would like. It is not extensible, as far as I know. Written in Java (ouch by my standards).

Conclusion

The overwhelming goal of Stx is being a good authoring tool. That's why, for example, headings are denoted by exclamation points instead of putting a line of dashes over/under the heading: it's much nicer to write. That's why you can write inline HTML/man/LaTeX/DocBook in the document. The point is: it should make document writing fast, it should not get into your way, it should let you simply state what you want.

In ReST, unambiguity and generality (and sometimes aesthetics of source text) override this goal. In Markdown, aesthetics of the source text override this goal. Some of the other structured texts have not been very thoroughly thought out at all: just "being better than explicit markup languages" has been enough. Anyway, Stx is the authoring tool of choice for me.

kategoria: projektit

Pikalinkit:

What is it?
What is structured text?
What kind of structured text formats are there?
Who should use structured text (and why?)
What are the design goals?
Where is this stuff documented?
Where can I download the software?
What does it need to run?
How do I install it?
Could you tell me some highlights about Stx and stx2any?
How do the different structured texts compare?
Conclusion

kommentoi (viimeksi muutettu 13.05.2020 00:57)