(toiminnot)

hwechtla-tl: Stx2any

Kierre.png

Mikä on WikiWiki?
nettipäiväkirja
koko wiki (etsi)
viime muutokset


This page discusses the piece of software called stx2any. I haven't been adding hordes of new features to it, but I (and other people) continue to use it, so I'm maintaining it regularly. To see what has changed, look at the changelog (http://sange.fi/~atehwa/Stx/debian/changelog).

What is it?

stx2any is a conversion utility from structured text (Stx) into HTML, man, LaTeX and DocBook XML formats. It comes with an emacs mode (extension to the text editor emacs, for authoring documents in Stx).

What is structured text?

Structured text (Stx) is a kind of plaintext with unambiguous markup for formatting constructs. It is designed to adhere to the tradition and readability of plaintext, with _emphasis_ marked by underscores, paragraph breaks by empty lines, and so on.

Stx is meant to be as good an authoring format as possible. It combines the benefits of structured markup (mid-level semantic markup such as DocBook or HTML) with the ease and readability of plaintext.

Here is an example of structured text:

What, when, *why*?  This is
the important question.  You
should ponder it thoroughly.

Otherwise, the consequences
will be... dire.

When converted to HTML, it will look like (depending on the stylesheet):


What, when, why? This is the important question. You should ponder it thoroughly.

Otherwise, the consequences will be... dire.


For a longer and more complicated example, see http://sange.fi/~atehwa/Stx/examples/Stx-doc.html .

What kind of structured text formats are there?

Currently, there are several similar projects as this one to come up with an optimal format for authoring. The ones I know include (in order of my preference):

See below ("How do the different structured texts compare?") for a discussion of differences between these and Stx.

Who should use structured text (and why?)

Stx is meant to be a general-purpose documentation and authoring format. This means that it is aimed as a format for writing documents (and that's quite a broad area of application). The benefits of writing your miscellaneous documents in Stx include:

There is a provocative essay about word processing vs. text processing at http://ricardo.ecn.wfu.edu/~cottrell/wp.html .

What are the design goals?

The design goals of Stx include:

  1. simple, intuitive, traditional syntax
  2. unrendered, it should look aesthetical and concise
  3. the markup should be pretty powerful, i.e. you should be able to use Stx to make relatively complex documents.
  4. it should not get in your way: it should allow for expansion with your own markup, as well as writing inline destination format (eg. HTML) markup in the document

The design goals of stx2any include:

  1. clean, small and moderately efficient
  2. to produce clean, simple output that is preferably typical of the output format
  3. to preserve the original document's formatting as it comes to whitespace
  4. to provide means for extension by, for instance, add-on macro packages and new output formats.

Where is this stuff documented?

http://sange.fi/~atehwa/Stx/README.html contains pointers to the documentation.

Stx is documented in:

stx2any is documented in:

The internals of stx2any are documented in:

Where can I download the software?

That's http://sange.fi/~atehwa/Stx.tar.gz .

John Magolske has also made a Vim file mode for stx2any (thanks!), which is available at http://b79.net/code/stx2any.vim .

What does it need to run?

stx2any is a typical Unix program. I reckon it will run under Cygwin; it will require sh (or a sh-compatible shell, such as ksh or bash), sed, and m4. All these programs are available on your average Unix box. Note that stx2any does not require a C compiler to build (it isn't written in C).

To use the emacs editing mode, you need emacs (doh). Conversion to plaintext uses w3m; conversion to XHTML uses tidy; and html2stx is written in Python.

How do I install it?

Well,

$ make && su -c 'make install'

or, if you are in a NetBSD box, for example,

# gmake PREFIX=/usr/local EMACSDIR=/usr/pkg/share/emacs/site-lisp install

On a debian box, you can build a debian package out of the source. That's

$ fakeroot debian/rules binary && dpkg -i ../stx2any_*.deb

There is also an APT repository available at:

http://sange.fi/~atehwa/ debian/

Add that into your /etc/apt/sources.list and do an apt-get update; apt-get install stx2any.

Could you tell me some highlights about Stx and stx2any?

These are some properties of Stx that are likely to irritate somebody (but have good reasons from my point of view):

How do the different structured texts compare?

Let the reader be warned that the following comparison is mostly based on autumn 2005 impressions.

(In the following assessments, "extensible" means that users can, within the document, specify their own markup and definitions, even change how the "native" constructs work. That is, extensibility on the document (not implementation) level. For instance, if I want my document to use en dashes instead of em dashes (common in some Finnish literature), I can say define(`w_emdash',`w_endash') to stx2any.)

Deplate is about the only one that seems to be able to do most everything stx2any is able to do. It's extensible, has "enough" conversion targets, feature-rich, written in Ruby. Sadly, it doesn't convert currently into man. It seems to have its own built-in programming language, which is not Ruby[1]. Markup, which is mostly a matter of preference and custom, is somewhat, but not too different. Emphasis requires two characters, enumerated lists have this '1.' look which I don't like, and tables have to be written "lined up". But chances are that you like exactly those choices.

[1] stx2any's programming language is m4, which is also stx2any's implementation language. deplate also seems to provide escapes to Ruby in a somewhat similar fashion. The unification of extension and implementation language is necessary for documents to be able to change the behaviour of builtin constructs.

Deplate clearly tries to do many things I wouldn't even consider for a program like this: I see structured texts as light wrappers around real formatting engines like browsers or TeX, so things like converting inline images to ascii fall outside the scope of stx2any. Consequently, deplate and its dependencies are quite big compared to stx2any.

Grutatxt and txt2tags, about which I found out rather late, are strikingly similar to stx2any in their "attitude". However, neither of them is as construct-rich as stx2any or as extensible. txt2tags has more target formats than stx2any, though.

txt2tags has been around for rather long, converts to many formats, and seems nice all around. I understood that it quotes the input text (as most other tools), so it's hard to write target language markup if you're only interested in one conversion target. It has some facilities to poke the internals (options, macros and substitutions), but is not very extensible after all. txt2tags' formatting constructs are even less obtrusive than stx2any's (which means they're a bit more to write). txt2tags comes with a Tk GUI (yecch if you ask me), and is written in Python (good).

Grutatxt produces HTML, man, me (another troff macro package), and LaTeX output. The most important differences are: (1) Grutatxt is not extensible, (2) it has "aesthetical" rather than "handy" markup for tables and headings, (3) it is more minimalistic than stx2any. Grutatxt is written in Perl.

ReST has a kind of serious-document-ish feel to it. It is relatively technically oriented and the conversion utilities are unforgiving on errors, but it is backed up by a strong community and the markup is quite well thought out. It is also extensible -- but not quite as extensible as Stx. ReST states unambiguity as its requirement, which leads to it having tons of syntax. (I don't quite understand what they mean by ambiguity, by the way. Maybe the situation where one construct eats the syntax space of another? But every construct eats the syntax space of ordinary text. Which rule takes precedence, is the decision that makes the markup unambiguous again.)

ReST currently converts into HTML and LaTeX, but the LaTeX support is somewhat limited due to its late addition. Overall, ReST has taken curiously long to produce important utilities, taken the size of the project (in people). ReST is implemented in Python.

Markdown also feels well thought out. However, it is quite HTML-centric, and not extensible as far as I know. It has many ease-of-use niceties. Markdown is a Perl script, AFAIK.

(Markdown has a (Python) utility to produce Markdown documents out of HTML. I expect HTML to become some kind of lingua franca among structured texts in the long run.)

Textile is very interesting. It has a lot of markup for "small but important things", such as acronyms and special characters. Textile is in some ways superior to Markdown, and in some ways, inferior. It is also HTML-centric, and not extensible.

AFT has been around for ages. It is relatively practical and somewhat extensible, but there is a definite arbitrariness to the markup and many wiki-reminescent markup conventions that are both counterintuitive and ugly to me: dependence on specific indent levels, emphasis by apostrophes. Written in Perl.

asciiDoc seems to have been conjured up as a frontend for writing DocBook documents. For some reason, it does not allow inline DocBook markup, and as DocBook is quite an extensive standard, it has lots of syntax (probably even more than ReST!). The emphasis syntaxes are a little bit weird, and its list markup is not indentation sensitive for some reason -- in practice, you need to indent anyway if you want readable source. Because of all this, it takes up so much syntax space that false positives for markup are quite common.

However, asciiDoc is fairly extensible. It is written in Python. Curiously, the language it gives the user to write extension logic in is not Python but an ad-hoc programming-like language. The configuration for producing documents is not embedded in the documents but put in a separate configuration file, which causes extra hassle when sending documents between authors.

Zope's structured text is about as technically-oriented as ReST, but in addition, it is not as well thought out and not extensible. Moreover, I'm not sure it has command-line support at all. Zope is written in Python.

APT is big and has many features, but its markup is even more arbitrary than AFT's and its output is not as minimalistic as I would like. It is not extensible, as far as I know. Written in Java (ouch by my standards).

Conclusion

The overwhelming goal of Stx is being a good authoring tool. That's why, for example, headings are denoted by exclamation points instead of putting a line of dashes over/under the heading: it's much nicer to write. That's why you can write inline HTML/man/LaTeX/DocBook in the document. The point is: it should make document writing fast, it should not get into your way, it should let you simply state what you want.

In ReST, unambiguity and generality (and sometimes aesthetics of source text) override this goal. In Markdown, aesthetics of the source text override this goal. Some of the other structured texts have not been very thoroughly thought out at all: just "being better than explicit markup languages" has been enough. Anyway, Stx is the authoring tool of choice for me.

kategoria: projektit


Pikalinkit:


kommentoi (viimeksi muutettu 13.10.2011 09:15)