Stx markup reference

1 Preliminary note
2 Command line options
3 Formatting markup
3.1 Markup with abbreviations
3.1.1 Structural markup
3.1.2 Special characters
3.1.3 Other markup with abbreviations
3.2 Link abbreviations
3.3 Macro-based markup
3.3.1 Macros for document metadata
3.3.2 Macros for links and references
3.3.3 Diversions
3.3.4 Whitespace
3.4 Environments
4 Extension markup
4.1 Definitions
4.2 Counters
4.3 Hooks
5 Quoting
5.1 m4 quoting and abbreviation quoting
5.2 Quote quoting
5.3 Output format quoting
6 Compatibility

1 Preliminary note

Please note that reading this document is not the proper way to learn Stx. You should never spend time learning tools you don't have to learn. Just start to write Stx (probably with the help of Stx quickie guide), and check this document if/when the need to do something complicated arises.

This document is arranged roughly by how frequently the material is of use for people. More frequently useful things come first, less frequently useful come after.

2 Command line options

These are documented on the manual page of stx2any.

3 Formatting markup

3.1 Markup with abbreviations

3.1.1 Structural markup

empty line
w_paragraph
causes a paragraph break
/text/, _text_
w_techemph(text), w_emph(text)
normal emphasis. Slashes (/) produce technical emphasis, underscores (_) semantic emphasis.
*text*
w_strong(text)
strong emphasis
''text''
w_literal(text)
literal formatting (short excerpts of text copied from somewhere, some program names, commands, abbreviations, etc.)
[[ text ]], [[-text-]]
w_footnote(text)
footnotes
// (at the end of line)
w_linebr
hard linebreaks
--, ---, ----, … (alone on a line)
w_sectbreak(number)
a section break (transition). The number (of dashes) determines the strength of the section break.
{{{, }}} (alone on a line)
begin and end a preformatted block, respectively. In preformatted blocks, all kinds of other constructs lose their special meaning.
!, !!, !!!, … (at the beginning of a line)
w_headl(number, text)
causes the line to be made into a heading. The number (of exclamation points) determines the level of the heading.
*, - (at the beginning of a line, with optional indentation)
a list item
# (at the beginning of a line, with optional indentation)
a numbered list item
:: (at the end of a line, line optionally indented)
makes the line a definition term. The corresponding definition follows on the next lines, indented.

More indentation causes lists to become nested. If a paragraph is indented without being on the top level of a list item, it is made into a citation block.

3.1.2 Special characters

"text"
w_quotation(text)
quotation marks
item--item, text -- text
w_endash, w_emdash
en dashes and em dashes
text...
w_ellipsis
ellipsis (a pause or omission marker)
text -> text, text <- text
w_rarrow, w_larrow
right and left pointing arrows
(c) text
w_copyrightsign
a copyright sign
(tm)
w_trademarksign
a trademark sign

3.1.3 Other markup with abbreviations

Link abbreviations are discussed in a separate section.

At the beginning of file (before the first empty line in the document), you can give document metadata by writing lines of the form metadata_type: value. For example, put the following lines at the beginning of a file:

title: Very important document
author: me
language: english

All metadata is available for setting in this way. Different kinds of metadata are given in the section Macros for document metadata. You just leave the w_ prefix away.

3.2 Link abbreviations

You can use link abbreviations if you request them separately.[1] If you enable them, almost all brackets become special. Link abbreviations require two-pass processing for indirect link references and automatic labeling, so they are not well suited for using stx2any as part of a pipeline. However, stx2any will do what it can if it gets link abbreviations from the standard input.

[1] stx2any has a command line switch to activate link abbreviations.

There are also indirect link references, which are gathered in separate link data blocks. Link data blocks do not affect the rendering of your document in any way except by providing information for linking. You can group them e.g. at the beginning of your document, at the end, or after every paragraph (which I think looks the neatest).

[+some text+]
This makes a label. The text some text is inserted at this point, and the label can be later referenced by the name some text.

When link abbreviations are enabled, headings also produce labels automatically.

http://nice-to-meet-you/blah/blah
URL's are marked as URL's without any special syntax. Recognised URL schemes are http, https, ftp, ftps, gopher, file, nntp, mailto and news.
text[link-id]
[text][link-id]
Produces a link to link-id with text. In the first form, text cannot contain spaces, whereas in the second form it can. The interpretation of link-id is explained below.
[link-id]
Produces a nameless link to link-id. The text for the link is the label text for labels, the foreign document title (or filename) for cross links, the URL itself for URL's, and an empty string for the alternative text of inline images.
[link-id] another link-id
put in separate paragraphs (link data blocks), lines like these make indirect link references, declaring link-id to be equal to another link-id.

For every link-id, different interpretations are tried in order until an appropriate one is found. This is the order:

  1. if it is an indirect link reference, it is rewritten and the new link id is tried;
  2. the link is made a cross reference, if there is a label corresponding to the link id;
  3. the link is made a cross link (between documents), if there is a document with corresponding document id or file name;
  4. if the link id begins with img:, the link is made into an inline image to the file given by the rest of the link id (+ a suffix; see the documentation of w_img).
  5. the link is made into an ordinary link if the link id looks like an URL.
  6. if all else fails, the link id just becomes footnote text at the point where the link was. But the footnote text has to originate from an indirect link reference. This is to protect a miswritten URL, cross reference or link tag from becoming a footnote by accident.

3.3 Macro-based markup

All Stx macros begin with a prefix w_. Normal m4 macros are also available, with the exception of GNU m4 format, which is too common a word to be left as a macro.

w_beg(env [, env args]), w_end(env)
begin and end the environment env, respectively. A list of available environments is in the next section. Some environments take arguments (env args above); these are described with the environments.
w_use(package-or-file), include(file)
use the definitions in package in the document, or, include the contents of file in the input. These two are almost the same thing, but w_use makes sure that the file is only included once, and adds the suffix .m4 to its name. Use w_use when in doubt.

The included file is not subject to Stx abbreviations, but goes through m4 processing. These are a good way for adding templates to your HTML pages (just dump some HTML markup into the diversions frontmatter and backmatter), sharing some content between documents, or inserting long sections of content where abbreviation processing is not to take place. If you want to concatenate many Stx documents, it's better to give them all as arguments to stx2any — that way they go through abbreviation processing.

w_man_desc([name,] short description)
This helper macro is for writing properly formatted NAME sections for man. The calls should be at the beginning of your man page, one by a line. name defaults to w_title, and on most pages, you need only one w_man_desc.

3.3.1 Macros for document metadata

w_title(text)
w_title
w_gettitle
set or get the title of the document. If the argument text is present, the title of the document is set to text; if it is absent, the macro expands to the (previously set) title. w_gettitle is the old name for w_title without arguments.
w_doc_id(text)
set the unique id of the document. This can be used to refer to the document; currently w_crosslink supports it.
w_char_coding(charset [, long-charset-name])
declare the input text to be in character coding charset. Supported values are (currently) ascii, latin1, latin9 and utf8. The default is utf8.[2] The man output format is unable to carry this piece of metadata.

[2] This has been changed; the default was latin1 up to and including stx2any 1.57.

The optional parameter long-charset-name can be used to make documents in character sets that are not natively supported. The charset is used for LaTeX (and possibly internally); long-charset-name is used for HTML and DocBook.

w_author(text)
w_author
set or get the author of the document.
w_date(text)
w_date
w_getdate
set or get the date of modifying / releasing the document. w_getdate is the old name for w_date without arguments. Note that Stx provides no magic for managing modification dates; it is up to you to keep the date correct, fetch it automatically from the file system, or to use e.g. features provided by version control systems to manage it. The meaning of the date of a document is a somewhat ambiguous; as a consequence, Stx doesn't try to guess what you use it for.
w_section(number)
for man pages, set the section into which the man page belongs
w_language(language [, langcode])
set the primary language the document is written in. language is the LaTeX-style, full language name (all lowercase); langcode is the ISO-style, two-letter language code, and defaults to the two first letters of language. (Hey, I had to come up with something!)
w_documentclass(class)
w_documentclass
(LaTeX only) set or get the document class of the document. (Default: article)
w_slideheader(text)
w_slidefooter(text)
Set the header and footer text, respectively, for slides, if you use the slide environment.

3.3.2 Macros for links and references

These macros have shorthands in Link abbreviations.

w_img(basename, text)
put in markup for including an inline image in the document. The name of the image is given by appending a dot (.) and a suffix to the given basename. This allows you to automatically produce the pictures for different formats: no picture format works for every output format. The suffix is given by a command line option; see the manual page of stx2any. The base of a relative filename can be altered by defining w_base.

The text in the second parameter is always displayed alternatively, never in addition to, the image.

w_link(url, text)
produce a link to url
w_url(url)
put url into the document. In HTML, it also becomes a link.
w_crosslink(doc-id [, text])
produce a cross-link from this file to the other document. doc-id is the target document's unique id (as set by w_doc_id in that document) or, failing that, its filename. The link text will be the destination document's title, as gathered by gather_stx_titles or, if it has no title, its filename. The base of the relative link can be altered by defining w_base; use it if your documents are in different directories.

If the second argument is present, its text will be used as the link text instead of the destination document's title / filename.

w_label(label, text)
produce a label (a place that can be referenced later)
w_refer(label, text)
produce a cross reference to the label label
w_autolabel(text)
w_autorefer(text, link text)
The same as w_label and w_refer, except that the label is automatically generated from text. If link text is not present, text is used for the link text.
w_index(div, [ marker, ] text)
put the given text into diversion div (the index) as well as at the current point in document. The current point is cross referenced from the index. This command is useful for creating lists-of-tables and such stuff, but is not very well thought out yet. The text of marker, if given, is put into the index but not made part of the link.
w_indexword(div, word)
make the word word so that it will always produce an index entry in div when it occurs in the text

3.3.3 Diversions

w_begdiv(div), w_enddiv(div)
begin and end outputting text into diversion div, respectively. Diversions can be used for rearranging input. The following diversions are currently used:
body
Body text. This is the default diversion: all text initially goes into the body.
ingr
Ingress. This diversion is placed immediately under the document title.
frontmatter
placed before any other content in the document. Can be used e.g. for making document templates.
backmatter
placed after any other content in the document.
metas
placed in the header of HTML documents. You need this if you want to e.g. include a stylesheet in your document.
preamble
placed in the preamble of LaTeX documents. This is a good place for your own LaTeX declarations, importing packages, etc.
defs
this is the trashcan diversion. It is useful for including stx-level comments, making macro definitions without producing extra whitespace in the output, and other things like that.
footnote
this diversion is used internally for gathering footnotes in output formats that don't natively support them.
w_dumpdiv(div)
place text gathered thus far in diversion div at this spot.

3.3.4 Whitespace

dnl
Eats everything up to and including following newline. Good for placing comments in the document and deleting spurious newlines from the output.
w_nl
Causes a newline in the output.

3.4 Environments

Environments must be properly nested, i.e. w_beg(foo) must be closed by w_end(foo). Moreover, these abbreviated constructs are internally environments and must therefore be properly nested with explicit environments:

abstract
the abstract of the document. This text is placed in the diversion ingr.
admonition
Admonitions are short notes that should stand out from the rest of the text. They are usually one or a few paragraphs long. The environment takes a parameter, the admonition type; for example, w_beg(admonition, Note). Special admonition types supported by DocBook XML are Note, Tip, Warning, Caution, and Important.
center
the text in the environment becomes centered.
citation
This is similar to an ordinary citation block (which is indicated by mere indentation), but has an attribution (who said the quoted stuff) given as a parameter, e.g. w_beg(citation, Chuck Moore).
comment
The text in the environment becomes commented in the output, i.e. it becomes a comment in whatever language the output language is. This environment can be used in the middle of a line.
compactlist
This environment makes a list without indents or list markers. Every line becomes one item in the list. The reason this type of list has such a long name is that it should not be used except in special cases: most output formats do not have semantic markup for this kind of list, and a list that does not have the ordinary look of a list is somewhat confusing. The environment is mostly useful for building navigation menus etc. where space is a scarce resource and putting list markup in is null-semantic.
float
Floats are blocks of text or other content that are separated from the normal flow of body text. They are used for lengthy tables, figures and pictures, and notes that relate generally to the subject at hand. This environment makes the enclosed text a float.

The environment takes two parameters, as in: w_beg(float, pos, caption). The first parameter pos determines the placement of the float and is composed of one-character placement hints, in order of preference of the placement of the float. (First character tells the most preferred placement and so on.) Some placements may not be available in some formats. The meanings of the characters are as follows:

h
place the float here, in the running text.
n
place the float near, for example after the paragraph or on the same page.
f
place the float far. It may be e.g. several pages away.
m
place the float in the margin. This is a way to make margin notes.

The second parameter, caption, tells the caption text of the float, if any.

ifeq
The text in the environment is only inserted if the (two) environment parameters match. This environment can be used in the middle of a line.

For example, the following only puts the included text in when producing a LaTeX document: w_beg(ifeq, w_outputfmt, latex) text… w_end(ifeq)

slide
To make slides, divide the text into slide environments following each other. You can set the slide header and footer with w_slideheader and w_slidefooter. For LaTeX, the slide show is implemented with the seminar document class.

For HTML, the result is a S5 slide show, about which you can read more at http://www.meyerweb.com/eric/tools/s5/. You need to fetch the style sheets and javascript file yourself. You can define w_s5url to specify the directory where S5 files reside (default is ui/default/).

For man and DocBook XML, the slides simply suck.

table
The text in the environment is a table with columns separated by || and rows closed by // at the end of a line.

Column types are given as parameters to the table environment, as in: w_beg(table, r, p). The meanings of column types are as follows:

l
produce a left-aligned column.
r
produce a right-aligned column.
c
produce a centered column.
p
produce a column where text may be put on multiple lines.

4 Extension markup

All macros beginning with a w_ (or @w_) prefix are reserved for stx2any. You can redefine them, of course, if you want to change the operation of stx2any. For your own macros, you can choose your own prefix or use macros without a prefix at all. Environments, diversions and counters have their own namespaces under @w_, and you should not worry about them.

4.1 Definitions

define(`macro', `expansion')
define your own macro. Further occurrences of macro will be replaced by the expansion.
w_def_in_fmt(format, `macro', `expansion')
define a macro, but only for output format format. In addition to making output format specific macros, you can use this to add some neatness that can be expressed only in some output formats to a word: for example, w_def_in_fmt(docbook-xml, Dr, <Honorific> ``Dr''</Honorific>)
w_define_env(environment, `beginstuff', `endstuff')
define a new environment. beginstuff will be executed at the beginning of the environment, and endstuff will be executed at the end. Formal parameters ($1 etc) will give the environment parameters in both beginstuff and endstuff.
w_derive_env(env, base-env, num, `pre-begin', `post-begin', `pre-end', `post-end')
this command is deprecated. It will continue to work, but there is no need for it.
w_define_div(diversion)
declare a new diversion for gathering text. After declaration, you can use w_begdiv, w_enddiv and w_dumpdiv on that diversion.
w_outputfmt
this macro has expansion html, man, docbook-xml or latex depending on which output format we're converting to. You can use it for writing your own format-aware macros.

4.2 Counters

w_newcounter(counter [, refcounter])
w_setcounter(counter, value)
w_delcounter(counter)
create, set the value of, or delete counter, respectively. If a counter has a refcounter (as specified in w_newcounter), the refcounter is reset every time the counter changes value.

Calls to w_newcounter and w_delcounter can be nested for a counter. w_newcounter resets the counter to zero and w_delcounter returns the counter to the value it had before w_newcounter.

w_stepcounter(counter [, step])
Increment the value of counter by step. If the step argument is not present, the counter is incremented by 1.
w_counter_arabic(counter)
w_counter_alpha(counter)
w_counter_Alpha(counter)
give the value of counter as arabic number, lowercase letter, or uppercase letter, respectively.

4.3 Hooks

pushdef(`macro', `expansion')
popdef(`macro')
temporarily change the definition of a macro. pushdef gives the macro a new definition, popdef sets it back to the old one. define always changes the most recent definition, never the earlier ones.
w_push_env(environment)
w_pop_env(environment)
temporarily change the definition of environment. (This is not for the faint of heart.) w_push_env changes the environment to a null environment that does nothing; after that, you can change the definition with w_define_env or w_derive_env. w_pop_env reinstates the old definition.

The following macros are good candidates for temporary redefinition.

w_softbr
called at the beginning of an ordinary text line.
w_eline
called at the end of a text line.
w_softpara
called at empty lines.
w_linebr
called for hard linebreaks (//), at the end of a line but before w_eline.
w_paragraph
called at the beginning of a paragraph block.
w_horizbr
called at the horizontal separator (||).
w_indent_hook
called after an increased indent.
w_dedent_hook
called before a decreased indent.

The following environments are good candidates for temporary redefinition.

text
ordinary text.
q
citation blocks.
footnote
footnotes.
litblock
literal blocks.

5 Quoting

Warning: don't read this section unless you really need to or you want your head to explode. You have been warned.

Quoting means the act of making literal text from something that would have been considered markup ordinarily. For instance, if you need a word surrounded by *asterisks* not to be converted to strong emphasis, you need some kind of quoting.

Simple rule of quoting: don't quote unless you have to. stx2any has been built so that most of the time it gets the writer's intention correct. When it doesn't, you might have to resort to quoting.

stx2any has four types of quoting:

  1. m4 quoting
  2. abbreviation quoting
  3. quote quoting
  4. destination format quoting (not actually the job of stx2any)

The reason for having so many is partly the implementation of stx2any, partly its philosophy, and partly the design of m4.

5.1 m4 quoting and abbreviation quoting

stx2any uses the native m4 quoting mechanism for its ordinary quoting job. The m4 quoting mechanism is actually quite elegant: quoted text begins from a backquote (`) and ends in an apostrophe ('). Quotes can be nested, so if you need literal quotes `like this', all you have to do is write ``like this''. With m4 quoting, you can quote macro calls, as in `w_paragraph'.

Abbreviations (the heart of Stx) are more problematic, because they are processed before the m4 processing phase takes place. But all abbreviations are defined to be highly contextual: for instance, the begin emphasis markup is required to have a left separator (space, open parenthesis, …) on its left side and a nonblank character on its right side. Because the backquote (`) is not considered to be a left separator, you can quote emphasis markup as if quoting m4 macro calls, and get e.g. /usr by writing `/'usr. All emphasis constructs also put quotes around the emphasised text, so literal /usr works without any quoting. Alternately, you can put markup out of context: for example, writing // in the middle of the line does not cause a line break.

Link abbreviations can be quoted by quoting the last square bracket in them.

5.2 Quote quoting

When you have a quoting mechanism, you have to have a quote-quoting mechanism: otherwise, you have no way to include a literal quote in your text. Simple instances of quote quoting are handled by the nesting of quotes: whenever you write quotes inside quotes, the inner quotes are preserved.

Unmatched quotes are more problematic. Especially the apostrophe (i.e. m4 closing quote) is a common character. stx2any strives for keeping your apostrophes from interfering with its m4 machinery (which makes heavy use of quoting) by quoting them into calls to w_apo, which are eventually converted back into apostrophes. But there is the problem that sometimes your apostrophes are meant to be m4 quotes. stx2any takes this into account by applying the w_apo rule only to those apostrophes that do not have a matching backquote (i.e. m4 opening quote). This, in turn, means that if you need an unmatched apostrophe within quotes, you have no option but to write w_apo there yourself.

A whole another problem is the lone open quote, for which there is no way of writing without changing m4 quoting rules. If there is an unmatched backquote in the source, stx2any reports this as an error. The macro w_bq temporarily changes m4 quotes and puts a literal backquote in the output. So if you need a backquote (`) in the output, write w_bq (or `'w_bq`' if there are adjacent words) instead. And thank god[3] that you don't need backquotes all that often.

[3] or Gaea, destiny, what have you

5.3 Output format quoting

To enable easy mixing of direct output format code and stx2any markup, stx2any by default does not perform any quoting whatsoever for constructs in the output language. There is a command line option, --quote, which will quote all characters that are somehow magical in the output format so that they will appear literally in the output.

The magical constructs are, by output format:

HTML and DocBook
less-than and greater-than signs (< and >), ampersands (&).
man
lines beginning with dots (.) or apostrophes ('), backslashes.
LaTeX
words beginning with backslashes (\), and any curly braces ({ and }), ampersands, underscores (_), carets (^), tildes (~), dollar signs ($), hashes (#), percent signs (%) and probably a few others I've forgotten…

The --quote option works by converting these special characters [4] into macro calls, which eventually get converted back to the literal representation of that character in the requested output format. Often this is just the character itself, if it happens not to be magical in that output format.

[4] except underscores and dollar signs, which are quoted by the separate option --quote-me-harder.

If you decide not to use automatic output format quoting, you can call these macros yourself every time you need some character that might be magical. The macros are:

--quote-me-harder is a separate option because underscores and dollar signs are especially problematic. An underscore is a valid character in a macro name, and blindly quoting underscores will break your markup seriously. At the moment, w_ macro calls are protected against underscore quoting. Dollar signs are also sometimes used in macro definitions, and quoting them will break the macro definition.

6 Compatibility

As every creator has high hopes for his thought-child, I, too, would like to see Stx taken into widespread use and implemented many times.[5] There is, however, the problem that because stx2any allows (in fact, encourages) extending the vocabulary with your own m4 definitions, correctly reimplementing Stx would require you to build a m4 interpreter in the process, and if you did that, there would seem to be little point to reimplement Stx.[6]

[5] The chances of this actually happening are quite small, because there are many competing formats out there, and everybody seems to have own personal preferences about what the syntax of structured text should be like.
[6] A similar situation applies for LaTeX, whose proper reimplementation will require reimplementing at least part of TeX, to allow for user-defined commands, environments and the like.

Because of this, I suggest compatibility levels, which describe the degree to which a particular implementation supports Stx. They're meant to make a progression from more important things to less important ones. I would deem implementation on the first level (abbrev level) to be quite sufficient for, for example, wiki markup, and the second level (markup level) sufficient for most purposes. On the fourth level, the syntax supported is mostly equivalent to that of stx2any.

Support for Link abbreviations is orthogonal to these compatibility levels and regarded as an optional extension to them.

abbrev level

This level has support for abbreviation-based markup: paragraphs, different kinds of emphasis, headings, different kinds of lists, preformatted blocks, hard linebreaks, section breaks, and metadata headers (but support for footnotes and special characters such as long dashes, ellipses or copyright signs is not required).

Emphasis constructs should only be recognised when there is an opening and closing emphasis mark. Emphasis should not be allowed to span paragraph boundaries. An emphasis marker (asterisk, slash, double-apostrophe, etc) is recognised as an opening mark when preceded by a left separator and followed by a non-blank character; and as a closing mark when preceded by a non-blank character and followed by a right separator.[7] Emphasis must not be empty but can be only one character long.

[7] Separators include blanks, dashes, apostrophes and quotation marks, in addition to which different kinds of opening parentheses are left separators, and punctuation marks and different kinds of closing parentheses are right separators.

markup level

This level has support for (almost) all non-extension-oriented markup: all the contructs on abbrev level, plus footnotes, everything in the sections environments and macro-based markup except w_crosslink, w_index and w_indexword. Support for m4 quoting is required, though only one level (that is, no need to implement nested quotes), as well as w_apo, w_bq and the rest of special characters described in the section output format quoting.

When (part of) something is m4 quoted, it should not be considered for macro expansion or other kinds of processing.

extension level

This level is meant to support users using their own extensions, as long as they don't do anything complicated. All the constructs on markup level are supported, plus proper m4 (nestable) quoting and everything in the section extension markup. In short, everything described in this document, with the exception of w_crosslink, w_index, w_indexword and some esoteric subtleties of abbreviation quoting should be supported.

Implementing w_define_env, define and friends will require at least some kind of macro processing capability. On this level, you don't have to implement arguments to macros; that's on the next level.

full compatibility level

All m4 constructs not already mentioned should be supported, in addition to the stuff in extension level. This includes a lot of stuff, and means you ended up reimplementing m4 after all.