Whitespace: Spaces, Tabs and Newlines

The goal for the handling of normal whitespace (spaces, tabs and newlines) in Markua is for everything to just work.

There are the four principles of Markua’s whitespace handling:

  1. You should be able to look at a Markua document and know what is produced. Invisible formatting is frowned upon.
  2. Paragraphs and sentences should be handled consistently, regardless of indentation and spaces after periods.
  3. Manual whitespace formatting should be discouraged.
  4. Newlines are newlines; spaces are spaces. These are different things.

These simple goals have far-reaching consequences:

  1. Whitespace at the end of a line or file is ignored.
  2. It doesn’t matter how many spaces you add after a sentence.
  3. All consecutive blank lines after the first blank line are ignored when separating paragraphs, and all consecutive blank lines after the second blank line are ignored when separating lists.
  4. You can’t manually wrap text with newlines being used as though they are spaces, but you can add forced line breaks without hacks.

Whitespace handling is the largest difference between Markua and Markdown, so it’s discussed here, instead of in the appendix.

Newlines

Single Newline = Forced Line Break

In Markdown, you can manually wrap headings, paragraphs, lists and blockquotes with single newlines with no effect on the HTML output. Markdown, like HTML, treats single newlines as equivalent to single spaces.

In Markua, however, a forced line break in the input is a forced line break in all output formats. This is true for paragraphs, lists, blockquotes, asides and blurbs.

In ancient history, some text editors did not automatically wrap lines of text, so manual wrapping of plain text files was a good thing to do. Also, for computer programmers, we still do not wrap our text when programming. However, for writing, automatic wrapping of paragraphs is essential for staying in the flow while writing, and for being able to edit your text without needing to re-wrap every line in a paragraph. This is one decision that even Microsoft Word gets right.

The decision in Markua to treat single newlines as forced line breaks means that Markua does not need to use the horrible hack that Markdown uses to output forced line breaks. In Markdown, to output a forced line break (a <br/> tag in HTML), you need to add two spaces at the end of the line, followed by a single newline. This means that it is impossible to look at a Markdown document with single newlines in it and understand what they mean: you need to find out if there are invisible formatting characters at the end of the line to find out if the newlines mean “newline” or “single space”. Yuck!

Worse, some text editors (like Emacs, the editor I use) can be configured to remove trailing spaces at the ends of lines automatically when a file is saved. So, it’s possible for me to dramatically modify a Markdown document by simply opening it and saving it unedited. Yikes!

The following is an example of Markua’s single newlines:

I'm paragraph one. Yay!

This is paragraph two.
This is *still* in paragraph two, preceded by a forced line break.
This is also in paragraph two, also preceded by a forced line break.

This is paragraph three.

Three or More Newlines = Two Newlines = One Blank Line

Markua handles two consecutive newlines identically to Markdown: they produce a blank line, which separates block elements like paragraphs from each other.

Similarly, Markua handles three or more consecutive newlines identically to Markdown: they produce one blank line, as though only two newlines had been typed.

If you absolutely must insert a bunch of newlines in a row, you can do this by starting a code block (with three tildes or backticks) and doing so:

...the end of a paragraph.

~~~

~~~

That empty code block is 3 lines long, so it adds three blank lines of code to the output.

The question of backticks versus tildes just determines which font your blank lines of code will use, since it sets the default language to guess or text.

One Blank Line Is Added When Concatenating Manuscript Files

A Markua document can be written in one file or multiple manuscript files. If a manuscript is written in multiple files, these files are concatenated together by Leanpub to produce one temporary manuscript file, and that one file is used as the input.

Importantly, in order to avoid a number of bugs, the files are not just concatenated together unchanged–they are concatenated together with two newlines (i.e. one blank line) added between the end of each file and the beginning of the next file.

This is needed in order to separate the content of the two files with one blank line between them, in order to prevent a number of surprises for authors. Note that because of this rule, a paragraph (or any other block element) cannot span multiple manuscript files.

All Blank Lines at the Beginning and End of a File are Removed

Since a blank line is added when concatenating multiple manuscript files, there is no good reason to support blank lines, or lines containing only whitespace, at the beginning or end of a file. So, all blank lines and all whitespace-only lines at the beginning or end of a file are removed.

This is especially important with the whitespace at the end of a file: trailing whitespace at the end of a file is invisible to the author, and supporting invisible formatting–whether at the end of a line or the end of a file–is insanity.

Spaces and Tabs

Spaces and Tabs at the Beginning of a Line are Only to Determine List Containment, and Extra Spaces are Removed

Spaces and tabs at the beginning of a line are only used to determine whether the content is contained in a list item–or, in the case of a nested list, which list the list item is contained in.

Besides this, in a paragraph, any manual indentation (using spaces or tabs at the beginning of a line) is just removed. This is even true after a forced line break, using a single newline.

Spaces and Tabs at the End of a Line are Removed

Unlike Markdown, all trailing spaces at the end of a line are ignored by Markua. This way, there is no reliance on invisible formatting to produce newlines, and editors which strip trailing spaces have no effect on a Markua document.

Internal Spaces are Collapsed to One Space, Except At the End of Sentences

Markua handles internal whitespace in a paragraph in a similar way to Markdown:

First, in Markua and Markdown, multiple internal spaces or tabs in the middle of a sentence are all collapsed to one space.

However, Leanpub should be smart about interpreting what is the end of a sentence, and handle that specially.

At the end of sentences that aren’t followed by newlines, Leanpub may output one space, one and a half spaces, two spaces or some other amount of space. (Yes, one and a half spaces at the end of a sentence is a real thing, and it is arguably the one true amount of space at the end of a sentence.)

The amount of space chosen to be output at the end of sentences must be output by Leanpub at the end of all sentences which aren’t followed by newlines, regardless of whether any given sentence has one or two spaces at its end.

But what’s the end of a sentence?

This would be a lot easier to determine if all authors typed two spaces at the end of their sentences! This way, Leanpub could easily determine that something like “Mr. Armstrong” did not, in fact, contain the end of a sentence.

However, many authors type one space at the end of their sentences. So, Leanpub should use heuristics to determine what is the end of a sentence and what is not.

Regardless of how it made the determination about whether a sentence has ended, if Leanpub decides that something is, in fact, the end of a sentence, it must output the same amount of space every time. This can be one space, one and a half spaces, two spaces or some other amount.