X-Git-Url: https://gerrit.simantics.org/r/gitweb?p=simantics%2Fplatform.git;a=blobdiff_plain;f=tests%2Forg.simantics.scl.compiler.tests%2Fsrc%2Forg%2Fsimantics%2Fscl%2Fcompiler%2Ftests%2Fmarkdown%2Fspec.txt;h=92faa7302258e797e2519f906d035d534f58567f;hp=bdaed436dd9e20ae200ea1e49be4b10c679c730f;hb=a8758de5bc19e5adb3f618d3038743a164f09912;hpb=12d9af17384d960b75d58c3935d2b7b46d93e87b diff --git a/tests/org.simantics.scl.compiler.tests/src/org/simantics/scl/compiler/tests/markdown/spec.txt b/tests/org.simantics.scl.compiler.tests/src/org/simantics/scl/compiler/tests/markdown/spec.txt index bdaed436d..92faa7302 100644 --- a/tests/org.simantics.scl.compiler.tests/src/org/simantics/scl/compiler/tests/markdown/spec.txt +++ b/tests/org.simantics.scl.compiler.tests/src/org/simantics/scl/compiler/tests/markdown/spec.txt @@ -1,8 +1,8 @@ --- title: CommonMark Spec author: John MacFarlane -version: 0.25 -date: '2016-03-24' +version: 0.26 +date: '2016-07-15' license: '[CC-BY-SA 4.0](http://creativecommons.org/licenses/by-sa/4.0/)' ... @@ -13,12 +13,90 @@ license: '[CC-BY-SA 4.0](http://creativecommons.org/licenses/by-sa/4.0/)' Markdown is a plain text format for writing structured documents, based on conventions used for indicating formatting in email and usenet posts. It was developed in 2004 by John Gruber, who wrote -the first Markdown-to-HTML converter in perl, and it soon became -widely used in websites. By 2014 there were dozens of -implementations in many languages. Some of them extended basic -Markdown syntax with conventions for footnotes, definition lists, -tables, and other constructs, and some allowed output not just in -HTML but in LaTeX and many other formats. +the first Markdown-to-HTML converter in Perl, and it soon became +ubiquitous. In the next decade, dozens of implementations were +developed in many languages. Some extended the original +Markdown syntax with conventions for footnotes, tables, and +other document elements. Some allowed Markdown documents to be +rendered in formats other than HTML. Websites like Reddit, +StackOverflow, and GitHub had millions of people using Markdown. +And Markdown started to be used beyond the web, to author books, +articles, slide shows, letters, and lecture notes. + +What distinguishes Markdown from many other lightweight markup +syntaxes, which are often easier to write, is its readability. +As Gruber writes: + +> The overriding design goal for Markdown's formatting syntax is +> to make it as readable as possible. The idea is that a +> Markdown-formatted document should be publishable as-is, as +> plain text, without looking like it's been marked up with tags +> or formatting instructions. +> () + +The point can be illustrated by comparing a sample of +[AsciiDoc](http://www.methods.co.nz/asciidoc/) with +an equivalent sample of Markdown. Here is a sample of +AsciiDoc from the AsciiDoc manual: + +``` +1. List item one. ++ +List item one continued with a second paragraph followed by an +Indented block. ++ +................. +$ ls *.sh +$ mv *.sh ~/tmp +................. ++ +List item continued with a third paragraph. + +2. List item two continued with an open block. ++ +-- +This paragraph is part of the preceding list item. + +a. This list is nested and does not require explicit item +continuation. ++ +This paragraph is part of the preceding list item. + +b. List item b. + +This paragraph belongs to item two of the outer list. +-- +``` + +And here is the equivalent in Markdown: +``` +1. List item one. + + List item one continued with a second paragraph followed by an + Indented block. + + $ ls *.sh + $ mv *.sh ~/tmp + + List item continued with a third paragraph. + +2. List item two continued with an open block. + + This paragraph is part of the preceding list item. + + 1. This list is nested and does not require explicit item continuation. + + This paragraph is part of the preceding list item. + + 2. List item b. + + This paragraph belongs to item two of the outer list. +``` + +The AsciiDoc version is, arguably, easier to write. You don't need +to worry about indentation. But the Markdown version is much easier +to read. The nesting of list items is apparent to the eye in the +source, not just in the processed document. ## Why is a spec needed? @@ -258,9 +336,14 @@ the Unicode classes `Pc`, `Pd`, `Pe`, `Pf`, `Pi`, `Po`, or `Ps`. ## Tabs Tabs in lines are not expanded to [spaces]. However, -in contexts where indentation is significant for the -document's structure, tabs behave as if they were replaced -by spaces with a tab stop of 4 characters. +in contexts where whitespace helps to define block structure, +tabs behave as if they were replaced by spaces with a tab stop +of 4 characters. + +Thus, for example, a tab can be used instead of four spaces +in an indented code block. (Note, however, that internal +tabs are passed through as literal tabs, not expanded to +spaces.) ```````````````````````````````` example →foo→baz→→bim @@ -269,7 +352,6 @@ by spaces with a tab stop of 4 characters. ```````````````````````````````` - ```````````````````````````````` example →foo→baz→→bim . @@ -277,7 +359,6 @@ by spaces with a tab stop of 4 characters. ```````````````````````````````` - ```````````````````````````````` example a→a ὐ→a @@ -287,6 +368,9 @@ by spaces with a tab stop of 4 characters. ```````````````````````````````` +In the following example, a continuation paragraph of a list +item is indented with a tab; this has exactly the same effect +as indentation with four spaces would: ```````````````````````````````` example - foo @@ -315,6 +399,15 @@ by spaces with a tab stop of 4 characters. ```````````````````````````````` +Normally the `>` that begins a block quote may be followed +optionally by a space, which is not considered part of the +content. In the following case `>` is followed by a tab, +which is treated as if it were expanded into spaces. +Since one of theses spaces is considered part of the +delimiter, `foo` is considered to be indented six spaces +inside the block quote context, so we get an indented +code block starting with two spaces. + ```````````````````````````````` example >→→foo . @@ -363,6 +456,17 @@ bar ```````````````````````````````` +```````````````````````````````` example +#→Foo +. +

Foo

+```````````````````````````````` + +```````````````````````````````` example +*→*→*→ +. +
+```````````````````````````````` ## Insecure characters @@ -701,15 +805,6 @@ headings: ```````````````````````````````` -A tab will not work: - -```````````````````````````````` example -#→foo -. -

#→foo

-```````````````````````````````` - - This is not a heading, because the first `#` is escaped: ```````````````````````````````` example @@ -1890,7 +1985,7 @@ by their start and end conditions. The block begins with a line that meets a [start condition](@) (after up to three spaces optional indentation). It ends with the first subsequent line that meets a matching [end condition](@), or the last line of -the document, if no line is encountered that meets the +the document or other [container block]), if no line is encountered that meets the [end condition]. If the first line meets both the [start condition] and the [end condition], the block will contain just that line. @@ -1920,7 +2015,8 @@ followed by one of the strings (case-insensitive) `address`, `article`, `aside`, `base`, `basefont`, `blockquote`, `body`, `caption`, `center`, `col`, `colgroup`, `dd`, `details`, `dialog`, `dir`, `div`, `dl`, `dt`, `fieldset`, `figcaption`, `figure`, -`footer`, `form`, `frame`, `frameset`, `h1`, `head`, `header`, `hr`, +`footer`, `form`, `frame`, `frameset`, +`h1`, `h2`, `h3`, `h4`, `h5`, `h6`, `head`, `header`, `hr`, `html`, `iframe`, `legend`, `li`, `link`, `main`, `menu`, `menuitem`, `meta`, `nav`, `noframes`, `ol`, `optgroup`, `option`, `p`, `param`, `section`, `source`, `summary`, `table`, `tbody`, `td`, @@ -2224,6 +2320,7 @@ import Text.HTML.TagSoup main :: IO () main = print $ parseTags tags +okay .

 import Text.HTML.TagSoup
@@ -2231,6 +2328,7 @@ import Text.HTML.TagSoup
 main :: IO ()
 main = print $ parseTags tags
 
+

okay

```````````````````````````````` @@ -2242,12 +2340,14 @@ A script tag (type 1): document.getElementById("demo").innerHTML = "Hello JavaScript!"; +okay . +

okay

```````````````````````````````` @@ -2260,6 +2360,7 @@ h1 {color:red;} p {color:blue;} +okay . +

okay

```````````````````````````````` @@ -2355,11 +2457,13 @@ A comment (type 2): bar baz --> +okay . +

okay

```````````````````````````````` @@ -2372,12 +2476,14 @@ A processing instruction (type 3): echo '>'; ?> +okay . '; ?> +

okay

```````````````````````````````` @@ -2405,6 +2511,7 @@ function matchwo(a,b) } } ]]> +okay . +

okay

```````````````````````````````` @@ -3162,8 +3270,8 @@ Four spaces gives us a code block: ```````````````````````````````` -The Laziness clause allows us to omit the `>` before a -paragraph continuation line: +The Laziness clause allows us to omit the `>` before +[paragraph continuation text]: ```````````````````````````````` example > # Foo @@ -3269,8 +3377,8 @@ foo ```````````````````````````````` -Note that in the following case, we have a paragraph -continuation line: +Note that in the following case, we have a [lazy +continuation line]: ```````````````````````````````` example > foo @@ -3292,7 +3400,7 @@ To see why, note that in the `- bar` is indented too far to start a list, and can't be an indented code block because indented code blocks cannot -interrupt paragraphs, so it is a [paragraph continuation line]. +interrupt paragraphs, so it is [paragraph continuation text]. A block quote can be empty: @@ -3521,7 +3629,7 @@ The following rules define [list items]: 1. **Basic case.** If a sequence of lines *Ls* constitute a sequence of blocks *Bs* starting with a [non-whitespace character] and not separated from each other by more than one blank line, and *M* is a list - marker of width *W* followed by 0 < *N* < 5 spaces, then the result + marker of width *W* followed by 1 ≤ *N* ≤ 4 spaces, then the result of prepending *M* and the following spaces to the first line of *Ls*, and indenting subsequent lines of *Ls* by *W + N* spaces, is a list item with *Bs* as its contents. The type of the list item @@ -3529,6 +3637,12 @@ The following rules define [list items]: If the list item is ordered, then it is also assigned a start number, based on the ordered list marker. + Exceptions: When the first list item in a [list] interrupts + a paragraph---that is, when it starts on a line that would + otherwise count as [paragraph continuation text]---then (a) + the lines *Ls* must not begin with a blank line, and (b) if + the list item is ordered, the start number must be 1. + For example, let *Ls* be the lines ```````````````````````````````` example @@ -3703,66 +3817,20 @@ any following content, so these are not list items: ```````````````````````````````` -A list item may not contain blocks that are separated by more than -one blank line. Thus, two blank lines will end a list, unless the -two blanks are contained in a [fenced code block]. +A list item may contain blocks that are separated by more than +one blank line. ```````````````````````````````` example -- foo - - bar - - foo bar - -- ``` - foo - - - bar - ``` - -- baz - - + ``` - foo - - - bar - ``` . -

bar

- ```````````````````````````````` @@ -3795,15 +3863,14 @@ A list item may contain any kind of block: A list item that contains an indented code block will preserve -empty lines within the code block verbatim, unless there are two -or more empty lines in a row (since as described above, two -blank lines end the list): +empty lines within the code block verbatim. ```````````````````````````````` example - Foo bar + baz . -```````````````````````````````` - -```````````````````````````````` example -- Foo - - bar - - - baz -. - -
  baz
-
```````````````````````````````` - Note that ordered list start numbers must be nine digits or less: ```````````````````````````````` example @@ -4164,6 +4211,20 @@ A list may start or end with an empty list item: ```````````````````````````````` +However, an empty list item cannot interrupt a paragraph: + +```````````````````````````````` example +foo +* + +foo +1. +. +

foo +*

+

foo +1.

+```````````````````````````````` 4. **Indentation.** If a sequence of lines *Ls* constitutes a list item @@ -4361,13 +4422,18 @@ So, in this case we need two spaces indent: - foo - bar - baz + - boo . ```````````````````````````````` - `Markdown.pl` does not allow this, through fear of triggering a list via a numeral in a hard-wrapped line: -```````````````````````````````` example +``` markdown The number of windows in my house is 14. The number of doors is 6. -. -

The number of windows in my house is

-
    -
  1. The number of doors is 6.
  2. -
-```````````````````````````````` - +``` -Oddly, `Markdown.pl` *does* allow a blockquote to interrupt a paragraph, -even though the same considerations might apply. We think that the two -cases should be treated the same. Here are two reasons for allowing -lists to interrupt paragraphs: +Oddly, though, `Markdown.pl` *does* allow a blockquote to +interrupt a paragraph, even though the same considerations might +apply. -First, it is natural and not uncommon for people to start lists without -blank lines: +In CommonMark, we do allow lists to interrupt paragraphs, for +two reasons. First, it is natural and not uncommon for people +to start lists without blank lines: - I need to buy - - new shoes - - a coat - - a plane ticket +``` markdown +I need to buy +- new shoes +- a coat +- a plane ticket +``` Second, we are attracted to a @@ -4777,37 +4839,61 @@ Second, we are attracted to a (Indeed, the spec for [list items] and [block quotes] presupposes this principle.) This principle implies that if - * I need to buy - - new shoes - - a coat - - a plane ticket +``` markdown + * I need to buy + - new shoes + - a coat + - a plane ticket +``` is a list item containing a paragraph followed by a nested sublist, as all Markdown implementations agree it is (though the paragraph may be rendered without `

` tags, since the list is "tight"), then - I need to buy - - new shoes - - a coat - - a plane ticket +``` markdown +I need to buy +- new shoes +- a coat +- a plane ticket +``` by itself should be a paragraph followed by a nested sublist. -Our adherence to the [principle of uniformity] -thus inclines us to think that there are two coherent packages: +Since it is well established Markdown practice to allow lists to +interrupt paragraphs inside list items, the [principle of +uniformity] requires us to allow this outside list items as +well. ([reStructuredText](http://docutils.sourceforge.net/rst.html) +takes a different approach, requiring blank lines before lists +even inside other list items.) -1. Require blank lines before *all* lists and blockquotes, - including lists that occur as sublists inside other list items. +In order to solve of unwanted lists in paragraphs with +hard-wrapped numerals, we allow only lists starting with `1` to +interrupt paragraphs. Thus, -2. Require blank lines in none of these places. +```````````````````````````````` example +The number of windows in my house is +14. The number of doors is 6. +. +

The number of windows in my house is +14. The number of doors is 6.

+```````````````````````````````` -[reStructuredText](http://docutils.sourceforge.net/rst.html) takes -the first approach, for which there is much to be said. But the second -seems more consistent with established practice with Markdown. +We may still get an unintended result in cases like -There can be blank lines between items, but two blank lines end -a list: +```````````````````````````````` example +The number of windows in my house is +1. The number of doors is 6. +. +

The number of windows in my house is

+
    +
  1. The number of doors is 6.
  2. +
+```````````````````````````````` + +but this rule should prevent most spurious list captures. + +There can be any number of blank lines between items: ```````````````````````````````` example - foo @@ -4824,36 +4910,12 @@ a list:
  • bar

  • - - -```````````````````````````````` - - -As illustrated above in the section on [list items], -two blank lines between blocks *within* a list item will also end a -list: - -```````````````````````````````` example -- foo - - - bar -- baz -. - -

    bar

    - ```````````````````````````````` - -Indeed, two blank lines will end *all* containing lists: - ```````````````````````````````` example - foo - bar @@ -4867,26 +4929,28 @@ Indeed, two blank lines will end *all* containing lists: -
      bim
    -
    ```````````````````````````````` -Thus, two blank lines can be used to separate consecutive lists of -the same type, or to separate a list from an indented code block -that would otherwise be parsed as a subparagraph of the final list -item: +To separate consecutive lists of the same type, or to separate a +list from an indented code block that would otherwise be parsed +as a subparagraph of the final list item, you can insert a blank HTML +comment: ```````````````````````````````` example - foo - bar + - baz - bim @@ -4895,6 +4959,7 @@ item:
  • foo
  • bar
  • + +
    code
     
    ```````````````````````````````` @@ -5611,6 +5678,16 @@ single spaces, just as they would be by a browser: ```````````````````````````````` +Not all [Unicode whitespace] (for instance, non-breaking space) is +collapsed, however: + +```````````````````````````````` example +`a  b` +. +

    a  b

    +```````````````````````````````` + + Q: Why not just leave the spaces, since browsers will collapse them anyway? A: Because we might be targeting a non-HTML format, and we shouldn't rely on HTML-specific rendering assumptions. @@ -5867,18 +5944,22 @@ The following rules define emphasis and strong emphasis: 9. Emphasis begins with a delimiter that [can open emphasis] and ends with a delimiter that [can close emphasis], and that uses the same - character (`_` or `*`) as the opening delimiter. There must - be a nonempty sequence of inlines between the open delimiter - and the closing delimiter; these form the contents of the emphasis - inline. + character (`_` or `*`) as the opening delimiter. The + opening and closing delimiters must belong to separate + [delimiter runs]. If one of the delimiters can both + open and close emphasis, then the sum of the lengths of the + delimiter runs containing the opening and closing delimiters + must not be a multiple of 3. 10. Strong emphasis begins with a delimiter that [can open strong emphasis] and ends with a delimiter that [can close strong emphasis], and that uses the same character - (`_` or `*`) as the opening delimiter. - There must be a nonempty sequence of inlines between the open - delimiter and the closing delimiter; these form the contents of - the strong emphasis inline. + (`_` or `*`) as the opening delimiter. The + opening and closing delimiters must belong to separate + [delimiter runs]. If one of the delimiters can both open + and close strong emphasis, then the sum of the lengths of + the delimiter runs containing the opening and closing + delimiters must not be a multiple of 3. 11. A literal `*` character cannot occur at the beginning or end of `*`-delimited emphasis or `**`-delimited strong emphasis, unless it @@ -5902,9 +5983,7 @@ the following principles resolve ambiguity: so that the second begins before the first ends and ends after the first ends, the first takes precedence. Thus, for example, `*foo _bar* baz_` is parsed as `foo _bar baz_` rather - than `*foo bar* baz`. For the same reason, - `**foo*bar**` is parsed as `foobar*` - rather than `foo*bar`. + than `*foo bar* baz`. 16. When there are two potential emphasis or strong emphasis spans with the same closing delimiter, the shorter one (the one that @@ -6077,10 +6156,8 @@ A newline also counts as whitespace: *foo bar * . -

    *foo bar

    - +

    *foo bar +*

    ```````````````````````````````` @@ -6484,18 +6561,30 @@ __foo_ bar_

    foo bar baz

    ```````````````````````````````` - -But note: - ```````````````````````````````` example *foo**bar**baz* . -

    foobarbaz

    +

    foobarbaz

    ```````````````````````````````` +Note that in the preceding case, the interpretation + +``` markdown +

    foobarbaz

    +``` + + +is precluded by the condition that a delimiter that +can both open and close (like the `*` after `foo`) +cannot form emphasis if the sum of the lengths of +the delimiter runs containing the opening and +closing delimiters is a multiple of 3. + +The same condition ensures that the following +cases are all strong emphasis nested inside +emphasis, even when the interior spaces are +omitted: -The difference is that in the preceding case, the internal delimiters -[can close emphasis], while in the cases with spaces, they cannot. ```````````````````````````````` example ***foo** bar* @@ -6511,18 +6600,13 @@ The difference is that in the preceding case, the internal delimiters ```````````````````````````````` -Note, however, that in the following case we get no strong -emphasis, because the opening delimiter is closed by the first -`*` before `bar`: - ```````````````````````````````` example *foo**bar*** . -

    foobar**

    +

    foobar

    ```````````````````````````````` - Indefinite levels of nesting are possible: ```````````````````````````````` example @@ -6615,18 +6699,13 @@ ____foo__ bar__ ```````````````````````````````` -But note: - ```````````````````````````````` example **foo*bar*baz** . -

    foobarbaz**

    +

    foobarbaz

    ```````````````````````````````` -The difference is that in the preceding case, the internal delimiters -[can close emphasis], while in the cases with spaces, they cannot. - ```````````````````````````````` example ***foo* bar** . @@ -6941,13 +7020,6 @@ Rule 15: ```````````````````````````````` -```````````````````````````````` example -**foo*bar** -. -

    foobar*

    -```````````````````````````````` - - ```````````````````````````````` example *foo __bar *baz bim__ bam* . @@ -7300,6 +7372,16 @@ may be used in titles: ```````````````````````````````` +Titles must be separated from the link using a [whitespace]. +Other [Unicode whitespace] like non-breaking space doesn't work. + +```````````````````````````````` example +[link](/url "title") +. +

    link

    +```````````````````````````````` + + Nested balanced quotes are not allowed without escaping: ```````````````````````````````` example @@ -7878,7 +7960,7 @@ consists of a [link label] that [matches] a [link reference definition] elsewhere in the document and is not followed by `[]` or a link label. The contents of the first link label are parsed as inlines, -which are used as the link's text. the link's URI and title +which are used as the link's text. The link's URI and title are provided by the matching link reference definition. Thus, `[foo]` is equivalent to `[foo][]`. @@ -7964,7 +8046,8 @@ following closing bracket: ```````````````````````````````` -Full references take precedence over shortcut references: +Full and compact references take precedence over shortcut +references: ```````````````````````````````` example [foo][bar] @@ -7975,6 +8058,31 @@ Full references take precedence over shortcut references:

    foo

    ```````````````````````````````` +```````````````````````````````` example +[foo][] + +[foo]: /url1 +. +

    foo

    +```````````````````````````````` + +Inline links also take precedence: + +```````````````````````````````` example +[foo]() + +[foo]: /url1 +. +

    foo

    +```````````````````````````````` + +```````````````````````````````` example +[foo](not a link) + +[foo]: /url1 +. +

    foo(not a link)

    +```````````````````````````````` In the following case `[bar][baz]` is parsed as a reference, `[foo]` as normal text: @@ -8853,7 +8961,7 @@ foo A regular line break (not in a code span or HTML tag) that is not preceded by two or more spaces or a backslash is parsed as a -softbreak. (A softbreak may be rendered in HTML either as a +[softbreak](@). (A softbreak may be rendered in HTML either as a [line ending] or as a space. The result will be the same in browsers. In the examples here, a [line ending] will be used.) @@ -8984,7 +9092,7 @@ blocks. But we cannot close unmatched blocks yet, because we may have a [lazy continuation line]. 2. Next, after consuming the continuation markers for existing -blocks, we look for new block starts (e.g. `>` for a block quote. +blocks, we look for new block starts (e.g. `>` for a block quote). If we encounter a new block start, we close any blocks unmatched in step 1 before creating the new block as a child of the last matched block.