3 author: John MacFarlane
6 license: '[CC-BY-SA 4.0](http://creativecommons.org/licenses/by-sa/4.0/)'
13 Markdown is a plain text format for writing structured documents,
14 based on conventions used for indicating formatting in email and
15 usenet posts. It was developed in 2004 by John Gruber, who wrote
16 the first Markdown-to-HTML converter in perl, and it soon became
17 widely used in websites. By 2014 there were dozens of
18 implementations in many languages. Some of them extended basic
19 Markdown syntax with conventions for footnotes, definition lists,
20 tables, and other constructs, and some allowed output not just in
21 HTML but in LaTeX and many other formats.
23 ## Why is a spec needed?
25 John Gruber's [canonical description of Markdown's
26 syntax](http://daringfireball.net/projects/markdown/syntax)
27 does not specify the syntax unambiguously. Here are some examples of
28 questions it does not answer:
30 1. How much indentation is needed for a sublist? The spec says that
31 continuation paragraphs need to be indented four spaces, but is
32 not fully explicit about sublists. It is natural to think that
33 they, too, must be indented four spaces, but `Markdown.pl` does
34 not require that. This is hardly a "corner case," and divergences
35 between implementations on this issue often lead to surprises for
36 users in real documents. (See [this comment by John
37 Gruber](http://article.gmane.org/gmane.text.markdown.general/1997).)
39 2. Is a blank line needed before a block quote or heading?
40 Most implementations do not require the blank line. However,
41 this can lead to unexpected results in hard-wrapped text, and
42 also to ambiguities in parsing (note that some implementations
43 put the heading inside the blockquote, while others do not).
44 (John Gruber has also spoken [in favor of requiring the blank
45 lines](http://article.gmane.org/gmane.text.markdown.general/2146).)
47 3. Is a blank line needed before an indented code block?
48 (`Markdown.pl` requires it, but this is not mentioned in the
49 documentation, and some implementations do not require it.)
56 4. What is the exact rule for determining when list items get
57 wrapped in `<p>` tags? Can a list be partially "loose" and partially
58 "tight"? What should we do with a list like this?
77 (There are some relevant comments by John Gruber
78 [here](http://article.gmane.org/gmane.text.markdown.general/2554).)
80 5. Can list markers be indented? Can ordered list markers be right-aligned?
88 6. Is this one list with a thematic break in its second item,
89 or two lists separated by a thematic break?
97 7. When list markers change from numbers to bullets, do we have
98 two lists or one? (The Markdown syntax description suggests two,
99 but the perl scripts and many other implementations produce one.)
108 8. What are the precedence rules for the markers of inline structure?
109 For example, is the following a valid link, or does the code span
113 [a backtick (`)](/url) and [another backtick (`)](/url).
116 9. What are the precedence rules for markers of emphasis and strong
117 emphasis? For example, how should the following be parsed?
123 10. What are the precedence rules between block-level and inline-level
124 structure? For example, how should the following be parsed?
127 - `a long code span can contain a hyphen like this
128 - and it can screw things up`
131 11. Can list items include section headings? (`Markdown.pl` does not
132 allow this, but does allow blockquotes to include headings.)
138 12. Can list items be empty?
146 13. Can link references be defined inside block quotes or list items?
154 14. If there are multiple definitions for the same reference, which takes
164 In the absence of a spec, early implementers consulted `Markdown.pl`
165 to resolve these ambiguities. But `Markdown.pl` was quite buggy, and
166 gave manifestly bad results in many cases, so it was not a
167 satisfactory replacement for a spec.
169 Because there is no unambiguous spec, implementations have diverged
170 considerably. As a result, users are often surprised to find that
171 a document that renders one way on one system (say, a github wiki)
172 renders differently on another (say, converting to docbook using
173 pandoc). To make matters worse, because nothing in Markdown counts
174 as a "syntax error," the divergence often isn't discovered right away.
176 ## About this document
178 This document attempts to specify Markdown syntax unambiguously.
179 It contains many examples with side-by-side Markdown and
180 HTML. These are intended to double as conformance tests. An
181 accompanying script `spec_tests.py` can be used to run the tests
182 against any Markdown program:
184 python test/spec_tests.py --spec spec.txt --program PROGRAM
186 Since this document describes how Markdown is to be parsed into
187 an abstract syntax tree, it would have made sense to use an abstract
188 representation of the syntax tree instead of HTML. But HTML is capable
189 of representing the structural distinctions we need to make, and the
190 choice of HTML for the tests makes it possible to run the tests against
191 an implementation without writing an abstract syntax tree renderer.
193 This document is generated from a text file, `spec.txt`, written
194 in Markdown with a small extension for the side-by-side tests.
195 The script `tools/makespec.py` can be used to convert `spec.txt` into
196 HTML or CommonMark (which can then be converted into other formats).
198 In the examples, the `→` character is used to represent tabs.
202 ## Characters and lines
204 Any sequence of [characters] is a valid CommonMark
207 A [character](@) is a Unicode code point. Although some
208 code points (for example, combining accents) do not correspond to
209 characters in an intuitive sense, all code points count as characters
210 for purposes of this spec.
212 This spec does not specify an encoding; it thinks of lines as composed
213 of [characters] rather than bytes. A conforming parser may be limited
214 to a certain encoding.
216 A [line](@) is a sequence of zero or more [characters]
217 other than newline (`U+000A`) or carriage return (`U+000D`),
218 followed by a [line ending] or by the end of file.
220 A [line ending](@) is a newline (`U+000A`), a carriage return
221 (`U+000D`) not followed by a newline, or a carriage return and a
224 A line containing no characters, or a line containing only spaces
225 (`U+0020`) or tabs (`U+0009`), is called a [blank line](@).
227 The following definitions of character classes will be used in this spec:
229 A [whitespace character](@) is a space
230 (`U+0020`), tab (`U+0009`), newline (`U+000A`), line tabulation (`U+000B`),
231 form feed (`U+000C`), or carriage return (`U+000D`).
233 [Whitespace](@) is a sequence of one or more [whitespace
236 A [Unicode whitespace character](@) is
237 any code point in the Unicode `Zs` class, or a tab (`U+0009`),
238 carriage return (`U+000D`), newline (`U+000A`), or form feed
241 [Unicode whitespace](@) is a sequence of one
242 or more [Unicode whitespace characters].
244 A [space](@) is `U+0020`.
246 A [non-whitespace character](@) is any character
247 that is not a [whitespace character].
249 An [ASCII punctuation character](@)
250 is `!`, `"`, `#`, `$`, `%`, `&`, `'`, `(`, `)`,
251 `*`, `+`, `,`, `-`, `.`, `/`, `:`, `;`, `<`, `=`, `>`, `?`, `@`,
252 `[`, `\`, `]`, `^`, `_`, `` ` ``, `{`, `|`, `}`, or `~`.
254 A [punctuation character](@) is an [ASCII
255 punctuation character] or anything in
256 the Unicode classes `Pc`, `Pd`, `Pe`, `Pf`, `Pi`, `Po`, or `Ps`.
260 Tabs in lines are not expanded to [spaces]. However,
261 in contexts where indentation is significant for the
262 document's structure, tabs behave as if they were replaced
263 by spaces with a tab stop of 4 characters.
265 ```````````````````````````````` example
268 <pre><code>foo→baz→→bim
270 ````````````````````````````````
273 ```````````````````````````````` example
276 <pre><code>foo→baz→→bim
278 ````````````````````````````````
281 ```````````````````````````````` example
288 ````````````````````````````````
291 ```````````````````````````````` example
302 ````````````````````````````````
304 ```````````````````````````````` example
316 ````````````````````````````````
318 ```````````````````````````````` example
325 ````````````````````````````````
327 ```````````````````````````````` example
336 ````````````````````````````````
339 ```````````````````````````````` example
346 ````````````````````````````````
348 ```````````````````````````````` example
364 ````````````````````````````````
368 ## Insecure characters
370 For security reasons, the Unicode character `U+0000` must be replaced
371 with the REPLACEMENT CHARACTER (`U+FFFD`).
375 We can think of a document as a sequence of
376 [blocks](@)---structural elements like paragraphs, block
377 quotations, lists, headings, rules, and code blocks. Some blocks (like
378 block quotes and list items) contain other blocks; others (like
379 headings and paragraphs) contain [inline](@) content---text,
380 links, emphasized text, images, code, and so on.
384 Indicators of block structure always take precedence over indicators
385 of inline structure. So, for example, the following is a list with
386 two items, not a list with one item containing a code span:
388 ```````````````````````````````` example
396 ````````````````````````````````
399 This means that parsing can proceed in two steps: first, the block
400 structure of the document can be discerned; second, text lines inside
401 paragraphs, headings, and other block constructs can be parsed for inline
402 structure. The second step requires information about link reference
403 definitions that will be available only at the end of the first
404 step. Note that the first step requires processing lines in sequence,
405 but the second can be parallelized, since the inline parsing of
406 one block element does not affect the inline parsing of any other.
408 ## Container blocks and leaf blocks
410 We can divide blocks into two types:
411 [container block](@)s,
412 which can contain other blocks, and [leaf block](@)s,
417 This section describes the different kinds of leaf block that make up a
422 A line consisting of 0-3 spaces of indentation, followed by a sequence
423 of three or more matching `-`, `_`, or `*` characters, each followed
424 optionally by any number of spaces, forms a
427 ```````````````````````````````` example
435 ````````````````````````````````
440 ```````````````````````````````` example
444 ````````````````````````````````
447 ```````````````````````````````` example
451 ````````````````````````````````
454 Not enough characters:
456 ```````````````````````````````` example
464 ````````````````````````````````
467 One to three spaces indent are allowed:
469 ```````````````````````````````` example
477 ````````````````````````````````
480 Four spaces is too many:
482 ```````````````````````````````` example
487 ````````````````````````````````
490 ```````````````````````````````` example
496 ````````````````````````````````
499 More than three characters may be used:
501 ```````````````````````````````` example
502 _____________________________________
505 ````````````````````````````````
508 Spaces are allowed between the characters:
510 ```````````````````````````````` example
514 ````````````````````````````````
517 ```````````````````````````````` example
521 ````````````````````````````````
524 ```````````````````````````````` example
528 ````````````````````````````````
531 Spaces are allowed at the end:
533 ```````````````````````````````` example
537 ````````````````````````````````
540 However, no other characters may occur in the line:
542 ```````````````````````````````` example
552 ````````````````````````````````
555 It is required that all of the [non-whitespace characters] be the same.
556 So, this is not a thematic break:
558 ```````````````````````````````` example
562 ````````````````````````````````
565 Thematic breaks do not need blank lines before or after:
567 ```````````````````````````````` example
579 ````````````````````````````````
582 Thematic breaks can interrupt a paragraph:
584 ```````````````````````````````` example
592 ````````````````````````````````
595 If a line of dashes that meets the above conditions for being a
596 thematic break could also be interpreted as the underline of a [setext
597 heading], the interpretation as a
598 [setext heading] takes precedence. Thus, for example,
599 this is a setext heading, not a paragraph followed by a thematic break:
601 ```````````````````````````````` example
608 ````````````````````````````````
611 When both a thematic break and a list item are possible
612 interpretations of a line, the thematic break takes precedence:
614 ```````````````````````````````` example
626 ````````````````````````````````
629 If you want a thematic break in a list item, use a different bullet:
631 ```````````````````````````````` example
641 ````````````````````````````````
647 consists of a string of characters, parsed as inline content, between an
648 opening sequence of 1--6 unescaped `#` characters and an optional
649 closing sequence of any number of unescaped `#` characters.
650 The opening sequence of `#` characters must be followed by a
651 [space] or by the end of line. The optional closing sequence of `#`s must be
652 preceded by a [space] and may be followed by spaces only. The opening
653 `#` character may be indented 0-3 spaces. The raw contents of the
654 heading are stripped of leading and trailing spaces before being parsed
655 as inline content. The heading level is equal to the number of `#`
656 characters in the opening sequence.
660 ```````````````````````````````` example
674 ````````````````````````````````
677 More than six `#` characters is not a heading:
679 ```````````````````````````````` example
683 ````````````````````````````````
686 At least one space is required between the `#` characters and the
687 heading's contents, unless the heading is empty. Note that many
688 implementations currently do not require the space. However, the
689 space was required by the
690 [original ATX implementation](http://www.aaronsw.com/2002/atx/atx.py),
691 and it helps prevent things like the following from being parsed as
694 ```````````````````````````````` example
701 ````````````````````````````````
706 ```````````````````````````````` example
710 ````````````````````````````````
713 This is not a heading, because the first `#` is escaped:
715 ```````````````````````````````` example
719 ````````````````````````````````
722 Contents are parsed as inlines:
724 ```````````````````````````````` example
727 <h1>foo <em>bar</em> *baz*</h1>
728 ````````````````````````````````
731 Leading and trailing blanks are ignored in parsing inline content:
733 ```````````````````````````````` example
737 ````````````````````````````````
740 One to three spaces indentation are allowed:
742 ```````````````````````````````` example
750 ````````````````````````````````
753 Four spaces are too much:
755 ```````````````````````````````` example
760 ````````````````````````````````
763 ```````````````````````````````` example
769 ````````````````````````````````
772 A closing sequence of `#` characters is optional:
774 ```````````````````````````````` example
780 ````````````````````````````````
783 It need not be the same length as the opening sequence:
785 ```````````````````````````````` example
786 # foo ##################################
791 ````````````````````````````````
794 Spaces are allowed after the closing sequence:
796 ```````````````````````````````` example
800 ````````````````````````````````
803 A sequence of `#` characters with anything but [spaces] following it
804 is not a closing sequence, but counts as part of the contents of the
807 ```````````````````````````````` example
811 ````````````````````````````````
814 The closing sequence must be preceded by a space:
816 ```````````````````````````````` example
820 ````````````````````````````````
823 Backslash-escaped `#` characters do not count as part
824 of the closing sequence:
826 ```````````````````````````````` example
834 ````````````````````````````````
837 ATX headings need not be separated from surrounding content by blank
838 lines, and they can interrupt paragraphs:
840 ```````````````````````````````` example
848 ````````````````````````````````
851 ```````````````````````````````` example
859 ````````````````````````````````
862 ATX headings can be empty:
864 ```````````````````````````````` example
872 ````````````````````````````````
877 A [setext heading](@) consists of one or more
878 lines of text, each containing at least one [non-whitespace
879 character], with no more than 3 spaces indentation, followed by
880 a [setext heading underline]. The lines of text must be such
881 that, were they not followed by the setext heading underline,
882 they would be interpreted as a paragraph: they cannot be
883 interpretable as a [code fence], [ATX heading][ATX headings],
884 [block quote][block quotes], [thematic break][thematic breaks],
885 [list item][list items], or [HTML block][HTML blocks].
887 A [setext heading underline](@) is a sequence of
888 `=` characters or a sequence of `-` characters, with no more than 3
889 spaces indentation and any number of trailing spaces. If a line
890 containing a single `-` can be interpreted as an
891 empty [list items], it should be interpreted this way
892 and not as a [setext heading underline].
894 The heading is a level 1 heading if `=` characters are used in
895 the [setext heading underline], and a level 2 heading if `-`
896 characters are used. The contents of the heading are the result
897 of parsing the preceding lines of text as CommonMark inline
900 In general, a setext heading need not be preceded or followed by a
901 blank line. However, it cannot interrupt a paragraph, so when a
902 setext heading comes after a paragraph, a blank line is needed between
907 ```````````````````````````````` example
914 <h1>Foo <em>bar</em></h1>
915 <h2>Foo <em>bar</em></h2>
916 ````````````````````````````````
919 The content of the header may span more than one line:
921 ```````````````````````````````` example
928 ````````````````````````````````
931 The underlining can be any length:
933 ```````````````````````````````` example
935 -------------------------
942 ````````````````````````````````
945 The heading content can be indented up to three spaces, and need
946 not line up with the underlining:
948 ```````````````````````````````` example
961 ````````````````````````````````
964 Four spaces indent is too much:
966 ```````````````````````````````` example
979 ````````````````````````````````
982 The setext heading underline can be indented up to three spaces, and
983 may have trailing spaces:
985 ```````````````````````````````` example
990 ````````````````````````````````
993 Four spaces is too much:
995 ```````````````````````````````` example
1001 ````````````````````````````````
1004 The setext heading underline cannot contain internal spaces:
1006 ```````````````````````````````` example
1017 ````````````````````````````````
1020 Trailing spaces in the content line do not cause a line break:
1022 ```````````````````````````````` example
1027 ````````````````````````````````
1030 Nor does a backslash at the end:
1032 ```````````````````````````````` example
1037 ````````````````````````````````
1040 Since indicators of block structure take precedence over
1041 indicators of inline structure, the following are setext headings:
1043 ```````````````````````````````` example
1054 <h2><a title="a lot</h2>
1055 <p>of dashes"/></p>
1056 ````````````````````````````````
1059 The setext heading underline cannot be a [lazy continuation
1060 line] in a list item or block quote:
1062 ```````````````````````````````` example
1070 ````````````````````````````````
1073 ```````````````````````````````` example
1083 ````````````````````````````````
1086 ```````````````````````````````` example
1094 ````````````````````````````````
1097 A blank line is needed between a paragraph and a following
1098 setext heading, since otherwise the paragraph becomes part
1099 of the heading's content:
1101 ```````````````````````````````` example
1108 ````````````````````````````````
1111 But in general a blank line is not required before or after
1114 ```````````````````````````````` example
1126 ````````````````````````````````
1129 Setext headings cannot be empty:
1131 ```````````````````````````````` example
1136 ````````````````````````````````
1139 Setext heading text lines must not be interpretable as block
1140 constructs other than paragraphs. So, the line of dashes
1141 in these examples gets interpreted as a thematic break:
1143 ```````````````````````````````` example
1149 ````````````````````````````````
1152 ```````````````````````````````` example
1160 ````````````````````````````````
1163 ```````````````````````````````` example
1170 ````````````````````````````````
1173 ```````````````````````````````` example
1181 ````````````````````````````````
1184 If you want a heading with `> foo` as its literal text, you can
1185 use backslash escapes:
1187 ```````````````````````````````` example
1192 ````````````````````````````````
1195 **Compatibility note:** Most existing Markdown implementations
1196 do not allow the text of setext headings to span multiple lines.
1197 But there is no consensus about how to interpret
1206 One can find four different interpretations:
1208 1. paragraph "Foo", heading "bar", paragraph "baz"
1209 2. paragraph "Foo bar", thematic break, paragraph "baz"
1210 3. paragraph "Foo bar --- baz"
1211 4. heading "Foo bar", paragraph "baz"
1213 We find interpretation 4 most natural, and interpretation 4
1214 increases the expressive power of CommonMark, by allowing
1215 multiline headings. Authors who want interpretation 1 can
1216 put a blank line after the first paragraph:
1218 ```````````````````````````````` example
1228 ````````````````````````````````
1231 Authors who want interpretation 2 can put blank lines around
1234 ```````````````````````````````` example
1246 ````````````````````````````````
1249 or use a thematic break that cannot count as a [setext heading
1252 ```````````````````````````````` example
1262 ````````````````````````````````
1265 Authors who want interpretation 3 can use backslash escapes:
1267 ```````````````````````````````` example
1277 ````````````````````````````````
1280 ## Indented code blocks
1282 An [indented code block](@) is composed of one or more
1283 [indented chunks] separated by blank lines.
1284 An [indented chunk](@) is a sequence of non-blank lines,
1285 each indented four or more spaces. The contents of the code block are
1286 the literal contents of the lines, including trailing
1287 [line endings], minus four spaces of indentation.
1288 An indented code block has no [info string].
1290 An indented code block cannot interrupt a paragraph, so there must be
1291 a blank line between a paragraph and a following indented code block.
1292 (A blank line is not needed, however, between a code block and a following
1295 ```````````````````````````````` example
1302 ````````````````````````````````
1305 If there is any ambiguity between an interpretation of indentation
1306 as a code block and as indicating that material belongs to a [list
1307 item][list items], the list item interpretation takes precedence:
1309 ```````````````````````````````` example
1320 ````````````````````````````````
1323 ```````````````````````````````` example
1336 ````````````````````````````````
1340 The contents of a code block are literal text, and do not get parsed
1343 ```````````````````````````````` example
1349 <pre><code><a/>
1354 ````````````````````````````````
1357 Here we have three chunks separated by blank lines:
1359 ```````````````````````````````` example
1376 ````````````````````````````````
1379 Any initial spaces beyond four will be included in the content, even
1380 in interior blank lines:
1382 ```````````````````````````````` example
1391 ````````````````````````````````
1394 An indented code block cannot interrupt a paragraph. (This
1395 allows hanging indents and the like.)
1397 ```````````````````````````````` example
1404 ````````````````````````````````
1407 However, any non-blank line with fewer than four leading spaces ends
1408 the code block immediately. So a paragraph may occur immediately
1409 after indented code:
1411 ```````````````````````````````` example
1418 ````````````````````````````````
1421 And indented code can occur immediately before and after other kinds of
1424 ```````````````````````````````` example
1439 ````````````````````````````````
1442 The first line can be indented more than four spaces:
1444 ```````````````````````````````` example
1451 ````````````````````````````````
1454 Blank lines preceding or following an indented code block
1455 are not included in it:
1457 ```````````````````````````````` example
1466 ````````````````````````````````
1469 Trailing spaces are included in the code block's content:
1471 ```````````````````````````````` example
1476 ````````````````````````````````
1480 ## Fenced code blocks
1482 A [code fence](@) is a sequence
1483 of at least three consecutive backtick characters (`` ` ``) or
1484 tildes (`~`). (Tildes and backticks cannot be mixed.)
1485 A [fenced code block](@)
1486 begins with a code fence, indented no more than three spaces.
1488 The line with the opening code fence may optionally contain some text
1489 following the code fence; this is trimmed of leading and trailing
1490 spaces and called the [info string](@).
1491 The [info string] may not contain any backtick
1492 characters. (The reason for this restriction is that otherwise
1493 some inline code would be incorrectly interpreted as the
1494 beginning of a fenced code block.)
1496 The content of the code block consists of all subsequent lines, until
1497 a closing [code fence] of the same type as the code block
1498 began with (backticks or tildes), and with at least as many backticks
1499 or tildes as the opening code fence. If the leading code fence is
1500 indented N spaces, then up to N spaces of indentation are removed from
1501 each line of the content (if present). (If a content line is not
1502 indented, it is preserved unchanged. If it is indented less than N
1503 spaces, all of the indentation is removed.)
1505 The closing code fence may be indented up to three spaces, and may be
1506 followed only by spaces, which are ignored. If the end of the
1507 containing block (or document) is reached and no closing code fence
1508 has been found, the code block contains all of the lines after the
1509 opening code fence until the end of the containing block (or
1510 document). (An alternative spec would require backtracking in the
1511 event that a closing code fence is not found. But this makes parsing
1512 much less efficient, and there seems to be no real down side to the
1513 behavior described here.)
1515 A fenced code block may interrupt a paragraph, and does not require
1516 a blank line either before or after.
1518 The content of a code fence is treated as literal text, not parsed
1519 as inlines. The first word of the [info string] is typically used to
1520 specify the language of the code sample, and rendered in the `class`
1521 attribute of the `code` tag. However, this spec does not mandate any
1522 particular treatment of the [info string].
1524 Here is a simple example with backticks:
1526 ```````````````````````````````` example
1535 ````````````````````````````````
1540 ```````````````````````````````` example
1549 ````````````````````````````````
1552 The closing code fence must use the same character as the opening
1555 ```````````````````````````````` example
1564 ````````````````````````````````
1567 ```````````````````````````````` example
1576 ````````````````````````````````
1579 The closing code fence must be at least as long as the opening fence:
1581 ```````````````````````````````` example
1590 ````````````````````````````````
1593 ```````````````````````````````` example
1602 ````````````````````````````````
1605 Unclosed code blocks are closed by the end of the document
1606 (or the enclosing [block quote][block quotes] or [list item][list items]):
1608 ```````````````````````````````` example
1611 <pre><code></code></pre>
1612 ````````````````````````````````
1615 ```````````````````````````````` example
1625 ````````````````````````````````
1628 ```````````````````````````````` example
1639 ````````````````````````````````
1642 A code block can have all empty lines as its content:
1644 ```````````````````````````````` example
1653 ````````````````````````````````
1656 A code block can be empty:
1658 ```````````````````````````````` example
1662 <pre><code></code></pre>
1663 ````````````````````````````````
1666 Fences can be indented. If the opening fence is indented,
1667 content lines will have equivalent opening indentation removed,
1670 ```````````````````````````````` example
1679 ````````````````````````````````
1682 ```````````````````````````````` example
1693 ````````````````````````````````
1696 ```````````````````````````````` example
1707 ````````````````````````````````
1710 Four spaces indentation produces an indented code block:
1712 ```````````````````````````````` example
1721 ````````````````````````````````
1724 Closing fences may be indented by 0-3 spaces, and their indentation
1725 need not match that of the opening fence:
1727 ```````````````````````````````` example
1734 ````````````````````````````````
1737 ```````````````````````````````` example
1744 ````````````````````````````````
1747 This is not a closing fence, because it is indented 4 spaces:
1749 ```````````````````````````````` example
1757 ````````````````````````````````
1761 Code fences (opening and closing) cannot contain internal spaces:
1763 ```````````````````````````````` example
1769 ````````````````````````````````
1772 ```````````````````````````````` example
1780 ````````````````````````````````
1783 Fenced code blocks can interrupt paragraphs, and can be followed
1784 directly by paragraphs, without a blank line between:
1786 ```````````````````````````````` example
1797 ````````````````````````````````
1800 Other blocks can also occur before and after fenced code blocks
1801 without an intervening blank line:
1803 ```````````````````````````````` example
1815 ````````````````````````````````
1818 An [info string] can be provided after the opening code fence.
1819 Opening and closing spaces will be stripped, and the first word, prefixed
1820 with `language-`, is used as the value for the `class` attribute of the
1821 `code` element within the enclosing `pre` element.
1823 ```````````````````````````````` example
1830 <pre><code class="language-ruby">def foo(x)
1834 ````````````````````````````````
1837 ```````````````````````````````` example
1838 ~~~~ ruby startline=3 $%@#$
1844 <pre><code class="language-ruby">def foo(x)
1848 ````````````````````````````````
1851 ```````````````````````````````` example
1855 <pre><code class="language-;"></code></pre>
1856 ````````````````````````````````
1859 [Info strings] for backtick code blocks cannot contain backticks:
1861 ```````````````````````````````` example
1867 ````````````````````````````````
1870 Closing code fences cannot have [info strings]:
1872 ```````````````````````````````` example
1879 ````````````````````````````````
1885 An [HTML block](@) is a group of lines that is treated
1886 as raw HTML (and will not be escaped in HTML output).
1888 There are seven kinds of [HTML block], which can be defined
1889 by their start and end conditions. The block begins with a line that
1890 meets a [start condition](@) (after up to three spaces
1891 optional indentation). It ends with the first subsequent line that
1892 meets a matching [end condition](@), or the last line of
1893 the document, if no line is encountered that meets the
1894 [end condition]. If the first line meets both the [start condition]
1895 and the [end condition], the block will contain just that line.
1897 1. **Start condition:** line begins with the string `<script`,
1898 `<pre`, or `<style` (case-insensitive), followed by whitespace,
1899 the string `>`, or the end of the line.\
1900 **End condition:** line contains an end tag
1901 `</script>`, `</pre>`, or `</style>` (case-insensitive; it
1902 need not match the start tag).
1904 2. **Start condition:** line begins with the string `<!--`.\
1905 **End condition:** line contains the string `-->`.
1907 3. **Start condition:** line begins with the string `<?`.\
1908 **End condition:** line contains the string `?>`.
1910 4. **Start condition:** line begins with the string `<!`
1911 followed by an uppercase ASCII letter.\
1912 **End condition:** line contains the character `>`.
1914 5. **Start condition:** line begins with the string
1916 **End condition:** line contains the string `]]>`.
1918 6. **Start condition:** line begins the string `<` or `</`
1919 followed by one of the strings (case-insensitive) `address`,
1920 `article`, `aside`, `base`, `basefont`, `blockquote`, `body`,
1921 `caption`, `center`, `col`, `colgroup`, `dd`, `details`, `dialog`,
1922 `dir`, `div`, `dl`, `dt`, `fieldset`, `figcaption`, `figure`,
1923 `footer`, `form`, `frame`, `frameset`, `h1`, `head`, `header`, `hr`,
1924 `html`, `iframe`, `legend`, `li`, `link`, `main`, `menu`, `menuitem`,
1925 `meta`, `nav`, `noframes`, `ol`, `optgroup`, `option`, `p`, `param`,
1926 `section`, `source`, `summary`, `table`, `tbody`, `td`,
1927 `tfoot`, `th`, `thead`, `title`, `tr`, `track`, `ul`, followed
1928 by [whitespace], the end of the line, the string `>`, or
1930 **End condition:** line is followed by a [blank line].
1932 7. **Start condition:** line begins with a complete [open tag]
1933 or [closing tag] (with any [tag name] other than `script`,
1934 `style`, or `pre`) followed only by [whitespace]
1935 or the end of the line.\
1936 **End condition:** line is followed by a [blank line].
1938 All types of [HTML blocks] except type 7 may interrupt
1939 a paragraph. Blocks of type 7 may not interrupt a paragraph.
1940 (This restriction is intended to prevent unwanted interpretation
1941 of long tags inside a wrapped paragraph as starting HTML blocks.)
1943 Some simple examples follow. Here are some basic HTML blocks
1946 ```````````````````````````````` example
1965 ````````````````````````````````
1968 ```````````````````````````````` example
1976 ````````````````````````````````
1979 A block can also start with a closing tag:
1981 ```````````````````````````````` example
1987 ````````````````````````````````
1990 Here we have two HTML blocks with a Markdown paragraph between them:
1992 ```````````````````````````````` example
2000 <p><em>Markdown</em></p>
2002 ````````````````````````````````
2005 The tag on the first line can be partial, as long
2006 as it is split where there would be whitespace:
2008 ```````````````````````````````` example
2016 ````````````````````````````````
2019 ```````````````````````````````` example
2020 <div id="foo" class="bar
2024 <div id="foo" class="bar
2027 ````````````````````````````````
2030 An open tag need not be closed:
2031 ```````````````````````````````` example
2040 ````````````````````````````````
2044 A partial tag need not even be completed (garbage
2047 ```````````````````````````````` example
2053 ````````````````````````````````
2056 ```````````````````````````````` example
2062 ````````````````````````````````
2065 The initial tag doesn't even need to be a valid
2066 tag, as long as it starts like one:
2068 ```````````````````````````````` example
2074 ````````````````````````````````
2077 In type 6 blocks, the initial tag need not be on a line by
2080 ```````````````````````````````` example
2081 <div><a href="bar">*foo*</a></div>
2083 <div><a href="bar">*foo*</a></div>
2084 ````````````````````````````````
2087 ```````````````````````````````` example
2095 ````````````````````````````````
2098 Everything until the next blank line or end of document
2099 gets included in the HTML block. So, in the following
2100 example, what looks like a Markdown code block
2101 is actually part of the HTML block, which continues until a blank
2102 line or the end of the document is reached:
2104 ```````````````````````````````` example
2114 ````````````````````````````````
2117 To start an [HTML block] with a tag that is *not* in the
2118 list of block-level tags in (6), you must put the tag by
2119 itself on the first line (and it must be complete):
2121 ```````````````````````````````` example
2129 ````````````````````````````````
2132 In type 7 blocks, the [tag name] can be anything:
2134 ```````````````````````````````` example
2142 ````````````````````````````````
2145 ```````````````````````````````` example
2153 ````````````````````````````````
2156 ```````````````````````````````` example
2162 ````````````````````````````````
2165 These rules are designed to allow us to work with tags that
2166 can function as either block-level or inline-level tags.
2167 The `<del>` tag is a nice example. We can surround content with
2168 `<del>` tags in three different ways. In this case, we get a raw
2169 HTML block, because the `<del>` tag is on a line by itself:
2171 ```````````````````````````````` example
2179 ````````````````````````````````
2182 In this case, we get a raw HTML block that just includes
2183 the `<del>` tag (because it ends with the following blank
2184 line). So the contents get interpreted as CommonMark:
2186 ```````````````````````````````` example
2196 ````````````````````````````````
2199 Finally, in this case, the `<del>` tags are interpreted
2200 as [raw HTML] *inside* the CommonMark paragraph. (Because
2201 the tag is not on a line by itself, we get inline HTML
2202 rather than an [HTML block].)
2204 ```````````````````````````````` example
2207 <p><del><em>foo</em></del></p>
2208 ````````````````````````````````
2211 HTML tags designed to contain literal content
2212 (`script`, `style`, `pre`), comments, processing instructions,
2213 and declarations are treated somewhat differently.
2214 Instead of ending at the first blank line, these blocks
2215 end at the first line containing a corresponding end tag.
2216 As a result, these blocks can contain blank lines:
2220 ```````````````````````````````` example
2221 <pre language="haskell"><code>
2222 import Text.HTML.TagSoup
2225 main = print $ parseTags tags
2228 <pre language="haskell"><code>
2229 import Text.HTML.TagSoup
2232 main = print $ parseTags tags
2234 ````````````````````````````````
2237 A script tag (type 1):
2239 ```````````````````````````````` example
2240 <script type="text/javascript">
2241 // JavaScript example
2243 document.getElementById("demo").innerHTML = "Hello JavaScript!";
2246 <script type="text/javascript">
2247 // JavaScript example
2249 document.getElementById("demo").innerHTML = "Hello JavaScript!";
2251 ````````````````````````````````
2254 A style tag (type 1):
2256 ```````````````````````````````` example
2270 ````````````````````````````````
2273 If there is no matching end tag, the block will end at the
2274 end of the document (or the enclosing [block quote][block quotes]
2275 or [list item][list items]):
2277 ```````````````````````````````` example
2287 ````````````````````````````````
2290 ```````````````````````````````` example
2301 ````````````````````````````````
2304 ```````````````````````````````` example
2314 ````````````````````````````````
2317 The end tag can occur on the same line as the start tag:
2319 ```````````````````````````````` example
2320 <style>p{color:red;}</style>
2323 <style>p{color:red;}</style>
2325 ````````````````````````````````
2328 ```````````````````````````````` example
2334 ````````````````````````````````
2337 Note that anything on the last line after the
2338 end tag will be included in the [HTML block]:
2340 ```````````````````````````````` example
2348 ````````````````````````````````
2353 ```````````````````````````````` example
2363 ````````````````````````````````
2367 A processing instruction (type 3):
2369 ```````````````````````````````` example
2381 ````````````````````````````````
2384 A declaration (type 4):
2386 ```````````````````````````````` example
2390 ````````````````````````````````
2395 ```````````````````````````````` example
2397 function matchwo(a,b)
2399 if (a < b && a < 0) then {
2410 function matchwo(a,b)
2412 if (a < b && a < 0) then {
2421 ````````````````````````````````
2424 The opening tag can be indented 1-3 spaces, but not 4:
2426 ```````````````````````````````` example
2432 <pre><code><!-- foo -->
2434 ````````````````````````````````
2437 ```````````````````````````````` example
2443 <pre><code><div>
2445 ````````````````````````````````
2448 An HTML block of types 1--6 can interrupt a paragraph, and need not be
2449 preceded by a blank line.
2451 ```````````````````````````````` example
2461 ````````````````````````````````
2464 However, a following blank line is needed, except at the end of
2465 a document, and except for blocks of types 1--5, above:
2467 ```````````````````````````````` example
2477 ````````````````````````````````
2480 HTML blocks of type 7 cannot interrupt a paragraph:
2482 ```````````````````````````````` example
2490 ````````````````````````````````
2493 This rule differs from John Gruber's original Markdown syntax
2494 specification, which says:
2496 > The only restrictions are that block-level HTML elements —
2497 > e.g. `<div>`, `<table>`, `<pre>`, `<p>`, etc. — must be separated from
2498 > surrounding content by blank lines, and the start and end tags of the
2499 > block should not be indented with tabs or spaces.
2501 In some ways Gruber's rule is more restrictive than the one given
2504 - It requires that an HTML block be preceded by a blank line.
2505 - It does not allow the start tag to be indented.
2506 - It requires a matching end tag, which it also does not allow to
2509 Most Markdown implementations (including some of Gruber's own) do not
2510 respect all of these restrictions.
2512 There is one respect, however, in which Gruber's rule is more liberal
2513 than the one given here, since it allows blank lines to occur inside
2514 an HTML block. There are two reasons for disallowing them here.
2515 First, it removes the need to parse balanced tags, which is
2516 expensive and can require backtracking from the end of the document
2517 if no matching end tag is found. Second, it provides a very simple
2518 and flexible way of including Markdown content inside HTML tags:
2519 simply separate the Markdown from the HTML using blank lines:
2523 ```````````````````````````````` example
2531 <p><em>Emphasized</em> text.</p>
2533 ````````````````````````````````
2536 ```````````````````````````````` example
2544 ````````````````````````````````
2547 Some Markdown implementations have adopted a convention of
2548 interpreting content inside tags as text if the open tag has
2549 the attribute `markdown=1`. The rule given above seems a simpler and
2550 more elegant way of achieving the same expressive power, which is also
2551 much simpler to parse.
2553 The main potential drawback is that one can no longer paste HTML
2554 blocks into Markdown documents with 100% reliability. However,
2555 *in most cases* this will work fine, because the blank lines in
2556 HTML are usually followed by HTML block tags. For example:
2558 ```````````````````````````````` example
2578 ````````````````````````````````
2581 There are problems, however, if the inner tags are indented
2582 *and* separated by spaces, as then they will be interpreted as
2583 an indented code block:
2585 ```````````````````````````````` example
2600 <pre><code><td>
2606 ````````````````````````````````
2609 Fortunately, blank lines are usually not necessary and can be
2610 deleted. The exception is inside `<pre>` tags, but as described
2611 above, raw HTML blocks starting with `<pre>` *can* contain blank
2614 ## Link reference definitions
2616 A [link reference definition](@)
2617 consists of a [link label], indented up to three spaces, followed
2618 by a colon (`:`), optional [whitespace] (including up to one
2619 [line ending]), a [link destination],
2620 optional [whitespace] (including up to one
2621 [line ending]), and an optional [link
2622 title], which if it is present must be separated
2623 from the [link destination] by [whitespace].
2624 No further [non-whitespace characters] may occur on the line.
2626 A [link reference definition]
2627 does not correspond to a structural element of a document. Instead, it
2628 defines a label which can be used in [reference links]
2629 and reference-style [images] elsewhere in the document. [Link
2630 reference definitions] can come either before or after the links that use
2633 ```````````````````````````````` example
2638 <p><a href="/url" title="title">foo</a></p>
2639 ````````````````````````````````
2642 ```````````````````````````````` example
2649 <p><a href="/url" title="the title">foo</a></p>
2650 ````````````````````````````````
2653 ```````````````````````````````` example
2654 [Foo*bar\]]:my_(url) 'title (with parens)'
2658 <p><a href="my_(url)" title="title (with parens)">Foo*bar]</a></p>
2659 ````````````````````````````````
2662 ```````````````````````````````` example
2669 <p><a href="my%20url" title="title">Foo bar</a></p>
2670 ````````````````````````````````
2673 The title may extend over multiple lines:
2675 ```````````````````````````````` example
2684 <p><a href="/url" title="
2689 ````````````````````````````````
2692 However, it may not contain a [blank line]:
2694 ```````````````````````````````` example
2701 <p>[foo]: /url 'title</p>
2702 <p>with blank line'</p>
2704 ````````````````````````````````
2707 The title may be omitted:
2709 ```````````````````````````````` example
2715 <p><a href="/url">foo</a></p>
2716 ````````````````````````````````
2719 The link destination may not be omitted:
2721 ```````````````````````````````` example
2728 ````````````````````````````````
2731 Both title and destination can contain backslash escapes
2732 and literal backslashes:
2734 ```````````````````````````````` example
2735 [foo]: /url\bar\*baz "foo\"bar\baz"
2739 <p><a href="/url%5Cbar*baz" title="foo"bar\baz">foo</a></p>
2740 ````````````````````````````````
2743 A link can come before its corresponding definition:
2745 ```````````````````````````````` example
2750 <p><a href="url">foo</a></p>
2751 ````````````````````````````````
2754 If there are several matching definitions, the first one takes
2757 ```````````````````````````````` example
2763 <p><a href="first">foo</a></p>
2764 ````````````````````````````````
2767 As noted in the section on [Links], matching of labels is
2768 case-insensitive (see [matches]).
2770 ```````````````````````````````` example
2775 <p><a href="/url">Foo</a></p>
2776 ````````````````````````````````
2779 ```````````````````````````````` example
2784 <p><a href="/%CF%86%CE%BF%CF%85">αγω</a></p>
2785 ````````````````````````````````
2788 Here is a link reference definition with no corresponding link.
2789 It contributes nothing to the document.
2791 ```````````````````````````````` example
2794 ````````````````````````````````
2797 Here is another one:
2799 ```````````````````````````````` example
2806 ````````````````````````````````
2809 This is not a link reference definition, because there are
2810 [non-whitespace characters] after the title:
2812 ```````````````````````````````` example
2813 [foo]: /url "title" ok
2815 <p>[foo]: /url "title" ok</p>
2816 ````````````````````````````````
2819 This is a link reference definition, but it has no title:
2821 ```````````````````````````````` example
2825 <p>"title" ok</p>
2826 ````````````````````````````````
2829 This is not a link reference definition, because it is indented
2832 ```````````````````````````````` example
2837 <pre><code>[foo]: /url "title"
2840 ````````````````````````````````
2843 This is not a link reference definition, because it occurs inside
2846 ```````````````````````````````` example
2853 <pre><code>[foo]: /url
2856 ````````````````````````````````
2859 A [link reference definition] cannot interrupt a paragraph.
2861 ```````````````````````````````` example
2870 ````````````````````````````````
2873 However, it can directly follow other block elements, such as headings
2874 and thematic breaks, and it need not be followed by a blank line.
2876 ```````````````````````````````` example
2881 <h1><a href="/url">Foo</a></h1>
2885 ````````````````````````````````
2888 Several [link reference definitions]
2889 can occur one after another, without intervening blank lines.
2891 ```````````````````````````````` example
2892 [foo]: /foo-url "foo"
2901 <p><a href="/foo-url" title="foo">foo</a>,
2902 <a href="/bar-url" title="bar">bar</a>,
2903 <a href="/baz-url">baz</a></p>
2904 ````````````````````````````````
2907 [Link reference definitions] can occur
2908 inside block containers, like lists and block quotations. They
2909 affect the entire document, not just the container in which they
2912 ```````````````````````````````` example
2917 <p><a href="/url">foo</a></p>
2920 ````````````````````````````````
2926 A sequence of non-blank lines that cannot be interpreted as other
2927 kinds of blocks forms a [paragraph](@).
2928 The contents of the paragraph are the result of parsing the
2929 paragraph's raw content as inlines. The paragraph's raw content
2930 is formed by concatenating the lines and removing initial and final
2933 A simple example with two paragraphs:
2935 ```````````````````````````````` example
2942 ````````````````````````````````
2945 Paragraphs can contain multiple lines, but no blank lines:
2947 ```````````````````````````````` example
2958 ````````````````````````````````
2961 Multiple blank lines between paragraph have no effect:
2963 ```````````````````````````````` example
2971 ````````````````````````````````
2974 Leading spaces are skipped:
2976 ```````````````````````````````` example
2982 ````````````````````````````````
2985 Lines after the first may be indented any amount, since indented
2986 code blocks cannot interrupt paragraphs.
2988 ```````````````````````````````` example
2996 ````````````````````````````````
2999 However, the first line may be indented at most three spaces,
3000 or an indented code block will be triggered:
3002 ```````````````````````````````` example
3008 ````````````````````````````````
3011 ```````````````````````````````` example
3018 ````````````````````````````````
3021 Final spaces are stripped before inline parsing, so a paragraph
3022 that ends with two or more spaces will not end with a [hard line
3025 ```````````````````````````````` example
3031 ````````````````````````````````
3036 [Blank lines] between block-level elements are ignored,
3037 except for the role they play in determining whether a [list]
3038 is [tight] or [loose].
3040 Blank lines at the beginning and end of the document are also ignored.
3042 ```````````````````````````````` example
3054 ````````````````````````````````
3060 A [container block] is a block that has other
3061 blocks as its contents. There are two basic kinds of container blocks:
3062 [block quotes] and [list items].
3063 [Lists] are meta-containers for [list items].
3065 We define the syntax for container blocks recursively. The general
3066 form of the definition is:
3068 > If X is a sequence of blocks, then the result of
3069 > transforming X in such-and-such a way is a container of type Y
3070 > with these blocks as its content.
3072 So, we explain what counts as a block quote or list item by explaining
3073 how these can be *generated* from their contents. This should suffice
3074 to define the syntax, although it does not give a recipe for *parsing*
3075 these constructions. (A recipe is provided below in the section entitled
3076 [A parsing strategy](#appendix-a-parsing-strategy).)
3080 A [block quote marker](@)
3081 consists of 0-3 spaces of initial indent, plus (a) the character `>` together
3082 with a following space, or (b) a single character `>` not followed by a space.
3084 The following rules define [block quotes]:
3086 1. **Basic case.** If a string of lines *Ls* constitute a sequence
3087 of blocks *Bs*, then the result of prepending a [block quote
3088 marker] to the beginning of each line in *Ls*
3089 is a [block quote](#block-quotes) containing *Bs*.
3091 2. **Laziness.** If a string of lines *Ls* constitute a [block
3092 quote](#block-quotes) with contents *Bs*, then the result of deleting
3093 the initial [block quote marker] from one or
3094 more lines in which the next [non-whitespace character] after the [block
3095 quote marker] is [paragraph continuation
3096 text] is a block quote with *Bs* as its content.
3097 [Paragraph continuation text](@) is text
3098 that will be parsed as part of the content of a paragraph, but does
3099 not occur at the beginning of the paragraph.
3101 3. **Consecutiveness.** A document cannot contain two [block
3102 quotes] in a row unless there is a [blank line] between them.
3104 Nothing else counts as a [block quote](#block-quotes).
3106 Here is a simple example:
3108 ```````````````````````````````` example
3118 ````````````````````````````````
3121 The spaces after the `>` characters can be omitted:
3123 ```````````````````````````````` example
3133 ````````````````````````````````
3136 The `>` characters can be indented 1-3 spaces:
3138 ```````````````````````````````` example
3148 ````````````````````````````````
3151 Four spaces gives us a code block:
3153 ```````````````````````````````` example
3158 <pre><code>> # Foo
3162 ````````````````````````````````
3165 The Laziness clause allows us to omit the `>` before a
3166 paragraph continuation line:
3168 ```````````````````````````````` example
3178 ````````````````````````````````
3181 A block quote can contain some lazy and some non-lazy
3184 ```````````````````````````````` example
3194 ````````````````````````````````
3197 Laziness only applies to lines that would have been continuations of
3198 paragraphs had they been prepended with [block quote markers].
3199 For example, the `> ` cannot be omitted in the second line of
3206 without changing the meaning:
3208 ```````````````````````````````` example
3216 ````````````````````````````````
3219 Similarly, if we omit the `> ` in the second line of
3226 then the block quote ends after the first line:
3228 ```````````````````````````````` example
3240 ````````````````````````````````
3243 For the same reason, we can't omit the `> ` in front of
3244 subsequent lines of an indented or fenced code block:
3246 ```````````````````````````````` example
3256 ````````````````````````````````
3259 ```````````````````````````````` example
3265 <pre><code></code></pre>
3268 <pre><code></code></pre>
3269 ````````````````````````````````
3272 Note that in the following case, we have a paragraph
3275 ```````````````````````````````` example
3283 ````````````````````````````````
3286 To see why, note that in
3293 the `- bar` is indented too far to start a list, and can't
3294 be an indented code block because indented code blocks cannot
3295 interrupt paragraphs, so it is a [paragraph continuation line].
3297 A block quote can be empty:
3299 ```````````````````````````````` example
3304 ````````````````````````````````
3307 ```````````````````````````````` example
3314 ````````````````````````````````
3317 A block quote can have initial or final blank lines:
3319 ```````````````````````````````` example
3327 ````````````````````````````````
3330 A blank line always separates block quotes:
3332 ```````````````````````````````` example
3343 ````````````````````````````````
3346 (Most current Markdown implementations, including John Gruber's
3347 original `Markdown.pl`, will parse this example as a single block quote
3348 with two paragraphs. But it seems better to allow the author to decide
3349 whether two block quotes or one are wanted.)
3351 Consecutiveness means that if we put these block quotes together,
3352 we get a single block quote:
3354 ```````````````````````````````` example
3362 ````````````````````````````````
3365 To get a block quote with two paragraphs, use:
3367 ```````````````````````````````` example
3376 ````````````````````````````````
3379 Block quotes can interrupt paragraphs:
3381 ```````````````````````````````` example
3389 ````````````````````````````````
3392 In general, blank lines are not needed before or after block
3395 ```````````````````````````````` example
3407 ````````````````````````````````
3410 However, because of laziness, a blank line is needed between
3411 a block quote and a following paragraph:
3413 ```````````````````````````````` example
3421 ````````````````````````````````
3424 ```````````````````````````````` example
3433 ````````````````````````````````
3436 ```````````````````````````````` example
3445 ````````````````````````````````
3448 It is a consequence of the Laziness rule that any number
3449 of initial `>`s may be omitted on a continuation line of a
3452 ```````````````````````````````` example
3464 ````````````````````````````````
3467 ```````````````````````````````` example
3481 ````````````````````````````````
3484 When including an indented code block in a block quote,
3485 remember that the [block quote marker] includes
3486 both the `>` and a following space. So *five spaces* are needed after
3489 ```````````````````````````````` example
3501 ````````````````````````````````
3507 A [list marker](@) is a
3508 [bullet list marker] or an [ordered list marker].
3510 A [bullet list marker](@)
3511 is a `-`, `+`, or `*` character.
3513 An [ordered list marker](@)
3514 is a sequence of 1--9 arabic digits (`0-9`), followed by either a
3515 `.` character or a `)` character. (The reason for the length
3516 limit is that with 10 digits we start seeing integer overflows
3519 The following rules define [list items]:
3521 1. **Basic case.** If a sequence of lines *Ls* constitute a sequence of
3522 blocks *Bs* starting with a [non-whitespace character] and not separated
3523 from each other by more than one blank line, and *M* is a list
3524 marker of width *W* followed by 0 < *N* < 5 spaces, then the result
3525 of prepending *M* and the following spaces to the first line of
3526 *Ls*, and indenting subsequent lines of *Ls* by *W + N* spaces, is a
3527 list item with *Bs* as its contents. The type of the list item
3528 (bullet or ordered) is determined by the type of its list marker.
3529 If the list item is ordered, then it is also assigned a start
3530 number, based on the ordered list marker.
3532 For example, let *Ls* be the lines
3534 ```````````````````````````````` example
3544 <pre><code>indented code
3547 <p>A block quote.</p>
3549 ````````````````````````````````
3552 And let *M* be the marker `1.`, and *N* = 2. Then rule #1 says
3553 that the following is an ordered list item with start number 1,
3554 and the same contents as *Ls*:
3556 ```````````````````````````````` example
3568 <pre><code>indented code
3571 <p>A block quote.</p>
3575 ````````````````````````````````
3578 The most important thing to notice is that the position of
3579 the text after the list marker determines how much indentation
3580 is needed in subsequent blocks in the list item. If the list
3581 marker takes up two spaces, and there are three spaces between
3582 the list marker and the next [non-whitespace character], then blocks
3583 must be indented five spaces in order to fall under the list
3586 Here are some examples showing how far content must be indented to be
3587 put under the list item:
3589 ```````````````````````````````` example
3598 ````````````````````````````````
3601 ```````````````````````````````` example
3612 ````````````````````````````````
3615 ```````````````````````````````` example
3625 ````````````````````````````````
3628 ```````````````````````````````` example
3639 ````````````````````````````````
3642 It is tempting to think of this in terms of columns: the continuation
3643 blocks must be indented at least to the column of the first
3644 [non-whitespace character] after the list marker. However, that is not quite right.
3645 The spaces after the list marker determine how much relative indentation
3646 is needed. Which column this indentation reaches will depend on
3647 how the list item is embedded in other constructions, as shown by
3650 ```````````````````````````````` example
3665 ````````````````````````````````
3668 Here `two` occurs in the same column as the list marker `1.`,
3669 but is actually contained in the list item, because there is
3670 sufficient indentation after the last containing blockquote marker.
3672 The converse is also possible. In the following example, the word `two`
3673 occurs far to the right of the initial text of the list item, `one`, but
3674 it is not considered part of the list item, because it is not indented
3675 far enough past the blockquote marker:
3677 ```````````````````````````````` example
3690 ````````````````````````````````
3693 Note that at least one space is needed between the list marker and
3694 any following content, so these are not list items:
3696 ```````````````````````````````` example
3703 ````````````````````````````````
3706 A list item may not contain blocks that are separated by more than
3707 one blank line. Thus, two blank lines will end a list, unless the
3708 two blanks are contained in a [fenced code block].
3710 ```````````````````````````````` example
3767 ````````````````````````````````
3770 A list item may contain any kind of block:
3772 ```````````````````````````````` example
3794 ````````````````````````````````
3797 A list item that contains an indented code block will preserve
3798 empty lines within the code block verbatim, unless there are two
3799 or more empty lines in a row (since as described above, two
3800 blank lines end the list):
3802 ```````````````````````````````` example
3818 ````````````````````````````````
3821 ```````````````````````````````` example
3838 ````````````````````````````````
3841 Note that ordered list start numbers must be nine digits or less:
3843 ```````````````````````````````` example
3846 <ol start="123456789">
3849 ````````````````````````````````
3852 ```````````````````````````````` example
3855 <p>1234567890. not ok</p>
3856 ````````````````````````````````
3859 A start number may begin with 0s:
3861 ```````````````````````````````` example
3867 ````````````````````````````````
3870 ```````````````````````````````` example
3876 ````````````````````````````````
3879 A start number may not be negative:
3881 ```````````````````````````````` example
3885 ````````````````````````````````
3889 2. **Item starting with indented code.** If a sequence of lines *Ls*
3890 constitute a sequence of blocks *Bs* starting with an indented code
3891 block and not separated from each other by more than one blank line,
3892 and *M* is a list marker of width *W* followed by
3893 one space, then the result of prepending *M* and the following
3894 space to the first line of *Ls*, and indenting subsequent lines of
3895 *Ls* by *W + 1* spaces, is a list item with *Bs* as its contents.
3896 If a line is empty, then it need not be indented. The type of the
3897 list item (bullet or ordered) is determined by the type of its list
3898 marker. If the list item is ordered, then it is also assigned a
3899 start number, based on the ordered list marker.
3901 An indented code block will have to be indented four spaces beyond
3902 the edge of the region where text will be included in the list item.
3903 In the following case that is 6 spaces:
3905 ```````````````````````````````` example
3917 ````````````````````````````````
3920 And in this case it is 11 spaces:
3922 ```````````````````````````````` example
3934 ````````````````````````````````
3937 If the *first* block in the list item is an indented code block,
3938 then by rule #2, the contents must be indented *one* space after the
3941 ```````````````````````````````` example
3948 <pre><code>indented code
3951 <pre><code>more code
3953 ````````````````````````````````
3956 ```````````````````````````````` example
3965 <pre><code>indented code
3968 <pre><code>more code
3972 ````````````````````````````````
3975 Note that an additional space indent is interpreted as space
3976 inside the code block:
3978 ```````````````````````````````` example
3987 <pre><code> indented code
3990 <pre><code>more code
3994 ````````````````````````````````
3997 Note that rules #1 and #2 only apply to two cases: (a) cases
3998 in which the lines to be included in a list item begin with a
3999 [non-whitespace character], and (b) cases in which
4000 they begin with an indented code
4001 block. In a case like the following, where the first block begins with
4002 a three-space indent, the rules do not allow us to form a list item by
4003 indenting the whole thing and prepending a list marker:
4005 ```````````````````````````````` example
4012 ````````````````````````````````
4015 ```````````````````````````````` example
4024 ````````````````````````````````
4027 This is not a significant restriction, because when a block begins
4028 with 1-3 spaces indent, the indentation can always be removed without
4029 a change in interpretation, allowing rule #1 to be applied. So, in
4032 ```````````````````````````````` example
4043 ````````````````````````````````
4046 3. **Item starting with a blank line.** If a sequence of lines *Ls*
4047 starting with a single [blank line] constitute a (possibly empty)
4048 sequence of blocks *Bs*, not separated from each other by more than
4049 one blank line, and *M* is a list marker of width *W*,
4050 then the result of prepending *M* to the first line of *Ls*, and
4051 indenting subsequent lines of *Ls* by *W + 1* spaces, is a list
4052 item with *Bs* as its contents.
4053 If a line is empty, then it need not be indented. The type of the
4054 list item (bullet or ordered) is determined by the type of its list
4055 marker. If the list item is ordered, then it is also assigned a
4056 start number, based on the ordered list marker.
4058 Here are some list items that start with a blank line but are not empty:
4060 ```````````````````````````````` example
4081 ````````````````````````````````
4083 When the list item starts with a blank line, the number of spaces
4084 following the list marker doesn't change the required indentation:
4086 ```````````````````````````````` example
4093 ````````````````````````````````
4096 A list item can begin with at most one blank line.
4097 In the following example, `foo` is not part of the list
4100 ```````````````````````````````` example
4109 ````````````````````````````````
4112 Here is an empty bullet list item:
4114 ```````````````````````````````` example
4124 ````````````````````````````````
4127 It does not matter whether there are spaces following the [list marker]:
4129 ```````````````````````````````` example
4139 ````````````````````````````````
4142 Here is an empty ordered list item:
4144 ```````````````````````````````` example
4154 ````````````````````````````````
4157 A list may start or end with an empty list item:
4159 ```````````````````````````````` example
4165 ````````````````````````````````
4169 4. **Indentation.** If a sequence of lines *Ls* constitutes a list item
4170 according to rule #1, #2, or #3, then the result of indenting each line
4171 of *Ls* by 1-3 spaces (the same for each line) also constitutes a
4172 list item with the same contents and attributes. If a line is
4173 empty, then it need not be indented.
4177 ```````````````````````````````` example
4189 <pre><code>indented code
4192 <p>A block quote.</p>
4196 ````````````````````````````````
4199 Indented two spaces:
4201 ```````````````````````````````` example
4213 <pre><code>indented code
4216 <p>A block quote.</p>
4220 ````````````````````````````````
4223 Indented three spaces:
4225 ```````````````````````````````` example
4237 <pre><code>indented code
4240 <p>A block quote.</p>
4244 ````````````````````````````````
4247 Four spaces indent gives a code block:
4249 ```````````````````````````````` example
4257 <pre><code>1. A paragraph
4264 ````````````````````````````````
4268 5. **Laziness.** If a string of lines *Ls* constitute a [list
4269 item](#list-items) with contents *Bs*, then the result of deleting
4270 some or all of the indentation from one or more lines in which the
4271 next [non-whitespace character] after the indentation is
4272 [paragraph continuation text] is a
4273 list item with the same contents and attributes. The unindented
4275 [lazy continuation line](@)s.
4277 Here is an example with [lazy continuation lines]:
4279 ```````````````````````````````` example
4291 <pre><code>indented code
4294 <p>A block quote.</p>
4298 ````````````````````````````````
4301 Indentation can be partially deleted:
4303 ```````````````````````````````` example
4309 with two lines.</li>
4311 ````````````````````````````````
4314 These examples show how laziness can work in nested structures:
4316 ```````````````````````````````` example
4330 ````````````````````````````````
4333 ```````````````````````````````` example
4347 ````````````````````````````````
4351 6. **That's all.** Nothing that is not counted as a list item by rules
4352 #1--5 counts as a [list item](#list-items).
4354 The rules for sublists follow from the general rules above. A sublist
4355 must be indented the same number of spaces a paragraph would need to be
4356 in order to be included in the list item.
4358 So, in this case we need two spaces indent:
4360 ```````````````````````````````` example
4376 ````````````````````````````````
4381 ```````````````````````````````` example
4391 ````````````````````````````````
4394 Here we need four, because the list marker is wider:
4396 ```````````````````````````````` example
4407 ````````````````````````````````
4410 Three is not enough:
4412 ```````````````````````````````` example
4422 ````````````````````````````````
4425 A list may be the first block in a list item:
4427 ```````````````````````````````` example
4437 ````````````````````````````````
4440 ```````````````````````````````` example
4454 ````````````````````````````````
4457 A list item can contain a heading:
4459 ```````````````````````````````` example
4473 ````````````````````````````````
4478 John Gruber's Markdown spec says the following about list items:
4480 1. "List markers typically start at the left margin, but may be indented
4481 by up to three spaces. List markers must be followed by one or more
4484 2. "To make lists look nice, you can wrap items with hanging indents....
4485 But if you don't want to, you don't have to."
4487 3. "List items may consist of multiple paragraphs. Each subsequent
4488 paragraph in a list item must be indented by either 4 spaces or one
4491 4. "It looks nice if you indent every line of the subsequent paragraphs,
4492 but here again, Markdown will allow you to be lazy."
4494 5. "To put a blockquote within a list item, the blockquote's `>`
4495 delimiters need to be indented."
4497 6. "To put a code block within a list item, the code block needs to be
4498 indented twice — 8 spaces or two tabs."
4500 These rules specify that a paragraph under a list item must be indented
4501 four spaces (presumably, from the left margin, rather than the start of
4502 the list marker, but this is not said), and that code under a list item
4503 must be indented eight spaces instead of the usual four. They also say
4504 that a block quote must be indented, but not by how much; however, the
4505 example given has four spaces indentation. Although nothing is said
4506 about other kinds of block-level content, it is certainly reasonable to
4507 infer that *all* block elements under a list item, including other
4508 lists, must be indented four spaces. This principle has been called the
4511 The four-space rule is clear and principled, and if the reference
4512 implementation `Markdown.pl` had followed it, it probably would have
4513 become the standard. However, `Markdown.pl` allowed paragraphs and
4514 sublists to start with only two spaces indentation, at least on the
4515 outer level. Worse, its behavior was inconsistent: a sublist of an
4516 outer-level list needed two spaces indentation, but a sublist of this
4517 sublist needed three spaces. It is not surprising, then, that different
4518 implementations of Markdown have developed very different rules for
4519 determining what comes under a list item. (Pandoc and python-Markdown,
4520 for example, stuck with Gruber's syntax description and the four-space
4521 rule, while discount, redcarpet, marked, PHP Markdown, and others
4522 followed `Markdown.pl`'s behavior more closely.)
4524 Unfortunately, given the divergences between implementations, there
4525 is no way to give a spec for list items that will be guaranteed not
4526 to break any existing documents. However, the spec given here should
4527 correctly handle lists formatted with either the four-space rule or
4528 the more forgiving `Markdown.pl` behavior, provided they are laid out
4529 in a way that is natural for a human to read.
4531 The strategy here is to let the width and indentation of the list marker
4532 determine the indentation necessary for blocks to fall under the list
4533 item, rather than having a fixed and arbitrary number. The writer can
4534 think of the body of the list item as a unit which gets indented to the
4535 right enough to fit the list marker (and any indentation on the list
4536 marker). (The laziness rule, #5, then allows continuation lines to be
4537 unindented if needed.)
4539 This rule is superior, we claim, to any rule requiring a fixed level of
4540 indentation from the margin. The four-space rule is clear but
4541 unnatural. It is quite unintuitive that
4551 should be parsed as two lists with an intervening paragraph,
4563 as the four-space rule demands, rather than a single list,
4577 The choice of four spaces is arbitrary. It can be learned, but it is
4578 not likely to be guessed, and it trips up beginners regularly.
4580 Would it help to adopt a two-space rule? The problem is that such
4581 a rule, together with the rule allowing 1--3 spaces indentation of the
4582 initial list marker, allows text that is indented *less than* the
4583 original list marker to be included in the list item. For example,
4584 `Markdown.pl` parses
4592 as a single list item, with `two` a continuation paragraph:
4624 This is extremely unintuitive.
4626 Rather than requiring a fixed indent from the margin, we could require
4627 a fixed indent (say, two spaces, or even one space) from the list marker (which
4628 may itself be indented). This proposal would remove the last anomaly
4629 discussed. Unlike the spec presented above, it would count the following
4630 as a list item with a subparagraph, even though the paragraph `bar`
4631 is not indented as far as the first paragraph `foo`:
4639 Arguably this text does read like a list item with `bar` as a subparagraph,
4640 which may count in favor of the proposal. However, on this proposal indented
4641 code would have to be indented six spaces after the list marker. And this
4642 would break a lot of existing Markdown, which has the pattern:
4650 where the code is indented eight spaces. The spec above, by contrast, will
4651 parse this text as expected, since the code block's indentation is measured
4652 from the beginning of `foo`.
4654 The one case that needs special treatment is a list item that *starts*
4655 with indented code. How much indentation is required in that case, since
4656 we don't have a "first paragraph" to measure from? Rule #2 simply stipulates
4657 that in such cases, we require one space indentation from the list marker
4658 (and then the normal four spaces for the indented code). This will match the
4659 four-space rule in cases where the list marker plus its initial indentation
4660 takes four spaces (a common case), but diverge in other cases.
4664 A [list](@) is a sequence of one or more
4665 list items [of the same type]. The list items
4666 may be separated by single [blank lines], but two
4667 blank lines end all containing lists.
4669 Two list items are [of the same type](@)
4670 if they begin with a [list marker] of the same type.
4671 Two list markers are of the
4672 same type if (a) they are bullet list markers using the same character
4673 (`-`, `+`, or `*`) or (b) they are ordered list numbers with the same
4674 delimiter (either `.` or `)`).
4676 A list is an [ordered list](@)
4677 if its constituent list items begin with
4678 [ordered list markers], and a
4679 [bullet list](@) if its constituent list
4680 items begin with [bullet list markers].
4682 The [start number](@)
4683 of an [ordered list] is determined by the list number of
4684 its initial list item. The numbers of subsequent list items are
4687 A list is [loose](@) if any of its constituent
4688 list items are separated by blank lines, or if any of its constituent
4689 list items directly contain two block-level elements with a blank line
4690 between them. Otherwise a list is [tight](@).
4691 (The difference in HTML output is that paragraphs in a loose list are
4692 wrapped in `<p>` tags, while paragraphs in a tight list are not.)
4694 Changing the bullet or ordered list delimiter starts a new list:
4696 ```````````````````````````````` example
4708 ````````````````````````````````
4711 ```````````````````````````````` example
4723 ````````````````````````````````
4726 In CommonMark, a list can interrupt a paragraph. That is,
4727 no blank line is needed to separate a paragraph from a following
4730 ```````````````````````````````` example
4740 ````````````````````````````````
4743 `Markdown.pl` does not allow this, through fear of triggering a list
4744 via a numeral in a hard-wrapped line:
4746 ```````````````````````````````` example
4747 The number of windows in my house is
4748 14. The number of doors is 6.
4750 <p>The number of windows in my house is</p>
4752 <li>The number of doors is 6.</li>
4754 ````````````````````````````````
4757 Oddly, `Markdown.pl` *does* allow a blockquote to interrupt a paragraph,
4758 even though the same considerations might apply. We think that the two
4759 cases should be treated the same. Here are two reasons for allowing
4760 lists to interrupt paragraphs:
4762 First, it is natural and not uncommon for people to start lists without
4770 Second, we are attracted to a
4772 > [principle of uniformity](@):
4773 > if a chunk of text has a certain
4774 > meaning, it will continue to have the same meaning when put into a
4775 > container block (such as a list item or blockquote).
4777 (Indeed, the spec for [list items] and [block quotes] presupposes
4778 this principle.) This principle implies that if
4785 is a list item containing a paragraph followed by a nested sublist,
4786 as all Markdown implementations agree it is (though the paragraph
4787 may be rendered without `<p>` tags, since the list is "tight"),
4795 by itself should be a paragraph followed by a nested sublist.
4797 Our adherence to the [principle of uniformity]
4798 thus inclines us to think that there are two coherent packages:
4800 1. Require blank lines before *all* lists and blockquotes,
4801 including lists that occur as sublists inside other list items.
4803 2. Require blank lines in none of these places.
4805 [reStructuredText](http://docutils.sourceforge.net/rst.html) takes
4806 the first approach, for which there is much to be said. But the second
4807 seems more consistent with established practice with Markdown.
4809 There can be blank lines between items, but two blank lines end
4812 ```````````````````````````````` example
4831 ````````````````````````````````
4834 As illustrated above in the section on [list items],
4835 two blank lines between blocks *within* a list item will also end a
4838 ```````````````````````````````` example
4852 ````````````````````````````````
4855 Indeed, two blank lines will end *all* containing lists:
4857 ```````````````````````````````` example
4878 ````````````````````````````````
4881 Thus, two blank lines can be used to separate consecutive lists of
4882 the same type, or to separate a list from an indented code block
4883 that would otherwise be parsed as a subparagraph of the final list
4886 ```````````````````````````````` example
4902 ````````````````````````````````
4905 ```````````````````````````````` example
4926 ````````````````````````````````
4929 List items need not be indented to the same level. The following
4930 list items will be treated as items at the same list level,
4931 since none is indented enough to belong to the previous list
4934 ```````````````````````````````` example
4956 ````````````````````````````````
4959 ```````````````````````````````` example
4977 ````````````````````````````````
4980 This is a loose list, because there is a blank line between
4981 two of the list items:
4983 ```````````````````````````````` example
5000 ````````````````````````````````
5003 So is this, with a empty second item:
5005 ```````````````````````````````` example
5020 ````````````````````````````````
5023 These are loose lists, even though there is no space between the items,
5024 because one of the items directly contains two block-level elements
5025 with a blank line between them:
5027 ```````````````````````````````` example
5046 ````````````````````````````````
5049 ```````````````````````````````` example
5067 ````````````````````````````````
5070 This is a tight list, because the blank lines are in a code block:
5072 ```````````````````````````````` example
5091 ````````````````````````````````
5094 This is a tight list, because the blank line is between two
5095 paragraphs of a sublist. So the sublist is loose while
5096 the outer list is tight:
5098 ```````````````````````````````` example
5116 ````````````````````````````````
5119 This is a tight list, because the blank line is inside the
5122 ```````````````````````````````` example
5136 ````````````````````````````````
5139 This list is tight, because the consecutive block elements
5140 are not separated by blank lines:
5142 ```````````````````````````````` example
5160 ````````````````````````````````
5163 A single-paragraph list is tight:
5165 ```````````````````````````````` example
5171 ````````````````````````````````
5174 ```````````````````````````````` example
5185 ````````````````````````````````
5188 This list is loose, because of the blank line between the
5189 two block elements in the list item:
5191 ```````````````````````````````` example
5205 ````````````````````````````````
5208 Here the outer list is loose, the inner list tight:
5210 ```````````````````````````````` example
5225 ````````````````````````````````
5228 ```````````````````````````````` example
5253 ````````````````````````````````
5258 Inlines are parsed sequentially from the beginning of the character
5259 stream to the end (left to right, in left-to-right languages).
5260 Thus, for example, in
5262 ```````````````````````````````` example
5265 <p><code>hi</code>lo`</p>
5266 ````````````````````````````````
5269 `hi` is parsed as code, leaving the backtick at the end as a literal
5272 ## Backslash escapes
5274 Any ASCII punctuation character may be backslash-escaped:
5276 ```````````````````````````````` example
5277 \!\"\#\$\%\&\'\(\)\*\+\,\-\.\/\:\;\<\=\>\?\@\[\\\]\^\_\`\{\|\}\~
5279 <p>!"#$%&'()*+,-./:;<=>?@[\]^_`{|}~</p>
5280 ````````````````````````````````
5283 Backslashes before other characters are treated as literal
5286 ```````````````````````````````` example
5289 <p>\→\A\a\ \3\φ\«</p>
5290 ````````````````````````````````
5293 Escaped characters are treated as regular characters and do
5294 not have their usual Markdown meanings:
5296 ```````````````````````````````` example
5304 \[foo]: /url "not a reference"
5307 <br/> not a tag
5313 [foo]: /url "not a reference"</p>
5314 ````````````````````````````````
5317 If a backslash is itself escaped, the following character is not:
5319 ```````````````````````````````` example
5322 <p>\<em>emphasis</em></p>
5323 ````````````````````````````````
5326 A backslash at the end of the line is a [hard line break]:
5328 ```````````````````````````````` example
5334 ````````````````````````````````
5337 Backslash escapes do not work in code blocks, code spans, autolinks, or
5340 ```````````````````````````````` example
5343 <p><code>\[\`</code></p>
5344 ````````````````````````````````
5347 ```````````````````````````````` example
5352 ````````````````````````````````
5355 ```````````````````````````````` example
5362 ````````````````````````````````
5365 ```````````````````````````````` example
5366 <http://example.com?find=\*>
5368 <p><a href="http://example.com?find=%5C*">http://example.com?find=\*</a></p>
5369 ````````````````````````````````
5372 ```````````````````````````````` example
5376 ````````````````````````````````
5379 But they work in all other contexts, including URLs and link titles,
5380 link references, and [info strings] in [fenced code blocks]:
5382 ```````````````````````````````` example
5383 [foo](/bar\* "ti\*tle")
5385 <p><a href="/bar*" title="ti*tle">foo</a></p>
5386 ````````````````````````````````
5389 ```````````````````````````````` example
5392 [foo]: /bar\* "ti\*tle"
5394 <p><a href="/bar*" title="ti*tle">foo</a></p>
5395 ````````````````````````````````
5398 ```````````````````````````````` example
5403 <pre><code class="language-foo+bar">foo
5405 ````````````````````````````````
5409 ## Entity and numeric character references
5411 All valid HTML entity references and numeric character
5412 references, except those occuring in code blocks and code spans,
5413 are recognized as such and treated as equivalent to the
5414 corresponding Unicode characters. Conforming CommonMark parsers
5415 need not store information about whether a particular character
5416 was represented in the source using a Unicode character or
5417 an entity reference.
5419 [Entity references](@) consist of `&` + any of the valid
5420 HTML5 entity names + `;`. The
5421 document <https://html.spec.whatwg.org/multipage/entities.json>
5422 is used as an authoritative source for the valid entity
5423 references and their corresponding code points.
5425 ```````````````````````````````` example
5426 & © Æ Ď
5427 ¾ ℋ ⅆ
5428 ∲ ≧̸
5433 ````````````````````````````````
5436 [Decimal numeric character
5438 consist of `&#` + a string of 1--8 arabic digits + `;`. A
5439 numeric character reference is parsed as the corresponding
5440 Unicode character. Invalid Unicode code points will be replaced by
5441 the REPLACEMENT CHARACTER (`U+FFFD`). For security reasons,
5442 the code point `U+0000` will also be replaced by `U+FFFD`.
5444 ```````````````````````````````` example
5445 # Ӓ Ϡ � �
5448 ````````````````````````````````
5451 [Hexadecimal numeric character
5452 references](@) consist of `&#` +
5453 either `X` or `x` + a string of 1-8 hexadecimal digits + `;`.
5454 They too are parsed as the corresponding Unicode character (this
5455 time specified with a hexadecimal numeral instead of decimal).
5457 ```````````````````````````````` example
5458 " ആ ಫ
5461 ````````````````````````````````
5464 Here are some nonentities:
5466 ```````````````````````````````` example
5468 &ThisIsNotDefined; &hi?;
5470 <p>&nbsp &x; &#; &#x;
5471 &ThisIsNotDefined; &hi?;</p>
5472 ````````````````````````````````
5475 Although HTML5 does accept some entity references
5476 without a trailing semicolon (such as `©`), these are not
5477 recognized here, because it makes the grammar too ambiguous:
5479 ```````````````````````````````` example
5483 ````````````````````````````````
5486 Strings that are not on the list of HTML5 named entities are not
5487 recognized as entity references either:
5489 ```````````````````````````````` example
5492 <p>&MadeUpEntity;</p>
5493 ````````````````````````````````
5496 Entity and numeric character references are recognized in any
5497 context besides code spans or code blocks, including
5498 URLs, [link titles], and [fenced code block][] [info strings]:
5500 ```````````````````````````````` example
5501 <a href="öö.html">
5503 <a href="öö.html">
5504 ````````````````````````````````
5507 ```````````````````````````````` example
5508 [foo](/föö "föö")
5510 <p><a href="/f%C3%B6%C3%B6" title="föö">foo</a></p>
5511 ````````````````````````````````
5514 ```````````````````````````````` example
5517 [foo]: /föö "föö"
5519 <p><a href="/f%C3%B6%C3%B6" title="föö">foo</a></p>
5520 ````````````````````````````````
5523 ```````````````````````````````` example
5528 <pre><code class="language-föö">foo
5530 ````````````````````````````````
5533 Entity and numeric character references are treated as literal
5534 text in code spans and code blocks:
5536 ```````````````````````````````` example
5539 <p><code>f&ouml;&ouml;</code></p>
5540 ````````````````````````````````
5543 ```````````````````````````````` example
5546 <pre><code>f&ouml;f&ouml;
5548 ````````````````````````````````
5553 A [backtick string](@)
5554 is a string of one or more backtick characters (`` ` ``) that is neither
5555 preceded nor followed by a backtick.
5557 A [code span](@) begins with a backtick string and ends with
5558 a backtick string of equal length. The contents of the code span are
5559 the characters between the two backtick strings, with leading and
5560 trailing spaces and [line endings] removed, and
5561 [whitespace] collapsed to single spaces.
5563 This is a simple code span:
5565 ```````````````````````````````` example
5568 <p><code>foo</code></p>
5569 ````````````````````````````````
5572 Here two backticks are used, because the code contains a backtick.
5573 This example also illustrates stripping of leading and trailing spaces:
5575 ```````````````````````````````` example
5578 <p><code>foo ` bar</code></p>
5579 ````````````````````````````````
5582 This example shows the motivation for stripping leading and trailing
5585 ```````````````````````````````` example
5588 <p><code>``</code></p>
5589 ````````````````````````````````
5592 [Line endings] are treated like spaces:
5594 ```````````````````````````````` example
5599 <p><code>foo</code></p>
5600 ````````````````````````````````
5603 Interior spaces and [line endings] are collapsed into
5604 single spaces, just as they would be by a browser:
5606 ```````````````````````````````` example
5610 <p><code>foo bar baz</code></p>
5611 ````````````````````````````````
5614 Q: Why not just leave the spaces, since browsers will collapse them
5615 anyway? A: Because we might be targeting a non-HTML format, and we
5616 shouldn't rely on HTML-specific rendering assumptions.
5618 (Existing implementations differ in their treatment of internal
5619 spaces and [line endings]. Some, including `Markdown.pl` and
5620 `showdown`, convert an internal [line ending] into a
5621 `<br />` tag. But this makes things difficult for those who like to
5622 hard-wrap their paragraphs, since a line break in the midst of a code
5623 span will cause an unintended line break in the output. Others just
5624 leave internal spaces as they are, which is fine if only HTML is being
5627 ```````````````````````````````` example
5630 <p><code>foo `` bar</code></p>
5631 ````````````````````````````````
5634 Note that backslash escapes do not work in code spans. All backslashes
5635 are treated literally:
5637 ```````````````````````````````` example
5640 <p><code>foo\</code>bar`</p>
5641 ````````````````````````````````
5644 Backslash escapes are never needed, because one can always choose a
5645 string of *n* backtick characters as delimiters, where the code does
5646 not contain any strings of exactly *n* backtick characters.
5648 Code span backticks have higher precedence than any other inline
5649 constructs except HTML tags and autolinks. Thus, for example, this is
5650 not parsed as emphasized text, since the second `*` is part of a code
5653 ```````````````````````````````` example
5656 <p>*foo<code>*</code></p>
5657 ````````````````````````````````
5660 And this is not parsed as a link:
5662 ```````````````````````````````` example
5663 [not a `link](/foo`)
5665 <p>[not a <code>link](/foo</code>)</p>
5666 ````````````````````````````````
5669 Code spans, HTML tags, and autolinks have the same precedence.
5672 ```````````````````````````````` example
5675 <p><code><a href="</code>">`</p>
5676 ````````````````````````````````
5679 But this is an HTML tag:
5681 ```````````````````````````````` example
5684 <p><a href="`">`</p>
5685 ````````````````````````````````
5690 ```````````````````````````````` example
5691 `<http://foo.bar.`baz>`
5693 <p><code><http://foo.bar.</code>baz>`</p>
5694 ````````````````````````````````
5697 But this is an autolink:
5699 ```````````````````````````````` example
5700 <http://foo.bar.`baz>`
5702 <p><a href="http://foo.bar.%60baz">http://foo.bar.`baz</a>`</p>
5703 ````````````````````````````````
5706 When a backtick string is not closed by a matching backtick string,
5707 we just have literal backticks:
5709 ```````````````````````````````` example
5713 ````````````````````````````````
5716 ```````````````````````````````` example
5720 ````````````````````````````````
5723 ## Emphasis and strong emphasis
5725 John Gruber's original [Markdown syntax
5726 description](http://daringfireball.net/projects/markdown/syntax#em) says:
5728 > Markdown treats asterisks (`*`) and underscores (`_`) as indicators of
5729 > emphasis. Text wrapped with one `*` or `_` will be wrapped with an HTML
5730 > `<em>` tag; double `*`'s or `_`'s will be wrapped with an HTML `<strong>`
5733 This is enough for most users, but these rules leave much undecided,
5734 especially when it comes to nested emphasis. The original
5735 `Markdown.pl` test suite makes it clear that triple `***` and
5736 `___` delimiters can be used for strong emphasis, and most
5737 implementations have also allowed the following patterns:
5741 ***strong** in emph*
5742 ***emph* in strong**
5743 **in strong *emph***
5744 *in emph **strong***
5747 The following patterns are less widely supported, but the intent
5748 is clear and they are useful (especially in contexts like bibliography
5752 *emph *with emph* in it*
5753 **strong **with strong** in it**
5756 Many implementations have also restricted intraword emphasis to
5757 the `*` forms, to avoid unwanted emphasis in words containing
5758 internal underscores. (It is best practice to put these in code
5759 spans, but users often do not.)
5762 internal emphasis: foo*bar*baz
5763 no emphasis: foo_bar_baz
5766 The rules given below capture all of these patterns, while allowing
5767 for efficient parsing strategies that do not backtrack.
5769 First, some definitions. A [delimiter run](@) is either
5770 a sequence of one or more `*` characters that is not preceded or
5771 followed by a `*` character, or a sequence of one or more `_`
5772 characters that is not preceded or followed by a `_` character.
5774 A [left-flanking delimiter run](@) is
5775 a [delimiter run] that is (a) not followed by [Unicode whitespace],
5776 and (b) either not followed by a [punctuation character], or
5777 preceded by [Unicode whitespace] or a [punctuation character].
5778 For purposes of this definition, the beginning and the end of
5779 the line count as Unicode whitespace.
5781 A [right-flanking delimiter run](@) is
5782 a [delimiter run] that is (a) not preceded by [Unicode whitespace],
5783 and (b) either not preceded by a [punctuation character], or
5784 followed by [Unicode whitespace] or a [punctuation character].
5785 For purposes of this definition, the beginning and the end of
5786 the line count as Unicode whitespace.
5788 Here are some examples of delimiter runs.
5790 - left-flanking but not right-flanking:
5799 - right-flanking but not left-flanking:
5808 - Both left and right-flanking:
5815 - Neither left nor right-flanking:
5822 (The idea of distinguishing left-flanking and right-flanking
5823 delimiter runs based on the character before and the character
5824 after comes from Roopesh Chander's
5825 [vfmd](http://www.vfmd.org/vfmd-spec/specification/#procedure-for-identifying-emphasis-tags).
5826 vfmd uses the terminology "emphasis indicator string" instead of "delimiter
5827 run," and its rules for distinguishing left- and right-flanking runs
5828 are a bit more complex than the ones given here.)
5830 The following rules define emphasis and strong emphasis:
5832 1. A single `*` character [can open emphasis](@)
5833 iff (if and only if) it is part of a [left-flanking delimiter run].
5835 2. A single `_` character [can open emphasis] iff
5836 it is part of a [left-flanking delimiter run]
5837 and either (a) not part of a [right-flanking delimiter run]
5838 or (b) part of a [right-flanking delimiter run]
5839 preceded by punctuation.
5841 3. A single `*` character [can close emphasis](@)
5842 iff it is part of a [right-flanking delimiter run].
5844 4. A single `_` character [can close emphasis] iff
5845 it is part of a [right-flanking delimiter run]
5846 and either (a) not part of a [left-flanking delimiter run]
5847 or (b) part of a [left-flanking delimiter run]
5848 followed by punctuation.
5850 5. A double `**` [can open strong emphasis](@)
5851 iff it is part of a [left-flanking delimiter run].
5853 6. A double `__` [can open strong emphasis] iff
5854 it is part of a [left-flanking delimiter run]
5855 and either (a) not part of a [right-flanking delimiter run]
5856 or (b) part of a [right-flanking delimiter run]
5857 preceded by punctuation.
5859 7. A double `**` [can close strong emphasis](@)
5860 iff it is part of a [right-flanking delimiter run].
5862 8. A double `__` [can close strong emphasis]
5863 it is part of a [right-flanking delimiter run]
5864 and either (a) not part of a [left-flanking delimiter run]
5865 or (b) part of a [left-flanking delimiter run]
5866 followed by punctuation.
5868 9. Emphasis begins with a delimiter that [can open emphasis] and ends
5869 with a delimiter that [can close emphasis], and that uses the same
5870 character (`_` or `*`) as the opening delimiter. There must
5871 be a nonempty sequence of inlines between the open delimiter
5872 and the closing delimiter; these form the contents of the emphasis
5875 10. Strong emphasis begins with a delimiter that
5876 [can open strong emphasis] and ends with a delimiter that
5877 [can close strong emphasis], and that uses the same character
5878 (`_` or `*`) as the opening delimiter.
5879 There must be a nonempty sequence of inlines between the open
5880 delimiter and the closing delimiter; these form the contents of
5881 the strong emphasis inline.
5883 11. A literal `*` character cannot occur at the beginning or end of
5884 `*`-delimited emphasis or `**`-delimited strong emphasis, unless it
5885 is backslash-escaped.
5887 12. A literal `_` character cannot occur at the beginning or end of
5888 `_`-delimited emphasis or `__`-delimited strong emphasis, unless it
5889 is backslash-escaped.
5891 Where rules 1--12 above are compatible with multiple parsings,
5892 the following principles resolve ambiguity:
5894 13. The number of nestings should be minimized. Thus, for example,
5895 an interpretation `<strong>...</strong>` is always preferred to
5896 `<em><em>...</em></em>`.
5898 14. An interpretation `<strong><em>...</em></strong>` is always
5899 preferred to `<em><strong>..</strong></em>`.
5901 15. When two potential emphasis or strong emphasis spans overlap,
5902 so that the second begins before the first ends and ends after
5903 the first ends, the first takes precedence. Thus, for example,
5904 `*foo _bar* baz_` is parsed as `<em>foo _bar</em> baz_` rather
5905 than `*foo <em>bar* baz</em>`. For the same reason,
5906 `**foo*bar**` is parsed as `<em><em>foo</em>bar</em>*`
5907 rather than `<strong>foo*bar</strong>`.
5909 16. When there are two potential emphasis or strong emphasis spans
5910 with the same closing delimiter, the shorter one (the one that
5911 opens later) takes precedence. Thus, for example,
5912 `**foo **bar baz**` is parsed as `**foo <strong>bar baz</strong>`
5913 rather than `<strong>foo **bar baz</strong>`.
5915 17. Inline code spans, links, images, and HTML tags group more tightly
5916 than emphasis. So, when there is a choice between an interpretation
5917 that contains one of these elements and one that does not, the
5918 former always wins. Thus, for example, `*[foo*](bar)` is
5919 parsed as `*<a href="bar">foo*</a>` rather than as
5920 `<em>[foo</em>](bar)`.
5922 These rules can be illustrated through a series of examples.
5926 ```````````````````````````````` example
5929 <p><em>foo bar</em></p>
5930 ````````````````````````````````
5933 This is not emphasis, because the opening `*` is followed by
5934 whitespace, and hence not part of a [left-flanking delimiter run]:
5936 ```````````````````````````````` example
5940 ````````````````````````````````
5943 This is not emphasis, because the opening `*` is preceded
5944 by an alphanumeric and followed by punctuation, and hence
5945 not part of a [left-flanking delimiter run]:
5947 ```````````````````````````````` example
5950 <p>a*"foo"*</p>
5951 ````````````````````````````````
5954 Unicode nonbreaking spaces count as whitespace, too:
5956 ```````````````````````````````` example
5960 ````````````````````````````````
5963 Intraword emphasis with `*` is permitted:
5965 ```````````````````````````````` example
5968 <p>foo<em>bar</em></p>
5969 ````````````````````````````````
5972 ```````````````````````````````` example
5975 <p>5<em>6</em>78</p>
5976 ````````````````````````````````
5981 ```````````````````````````````` example
5984 <p><em>foo bar</em></p>
5985 ````````````````````````````````
5988 This is not emphasis, because the opening `_` is followed by
5991 ```````````````````````````````` example
5995 ````````````````````````````````
5998 This is not emphasis, because the opening `_` is preceded
5999 by an alphanumeric and followed by punctuation:
6001 ```````````````````````````````` example
6004 <p>a_"foo"_</p>
6005 ````````````````````````````````
6008 Emphasis with `_` is not allowed inside words:
6010 ```````````````````````````````` example
6014 ````````````````````````````````
6017 ```````````````````````````````` example
6021 ````````````````````````````````
6024 ```````````````````````````````` example
6025 пристаням_стремятся_
6027 <p>пристаням_стремятся_</p>
6028 ````````````````````````````````
6031 Here `_` does not generate emphasis, because the first delimiter run
6032 is right-flanking and the second left-flanking:
6034 ```````````````````````````````` example
6037 <p>aa_"bb"_cc</p>
6038 ````````````````````````````````
6041 This is emphasis, even though the opening delimiter is
6042 both left- and right-flanking, because it is preceded by
6045 ```````````````````````````````` example
6048 <p>foo-<em>(bar)</em></p>
6049 ````````````````````````````````
6054 This is not emphasis, because the closing delimiter does
6055 not match the opening delimiter:
6057 ```````````````````````````````` example
6061 ````````````````````````````````
6064 This is not emphasis, because the closing `*` is preceded by
6067 ```````````````````````````````` example
6071 ````````````````````````````````
6074 A newline also counts as whitespace:
6076 ```````````````````````````````` example
6084 ````````````````````````````````
6087 This is not emphasis, because the second `*` is
6088 preceded by punctuation and followed by an alphanumeric
6089 (hence it is not part of a [right-flanking delimiter run]:
6091 ```````````````````````````````` example
6095 ````````````````````````````````
6098 The point of this restriction is more easily appreciated
6101 ```````````````````````````````` example
6104 <p><em>(<em>foo</em>)</em></p>
6105 ````````````````````````````````
6108 Intraword emphasis with `*` is allowed:
6110 ```````````````````````````````` example
6113 <p><em>foo</em>bar</p>
6114 ````````````````````````````````
6120 This is not emphasis, because the closing `_` is preceded by
6123 ```````````````````````````````` example
6127 ````````````````````````````````
6130 This is not emphasis, because the second `_` is
6131 preceded by punctuation and followed by an alphanumeric:
6133 ```````````````````````````````` example
6137 ````````````````````````````````
6140 This is emphasis within emphasis:
6142 ```````````````````````````````` example
6145 <p><em>(<em>foo</em>)</em></p>
6146 ````````````````````````````````
6149 Intraword emphasis is disallowed for `_`:
6151 ```````````````````````````````` example
6155 ````````````````````````````````
6158 ```````````````````````````````` example
6159 _пристаням_стремятся
6161 <p>_пристаням_стремятся</p>
6162 ````````````````````````````````
6165 ```````````````````````````````` example
6168 <p><em>foo_bar_baz</em></p>
6169 ````````````````````````````````
6172 This is emphasis, even though the closing delimiter is
6173 both left- and right-flanking, because it is followed by
6176 ```````````````````````````````` example
6179 <p><em>(bar)</em>.</p>
6180 ````````````````````````````````
6185 ```````````````````````````````` example
6188 <p><strong>foo bar</strong></p>
6189 ````````````````````````````````
6192 This is not strong emphasis, because the opening delimiter is
6193 followed by whitespace:
6195 ```````````````````````````````` example
6199 ````````````````````````````````
6202 This is not strong emphasis, because the opening `**` is preceded
6203 by an alphanumeric and followed by punctuation, and hence
6204 not part of a [left-flanking delimiter run]:
6206 ```````````````````````````````` example
6209 <p>a**"foo"**</p>
6210 ````````````````````````````````
6213 Intraword strong emphasis with `**` is permitted:
6215 ```````````````````````````````` example
6218 <p>foo<strong>bar</strong></p>
6219 ````````````````````````````````
6224 ```````````````````````````````` example
6227 <p><strong>foo bar</strong></p>
6228 ````````````````````````````````
6231 This is not strong emphasis, because the opening delimiter is
6232 followed by whitespace:
6234 ```````````````````````````````` example
6238 ````````````````````````````````
6241 A newline counts as whitespace:
6242 ```````````````````````````````` example
6248 ````````````````````````````````
6251 This is not strong emphasis, because the opening `__` is preceded
6252 by an alphanumeric and followed by punctuation:
6254 ```````````````````````````````` example
6257 <p>a__"foo"__</p>
6258 ````````````````````````````````
6261 Intraword strong emphasis is forbidden with `__`:
6263 ```````````````````````````````` example
6267 ````````````````````````````````
6270 ```````````````````````````````` example
6274 ````````````````````````````````
6277 ```````````````````````````````` example
6278 пристаням__стремятся__
6280 <p>пристаням__стремятся__</p>
6281 ````````````````````````````````
6284 ```````````````````````````````` example
6285 __foo, __bar__, baz__
6287 <p><strong>foo, <strong>bar</strong>, baz</strong></p>
6288 ````````````````````````````````
6291 This is strong emphasis, even though the opening delimiter is
6292 both left- and right-flanking, because it is preceded by
6295 ```````````````````````````````` example
6298 <p>foo-<strong>(bar)</strong></p>
6299 ````````````````````````````````
6305 This is not strong emphasis, because the closing delimiter is preceded
6308 ```````````````````````````````` example
6312 ````````````````````````````````
6315 (Nor can it be interpreted as an emphasized `*foo bar *`, because of
6318 This is not strong emphasis, because the second `**` is
6319 preceded by punctuation and followed by an alphanumeric:
6321 ```````````````````````````````` example
6325 ````````````````````````````````
6328 The point of this restriction is more easily appreciated
6329 with these examples:
6331 ```````````````````````````````` example
6334 <p><em>(<strong>foo</strong>)</em></p>
6335 ````````````````````````````````
6338 ```````````````````````````````` example
6339 **Gomphocarpus (*Gomphocarpus physocarpus*, syn.
6340 *Asclepias physocarpa*)**
6342 <p><strong>Gomphocarpus (<em>Gomphocarpus physocarpus</em>, syn.
6343 <em>Asclepias physocarpa</em>)</strong></p>
6344 ````````````````````````````````
6347 ```````````````````````````````` example
6350 <p><strong>foo "<em>bar</em>" foo</strong></p>
6351 ````````````````````````````````
6356 ```````````````````````````````` example
6359 <p><strong>foo</strong>bar</p>
6360 ````````````````````````````````
6365 This is not strong emphasis, because the closing delimiter is
6366 preceded by whitespace:
6368 ```````````````````````````````` example
6372 ````````````````````````````````
6375 This is not strong emphasis, because the second `__` is
6376 preceded by punctuation and followed by an alphanumeric:
6378 ```````````````````````````````` example
6382 ````````````````````````````````
6385 The point of this restriction is more easily appreciated
6388 ```````````````````````````````` example
6391 <p><em>(<strong>foo</strong>)</em></p>
6392 ````````````````````````````````
6395 Intraword strong emphasis is forbidden with `__`:
6397 ```````````````````````````````` example
6401 ````````````````````````````````
6404 ```````````````````````````````` example
6405 __пристаням__стремятся
6407 <p>__пристаням__стремятся</p>
6408 ````````````````````````````````
6411 ```````````````````````````````` example
6414 <p><strong>foo__bar__baz</strong></p>
6415 ````````````````````````````````
6418 This is strong emphasis, even though the closing delimiter is
6419 both left- and right-flanking, because it is followed by
6422 ```````````````````````````````` example
6425 <p><strong>(bar)</strong>.</p>
6426 ````````````````````````````````
6431 Any nonempty sequence of inline elements can be the contents of an
6434 ```````````````````````````````` example
6437 <p><em>foo <a href="/url">bar</a></em></p>
6438 ````````````````````````````````
6441 ```````````````````````````````` example
6447 ````````````````````````````````
6450 In particular, emphasis and strong emphasis can be nested
6453 ```````````````````````````````` example
6456 <p><em>foo <strong>bar</strong> baz</em></p>
6457 ````````````````````````````````
6460 ```````````````````````````````` example
6463 <p><em>foo <em>bar</em> baz</em></p>
6464 ````````````````````````````````
6467 ```````````````````````````````` example
6470 <p><em><em>foo</em> bar</em></p>
6471 ````````````````````````````````
6474 ```````````````````````````````` example
6477 <p><em>foo <em>bar</em></em></p>
6478 ````````````````````````````````
6481 ```````````````````````````````` example
6484 <p><em>foo <strong>bar</strong> baz</em></p>
6485 ````````````````````````````````
6490 ```````````````````````````````` example
6493 <p><em>foo</em><em>bar</em><em>baz</em></p>
6494 ````````````````````````````````
6497 The difference is that in the preceding case, the internal delimiters
6498 [can close emphasis], while in the cases with spaces, they cannot.
6500 ```````````````````````````````` example
6503 <p><em><strong>foo</strong> bar</em></p>
6504 ````````````````````````````````
6507 ```````````````````````````````` example
6510 <p><em>foo <strong>bar</strong></em></p>
6511 ````````````````````````````````
6514 Note, however, that in the following case we get no strong
6515 emphasis, because the opening delimiter is closed by the first
6518 ```````````````````````````````` example
6521 <p><em>foo</em><em>bar</em>**</p>
6522 ````````````````````````````````
6526 Indefinite levels of nesting are possible:
6528 ```````````````````````````````` example
6529 *foo **bar *baz* bim** bop*
6531 <p><em>foo <strong>bar <em>baz</em> bim</strong> bop</em></p>
6532 ````````````````````````````````
6535 ```````````````````````````````` example
6538 <p><em>foo <a href="/url"><em>bar</em></a></em></p>
6539 ````````````````````````````````
6542 There can be no empty emphasis or strong emphasis:
6544 ```````````````````````````````` example
6545 ** is not an empty emphasis
6547 <p>** is not an empty emphasis</p>
6548 ````````````````````````````````
6551 ```````````````````````````````` example
6552 **** is not an empty strong emphasis
6554 <p>**** is not an empty strong emphasis</p>
6555 ````````````````````````````````
6561 Any nonempty sequence of inline elements can be the contents of an
6562 strongly emphasized span.
6564 ```````````````````````````````` example
6567 <p><strong>foo <a href="/url">bar</a></strong></p>
6568 ````````````````````````````````
6571 ```````````````````````````````` example
6577 ````````````````````````````````
6580 In particular, emphasis and strong emphasis can be nested
6581 inside strong emphasis:
6583 ```````````````````````````````` example
6586 <p><strong>foo <em>bar</em> baz</strong></p>
6587 ````````````````````````````````
6590 ```````````````````````````````` example
6593 <p><strong>foo <strong>bar</strong> baz</strong></p>
6594 ````````````````````````````````
6597 ```````````````````````````````` example
6600 <p><strong><strong>foo</strong> bar</strong></p>
6601 ````````````````````````````````
6604 ```````````````````````````````` example
6607 <p><strong>foo <strong>bar</strong></strong></p>
6608 ````````````````````````````````
6611 ```````````````````````````````` example
6614 <p><strong>foo <em>bar</em> baz</strong></p>
6615 ````````````````````````````````
6620 ```````````````````````````````` example
6623 <p><em><em>foo</em>bar</em>baz**</p>
6624 ````````````````````````````````
6627 The difference is that in the preceding case, the internal delimiters
6628 [can close emphasis], while in the cases with spaces, they cannot.
6630 ```````````````````````````````` example
6633 <p><strong><em>foo</em> bar</strong></p>
6634 ````````````````````````````````
6637 ```````````````````````````````` example
6640 <p><strong>foo <em>bar</em></strong></p>
6641 ````````````````````````````````
6644 Indefinite levels of nesting are possible:
6646 ```````````````````````````````` example
6650 <p><strong>foo <em>bar <strong>baz</strong>
6651 bim</em> bop</strong></p>
6652 ````````````````````````````````
6655 ```````````````````````````````` example
6656 **foo [*bar*](/url)**
6658 <p><strong>foo <a href="/url"><em>bar</em></a></strong></p>
6659 ````````````````````````````````
6662 There can be no empty emphasis or strong emphasis:
6664 ```````````````````````````````` example
6665 __ is not an empty emphasis
6667 <p>__ is not an empty emphasis</p>
6668 ````````````````````````````````
6671 ```````````````````````````````` example
6672 ____ is not an empty strong emphasis
6674 <p>____ is not an empty strong emphasis</p>
6675 ````````````````````````````````
6681 ```````````````````````````````` example
6685 ````````````````````````````````
6688 ```````````````````````````````` example
6691 <p>foo <em>*</em></p>
6692 ````````````````````````````````
6695 ```````````````````````````````` example
6698 <p>foo <em>_</em></p>
6699 ````````````````````````````````
6702 ```````````````````````````````` example
6706 ````````````````````````````````
6709 ```````````````````````````````` example
6712 <p>foo <strong>*</strong></p>
6713 ````````````````````````````````
6716 ```````````````````````````````` example
6719 <p>foo <strong>_</strong></p>
6720 ````````````````````````````````
6723 Note that when delimiters do not match evenly, Rule 11 determines
6724 that the excess literal `*` characters will appear outside of the
6725 emphasis, rather than inside it:
6727 ```````````````````````````````` example
6730 <p>*<em>foo</em></p>
6731 ````````````````````````````````
6734 ```````````````````````````````` example
6737 <p><em>foo</em>*</p>
6738 ````````````````````````````````
6741 ```````````````````````````````` example
6744 <p>*<strong>foo</strong></p>
6745 ````````````````````````````````
6748 ```````````````````````````````` example
6751 <p>***<em>foo</em></p>
6752 ````````````````````````````````
6755 ```````````````````````````````` example
6758 <p><strong>foo</strong>*</p>
6759 ````````````````````````````````
6762 ```````````````````````````````` example
6765 <p><em>foo</em>***</p>
6766 ````````````````````````````````
6772 ```````````````````````````````` example
6776 ````````````````````````````````
6779 ```````````````````````````````` example
6782 <p>foo <em>_</em></p>
6783 ````````````````````````````````
6786 ```````````````````````````````` example
6789 <p>foo <em>*</em></p>
6790 ````````````````````````````````
6793 ```````````````````````````````` example
6797 ````````````````````````````````
6800 ```````````````````````````````` example
6803 <p>foo <strong>_</strong></p>
6804 ````````````````````````````````
6807 ```````````````````````````````` example
6810 <p>foo <strong>*</strong></p>
6811 ````````````````````````````````
6814 ```````````````````````````````` example
6817 <p>_<em>foo</em></p>
6818 ````````````````````````````````
6821 Note that when delimiters do not match evenly, Rule 12 determines
6822 that the excess literal `_` characters will appear outside of the
6823 emphasis, rather than inside it:
6825 ```````````````````````````````` example
6828 <p><em>foo</em>_</p>
6829 ````````````````````````````````
6832 ```````````````````````````````` example
6835 <p>_<strong>foo</strong></p>
6836 ````````````````````````````````
6839 ```````````````````````````````` example
6842 <p>___<em>foo</em></p>
6843 ````````````````````````````````
6846 ```````````````````````````````` example
6849 <p><strong>foo</strong>_</p>
6850 ````````````````````````````````
6853 ```````````````````````````````` example
6856 <p><em>foo</em>___</p>
6857 ````````````````````````````````
6860 Rule 13 implies that if you want emphasis nested directly inside
6861 emphasis, you must use different delimiters:
6863 ```````````````````````````````` example
6866 <p><strong>foo</strong></p>
6867 ````````````````````````````````
6870 ```````````````````````````````` example
6873 <p><em><em>foo</em></em></p>
6874 ````````````````````````````````
6877 ```````````````````````````````` example
6880 <p><strong>foo</strong></p>
6881 ````````````````````````````````
6884 ```````````````````````````````` example
6887 <p><em><em>foo</em></em></p>
6888 ````````````````````````````````
6891 However, strong emphasis within strong emphasis is possible without
6892 switching delimiters:
6894 ```````````````````````````````` example
6897 <p><strong><strong>foo</strong></strong></p>
6898 ````````````````````````````````
6901 ```````````````````````````````` example
6904 <p><strong><strong>foo</strong></strong></p>
6905 ````````````````````````````````
6909 Rule 13 can be applied to arbitrarily long sequences of
6912 ```````````````````````````````` example
6915 <p><strong><strong><strong>foo</strong></strong></strong></p>
6916 ````````````````````````````````
6921 ```````````````````````````````` example
6924 <p><strong><em>foo</em></strong></p>
6925 ````````````````````````````````
6928 ```````````````````````````````` example
6931 <p><strong><strong><em>foo</em></strong></strong></p>
6932 ````````````````````````````````
6937 ```````````````````````````````` example
6940 <p><em>foo _bar</em> baz_</p>
6941 ````````````````````````````````
6944 ```````````````````````````````` example
6947 <p><em><em>foo</em>bar</em>*</p>
6948 ````````````````````````````````
6951 ```````````````````````````````` example
6952 *foo __bar *baz bim__ bam*
6954 <p><em>foo <strong>bar *baz bim</strong> bam</em></p>
6955 ````````````````````````````````
6960 ```````````````````````````````` example
6963 <p>**foo <strong>bar baz</strong></p>
6964 ````````````````````````````````
6967 ```````````````````````````````` example
6970 <p>*foo <em>bar baz</em></p>
6971 ````````````````````````````````
6976 ```````````````````````````````` example
6979 <p>*<a href="/url">bar*</a></p>
6980 ````````````````````````````````
6983 ```````````````````````````````` example
6986 <p>_foo <a href="/url">bar_</a></p>
6987 ````````````````````````````````
6990 ```````````````````````````````` example
6991 *<img src="foo" title="*"/>
6993 <p>*<img src="foo" title="*"/></p>
6994 ````````````````````````````````
6997 ```````````````````````````````` example
7000 <p>**<a href="**"></p>
7001 ````````````````````````````````
7004 ```````````````````````````````` example
7007 <p>__<a href="__"></p>
7008 ````````````````````````````````
7011 ```````````````````````````````` example
7014 <p><em>a <code>*</code></em></p>
7015 ````````````````````````````````
7018 ```````````````````````````````` example
7021 <p><em>a <code>_</code></em></p>
7022 ````````````````````````````````
7025 ```````````````````````````````` example
7026 **a<http://foo.bar/?q=**>
7028 <p>**a<a href="http://foo.bar/?q=**">http://foo.bar/?q=**</a></p>
7029 ````````````````````````````````
7032 ```````````````````````````````` example
7033 __a<http://foo.bar/?q=__>
7035 <p>__a<a href="http://foo.bar/?q=__">http://foo.bar/?q=__</a></p>
7036 ````````````````````````````````
7042 A link contains [link text] (the visible text), a [link destination]
7043 (the URI that is the link destination), and optionally a [link title].
7044 There are two basic kinds of links in Markdown. In [inline links] the
7045 destination and title are given immediately after the link text. In
7046 [reference links] the destination and title are defined elsewhere in
7049 A [link text](@) consists of a sequence of zero or more
7050 inline elements enclosed by square brackets (`[` and `]`). The
7051 following rules apply:
7053 - Links may not contain other links, at any level of nesting. If
7054 multiple otherwise valid link definitions appear nested inside each
7055 other, the inner-most definition is used.
7057 - Brackets are allowed in the [link text] only if (a) they
7058 are backslash-escaped or (b) they appear as a matched pair of brackets,
7059 with an open bracket `[`, a sequence of zero or more inlines, and
7060 a close bracket `]`.
7062 - Backtick [code spans], [autolinks], and raw [HTML tags] bind more tightly
7063 than the brackets in link text. Thus, for example,
7064 `` [foo`]` `` could not be a link text, since the second `]`
7065 is part of a code span.
7067 - The brackets in link text bind more tightly than markers for
7068 [emphasis and strong emphasis]. Thus, for example, `*[foo*](url)` is a link.
7070 A [link destination](@) consists of either
7072 - a sequence of zero or more characters between an opening `<` and a
7073 closing `>` that contains no spaces, line breaks, or unescaped
7074 `<` or `>` characters, or
7076 - a nonempty sequence of characters that does not include
7077 ASCII space or control characters, and includes parentheses
7078 only if (a) they are backslash-escaped or (b) they are part of
7079 a balanced pair of unescaped parentheses that is not itself
7080 inside a balanced pair of unescaped parentheses.
7082 A [link title](@) consists of either
7084 - a sequence of zero or more characters between straight double-quote
7085 characters (`"`), including a `"` character only if it is
7086 backslash-escaped, or
7088 - a sequence of zero or more characters between straight single-quote
7089 characters (`'`), including a `'` character only if it is
7090 backslash-escaped, or
7092 - a sequence of zero or more characters between matching parentheses
7093 (`(...)`), including a `)` character only if it is backslash-escaped.
7095 Although [link titles] may span multiple lines, they may not contain
7098 An [inline link](@) consists of a [link text] followed immediately
7099 by a left parenthesis `(`, optional [whitespace], an optional
7100 [link destination], an optional [link title] separated from the link
7101 destination by [whitespace], optional [whitespace], and a right
7102 parenthesis `)`. The link's text consists of the inlines contained
7103 in the [link text] (excluding the enclosing square brackets).
7104 The link's URI consists of the link destination, excluding enclosing
7105 `<...>` if present, with backslash-escapes in effect as described
7106 above. The link's title consists of the link title, excluding its
7107 enclosing delimiters, with backslash-escapes in effect as described
7110 Here is a simple inline link:
7112 ```````````````````````````````` example
7113 [link](/uri "title")
7115 <p><a href="/uri" title="title">link</a></p>
7116 ````````````````````````````````
7119 The title may be omitted:
7121 ```````````````````````````````` example
7124 <p><a href="/uri">link</a></p>
7125 ````````````````````````````````
7128 Both the title and the destination may be omitted:
7130 ```````````````````````````````` example
7133 <p><a href="">link</a></p>
7134 ````````````````````````````````
7137 ```````````````````````````````` example
7140 <p><a href="">link</a></p>
7141 ````````````````````````````````
7144 The destination cannot contain spaces or line breaks,
7145 even if enclosed in pointy brackets:
7147 ```````````````````````````````` example
7150 <p>[link](/my uri)</p>
7151 ````````````````````````````````
7154 ```````````````````````````````` example
7157 <p>[link](</my uri>)</p>
7158 ````````````````````````````````
7161 ```````````````````````````````` example
7167 ````````````````````````````````
7170 ```````````````````````````````` example
7176 ````````````````````````````````
7178 Parentheses inside the link destination may be escaped:
7180 ```````````````````````````````` example
7183 <p><a href="(foo)">link</a></p>
7184 ````````````````````````````````
7186 One level of balanced parentheses is allowed without escaping:
7188 ```````````````````````````````` example
7189 [link]((foo)and(bar))
7191 <p><a href="(foo)and(bar)">link</a></p>
7192 ````````````````````````````````
7194 However, if you have parentheses within parentheses, you need to escape
7195 or use the `<...>` form:
7197 ```````````````````````````````` example
7198 [link](foo(and(bar)))
7200 <p>[link](foo(and(bar)))</p>
7201 ````````````````````````````````
7204 ```````````````````````````````` example
7205 [link](foo(and\(bar\)))
7207 <p><a href="foo(and(bar))">link</a></p>
7208 ````````````````````````````````
7211 ```````````````````````````````` example
7212 [link](<foo(and(bar))>)
7214 <p><a href="foo(and(bar))">link</a></p>
7215 ````````````````````````````````
7218 Parentheses and other symbols can also be escaped, as usual
7221 ```````````````````````````````` example
7224 <p><a href="foo):">link</a></p>
7225 ````````````````````````````````
7228 A link can contain fragment identifiers and queries:
7230 ```````````````````````````````` example
7233 [link](http://example.com#fragment)
7235 [link](http://example.com?foo=3#frag)
7237 <p><a href="#fragment">link</a></p>
7238 <p><a href="http://example.com#fragment">link</a></p>
7239 <p><a href="http://example.com?foo=3#frag">link</a></p>
7240 ````````````````````````````````
7243 Note that a backslash before a non-escapable character is
7246 ```````````````````````````````` example
7249 <p><a href="foo%5Cbar">link</a></p>
7250 ````````````````````````````````
7253 URL-escaping should be left alone inside the destination, as all
7254 URL-escaped characters are also valid URL characters. Entity and
7255 numerical character references in the destination will be parsed
7256 into the corresponding Unicode code points, as usual. These may
7257 be optionally URL-escaped when written as HTML, but this spec
7258 does not enforce any particular policy for rendering URLs in
7259 HTML or other formats. Renderers may make different decisions
7260 about how to escape or normalize URLs in the output.
7262 ```````````````````````````````` example
7263 [link](foo%20bä)
7265 <p><a href="foo%20b%C3%A4">link</a></p>
7266 ````````````````````````````````
7269 Note that, because titles can often be parsed as destinations,
7270 if you try to omit the destination and keep the title, you'll
7271 get unexpected results:
7273 ```````````````````````````````` example
7276 <p><a href="%22title%22">link</a></p>
7277 ````````````````````````````````
7280 Titles may be in single quotes, double quotes, or parentheses:
7282 ```````````````````````````````` example
7283 [link](/url "title")
7284 [link](/url 'title')
7285 [link](/url (title))
7287 <p><a href="/url" title="title">link</a>
7288 <a href="/url" title="title">link</a>
7289 <a href="/url" title="title">link</a></p>
7290 ````````````````````````````````
7293 Backslash escapes and entity and numeric character references
7294 may be used in titles:
7296 ```````````````````````````````` example
7297 [link](/url "title \""")
7299 <p><a href="/url" title="title """>link</a></p>
7300 ````````````````````````````````
7303 Nested balanced quotes are not allowed without escaping:
7305 ```````````````````````````````` example
7306 [link](/url "title "and" title")
7308 <p>[link](/url "title "and" title")</p>
7309 ````````````````````````````````
7312 But it is easy to work around this by using a different quote type:
7314 ```````````````````````````````` example
7315 [link](/url 'title "and" title')
7317 <p><a href="/url" title="title "and" title">link</a></p>
7318 ````````````````````````````````
7321 (Note: `Markdown.pl` did allow double quotes inside a double-quoted
7322 title, and its test suite included a test demonstrating this.
7323 But it is hard to see a good rationale for the extra complexity this
7324 brings, since there are already many ways---backslash escaping,
7325 entity and numeric character references, or using a different
7326 quote type for the enclosing title---to write titles containing
7327 double quotes. `Markdown.pl`'s handling of titles has a number
7328 of other strange features. For example, it allows single-quoted
7329 titles in inline links, but not reference links. And, in
7330 reference links but not inline links, it allows a title to begin
7331 with `"` and end with `)`. `Markdown.pl` 1.0.1 even allows
7332 titles with no closing quotation mark, though 1.0.2b8 does not.
7333 It seems preferable to adopt a simple, rational rule that works
7334 the same way in inline links and link reference definitions.)
7336 [Whitespace] is allowed around the destination and title:
7338 ```````````````````````````````` example
7342 <p><a href="/uri" title="title">link</a></p>
7343 ````````````````````````````````
7346 But it is not allowed between the link text and the
7347 following parenthesis:
7349 ```````````````````````````````` example
7352 <p>[link] (/uri)</p>
7353 ````````````````````````````````
7356 The link text may contain balanced brackets, but not unbalanced ones,
7357 unless they are escaped:
7359 ```````````````````````````````` example
7360 [link [foo [bar]]](/uri)
7362 <p><a href="/uri">link [foo [bar]]</a></p>
7363 ````````````````````````````````
7366 ```````````````````````````````` example
7369 <p>[link] bar](/uri)</p>
7370 ````````````````````````````````
7373 ```````````````````````````````` example
7376 <p>[link <a href="/uri">bar</a></p>
7377 ````````````````````````````````
7380 ```````````````````````````````` example
7383 <p><a href="/uri">link [bar</a></p>
7384 ````````````````````````````````
7387 The link text may contain inline content:
7389 ```````````````````````````````` example
7390 [link *foo **bar** `#`*](/uri)
7392 <p><a href="/uri">link <em>foo <strong>bar</strong> <code>#</code></em></a></p>
7393 ````````````````````````````````
7396 ```````````````````````````````` example
7397 [![moon](moon.jpg)](/uri)
7399 <p><a href="/uri"><img src="moon.jpg" alt="moon" /></a></p>
7400 ````````````````````````````````
7403 However, links may not contain other links, at any level of nesting.
7405 ```````````````````````````````` example
7406 [foo [bar](/uri)](/uri)
7408 <p>[foo <a href="/uri">bar</a>](/uri)</p>
7409 ````````````````````````````````
7412 ```````````````````````````````` example
7413 [foo *[bar [baz](/uri)](/uri)*](/uri)
7415 <p>[foo <em>[bar <a href="/uri">baz</a>](/uri)</em>](/uri)</p>
7416 ````````````````````````````````
7419 ```````````````````````````````` example
7420 ![[[foo](uri1)](uri2)](uri3)
7422 <p><img src="uri3" alt="[foo](uri2)" /></p>
7423 ````````````````````````````````
7426 These cases illustrate the precedence of link text grouping over
7429 ```````````````````````````````` example
7432 <p>*<a href="/uri">foo*</a></p>
7433 ````````````````````````````````
7436 ```````````````````````````````` example
7439 <p><a href="baz*">foo *bar</a></p>
7440 ````````````````````````````````
7443 Note that brackets that *aren't* part of links do not take
7446 ```````````````````````````````` example
7449 <p><em>foo [bar</em> baz]</p>
7450 ````````````````````````````````
7453 These cases illustrate the precedence of HTML tags, code spans,
7454 and autolinks over link grouping:
7456 ```````````````````````````````` example
7457 [foo <bar attr="](baz)">
7459 <p>[foo <bar attr="](baz)"></p>
7460 ````````````````````````````````
7463 ```````````````````````````````` example
7466 <p>[foo<code>](/uri)</code></p>
7467 ````````````````````````````````
7470 ```````````````````````````````` example
7471 [foo<http://example.com/?search=](uri)>
7473 <p>[foo<a href="http://example.com/?search=%5D(uri)">http://example.com/?search=](uri)</a></p>
7474 ````````````````````````````````
7477 There are three kinds of [reference link](@)s:
7478 [full](#full-reference-link), [collapsed](#collapsed-reference-link),
7479 and [shortcut](#shortcut-reference-link).
7481 A [full reference link](@)
7482 consists of a [link text] immediately followed by a [link label]
7483 that [matches] a [link reference definition] elsewhere in the document.
7485 A [link label](@) begins with a left bracket (`[`) and ends
7486 with the first right bracket (`]`) that is not backslash-escaped.
7487 Between these brackets there must be at least one [non-whitespace character].
7488 Unescaped square bracket characters are not allowed in
7489 [link labels]. A link label can have at most 999
7490 characters inside the square brackets.
7492 One label [matches](@)
7493 another just in case their normalized forms are equal. To normalize a
7494 label, perform the *Unicode case fold* and collapse consecutive internal
7495 [whitespace] to a single space. If there are multiple
7496 matching reference link definitions, the one that comes first in the
7497 document is used. (It is desirable in such cases to emit a warning.)
7499 The contents of the first link label are parsed as inlines, which are
7500 used as the link's text. The link's URI and title are provided by the
7501 matching [link reference definition].
7503 Here is a simple example:
7505 ```````````````````````````````` example
7510 <p><a href="/url" title="title">foo</a></p>
7511 ````````````````````````````````
7514 The rules for the [link text] are the same as with
7515 [inline links]. Thus:
7517 The link text may contain balanced brackets, but not unbalanced ones,
7518 unless they are escaped:
7520 ```````````````````````````````` example
7521 [link [foo [bar]]][ref]
7525 <p><a href="/uri">link [foo [bar]]</a></p>
7526 ````````````````````````````````
7529 ```````````````````````````````` example
7534 <p><a href="/uri">link [bar</a></p>
7535 ````````````````````````````````
7538 The link text may contain inline content:
7540 ```````````````````````````````` example
7541 [link *foo **bar** `#`*][ref]
7545 <p><a href="/uri">link <em>foo <strong>bar</strong> <code>#</code></em></a></p>
7546 ````````````````````````````````
7549 ```````````````````````````````` example
7550 [![moon](moon.jpg)][ref]
7554 <p><a href="/uri"><img src="moon.jpg" alt="moon" /></a></p>
7555 ````````````````````````````````
7558 However, links may not contain other links, at any level of nesting.
7560 ```````````````````````````````` example
7561 [foo [bar](/uri)][ref]
7565 <p>[foo <a href="/uri">bar</a>]<a href="/uri">ref</a></p>
7566 ````````````````````````````````
7569 ```````````````````````````````` example
7570 [foo *bar [baz][ref]*][ref]
7574 <p>[foo <em>bar <a href="/uri">baz</a></em>]<a href="/uri">ref</a></p>
7575 ````````````````````````````````
7578 (In the examples above, we have two [shortcut reference links]
7579 instead of one [full reference link].)
7581 The following cases illustrate the precedence of link text grouping over
7584 ```````````````````````````````` example
7589 <p>*<a href="/uri">foo*</a></p>
7590 ````````````````````````````````
7593 ```````````````````````````````` example
7598 <p><a href="/uri">foo *bar</a></p>
7599 ````````````````````````````````
7602 These cases illustrate the precedence of HTML tags, code spans,
7603 and autolinks over link grouping:
7605 ```````````````````````````````` example
7606 [foo <bar attr="][ref]">
7610 <p>[foo <bar attr="][ref]"></p>
7611 ````````````````````````````````
7614 ```````````````````````````````` example
7619 <p>[foo<code>][ref]</code></p>
7620 ````````````````````````````````
7623 ```````````````````````````````` example
7624 [foo<http://example.com/?search=][ref]>
7628 <p>[foo<a href="http://example.com/?search=%5D%5Bref%5D">http://example.com/?search=][ref]</a></p>
7629 ````````````````````````````````
7632 Matching is case-insensitive:
7634 ```````````````````````````````` example
7639 <p><a href="/url" title="title">foo</a></p>
7640 ````````````````````````````````
7643 Unicode case fold is used:
7645 ```````````````````````````````` example
7646 [Толпой][Толпой] is a Russian word.
7650 <p><a href="/url">Толпой</a> is a Russian word.</p>
7651 ````````````````````````````````
7654 Consecutive internal [whitespace] is treated as one space for
7655 purposes of determining matching:
7657 ```````````````````````````````` example
7663 <p><a href="/url">Baz</a></p>
7664 ````````````````````````````````
7667 No [whitespace] is allowed between the [link text] and the
7670 ```````````````````````````````` example
7675 <p>[foo] <a href="/url" title="title">bar</a></p>
7676 ````````````````````````````````
7679 ```````````````````````````````` example
7686 <a href="/url" title="title">bar</a></p>
7687 ````````````````````````````````
7690 This is a departure from John Gruber's original Markdown syntax
7691 description, which explicitly allows whitespace between the link
7692 text and the link label. It brings reference links in line with
7693 [inline links], which (according to both original Markdown and
7694 this spec) cannot have whitespace after the link text. More
7695 importantly, it prevents inadvertent capture of consecutive
7696 [shortcut reference links]. If whitespace is allowed between the
7697 link text and the link label, then in the following we will have
7698 a single reference link, not two shortcut reference links, as
7709 (Note that [shortcut reference links] were introduced by Gruber
7710 himself in a beta version of `Markdown.pl`, but never included
7711 in the official syntax description. Without shortcut reference
7712 links, it is harmless to allow space between the link text and
7713 link label; but once shortcut references are introduced, it is
7714 too dangerous to allow this, as it frequently leads to
7715 unintended results.)
7717 When there are multiple matching [link reference definitions],
7720 ```````````````````````````````` example
7727 <p><a href="/url1">bar</a></p>
7728 ````````````````````````````````
7731 Note that matching is performed on normalized strings, not parsed
7732 inline content. So the following does not match, even though the
7733 labels define equivalent inline content:
7735 ```````````````````````````````` example
7741 ````````````````````````````````
7744 [Link labels] cannot contain brackets, unless they are
7747 ```````````````````````````````` example
7754 ````````````````````````````````
7757 ```````````````````````````````` example
7762 <p>[foo][ref[bar]]</p>
7763 <p>[ref[bar]]: /uri</p>
7764 ````````````````````````````````
7767 ```````````````````````````````` example
7773 <p>[[[foo]]]: /url</p>
7774 ````````````````````````````````
7777 ```````````````````````````````` example
7782 <p><a href="/uri">foo</a></p>
7783 ````````````````````````````````
7786 Note that in this example `]` is not backslash-escaped:
7788 ```````````````````````````````` example
7793 <p><a href="/uri">bar\</a></p>
7794 ````````````````````````````````
7797 A [link label] must contain at least one [non-whitespace character]:
7799 ```````````````````````````````` example
7806 ````````````````````````````````
7809 ```````````````````````````````` example
7820 ````````````````````````````````
7823 A [collapsed reference link](@)
7824 consists of a [link label] that [matches] a
7825 [link reference definition] elsewhere in the
7826 document, followed by the string `[]`.
7827 The contents of the first link label are parsed as inlines,
7828 which are used as the link's text. The link's URI and title are
7829 provided by the matching reference link definition. Thus,
7830 `[foo][]` is equivalent to `[foo][foo]`.
7832 ```````````````````````````````` example
7837 <p><a href="/url" title="title">foo</a></p>
7838 ````````````````````````````````
7841 ```````````````````````````````` example
7844 [*foo* bar]: /url "title"
7846 <p><a href="/url" title="title"><em>foo</em> bar</a></p>
7847 ````````````````````````````````
7850 The link labels are case-insensitive:
7852 ```````````````````````````````` example
7857 <p><a href="/url" title="title">Foo</a></p>
7858 ````````````````````````````````
7862 As with full reference links, [whitespace] is not
7863 allowed between the two sets of brackets:
7865 ```````````````````````````````` example
7871 <p><a href="/url" title="title">foo</a>
7873 ````````````````````````````````
7876 A [shortcut reference link](@)
7877 consists of a [link label] that [matches] a
7878 [link reference definition] elsewhere in the
7879 document and is not followed by `[]` or a link label.
7880 The contents of the first link label are parsed as inlines,
7881 which are used as the link's text. the link's URI and title
7882 are provided by the matching link reference definition.
7883 Thus, `[foo]` is equivalent to `[foo][]`.
7885 ```````````````````````````````` example
7890 <p><a href="/url" title="title">foo</a></p>
7891 ````````````````````````````````
7894 ```````````````````````````````` example
7897 [*foo* bar]: /url "title"
7899 <p><a href="/url" title="title"><em>foo</em> bar</a></p>
7900 ````````````````````````````````
7903 ```````````````````````````````` example
7906 [*foo* bar]: /url "title"
7908 <p>[<a href="/url" title="title"><em>foo</em> bar</a>]</p>
7909 ````````````````````````````````
7912 ```````````````````````````````` example
7917 <p>[[bar <a href="/url">foo</a></p>
7918 ````````````````````````````````
7921 The link labels are case-insensitive:
7923 ```````````````````````````````` example
7928 <p><a href="/url" title="title">Foo</a></p>
7929 ````````````````````````````````
7932 A space after the link text should be preserved:
7934 ```````````````````````````````` example
7939 <p><a href="/url">foo</a> bar</p>
7940 ````````````````````````````````
7943 If you just want bracketed text, you can backslash-escape the
7944 opening bracket to avoid links:
7946 ```````````````````````````````` example
7952 ````````````````````````````````
7955 Note that this is a link, because a link label ends with the first
7956 following closing bracket:
7958 ```````````````````````````````` example
7963 <p>*<a href="/url">foo*</a></p>
7964 ````````````````````````````````
7967 Full references take precedence over shortcut references:
7969 ```````````````````````````````` example
7975 <p><a href="/url2">foo</a></p>
7976 ````````````````````````````````
7979 In the following case `[bar][baz]` is parsed as a reference,
7980 `[foo]` as normal text:
7982 ```````````````````````````````` example
7987 <p>[foo]<a href="/url">bar</a></p>
7988 ````````````````````````````````
7991 Here, though, `[foo][bar]` is parsed as a reference, since
7994 ```````````````````````````````` example
8000 <p><a href="/url2">foo</a><a href="/url1">baz</a></p>
8001 ````````````````````````````````
8004 Here `[foo]` is not parsed as a shortcut reference, because it
8005 is followed by a link label (even though `[bar]` is not defined):
8007 ```````````````````````````````` example
8013 <p>[foo]<a href="/url1">bar</a></p>
8014 ````````````````````````````````
8020 Syntax for images is like the syntax for links, with one
8021 difference. Instead of [link text], we have an
8022 [image description](@). The rules for this are the
8023 same as for [link text], except that (a) an
8024 image description starts with `![` rather than `[`, and
8025 (b) an image description may contain links.
8026 An image description has inline elements
8027 as its contents. When an image is rendered to HTML,
8028 this is standardly used as the image's `alt` attribute.
8030 ```````````````````````````````` example
8031 ![foo](/url "title")
8033 <p><img src="/url" alt="foo" title="title" /></p>
8034 ````````````````````````````````
8037 ```````````````````````````````` example
8040 [foo *bar*]: train.jpg "train & tracks"
8042 <p><img src="train.jpg" alt="foo bar" title="train & tracks" /></p>
8043 ````````````````````````````````
8046 ```````````````````````````````` example
8047 ![foo ![bar](/url)](/url2)
8049 <p><img src="/url2" alt="foo bar" /></p>
8050 ````````````````````````````````
8053 ```````````````````````````````` example
8054 ![foo [bar](/url)](/url2)
8056 <p><img src="/url2" alt="foo bar" /></p>
8057 ````````````````````````````````
8060 Though this spec is concerned with parsing, not rendering, it is
8061 recommended that in rendering to HTML, only the plain string content
8062 of the [image description] be used. Note that in
8063 the above example, the alt attribute's value is `foo bar`, not `foo
8064 [bar](/url)` or `foo <a href="/url">bar</a>`. Only the plain string
8065 content is rendered, without formatting.
8067 ```````````````````````````````` example
8070 [foo *bar*]: train.jpg "train & tracks"
8072 <p><img src="train.jpg" alt="foo bar" title="train & tracks" /></p>
8073 ````````````````````````````````
8076 ```````````````````````````````` example
8077 ![foo *bar*][foobar]
8079 [FOOBAR]: train.jpg "train & tracks"
8081 <p><img src="train.jpg" alt="foo bar" title="train & tracks" /></p>
8082 ````````````````````````````````
8085 ```````````````````````````````` example
8088 <p><img src="train.jpg" alt="foo" /></p>
8089 ````````````````````````````````
8092 ```````````````````````````````` example
8093 My ![foo bar](/path/to/train.jpg "title" )
8095 <p>My <img src="/path/to/train.jpg" alt="foo bar" title="title" /></p>
8096 ````````````````````````````````
8099 ```````````````````````````````` example
8102 <p><img src="url" alt="foo" /></p>
8103 ````````````````````````````````
8106 ```````````````````````````````` example
8109 <p><img src="/url" alt="" /></p>
8110 ````````````````````````````````
8115 ```````````````````````````````` example
8120 <p><img src="/url" alt="foo" /></p>
8121 ````````````````````````````````
8124 ```````````````````````````````` example
8129 <p><img src="/url" alt="foo" /></p>
8130 ````````````````````````````````
8135 ```````````````````````````````` example
8140 <p><img src="/url" alt="foo" title="title" /></p>
8141 ````````````````````````````````
8144 ```````````````````````````````` example
8147 [*foo* bar]: /url "title"
8149 <p><img src="/url" alt="foo bar" title="title" /></p>
8150 ````````````````````````````````
8153 The labels are case-insensitive:
8155 ```````````````````````````````` example
8160 <p><img src="/url" alt="Foo" title="title" /></p>
8161 ````````````````````````````````
8164 As with reference links, [whitespace] is not allowed
8165 between the two sets of brackets:
8167 ```````````````````````````````` example
8173 <p><img src="/url" alt="foo" title="title" />
8175 ````````````````````````````````
8180 ```````````````````````````````` example
8185 <p><img src="/url" alt="foo" title="title" /></p>
8186 ````````````````````````````````
8189 ```````````````````````````````` example
8192 [*foo* bar]: /url "title"
8194 <p><img src="/url" alt="foo bar" title="title" /></p>
8195 ````````````````````````````````
8198 Note that link labels cannot contain unescaped brackets:
8200 ```````````````````````````````` example
8203 [[foo]]: /url "title"
8206 <p>[[foo]]: /url "title"</p>
8207 ````````````````````````````````
8210 The link labels are case-insensitive:
8212 ```````````````````````````````` example
8217 <p><img src="/url" alt="Foo" title="title" /></p>
8218 ````````````````````````````````
8221 If you just want bracketed text, you can backslash-escape the
8222 opening `!` and `[`:
8224 ```````````````````````````````` example
8230 ````````````````````````````````
8233 If you want a link after a literal `!`, backslash-escape the
8236 ```````````````````````````````` example
8241 <p>!<a href="/url" title="title">foo</a></p>
8242 ````````````````````````````````
8247 [Autolink](@)s are absolute URIs and email addresses inside
8248 `<` and `>`. They are parsed as links, with the URL or email address
8251 A [URI autolink](@) consists of `<`, followed by an
8252 [absolute URI] not containing `<`, followed by `>`. It is parsed as
8253 a link to the URI, with the URI as the link's label.
8255 An [absolute URI](@),
8256 for these purposes, consists of a [scheme] followed by a colon (`:`)
8257 followed by zero or more characters other than ASCII
8258 [whitespace] and control characters, `<`, and `>`. If
8259 the URI includes these characters, they must be percent-encoded
8260 (e.g. `%20` for a space).
8262 For purposes of this spec, a [scheme](@) is any sequence
8263 of 2--32 characters beginning with an ASCII letter and followed
8264 by any combination of ASCII letters, digits, or the symbols plus
8265 ("+"), period ("."), or hyphen ("-").
8267 Here are some valid autolinks:
8269 ```````````````````````````````` example
8270 <http://foo.bar.baz>
8272 <p><a href="http://foo.bar.baz">http://foo.bar.baz</a></p>
8273 ````````````````````````````````
8276 ```````````````````````````````` example
8277 <http://foo.bar.baz/test?q=hello&id=22&boolean>
8279 <p><a href="http://foo.bar.baz/test?q=hello&id=22&boolean">http://foo.bar.baz/test?q=hello&id=22&boolean</a></p>
8280 ````````````````````````````````
8283 ```````````````````````````````` example
8284 <irc://foo.bar:2233/baz>
8286 <p><a href="irc://foo.bar:2233/baz">irc://foo.bar:2233/baz</a></p>
8287 ````````````````````````````````
8290 Uppercase is also fine:
8292 ```````````````````````````````` example
8293 <MAILTO:FOO@BAR.BAZ>
8295 <p><a href="MAILTO:FOO@BAR.BAZ">MAILTO:FOO@BAR.BAZ</a></p>
8296 ````````````````````````````````
8299 Note that many strings that count as [absolute URIs] for
8300 purposes of this spec are not valid URIs, because their
8301 schemes are not registered or because of other problems
8304 ```````````````````````````````` example
8307 <p><a href="a+b+c:d">a+b+c:d</a></p>
8308 ````````````````````````````````
8311 ```````````````````````````````` example
8312 <made-up-scheme://foo,bar>
8314 <p><a href="made-up-scheme://foo,bar">made-up-scheme://foo,bar</a></p>
8315 ````````````````````````````````
8318 ```````````````````````````````` example
8321 <p><a href="http://../">http://../</a></p>
8322 ````````````````````````````````
8325 ```````````````````````````````` example
8326 <localhost:5001/foo>
8328 <p><a href="localhost:5001/foo">localhost:5001/foo</a></p>
8329 ````````````````````````````````
8332 Spaces are not allowed in autolinks:
8334 ```````````````````````````````` example
8335 <http://foo.bar/baz bim>
8337 <p><http://foo.bar/baz bim></p>
8338 ````````````````````````````````
8341 Backslash-escapes do not work inside autolinks:
8343 ```````````````````````````````` example
8344 <http://example.com/\[\>
8346 <p><a href="http://example.com/%5C%5B%5C">http://example.com/\[\</a></p>
8347 ````````````````````````````````
8350 An [email autolink](@)
8351 consists of `<`, followed by an [email address],
8352 followed by `>`. The link's label is the email address,
8353 and the URL is `mailto:` followed by the email address.
8355 An [email address](@),
8356 for these purposes, is anything that matches
8357 the [non-normative regex from the HTML5
8358 spec](https://html.spec.whatwg.org/multipage/forms.html#e-mail-state-(type=email)):
8360 /^[a-zA-Z0-9.!#$%&'*+/=?^_`{|}~-]+@[a-zA-Z0-9](?:[a-zA-Z0-9-]{0,61}[a-zA-Z0-9])?
8361 (?:\.[a-zA-Z0-9](?:[a-zA-Z0-9-]{0,61}[a-zA-Z0-9])?)*$/
8363 Examples of email autolinks:
8365 ```````````````````````````````` example
8366 <foo@bar.example.com>
8368 <p><a href="mailto:foo@bar.example.com">foo@bar.example.com</a></p>
8369 ````````````````````````````````
8372 ```````````````````````````````` example
8373 <foo+special@Bar.baz-bar0.com>
8375 <p><a href="mailto:foo+special@Bar.baz-bar0.com">foo+special@Bar.baz-bar0.com</a></p>
8376 ````````````````````````````````
8379 Backslash-escapes do not work inside email autolinks:
8381 ```````````````````````````````` example
8382 <foo\+@bar.example.com>
8384 <p><foo+@bar.example.com></p>
8385 ````````````````````````````````
8388 These are not autolinks:
8390 ```````````````````````````````` example
8394 ````````````````````````````````
8397 ```````````````````````````````` example
8400 <p>< http://foo.bar ></p>
8401 ````````````````````````````````
8404 ```````````````````````````````` example
8407 <p><m:abc></p>
8408 ````````````````````````````````
8411 ```````````````````````````````` example
8414 <p><foo.bar.baz></p>
8415 ````````````````````````````````
8418 ```````````````````````````````` example
8421 <p>http://example.com</p>
8422 ````````````````````````````````
8425 ```````````````````````````````` example
8428 <p>foo@bar.example.com</p>
8429 ````````````````````````````````
8434 Text between `<` and `>` that looks like an HTML tag is parsed as a
8435 raw HTML tag and will be rendered in HTML without escaping.
8436 Tag and attribute names are not limited to current HTML tags,
8437 so custom tags (and even, say, DocBook tags) may be used.
8439 Here is the grammar for tags:
8441 A [tag name](@) consists of an ASCII letter
8442 followed by zero or more ASCII letters, digits, or
8445 An [attribute](@) consists of [whitespace],
8446 an [attribute name], and an optional
8447 [attribute value specification].
8449 An [attribute name](@)
8450 consists of an ASCII letter, `_`, or `:`, followed by zero or more ASCII
8451 letters, digits, `_`, `.`, `:`, or `-`. (Note: This is the XML
8452 specification restricted to ASCII. HTML5 is laxer.)
8454 An [attribute value specification](@)
8455 consists of optional [whitespace],
8456 a `=` character, optional [whitespace], and an [attribute
8459 An [attribute value](@)
8460 consists of an [unquoted attribute value],
8461 a [single-quoted attribute value], or a [double-quoted attribute value].
8463 An [unquoted attribute value](@)
8464 is a nonempty string of characters not
8465 including spaces, `"`, `'`, `=`, `<`, `>`, or `` ` ``.
8467 A [single-quoted attribute value](@)
8468 consists of `'`, zero or more
8469 characters not including `'`, and a final `'`.
8471 A [double-quoted attribute value](@)
8472 consists of `"`, zero or more
8473 characters not including `"`, and a final `"`.
8475 An [open tag](@) consists of a `<` character, a [tag name],
8476 zero or more [attributes], optional [whitespace], an optional `/`
8477 character, and a `>` character.
8479 A [closing tag](@) consists of the string `</`, a
8480 [tag name], optional [whitespace], and the character `>`.
8482 An [HTML comment](@) consists of `<!--` + *text* + `-->`,
8483 where *text* does not start with `>` or `->`, does not end with `-`,
8484 and does not contain `--`. (See the
8485 [HTML5 spec](http://www.w3.org/TR/html5/syntax.html#comments).)
8487 A [processing instruction](@)
8488 consists of the string `<?`, a string
8489 of characters not including the string `?>`, and the string
8492 A [declaration](@) consists of the
8493 string `<!`, a name consisting of one or more uppercase ASCII letters,
8494 [whitespace], a string of characters not including the
8495 character `>`, and the character `>`.
8497 A [CDATA section](@) consists of
8498 the string `<![CDATA[`, a string of characters not including the string
8499 `]]>`, and the string `]]>`.
8501 An [HTML tag](@) consists of an [open tag], a [closing tag],
8502 an [HTML comment], a [processing instruction], a [declaration],
8503 or a [CDATA section].
8505 Here are some simple open tags:
8507 ```````````````````````````````` example
8510 <p><a><bab><c2c></p>
8511 ````````````````````````````````
8516 ```````````````````````````````` example
8520 ````````````````````````````````
8523 [Whitespace] is allowed:
8525 ```````````````````````````````` example
8531 ````````````````````````````````
8536 ```````````````````````````````` example
8537 <a foo="bar" bam = 'baz <em>"</em>'
8538 _boolean zoop:33=zoop:33 />
8540 <p><a foo="bar" bam = 'baz <em>"</em>'
8541 _boolean zoop:33=zoop:33 /></p>
8542 ````````````````````````````````
8545 Custom tag names can be used:
8547 ```````````````````````````````` example
8548 Foo <responsive-image src="foo.jpg" />
8550 <p>Foo <responsive-image src="foo.jpg" /></p>
8551 ````````````````````````````````
8554 Illegal tag names, not parsed as HTML:
8556 ```````````````````````````````` example
8559 <p><33> <__></p>
8560 ````````````````````````````````
8563 Illegal attribute names:
8565 ```````````````````````````````` example
8568 <p><a h*#ref="hi"></p>
8569 ````````````````````````````````
8572 Illegal attribute values:
8574 ```````````````````````````````` example
8575 <a href="hi'> <a href=hi'>
8577 <p><a href="hi'> <a href=hi'></p>
8578 ````````````````````````````````
8581 Illegal [whitespace]:
8583 ```````````````````````````````` example
8588 foo><bar/ ></p>
8589 ````````````````````````````````
8592 Missing [whitespace]:
8594 ```````````````````````````````` example
8595 <a href='bar'title=title>
8597 <p><a href='bar'title=title></p>
8598 ````````````````````````````````
8603 ```````````````````````````````` example
8607 ````````````````````````````````
8610 Illegal attributes in closing tag:
8612 ```````````````````````````````` example
8615 <p></a href="foo"></p>
8616 ````````````````````````````````
8621 ```````````````````````````````` example
8623 comment - with hyphen -->
8625 <p>foo <!-- this is a
8626 comment - with hyphen --></p>
8627 ````````````````````````````````
8630 ```````````````````````````````` example
8631 foo <!-- not a comment -- two hyphens -->
8633 <p>foo <!-- not a comment -- two hyphens --></p>
8634 ````````````````````````````````
8639 ```````````````````````````````` example
8644 <p>foo <!--> foo --></p>
8645 <p>foo <!-- foo---></p>
8646 ````````````````````````````````
8649 Processing instructions:
8651 ```````````````````````````````` example
8652 foo <?php echo $a; ?>
8654 <p>foo <?php echo $a; ?></p>
8655 ````````````````````````````````
8660 ```````````````````````````````` example
8661 foo <!ELEMENT br EMPTY>
8663 <p>foo <!ELEMENT br EMPTY></p>
8664 ````````````````````````````````
8669 ```````````````````````````````` example
8672 <p>foo <![CDATA[>&<]]></p>
8673 ````````````````````````````````
8676 Entity and numeric character references are preserved in HTML
8679 ```````````````````````````````` example
8680 foo <a href="ö">
8682 <p>foo <a href="ö"></p>
8683 ````````````````````````````````
8686 Backslash escapes do not work in HTML attributes:
8688 ```````````````````````````````` example
8691 <p>foo <a href="\*"></p>
8692 ````````````````````````````````
8695 ```````````````````````````````` example
8698 <p><a href="""></p>
8699 ````````````````````````````````
8704 A line break (not in a code span or HTML tag) that is preceded
8705 by two or more spaces and does not occur at the end of a block
8706 is parsed as a [hard line break](@) (rendered
8707 in HTML as a `<br />` tag):
8709 ```````````````````````````````` example
8715 ````````````````````````````````
8718 For a more visible alternative, a backslash before the
8719 [line ending] may be used instead of two spaces:
8721 ```````````````````````````````` example
8727 ````````````````````````````````
8730 More than two spaces can be used:
8732 ```````````````````````````````` example
8738 ````````````````````````````````
8741 Leading spaces at the beginning of the next line are ignored:
8743 ```````````````````````````````` example
8749 ````````````````````````````````
8752 ```````````````````````````````` example
8758 ````````````````````````````````
8761 Line breaks can occur inside emphasis, links, and other constructs
8762 that allow inline content:
8764 ```````````````````````````````` example
8770 ````````````````````````````````
8773 ```````````````````````````````` example
8779 ````````````````````````````````
8782 Line breaks do not occur inside code spans
8784 ```````````````````````````````` example
8788 <p><code>code span</code></p>
8789 ````````````````````````````````
8792 ```````````````````````````````` example
8796 <p><code>code\ span</code></p>
8797 ````````````````````````````````
8802 ```````````````````````````````` example
8808 ````````````````````````````````
8811 ```````````````````````````````` example
8817 ````````````````````````````````
8820 Hard line breaks are for separating inline content within a block.
8821 Neither syntax for hard line breaks works at the end of a paragraph or
8822 other block element:
8824 ```````````````````````````````` example
8828 ````````````````````````````````
8831 ```````````````````````````````` example
8835 ````````````````````````````````
8838 ```````````````````````````````` example
8842 ````````````````````````````````
8845 ```````````````````````````````` example
8849 ````````````````````````````````
8854 A regular line break (not in a code span or HTML tag) that is not
8855 preceded by two or more spaces or a backslash is parsed as a
8856 softbreak. (A softbreak may be rendered in HTML either as a
8857 [line ending] or as a space. The result will be the same in
8858 browsers. In the examples here, a [line ending] will be used.)
8860 ```````````````````````````````` example
8866 ````````````````````````````````
8869 Spaces at the end of the line and beginning of the next line are
8872 ```````````````````````````````` example
8878 ````````````````````````````````
8881 A conforming parser may render a soft line break in HTML either as a
8882 line break or as a space.
8884 A renderer may also provide an option to render soft line breaks
8885 as hard line breaks.
8889 Any characters not given an interpretation by the above rules will
8890 be parsed as plain textual content.
8892 ```````````````````````````````` example
8895 <p>hello $.;'there</p>
8896 ````````````````````````````````
8899 ```````````````````````````````` example
8903 ````````````````````````````````
8906 Internal spaces are preserved verbatim:
8908 ```````````````````````````````` example
8911 <p>Multiple spaces</p>
8912 ````````````````````````````````
8917 # Appendix: A parsing strategy
8919 In this appendix we describe some features of the parsing strategy
8920 used in the CommonMark reference implementations.
8924 Parsing has two phases:
8926 1. In the first phase, lines of input are consumed and the block
8927 structure of the document---its division into paragraphs, block quotes,
8928 list items, and so on---is constructed. Text is assigned to these
8929 blocks but not parsed. Link reference definitions are parsed and a
8930 map of links is constructed.
8932 2. In the second phase, the raw text contents of paragraphs and headings
8933 are parsed into sequences of Markdown inline elements (strings,
8934 code spans, links, emphasis, and so on), using the map of link
8935 references constructed in phase 1.
8937 At each point in processing, the document is represented as a tree of
8938 **blocks**. The root of the tree is a `document` block. The `document`
8939 may have any number of other blocks as **children**. These children
8940 may, in turn, have other blocks as children. The last child of a block
8941 is normally considered **open**, meaning that subsequent lines of input
8942 can alter its contents. (Blocks that are not open are **closed**.)
8943 Here, for example, is a possible document tree, with the open blocks
8950 "Lorem ipsum dolor\nsit amet."
8951 -> list (type=bullet tight=true bullet_char=-)
8954 "Qui *quodsi iracundia*"
8960 ## Phase 1: block structure
8962 Each line that is processed has an effect on this tree. The line is
8963 analyzed and, depending on its contents, the document may be altered
8964 in one or more of the following ways:
8966 1. One or more open blocks may be closed.
8967 2. One or more new blocks may be created as children of the
8969 3. Text may be added to the last (deepest) open block remaining
8972 Once a line has been incorporated into the tree in this way,
8973 it can be discarded, so input can be read in a stream.
8975 For each line, we follow this procedure:
8977 1. First we iterate through the open blocks, starting with the
8978 root document, and descending through last children down to the last
8979 open block. Each block imposes a condition that the line must satisfy
8980 if the block is to remain open. For example, a block quote requires a
8981 `>` character. A paragraph requires a non-blank line.
8982 In this phase we may match all or just some of the open
8983 blocks. But we cannot close unmatched blocks yet, because we may have a
8984 [lazy continuation line].
8986 2. Next, after consuming the continuation markers for existing
8987 blocks, we look for new block starts (e.g. `>` for a block quote.
8988 If we encounter a new block start, we close any blocks unmatched
8989 in step 1 before creating the new block as a child of the last
8992 3. Finally, we look at the remainder of the line (after block
8993 markers like `>`, list markers, and indentation have been consumed).
8994 This is text that can be incorporated into the last open
8995 block (a paragraph, code block, heading, or raw HTML).
8997 Setext headings are formed when we see a line of a paragraph
8998 that is a [setext heading underline].
9000 Reference link definitions are detected when a paragraph is closed;
9001 the accumulated text lines are parsed to see if they begin with
9002 one or more reference link definitions. Any remainder becomes a
9005 We can see how this works by considering how the tree above is
9006 generated by four lines of Markdown:
9011 > - Qui *quodsi iracundia*
9015 At the outset, our document model is just
9021 The first line of our text,
9027 causes a `block_quote` block to be created as a child of our
9028 open `document` block, and a `paragraph` block as a child of
9029 the `block_quote`. Then the text is added to the last open
9030 block, the `paragraph`:
9045 is a "lazy continuation" of the open `paragraph`, so it gets added
9046 to the paragraph's text:
9052 "Lorem ipsum dolor\nsit amet."
9058 > - Qui *quodsi iracundia*
9061 causes the `paragraph` block to be closed, and a new `list` block
9062 opened as a child of the `block_quote`. A `list_item` is also
9063 added as a child of the `list`, and a `paragraph` as a child of
9064 the `list_item`. The text is then added to the new `paragraph`:
9070 "Lorem ipsum dolor\nsit amet."
9071 -> list (type=bullet tight=true bullet_char=-)
9074 "Qui *quodsi iracundia*"
9083 causes the `list_item` (and its child the `paragraph`) to be closed,
9084 and a new `list_item` opened up as child of the `list`. A `paragraph`
9085 is added as a child of the new `list_item`, to contain the text.
9086 We thus obtain the final tree:
9092 "Lorem ipsum dolor\nsit amet."
9093 -> list (type=bullet tight=true bullet_char=-)
9096 "Qui *quodsi iracundia*"
9102 ## Phase 2: inline structure
9104 Once all of the input has been parsed, all open blocks are closed.
9106 We then "walk the tree," visiting every node, and parse raw
9107 string contents of paragraphs and headings as inlines. At this
9108 point we have seen all the link reference definitions, so we can
9109 resolve reference links as we go.
9115 str "Lorem ipsum dolor"
9118 list (type=bullet tight=true bullet_char=-)
9123 str "quodsi iracundia"
9129 Notice how the [line ending] in the first paragraph has
9130 been parsed as a `softbreak`, and the asterisks in the first list item
9131 have become an `emph`.
9133 ### An algorithm for parsing nested emphasis and links
9135 By far the trickiest part of inline parsing is handling emphasis,
9136 strong emphasis, links, and images. This is done using the following
9139 When we're parsing inlines and we hit either
9141 - a run of `*` or `_` characters, or
9144 we insert a text node with these symbols as its literal content, and we
9145 add a pointer to this text node to the [delimiter stack](@).
9147 The [delimiter stack] is a doubly linked list. Each
9148 element contains a pointer to a text node, plus information about
9150 - the type of delimiter (`[`, `![`, `*`, `_`)
9151 - the number of delimiters,
9152 - whether the delimiter is "active" (all are active to start), and
9153 - whether the delimiter is a potential opener, a potential closer,
9154 or both (which depends on what sort of characters precede
9155 and follow the delimiters).
9157 When we hit a `]` character, we call the *look for link or image*
9158 procedure (see below).
9160 When we hit the end of the input, we call the *process emphasis*
9161 procedure (see below), with `stack_bottom` = NULL.
9163 #### *look for link or image*
9165 Starting at the top of the delimiter stack, we look backwards
9166 through the stack for an opening `[` or `![` delimiter.
9168 - If we don't find one, we return a literal text node `]`.
9170 - If we do find one, but it's not *active*, we remove the inactive
9171 delimiter from the stack, and return a literal text node `]`.
9173 - If we find one and it's active, then we parse ahead to see if
9174 we have an inline link/image, reference link/image, compact reference
9175 link/image, or shortcut reference link/image.
9177 + If we don't, then we remove the opening delimiter from the
9178 delimiter stack and return a literal text node `]`.
9182 * We return a link or image node whose children are the inlines
9183 after the text node pointed to by the opening delimiter.
9185 * We run *process emphasis* on these inlines, with the `[` opener
9188 * We remove the opening delimiter.
9190 * If we have a link (and not an image), we also set all
9191 `[` delimiters before the opening delimiter to *inactive*. (This
9192 will prevent us from getting links within links.)
9194 #### *process emphasis*
9196 Parameter `stack_bottom` sets a lower bound to how far we
9197 descend in the [delimiter stack]. If it is NULL, we can
9198 go all the way to the bottom. Otherwise, we stop before
9199 visiting `stack_bottom`.
9201 Let `current_position` point to the element on the [delimiter stack]
9202 just above `stack_bottom` (or the first element if `stack_bottom`
9205 We keep track of the `openers_bottom` for each delimiter
9206 type (`*`, `_`). Initialize this to `stack_bottom`.
9208 Then we repeat the following until we run out of potential
9211 - Move `current_position` forward in the delimiter stack (if needed)
9212 until we find the first potential closer with delimiter `*` or `_`.
9213 (This will be the potential closer closest
9214 to the beginning of the input -- the first one in parse order.)
9216 - Now, look back in the stack (staying above `stack_bottom` and
9217 the `openers_bottom` for this delimiter type) for the
9218 first matching potential opener ("matching" means same delimiter).
9222 + Figure out whether we have emphasis or strong emphasis:
9223 if both closer and opener spans have length >= 2, we have
9224 strong, otherwise regular.
9226 + Insert an emph or strong emph node accordingly, after
9227 the text node corresponding to the opener.
9229 + Remove any delimiters between the opener and closer from
9230 the delimiter stack.
9232 + Remove 1 (for regular emph) or 2 (for strong emph) delimiters
9233 from the opening and closing text nodes. If they become empty
9234 as a result, remove them and remove the corresponding element
9235 of the delimiter stack. If the closing node is removed, reset
9236 `current_position` to the next element in the stack.
9240 + Set `openers_bottom` to the element before `current_position`.
9241 (We know that there are no openers for this kind of closer up to and
9242 including this point, so this puts a lower bound on future searches.)
9244 + If the closer at `current_position` is not a potential opener,
9245 remove it from the delimiter stack (since we know it can't
9246 be a closer either).
9248 + Advance `current_position` to the next element in the stack.
9250 After we're done, we remove all delimiters above `stack_bottom` from the