5 At the moment, the use of the C runtime engine for the parser is not generally
6 for the inexperienced C programmer. However this is mainly because of the lack
7 of documentation on use, which will be corrected shortly. The C runtime
8 code itself is however well documented with doxygen style comments and a
9 reasonably experienced C programmer should be able to piece it together. You
10 can visit the documentation at: http://www.antlr.org/api/C/index.html
12 The general make up is that everything is implemented as a pseudo class/object
13 initialized with pointers to its 'member' functions and data. All objects are
14 (usually) created by factories, which auto manage the memory allocation and
15 release and generally make life easier. If you remember this rule, everything
16 should fall in to place.
18 Jim Idle - Portland Oregon, Jan 2008
21 ===============================================================================
23 Terence Parr, parrt at cs usfca edu
24 ANTLR project lead and supreme dictator for life
25 University of San Francisco
29 Welcome to ANTLR v3! I've been working on this for nearly 4 years and it's
30 almost ready! I plan no feature additions between this beta and first
31 3.0 release. I have lots of features to add later, but this will be
32 the first set. Ultimately, I need to rewrite ANTLR v3 in itself (it's
33 written in 2.7.7 at the moment and also needs StringTemplate 3.0 or
36 You should use v3 in conjunction with ANTLRWorks:
38 http://www.antlr.org/works/index.html
40 WARNING: We have bits of documentation started, but nothing super-complete
41 yet. The book will be printed May 2007:
43 http://www.pragmaticprogrammer.com/titles/tpantlr/index.html
45 but we should have a beta PDF available on that page in Feb 2007.
47 You also have the examples plus the source to guide you.
51 http://www.antlr.org/wiki/display/ANTLR3/ANTLR+v3+FAQ
55 http://www.antlr.org/wiki/display/ANTLR3/ANTLR+3+Wiki+Home
57 Please help add/update FAQ entries.
59 I have made very little effort at this point to deal well with
60 erroneous input (e.g., bad syntax might make ANTLR crash). I will clean
61 this up after I've rewritten v3 in v3.
63 Per the license in LICENSE.txt, this software is not guaranteed to
64 work and might even destroy all life on this planet:
66 THIS SOFTWARE IS PROVIDED BY THE AUTHOR ``AS IS'' AND ANY EXPRESS OR
67 IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED
68 WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE
69 DISCLAIMED. IN NO EVENT SHALL THE AUTHOR BE LIABLE FOR ANY DIRECT,
70 INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES
71 (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR
72 SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION)
73 HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT,
74 STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING
75 IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE
76 POSSIBILITY OF SUCH DAMAGE.
80 ANTLR v3 sample grammars:
82 http://www.antlr.org/download/examples-v3.tar.gz
84 contains the following examples: LL-star, cminus, dynamic-scope,
85 fuzzy, hoistedPredicates, island-grammar, java, python, scopes,
86 simplecTreeParser, treeparser, tweak, xmlLexer.
88 Also check out Mantra Programming Language for a prototype (work in
91 http://www.linguamantra.org/
93 ----------------------------------------------------------------------
97 ANTLR stands for (AN)other (T)ool for (L)anguage (R)ecognition and was
98 originally known as PCCTS. ANTLR is a language tool that provides a
99 framework for constructing recognizers, compilers, and translators
100 from grammatical descriptions containing actions. Target language list:
102 http://www.antlr.org/wiki/display/ANTLR3/Code+Generation+Targets
104 ----------------------------------------------------------------------
106 How is ANTLR v3 different than ANTLR v2?
109 http://www.antlr.org/wiki/display/ANTLR3/Migrating+from+ANTLR+2+to+ANTLR+3
111 ANTLR v3 has a far superior parsing algorithm called LL(*) that
112 handles many more grammars than v2 does. In practice, it means you
113 can throw almost any grammar at ANTLR that is non-left-recursive and
114 unambiguous (same input can be matched by multiple rules); the cost is
115 perhaps a tiny bit of backtracking, but with a DFA not a full parser.
116 You can manually set the max lookahead k as an option for any decision
117 though. The LL(*) algorithm ramps up to use more lookahead when it
118 needs to and is much more efficient than normal LL backtracking. There
119 is support for syntactic predicate (full LL backtracking) when LL(*)
122 Lexers are much easier due to the LL(*) algorithm as well. Previously
123 these two lexer rules would cause trouble because ANTLR couldn't
124 distinguish between them with finite lookahead to see the decimal
128 FLOAT : INT '.' INT ;
130 The syntax is almost identical for features in common, but you should
131 note that labels are always '=' not ':'. So do id=ID not id:ID.
133 You can do combined lexer/parser grammars again (ala PCCTS) both lexer
134 and parser rules are defined in the same file. See the examples.
135 Really nice. You can reference strings and characters in the grammar
136 and ANTLR will generate the lexer for you.
138 The attribute structure has been enhanced. Rules may have multiple
139 return values, for example. Further, there are dynamically scoped
140 attributes whereby a rule may define a value usable by any rule it
141 invokes directly or indirectly w/o having to pass a parameter all the
144 ANTLR v3 tree construction is far superior--it provides tree rewrite
145 rules where the right hand side is simply the tree grammar fragment
146 describing the tree you want to build:
149 : typename declarator (',' typename declarator )*
150 -> ^(ARG typename declarator)+
153 That builds tree sequences like:
155 ^(ARG int v1) ^(ARG int v2)
157 ANTLR v3 also incorporates StringTemplate:
159 http://www.stringtemplate.org
161 just like AST support. It is useful for generating output. For
162 example this rule creates a template called 'import' for each import
163 definition found in the input stream:
171 : 'import' identifierStar SEMI
172 -> import(name={$identifierStar.st},
173 begin={$identifierStar.start},
174 end={$identifierStar.stop})
177 The attributes are set via assignments in the argument list. The
178 arguments are actions with arbitrary expressions in the target
179 language. The .st label property is the result template from a rule
180 reference. There is a nice shorthand in actions too:
182 %foo(a={},b={},...) ctor
183 %({name-expr})(a={},...) indirect template ctor reference
184 %{string-expr} anonymous template from string expr
185 %{expr}.y = z; template attribute y of StringTemplate-typed expr to z
186 %x.y = z; set template attribute y of x (always set never get attr)
187 to z [languages like python without ';' must still use the
188 ';' which the code generator is free to remove during code gen]
189 Same as '(x).setAttribute("y", z);'
191 For ANTLR v3 I decided to make the most common tasks easy by default
192 rather. This means that some of the basic objects are heavier weight
193 than some speed demons would like, but they are free to pare it down
194 leaving most programmers the luxury of having it "just work." For
195 example, to read in some input, tweak it, and write it back out
196 preserving whitespace, is easy in v3.
198 The ANTLR source code is much prettier. You'll also note that the
199 run-time classes are conveniently encapsulated in the
200 org.antlr.runtime package.
202 ----------------------------------------------------------------------
204 How do I install this damn thing?
206 Just untar and you'll get:
208 antlr-3.0b6/README.txt (this file)
209 antlr-3.0b6/LICENSE.txt
210 antlr-3.0b6/src/org/antlr/...
211 antlr-3.0b6/lib/stringtemplate-3.0.jar (3.0b6 needs 3.0)
212 antlr-3.0b6/lib/antlr-2.7.7.jar
213 antlr-3.0b6/lib/antlr-3.0b6.jar
215 Then you need to add all the jars in lib to your CLASSPATH.
217 ----------------------------------------------------------------------
219 How do I use ANTLR v3?
221 [I am assuming you are only using the command-line (and not the
224 Running ANTLR with no parameters shows you:
226 ANTLR Parser Generator Early Access Version 3.0b6 (Jan 31, 2007) 1989-2007
227 usage: java org.antlr.Tool [args] file.g [file2.g file3.g ...]
228 -o outputDir specify output directory where all output is generated
229 -lib dir specify location of token files
230 -report print out a report about the grammar(s) processed
231 -print print out the grammar without actions
232 -debug generate a parser that emits debugging events
233 -profile generate a parser that computes profiling information
234 -nfa generate an NFA for each rule
235 -dfa generate a DFA for each decision point
236 -message-format name specify output style for messages
237 -X display extended argument list
239 For example, consider how to make the LL-star example from the examples
240 tarball you can get at http://www.antlr.org/download/examples-v3.tar.gz
242 $ cd examples/java/LL-star
243 $ java org.antlr.Tool simplec.g
251 int foo(int y, char d) {
253 for (i=0; i<3; i=i+1) {
259 you will see output as follows:
265 What if I want to test my parser without generating code? Easy. Just
266 run ANTLR in interpreter mode. It can't execute your actions, but it
267 can create a parse tree from your input to show you how it would be
268 matched. Use the org.antlr.tool.Interp main class. In the following,
269 I interpret simplec.g on t.c, which contains "int x;"
271 $ java org.antlr.tool.Interp simplec.g WS program t.c
276 ( type [@0,0:2='int',<14>,1:0] )
277 ( declarator [@2,4:4='x',<2>,1:4] )
284 where I have formatted the output to make it more readable. I have
285 told it to ignore all WS tokens.
287 ----------------------------------------------------------------------
289 How do I rebuild ANTLR v3?
291 Make sure the following two jars are in your CLASSPATH
293 antlr-3.0b6/lib/stringtemplate-3.0.jar
294 antlr-3.0b6/lib/antlr-2.7.7.jar
295 junit.jar [if you want to build the test directories]
297 then jump into antlr-3.0b6/src directory and then type:
299 $ javac -d . org/antlr/Tool.java org/antlr/*/*.java org/antlr/*/*/*.java
301 Takes 9 seconds on my 1Ghz laptop or 4 seconds with jikes. Later I'll
302 have a real build mechanism, though I must admit the one-liner appeals
303 to me. I use Intellij so I never type anything actually to build.
305 There is also an ANT build.xml file, but I know nothing of ANT; contributed
306 by others (I'm opposed to any tool with an XML interface for Humans).
308 -----------------------------------------------------------------------
311 1. Auto-generated lexers do not inherit parent parser's @namespace
312 {...} value. Use @lexer::namespace{...}.
314 -----------------------------------------------------------------------
320 * Jonathan DeKlotz updated C# templates to be 3.0b6 current
324 * Manually-specified (...)=> force backtracking eval of that predicate.
325 backtracking=true mode does not however. Added unit test.
329 * Fixed bug in lexer where ~T didn't compute the set from rule T.
331 * Added -Xnoinlinedfa make all DFA with tables; no inline prediction with IFs
333 * Fixed http://www.antlr.org:8888/browse/ANTLR-80.
334 Sem pred states didn't define lookahead vars.
336 * Fixed http://www.antlr.org:8888/browse/ANTLR-91.
337 When forcing some acyclic DFA to be state tables, they broke.
338 Forcing all DFA to be state tables should give same results.
342 * setTokenSource in CommonTokenStream didn't clear tokens list.
343 setCharStream calls reset in Lexer.
345 * Altered -depend. No longer printing grammar files for multiple input
346 files with -depend. Doesn't show T__.g temp file anymore. Added
347 TLexer.tokens. Added .h files if defined.
351 * Added -depend command-line option that, instead of processing files,
352 it shows you what files the input grammar(s) depend on and what files
353 they generate. For combined grammar T.g:
355 $ java org.antlr.Tool -depend T.g
363 Now, assuming U.g is a tree grammar ref'd T's tokens:
365 $ java org.antlr.Tool -depend T.g U.g
374 Handles spaces by escaping them. Pays attention to -o, -fo and -lib.
375 Dir 'x y' is a valid dir in current dir.
377 $ java org.antlr.Tool -depend -lib /usr/local/lib -o 'x y' T.g U.g
378 x\ y/TParser.java : T.g
381 U.g: /usr/local/lib/T.tokens
385 You have API access via org.antlr.tool.BuildDependencyGenerator class:
386 getGeneratedFileList(), getDependenciesFileList(). You can also access
387 the output template: getDependencies(). The file
388 org/antlr/tool/templates/depend.stg contains the template. You can
389 modify as you want. File objects go in so you can play with path etc...
393 * no more .gl files generated. All .g all the time.
395 * changed @finally to be @after and added a finally clause to the
396 exception stuff. I also removed the superfluous "exception"
397 keyword. Here's what the new syntax looks like:
400 @after { System.out.println("ick"); }
403 catch[RecognitionException e] { System.out.println("foo"); }
404 catch[IOException e] { System.out.println("io"); }
405 finally { System.out.println("foobar"); }
407 @after executes after bookkeeping to set $rule.stop, $rule.tree but
408 before scopes pop and any memoization happens. Dynamic scopes and
409 memoization are still in generated finally block because they must
410 exec even if error in rule. The @after action and tree setting
411 stuff can technically be skipped upon syntax error in rule. [Later
412 we might add something to finally to stick an ERROR token in the
413 tree and set the return value.] Sequence goes: set $stop, $tree (if
414 any), @after (if any), pop scopes (if any), memoize (if needed),
415 grammar finally clause. Last 3 are in generated code's finally
418 3.0b6 - January 31, 2007
422 * Fixed bug in IntervalSet.and: it returned the same empty set all the time
423 rather than new empty set. Code altered the same empty set.
425 * Made analysis terminate faster upon a decision that takes too long;
426 it seemed to keep doing work for a while. Refactored some names
427 and updated comments. Also made it terminate when it realizes it's
428 non-LL(*) due to recursion. just added terminate conditions to loop
431 * Sometimes fatal non-LL(*) messages didn't appear; instead you got
432 "antlr couldn't analyze", which is actually untrue. I had the
433 order of some prints wrong in the DecisionProbe.
435 * The code generator incorrectly detected when it could use a fixed,
436 acyclic inline DFA (i.e., using an IF). Upon non-LL(*) decisions
437 with predicates, analysis made cyclic DFA. But this stops
438 the computation detecting whether they are cyclic. I just added
439 a protection in front of the acyclic DFA generator to avoid if
440 non-LL(*). Updated comments.
444 * Made tree node streams use adaptor to create navigation nodes.
445 Thanks to Emond Papegaaij.
449 * Added lexer rule properties: start, stop
453 * analysis failsafe is back on; if a decision takes too long, it bails out
458 * += labels for rules only work for output option; previously elements
459 of list were the return value structs, but are now either the tree or
460 StringTemplate return value. You can label different rules now
465 * Allow \" to work correctly in "..." template.
469 * errors that are now warnings: missing AST label type in trees.
470 Also "no start rule detected" is warning.
472 * tree grammars also can do rewrite=true for output=template.
473 Only works for alts with single node or tree as alt elements.
474 If you are going to use $text in a tree grammar or do rewrite=true
475 for templates, you must use in your main:
477 nodes.setTokenStream(tokens);
479 * You get a warning for tree grammars that do rewrite=true and
480 output=template and have -> for alts that are not simple nodes
481 or simple trees. new unit tests in TestRewriteTemplates at end.
485 * Error message appears when you use -> in tree grammar with
486 output=template and rewrite=true for alt that is not simple
489 * no more $stop attribute for tree parsers; meaningless/useless.
490 Removed from TreeRuleReturnScope also.
492 * rule text attribute in tree parser must pull from token buffer.
493 Makes no sense otherwise. added getTokenStream to TreeNodeStream
494 so rule $text attr works. CommonTreeNodeStream etc... now let
495 you set the token stream so you can access later from tree parser.
496 $text is not well-defined for rules like
500 because stat is not a single node nor rooted with a single node.
501 $slist.text will get only first stat. I need to add a warning about
504 * Fixed http://www.antlr.org:8888/browse/ANTLR-76 for Java.
505 Enhanced TokenRewriteStream so it accepts any object; converts
506 to string at last second. Allows you to rewrite with StringTemplate
509 * added rewrite option that makes -> template rewrites do replace ops for
510 TokenRewriteStream input stream. In output=template and rewrite=true mode
511 same as before 'cept that the parser does
513 ((TokenRewriteStream)input).replace(
514 ((Token)retval.start).getTokenIndex(),
515 input.LT(-1).getTokenIndex(),
518 after each rewrite so that the input stream is altered. Later refs to
519 $text will have rewrites. Here's a sample test program for grammar Rew.
521 FileReader groupFileR = new FileReader("Rew.stg");
522 StringTemplateGroup templates = new StringTemplateGroup(groupFileR);
523 ANTLRInputStream input = new ANTLRInputStream(System.in);
524 RewLexer lexer = new RewLexer(input);
525 TokenRewriteStream tokens = new TokenRewriteStream(lexer);
526 RewParser parser = new RewParser(tokens);
527 parser.setTemplateLib(templates);
529 System.out.println(tokens.toString());
534 * BaseTree.dupTree didn't dup recursively.
538 * Cleaned up some comments and removed field treeNode
539 from MismatchedTreeNodeException class. It is "node" in
540 RecognitionException.
542 * Changed type from Object to BitSet for expecting fields in
543 MismatchedSetException and MismatchedNotSetException
545 * Cleaned up error printing in lexers and the messages that it creates.
547 * Added this to TreeAdaptor:
548 /** Return the token object from which this node was created.
549 * Currently used only for printing an error message.
550 * The error display routine in BaseRecognizer needs to
551 * display where the input the error occurred. If your
552 * tree of limitation does not store information that can
553 * lead you to the token, you can create a token filled with
554 * the appropriate information and pass that back. See
555 * BaseRecognizer.getErrorMessage().
557 public Token getToken(Object t);
561 * made BaseRecognizer.displayRecognitionError nonstatic so people can
562 override it. Not sure why it was static before.
564 * Removed state/decision message that comes out of no
565 viable alternative exceptions, as that was too much.
566 removed the decision number from the early exit exception
567 also. During development, you can simply override
568 displayRecognitionError from BaseRecognizer to add the stuff
571 * made output go to an output method you can override: emitErrorMessage()
573 * general cleanup of the error emitting code in BaseRecognizer. Lots
574 more stuff you can override: getErrorHeader, getTokenErrorDisplay,
575 emitErrorMessage, getErrorMessage.
579 * Altered Tree.Parser.matchAny() so that it skips entire trees if
580 node has children otherwise skips one node. Now this works to
581 skip entire body of function if single-rooted subtree:
582 ^(FUNC name=ID arg=ID .)
584 * Added "reverse index" from node to stream index. Override
585 fillReverseIndex() in CommonTreeNodeStream if you want to change.
586 Use getNodeIndex(node) to find stream index for a specific tree node.
587 See getNodeIndex(), reverseIndex(Set tokenTypes),
588 reverseIndex(int tokenType), fillReverseIndex(). The indexing
589 costs time and memory to fill, but pulling stuff out will be lots
590 faster as it can jump from a node ptr straight to a stream index.
592 * Added TreeNodeStream.get(index) to make it easier for interpreters to
593 jump around in tree node stream.
595 * New CommonTreeNodeStream buffers all nodes in stream for fast jumping
596 around. It now has push/pop methods to invoke other locations in
597 the stream for building interpreters.
599 * Moved CommonTreeNodeStream to UnBufferedTreeNodeStream and removed
600 Iterator implementation. moved toNodesOnlyString() to TestTreeNodeStream
602 * [BREAKS ANY TREE IMPLEMENTATION]
603 made CommonTreeNodeStream work with any tree node type. TreeAdaptor
604 now implements isNil so must add; trivial, but does break back
609 * Added traceIn/Out methods to recognizers so that you can override them;
610 previously they were in-line print statements. The message has also
611 been slightly improved.
613 * Factored BuildParseTree into debug package; cleaned stuff up. Fixed
618 * [BREAKS ANY TREE IMPLEMENTATION]
619 org.antlr.runtime.tree.Tree; needed to add get/set for token start/stop
620 index so CommonTreeAdaptor can assume Tree interface not CommonTree
621 implementation. Otherwise, no way to create your own nodes that satisfy
622 Tree because CommonTreeAdaptor was doing
624 public int getTokenStartIndex(Object t) {
625 return ((CommonTree)t).startIndex;
630 /** What is the smallest token index (indexing from 0) for this node
633 int getTokenStartIndex();
635 void setTokenStartIndex(int index);
637 /** What is the largest token index (indexing from 0) for this node
640 int getTokenStopIndex();
642 void setTokenStopIndex(int index);
646 * Added org.antlr.runtime.tree.DOTTreeGenerator so you can generate DOT
647 diagrams easily from trees.
649 CharStream input = new ANTLRInputStream(System.in);
650 TLexer lex = new TLexer(input);
651 CommonTokenStream tokens = new CommonTokenStream(lex);
652 TParser parser = new TParser(tokens);
653 TParser.e_return r = parser.e();
654 Tree t = (Tree)r.tree;
655 System.out.println(t.toStringTree());
656 DOTTreeGenerator gen = new DOTTreeGenerator();
657 StringTemplate st = gen.toDOT(t);
658 System.out.println(st);
660 * Changed the way mark()/rewind() work in CommonTreeNode stream to mirror
661 more flexible solution in ANTLRStringStream. Forgot to set lastMarker
662 anyway. Now you can rewind to non-most-recent marker.
666 * Temp lexer now end in .gl (T__.gl, for example)
668 * TreeParser suffix no longer generated for tree grammars
670 * Defined reset for lexer, parser, tree parser; rewinds the input stream also
674 * Made Grammar.abortNFAToDFAConversion() abort in middle of a DFA.
678 * fixed bug in OrderedHashSet.add(). It didn't track elements correctly.
682 * updated build.xml for future Ant compatibility, thanks to Matt Benson.
684 * various tests in TestRewriteTemplate and TestSyntacticPredicateEvaluation
685 were using the old 'channel' vs. new '$channel' notation.
686 TestInterpretedParsing didn't pick up an earlier change to CommonToken.
687 Reported by Matt Benson.
689 * fixed platform dependent test failures in TestTemplates, supplied by Matt
694 * optimized semantic predicate evaluation so that p||!p yields true.
698 * fixed bug that prevented var = $rule.some_retval from working in anything
699 but the first alternative of a rule or subrule.
701 * attribute names containing digits were not allowed, this is now fixed,
702 allowing attributes like 'name1' but not '1name1'.
706 * Removed LeftRecursionMessage and apparatus because it seems that I check
707 for left recursion upfront before analysis and everything gets specified as
708 recursion cycles at this point.
712 * TokenRewriteStream.replace was not passing programName to next method.
716 * updated DOT files for DFA generation to make smaller circles.
718 * made epsilon edges italics in the NFA diagrams.
720 3.0b5 - November 15, 2006
722 The biggest thing is that your grammar file names must match the grammar name
723 inside (your generated class names will also be different) and we use
724 $channel=HIDDEN now instead of channel=99 inside lexer actions.
725 Should be compatible other than that. Please look at complete list of
730 * Force token index to be -1 for CommonIndex in case not set.
734 * getUniqueID for TreeAdaptor now uses identityHashCode instead of hashCode.
738 * No grammar nondeterminism warning now when wildcard '.' is final alt.
748 : '//' (options {greedy=false;} : .)* '\r'? '\n'
752 : '//' (options {greedy=false;} : 'x'|.)* '\r'? '\n'
758 * Syntactic predicates did not get hoisting properly upon non-LL(*) decision. Other hoisting issues fixed. Cleaned up code.
760 * Removed failsafe that check to see if I'm spending too much time on a single DFA; I don't think we need it anymore.
764 * $text, $line, etc... were not working in assignments. Fixed and added
767 * $label.text translated to label.getText in lexer even if label was on a char
771 * Added error if you don't specify what the AST type is; actions in tree
772 grammar won't work without it.
776 a : ID {String s = $ID.text;} ;
778 ANTLR Parser Generator Early Access Version 3.0b5 (??, 2006) 1989-2006
779 error: x.g:0:0: (152) tree grammar x has no ASTLabelType option
783 * $text, $line, etc... were not working properly within lexer rule.
787 * Finally actions now execute before dynamic scopes are popped it in the
788 rule. Previously was not possible to access the rules scoped variables
793 * Altered ActionTranslator to emit errors on setting read-only attributes
794 such as $start, $stop, $text in a rule. Also forbid setting any attributes
795 in rules/tokens referenced by a label or name.
796 Setting dynamic scopes's attributes and your own parameter attributes
801 * Altered how ANTLR figures out what decision is associated with which
802 block of grammar. Makes ANTLRWorks correctly find DFA for a block.
806 * Fixed bug where EOT transitions led to no NFA configs in a DFA state,
807 yielding an error in DFA table generation.
809 * renamed action.g to ActionTranslator.g
810 the ActionTranslator class is now called ActionTranslatorLexer, as ANTLR
811 generates this classname now. Fixed rest of codebase accordingly.
813 * added rules recognizing setting of scopes' attributes to ActionTranslator.g
814 the Objective C target needed access to the right-hand side of the assignment
815 in order to generate correct code
817 * changed ANTLRCore.sti to reflect the new mandatory templates to support the above
818 namely: scopeSetAttributeRef, returnSetAttributeRef and the ruleSetPropertyRef_*
819 templates, with the exception of ruleSetPropertyRef_text. we cannot set this attribute
823 * Fixed 2 bugs in DFA conversion that caused exceptions.
824 altered functionality of getMinElement so it ignores elements<0.
828 * moved resetStateNumbersToBeContiguous() to after issuing of warnings;
829 an internal error in that routine should make more sense as issues
830 with decision will appear first.
832 * fixed cut/paste bug I introduced when fixed EOF in min/max
833 bug. Prevented C grammar from working briefly.
837 * Removed a failsafe that seems to be unnecessary that ensure DFA didn't
838 get too big. It was resulting in some failures in code generation that
839 led me on quite a strange debugging trip.
843 * Use channel=HIDDEN not channel=99 to put tokens on hidden channel.
847 * ANTLR now has a customizable message format for errors and warnings,
848 to make it easier to fulfill requirements by IDEs and such.
849 The format to be used can be specified via the '-message-format name'
850 command line switch. The default for name is 'antlr', also available
851 at the moment is 'gnu'. This is done via StringTemplate, for details
852 on the requirements look in org/antlr/tool/templates/messages/formats/
854 * line numbers for lexers in combined grammars are now reported correctly.
858 * ANTLRReaderStream improperly checked for end of input.
862 * For ANTLRStringStream, LA(-1) was off by one...gave you LA(-2).
864 3.0b4 - August 24, 2006
866 * error when no rules in grammar. doesn't crash now.
868 * Token is now an interface.
870 * remove dependence on non runtime classes in runtime package.
872 * filename and grammar name must be same Foo in Foo.g. Generates FooParser,
873 FooLexer, ... Combined grammar Foo generates Foo$Lexer.g which generates
874 FooLexer.java. tree grammars generate FooTreeParser.java
878 * added C# target to lib, codegen, templates
882 * added tree arg to navigation methods in treeadaptor
886 * fixed bug related to (a|)+ on end of lexer rules. crashed instead
889 * added warning that interpreter doesn't do synpreds yet
891 * allow different source of classloader:
892 ClassLoader cl = Thread.currentThread().getContextClassLoader();
894 cl = this.getClass().getClassLoader();
900 * compressed DFA edge tables significantly. All edge tables are
901 unique. The transition table can reuse arrays. Look like this now:
903 public static readonly DFA30_transition0 =
904 new short[] { 46, 46, -1, 46, 46, -1, -1, -1, -1, -1, -1, -1,...};
905 public static readonly DFA30_transition1 =
907 public static readonly short[][] DFA30_transition = {
914 * If you defined both a label like EQ and '=', sometimes the '=' was
915 used instead of the EQ label.
917 * made headerFile template have same arg list as outputFile for consistency
919 * outputFile, lexer, genericParser, parser, treeParser templates
920 reference cyclicDFAs attribute which was no longer used after I
921 started the new table-based DFA. I made cyclicDFADescriptors
922 argument to outputFile and headerFile (only). I think this is
923 correct as only OO languages will want the DFA in the recognizer.
924 At the top level, C and friends can use it. Changed name to use
925 cyclicDFAs again as it's a better name probably. Removed parameter
926 from the lexer, ... For example, my parser template says this now:
928 <cyclicDFAs:cyclicDFA()> <! dump tables for all DFA !>
930 * made all token ref token types go thru code gen's
931 getTokenTypeAsTargetLabel()
933 * no more computing DFA transition tables for acyclic DFA.
937 * fixed a place where I was adding syn predicates into rewrite stuff.
939 * turned off invalid token index warning in AW support; had a problem.
941 * bad location event generated with -debug for synpreds in autobacktrack mode.
945 * changed runtime.DFA so that it treats all chars and token types as
946 char (unsigned 16 bit int). -1 becomes '\uFFFF' then or 65535.
948 * changed MAX_STATE_TRANSITIONS_FOR_TABLE to be 65534 by default
949 now. This means that all states can use a table to do transitions.
951 * was not making synpreds on (C)* type loops with backtrack=true
953 * was copying tree stuff and actions into synpreds with backtrack=true
955 * was making synpreds on even single alt rules / blocks with backtrack=true
957 3.0b3 - July 21, 2006
959 * ANTLR fails to analyze complex decisions much less frequently. It
960 turns out that the set of decisions for which ANTLR fails (times
961 out) is the same set (so far) of non-LL(*) decisions. Morever, I'm
962 able to detect this situation quickly and report rather than timing
963 out. Errors look like:
965 java.g:468:23: [fatal] rule concreteDimensions has non-LL(*)
966 decision due to recursive rule invocations in alts 1,2. Resolve
967 by left-factoring or using syntactic predicates with fixed k
968 lookahead or use backtrack=true option.
970 This message only appears when k=*.
972 * Shortened no viable alt messages to not include decision
975 [compilationUnit, declaration]: line 8:8 decision=<<67:1: declaration
976 : ( ( fieldDeclaration )=> fieldDeclaration | ( methodDeclaration )=>
977 methodDeclaration | ( constructorDeclaration )=>
978 constructorDeclaration | ( classDeclaration )=> classDeclaration | (
979 interfaceDeclaration )=> interfaceDeclaration | ( blockDeclaration )=>
980 blockDeclaration | emptyDeclaration );>> state 3 (decision=14) no
981 viable alt; token=[@1,184:187='java',<122>,8:8]
983 too long and hard to read.
987 * Code gen bug: states with no emanating edges were ignored by ST.
988 Now an empty list is used.
990 * Added grammar parameter to recognizer templates so they can access
991 properties like getName(), ...
995 * Fixed the gated pred merged state bug. Added unit test.
997 * added new method to Target: getTokenTypeAsTargetLabel()
1001 * I was doing an AND instead of OR in the gated predicate stuff.
1002 Thanks to Stephen Kou!
1004 * Reduce op for combining predicates was insanely slow sometimes and
1005 didn't actually work well. Now it's fast and works.
1007 * There is a bug in merging of DFA stop states related to gated
1008 preds...turned it off for now.
1010 3.0b2 - July 5, 2006
1014 * token emission not properly protected in lexer filter mode.
1016 * EOT, EOT DFA state transition tables should be init'd to -1 (only
1017 was doing this for compressed tables). Fixed.
1019 * in trace mode, exit method not shown for memoized rules
1021 * added -Xmaxdfaedges to allow you to increase number of edges allowed
1022 for a single DFA state before it becomes "special" and can't fit in
1025 * Bug in tables. Short are signed so min/max tables for DFA are now
1030 * Added a method to reset the tool error state for current thread.
1031 See ErrorManager.java
1033 * [Got this working properly today] backtrack mode that let's you type
1034 in any old crap and ANTLR will backtrack if it can't figure out what
1035 you meant. No errors are reported by antlr during analysis. It
1036 implicitly adds a syn pred in front of every production, using them
1037 only if static grammar LL(*) analysis fails. Syn pred code is not
1038 generated if the pred is not used in a decision.
1040 This is essentially a rapid prototyping mode.
1042 * Added backtracking report to the -report option
1044 * Added NFA->DFA conversion early termination report to the -report option
1046 * Added grammar level k and backtrack options to -report
1048 * Added a dozen unit tests to test autobacktrack NFA construction.
1050 * If you are using filter mode, you must manually use option
1055 * Added k=* option so you can set k=2, for example, on whole grammar,
1056 but an individual decision can be LL(*).
1058 * memoize option for grammars, rules, blocks. Remove -nomemo cmd-line option
1060 * but in DOT generator for DFA; fixed.
1062 * runtime.DFA reported errors even when backtracking
1066 * Added -X option list to help
1068 * Syn preds were being hoisted into other rules, causing lots of extra
1073 * unnecessary files removed during build.
1075 * Matt Benson updated build.xml
1077 * Detecting use of synpreds in analysis now instead of codegen. In
1078 this way, I can avoid analyzing decisions in synpreds for synpreds
1079 not used in a DFA for a real rule. This is used to optimize things
1080 for backtrack option.
1082 * Code gen must add _fragment or whatever to end of pred name in
1083 template synpredRule to avoid having ANTLR know anything about
1086 * Added -IdbgST option to emit ST delimiters at start/stop of all
1091 * Tweaked message when ANTLR cannot handle analysis.
1093 3.0b1 - June 27, 2006
1097 * syn preds no longer generate little static classes; they also don't
1098 generate a whole bunch of extra crap in the rules built to test syn
1099 preds. Removed GrammarFragmentPointer class from runtime.
1103 * added output option to -report output.
1105 * added profiling info:
1106 Number of rule invocations in "guessing" mode
1107 number of rule memoization cache hits
1108 number of rule memoization cache misses
1110 * made DFA DOT diagrams go left to right not top to bottom
1112 * I try to recursive overflow states now by resolving these states
1113 with semantic/syntactic predicates if they exist. The DFA is then
1114 deterministic rather than simply resolving by choosing first
1115 nondeterministic alt. I used to generated errors:
1117 ~/tmp $ java org.antlr.Tool -dfa t.g
1118 ANTLR Parser Generator Early Access Version 3.0b2 (July 5, 2006) 1989-2006
1119 t.g:2:5: Alternative 1: after matching input such as A A A A A decision cannot predict what comes next due to recursion overflow to b from b
1120 t.g:2:5: Alternative 2: after matching input such as A A A A A decision cannot predict what comes next due to recursion overflow to b from b
1122 Now, I uses predicates if available and emits no warnings.
1124 * made sem preds share accept states. Previously, multiple preds in a
1125 decision forked new accepts each time for each nondet state.
1129 * Need parens around the prediction expressions in templates.
1131 * Referencing $ID.text in an action forced bad code gen in lexer rule ID.
1133 * Fixed a bug in how predicates are collected. The definition of
1134 "last predicated alternative" was incorrect in the analysis. Further,
1135 gated predicates incorrectly missed a case where an edge should become
1138 * Removed an unnecessary input.consume() reference in the runtime/DFA class.
1142 * -> ($rulelabel)? didn't generate proper code for ASTs.
1144 * bug in code gen (did not compile)
1148 Problem is repeated ref to ID from left side. Juergen pointed this out.
1150 * use of tokenVocab with missing file yielded exception
1152 * (A|B)=> foo yielded an exception as (A|B) is a set not a block. Fixed.
1154 * Didn't set ID1= and INT1= for this alt:
1155 | ^(ID INT+ {System.out.print(\"^(\"+$ID+\" \"+$INT+\")\");})
1157 * Fixed so repeated dangling state errors only occur once like:
1158 t.g:4:17: the decision cannot distinguish between alternative(s) 2,1 for at least one input sequence
1160 * tracking of rule elements was on (making list defs at start of
1161 method) with templates instead of just with ASTs. Turned off.
1163 * Doesn't crash when you give it a missing file now.
1165 * -report: add output info: how many LL(1) decisions.
1169 * ^(ROOT ID?) Didn't work; nor did any other nullable child list such as
1170 ^(ROOT ID* INT?). Now, I check to see if child list is nullable using
1171 Grammar.LOOK() and, if so, I generate an "IF lookahead is DOWN" gate
1172 around the child list so the whole thing is optional.
1174 * Fixed a bug in LOOK that made it not look through nullable rules.
1176 * Using AST suffixes or -> rewrite syntax now gives an error w/o a grammar
1177 output option. Used to crash ;)
1179 * References to EOF ended up with improper -1 refs instead of EOF in output.
1181 * didn't warn of ambig ref to $expr in rewrite; fixed.
1183 : '[' expr 'for' type ID 'in' expr ']'
1184 -> comprehension(expr={$expr.st},type={},list={},i={})
1189 * EOF works in the parser as a token name.
1191 * Rule b:(A B?)*; didn't display properly in AW due to the way ANTLR
1194 * "scope x;" in a rule for unknown x gives no error. Fixed. Added unit test.
1196 * Label type for refs to start/stop in tree parser and other parsers were
1197 not used. Lots of casting. Ick. Fixed.
1199 * couldn't refer to $tokenlabel in isolation; but need so we can test if
1200 something was matched. Fixed.
1202 * Lots of little bugs fixed in $x.y, %... translation due to new
1205 * Improperly tracking block nesting level; result was that you couldn't
1206 see $ID in action of rule "a : A+ | ID {Token t = $ID;} | C ;"
1208 * a : ID ID {$ID.text;} ; did not get a warning about ambiguous $ID ref.
1210 * No error was found on $COMMENT.text:
1213 : '/*' (options {greedy=false;} : . )* '*/'
1214 {System.out.println("found method "+$COMMENT.text);}
1217 $enclosinglexerrule scope does not exist. Use text or setText() here.
1221 * Single return values are initialized now to default or to your spec.
1223 * cleaned up input stream stuff. Added ANTLRReaderStream, ANTLRInputStream
1224 and refactored. You can specify encodings now on ANTLRFileStream (and
1225 ANTLRInputStream) now.
1227 * You can set text local var now in a lexer rule and token gets that text.
1228 start/stop indexes are still set for the token.
1230 * Changed lexer slightly. Calling a nonfragment rule from a
1231 nonfragment rule does not set the overall token.
1235 * Fixed bug where unnecessary escapes yield char==0 like '\{'.
1237 * Fixed analysis bug. This grammar didn't report a recursion warning:
1244 The DFAState.equals() method was messed up.
1246 * Added @synpredgate {...} action so you can tell ANTLR how to gate actions
1247 in/out during syntactic predicate evaluation.
1249 * Fuzzy parsing should be more efficient. It should backtrack over a rule
1250 and then rewind and do it again "with feeling" to exec actions. It was
1251 actually doing it 3x not 2x.
1255 * Gutted and rebuilt the action translator for $x.y, $x::y, ...
1256 Uses ANTLR v3 now for the first time inside v3 source. :)
1257 ActionTranslator.java
1259 * Fixed a bug where referencing a return value on a rule didn't work
1260 because later a ref to that rule's predefined properties didn't
1261 properly force a return value struct to be built. Added unit test.
1265 * New DFA mechanisms. Cyclic DFA are implemented as state tables,
1266 encoded via strings as java cannot handle large static arrays :(
1267 States with edges emanating that have predicates are specially
1268 treated. A method is generated to do these states. The DFA
1269 simulation routine uses the "special" array to figure out if the
1270 state is special. See March 25, 2006 entry for description:
1271 http://www.antlr.org/blog/antlr3/codegen.tml. analysis.DFA now has
1272 all the state tables generated for code gen. CyclicCodeGenerator.java
1273 disappeared as it's unneeded code. :)
1275 * Internal general clean up of the DFA.states vs uniqueStates thing.
1276 Fixed lookahead decisions no longer fill uniqueStates. Waste of
1277 time. Also noted that when adding sem pred edges, I didn't check
1278 for state reuse. Fixed.
1282 * When resolving ambig DFA states predicates, I did not add the new states
1283 to the list of unique DFA states. No observable effect on output except
1284 that DFA state numbers were not always contiguous for predicated decisions.
1285 I needed this fix for new DFA tables.
1287 3.0ea10 - June 2, 2006
1291 * Improved grammar stats and added syntactic pred tracking.
1295 * Due to a type mismatch, the DebugParser.recoverFromMismatchedToken()
1296 method was not called. Debug events for mismatched token error
1297 notification were not sent to ANTLRWorks probably
1299 * Added getBacktrackingLevel() for any recognizer; needed for profiler.
1301 * Only writes profiling data for antlr grammar analysis with -profile set
1303 * Major update and bug fix to (runtime) Profiler.
1307 * Added Lexer.skip() to force lexer to ignore current token and look for
1308 another; no token is created for current rule and is not passed on to
1309 parser (or other consumer of the lexer).
1311 * Parsers are much faster now. I removed use of java.util.Stack for pushing
1312 follow sets and use a hardcoded array stack instead. Dropped from
1313 5900ms to 3900ms for parse+lex time parsing entire java 1.4.2 source. Lex
1314 time alone was about 1500ms. Just looking at parse time, we get about 2x
1315 speed improvement. :)
1319 * Fixed NFA construction so it generates NFA for (A*)* such that ANTLRWorks
1320 can display it properly.
1324 * added abort method to Grammar so AW can terminate the conversion if it's
1329 * added method to get left recursive rules from grammar without doing full
1332 * analysis, code gen not attempted if serious error (like
1333 left-recursion or missing rule definition) occurred while reading
1334 the grammar in and defining symbols.
1336 * added amazing optimization; reduces analysis time by 90% for java
1337 grammar; simple IF statement addition!
1339 3.0ea9 - May 20, 2006
1341 * added global k value for grammar to limit lookahead for all decisions unless
1342 overridden in a particular decision.
1344 * added failsafe so that any decision taking longer than 2 seconds to create
1345 the DFA will fall back on k=1. Use -ImaxtimeforDFA n (in ms) to set the time.
1347 * added an option (turned off for now) to use multiple threads to
1348 perform grammar analysis. Not much help on a 2-CPU computer as
1349 garbage collection seems to peg the 2nd CPU already. :( Gotta wait for
1352 * switched from #src to // $ANTLR src directive.
1354 * CommonTokenStream.getTokens() looked past end of buffer sometimes. fixed.
1356 * unicode literals didn't really work in DOT output and generated code. fixed.
1358 * fixed the unit test rig so it compiles nicely with Java 1.5
1360 * Added ant build.xml file (reads build.properties file)
1362 * predicates sometimes failed to compile/eval properly due to missing (...)
1363 in IF expressions. Forced (..)
1365 * (...)? with only one alt were not optimized. Was:
1369 int LA1_0 = input.LA(1);
1373 else if ( LA1_0==-1 ) {
1377 NoViableAltException nvae =
1378 new NoViableAltException("4:7: ( B )?", 1, 0, input);
1386 int LA1_0 = input.LA(1);
1391 Smaller, faster and more readable.
1393 * Allow manual init of return values now:
1394 functionHeader returns [int x=3*4, char (*f)()=null] : ... ;
1396 * Added optimization for DFAs that fixed a codegen bug with rules in lexer:
1398 ASSIGNOP : '=' | '+=' ;
1399 EQ is a subset of other rule. It did not given an error which is
1400 correct, but generated bad code.
1402 * ANTLR was sending column not char position to ANTLRWorks.
1404 * Bug fix: location 0, 0 emitted for synpreds and empty alts.
1406 * debugging event handshake how sends grammar file name. Added getGrammarFileName() to recognizers. Java.stg generates it:
1408 public String getGrammarFileName() { return "<fileName>"; }
1410 * tree parsers can do arbitrary lookahead now including backtracking. I
1411 updated CommonTreeNodeStream.
1413 * added events for debugging tree parsers:
1415 /** Input for a tree parser is an AST, but we know nothing for sure
1416 * about a node except its type and text (obtained from the adaptor).
1417 * This is the analog of the consumeToken method. Again, the ID is
1418 * the hashCode usually of the node so it only works if hashCode is
1421 public void consumeNode(int ID, String text, int type);
1423 /** The tree parser looked ahead */
1424 public void LT(int i, int ID, String text, int type);
1426 /** The tree parser has popped back up from the child list to the
1431 /** The tree parser has descended to the first child of a the current
1434 public void goDown();
1436 * Added DebugTreeNodeStream and DebugTreeParser classes
1438 * Added ctor because the debug tree node stream will need to ask quesitons about nodes and since nodes are just Object, it needs an adaptor to decode the nodes and get text/type info for the debugger.
1440 public CommonTreeNodeStream(TreeAdaptor adaptor, Tree tree);
1442 * added getter to TreeNodeStream:
1443 public TreeAdaptor getTreeAdaptor();
1445 * Implemented getText/getType in CommonTreeAdaptor.
1447 * Added TraceDebugEventListener that can dump all events to stdout.
1449 * I broke down and make Tree implement getText
1451 * tree rewrites now gen location debug events.
1453 * added AST debug events to listener; added blank listener for convenience
1455 * updated debug events to send begin/end backtrack events for debugging
1457 * with a : (b->b) ('+' b -> ^(PLUS $a b))* ; you get b[0] each time as
1458 there is no loop in rewrite rule itself. Need to know context that
1459 the -> is inside the rule and hence b means last value of b not all
1462 * Bug in TokenRewriteStream; ops at indexes < start index blocked proper op.
1464 * Actions in ST rewrites "-> ({$op})()" were not translated
1466 * Added new action name:
1469 catch (RecognitionException re) {
1473 catch (Throwable t) {
1474 System.err.println(t);
1477 Overrides rule catch stuff.
1479 * Isolated $ refs caused exception
1481 3.0ea8 - March 11, 2006
1483 * added @finally {...} action like @init for rules. Executes in
1484 finally block (java target) after all other stuff like rule memoization.
1485 No code changes needs; ST just refs a new action:
1486 <ruleDescriptor.actions.finally>
1488 * hideous bug fixed: PLUS='+' didn't result in '+' rule in lexer
1490 * TokenRewriteStream didn't do toString() right when no rewrites had been done.
1492 * lexer errors in interpreter were not printed properly
1494 * bitsets are dumped in hex not decimal now for FOLLOW sets
1496 * /* epsilon */ is not printed now when printing out grammars with empty alts
1498 * Fixed another bug in tree rewrite stuff where it was checking that elements
1499 had at least one element. Strange...commented out for now to see if I can remember what's up.
1501 * Tree rewrites had problems when you didn't have x+=FOO variables. Rules
1504 a : (x=ID)? y=ID -> ($x $y)?;
1506 * filter=true for lexers turns on k=1 and backtracking for every token
1507 alternative. Put the rules in priority order.
1509 * added getLine() etc... to Tree to support better error reporting for
1510 trees. Added MismatchedTreeNodeException.
1512 * $templates::foo() is gone. added % as special template symbol.
1513 %foo(a={},b={},...) ctor (even shorter than $templates::foo(...))
1514 %({name-expr})(a={},...) indirect template ctor reference
1516 The above are parsed by antlr.g and translated by codegen.g
1517 The following are parsed manually here:
1519 %{string-expr} anonymous template from string expr
1520 %{expr}.y = z; template attribute y of StringTemplate-typed expr to z
1521 %x.y = z; set template attribute y of x (always set never get attr)
1522 to z [languages like python without ';' must still use the
1523 ';' which the code generator is free to remove during code gen]
1525 * -> ({expr})(a={},...) notation for indirect template rewrite.
1526 expr is the name of the template.
1528 * $x[i]::y and $x[-i]::y notation for accesssing absolute scope stack
1529 indexes and relative negative scopes. $x[-1]::y is the y attribute
1530 of the previous scope (stack top - 1).
1532 * filter=true mode for lexers; can do this now...upon mismatch, just
1533 consumes a char and tries again:
1534 lexer grammar FuzzyJava;
1535 options {filter=true;}
1538 : TYPE WS? name=ID WS? (';'|'=')
1539 {System.out.println("found var "+$name.text);}
1542 * refactored char streams so ANTLRFileStream is now a subclass of
1545 * char streams for lexer now allowed nested backtracking in lexer.
1547 * added TokenLabelType for lexer/parser for all token labels
1549 * line numbers for error messages were not updated properly in antlr.g
1550 for strings, char literals and <<...>>
1552 * init action in lexer rules was before the type,start,line,... decls.
1554 * Tree grammars can now specify output; I've only tested output=templat
1557 * You can reference EOF now in the parser and lexer. It's just token type
1560 * Bug fix: $ID refs in the *lexer* were all messed up. Cleaned up the
1561 set of properties available...
1563 * Bug fix: .st not found in rule ref when rule has scope:
1566 StringTemplate funcDef;
1569 {$field::funcDef = $field.st;}
1571 it gets field_stack.st instead
1573 * return in backtracking must return retval or null if return value.
1575 * $property within a rule now works like $text, $st, ...
1577 * AST/Template Rewrites were not gated by backtracking==0 so they
1578 executed even when guessing. Auto AST construction is now gated also.
1580 * CommonTokenStream was somehow returning tokens not text in toString()
1582 * added useful methods to runtime.BitSet and also to CommonToken so you can
1583 update the text. Added nice Token stream method:
1585 /** Given a start and stop index, return a List of all tokens in
1586 * the token type BitSet. Return null if no tokens were found. This
1587 * method looks at both on and off channel tokens.
1589 public List getTokens(int start, int stop, BitSet types);
1591 * literals are now passed in the .tokens files so you can ref them in
1592 tree parses, for example.
1594 * added basic exception handling; no labels, just general catches:
1598 catch[RecognitionException re] {
1599 System.out.println("recog error");
1601 catch[Exception e] {
1602 System.out.println("error");
1605 * Added method to TokenStream:
1606 public String toString(Token start, Token stop);
1608 * antlr generates #src lines in lexer grammars generated from combined grammars
1609 so error messages refer to original file.
1611 * lexers generated from combined grammars now use originally formatting.
1613 * predicates have $x.y stuff translated now. Warning: predicates might be
1614 hoisted out of context.
1616 * return values in return val structs are now public.
1618 * output=template with return values on rules was broken. I assume return values with ASTs was broken too. Fixed.
1620 3.0ea7 - December 14, 2005
1622 * Added -print option to print out grammar w/o actions
1624 * Renamed BaseParser to be BaseRecognizer and even made Lexer derive from
1625 this; nice as it now shares backtracking support code.
1627 * Added syntactic predicates (...)=>. See December 4, 2005 entry:
1629 http://www.antlr.org/blog/antlr3/lookahead.tml
1631 Note that we have a new option for turning off rule memoization during
1634 -nomemo when backtracking don't generate memoization code
1636 * Predicates are now tested in order that you specify the alts. If you
1637 leave the last alt "naked" (w/o pred), it will assume a true pred rather
1638 than union of other preds.
1640 * Added gated predicates "{p}?=>" that literally turn off a production whereas
1641 disambiguating predicates are only hoisted into the predictor when syntax alone
1642 is not sufficient to uniquely predict alternatives.
1645 B : {!p}? => ("a"|"b")+ ;
1647 * bug fixed related to predicates in predictor
1650 B : {!p}? ("a"|"b")+ ;
1651 DFA is correct. A state splits for input "a" on the pred.
1652 Generated code though was hosed. No pred tests in prediction code!
1653 I added testLexerPreds() and others in TestSemanticPredicateEvaluation.java
1655 * added execAction template in case we want to do something in front of
1656 each action execution or something.
1658 * left-recursive cycles from rules w/o decisions were not detected.
1660 * undefined lexer rules were not announced! fixed.
1662 * unreachable messages for Tokens rule now indicate rule name not alt. E.g.,
1664 Ruby.lexer.g:24:1: The following token definitions are unreachable: IVAR
1666 * nondeterminism warnings improved for Tokens rule:
1668 Ruby.lexer.g:10:1: Multiple token rules can match input such as ""0".."9"": INT, FLOAT
1669 As a result, tokens(s) FLOAT were disabled for that input
1672 * DOT diagrams didn't show escaped char properly.
1674 * Char/string literals are now all 'abc' not "abc".
1676 * action syntax changed "@scope::actionname {action}" where scope defaults
1677 to "parser" if parser grammar or combined grammar, "lexer" if lexer grammar,
1678 and "treeparser" if tree grammar. The code generation targets decide
1679 what scopes are available. Each "scope" yields a hashtable for use in
1680 the output templates. The scopes full of actions are sent to all output
1681 file templates (currently headerFile and outputFile) as attribute actions.
1682 Then you can reference <actions.scope> to get the map of actions associated
1683 with scope and <actions.parser.header> to get the parser's header action
1684 for example. This should be very flexible. The target should only have
1685 to define which scopes are valid, but the action names should be variable
1686 so we don't have to recompile ANTLR to add actions to code gen templates.
1689 options {language=Java;}
1690 @header { package foo; }
1691 @parser::stuff { int i; } // names within scope not checked; target dependent
1693 @lexer::header {head}
1694 @lexer::members { int j; }
1695 @headerfile::blort {...} // error: this target doesn't have headerfile
1696 @treeparser::members {...} // error: this is not a tree parser
1703 For now, the Java target uses members and header as a valid name. Within a
1704 rule, the init action name is valid.
1706 * changed $dynamicscope.value to $dynamicscope::value even if value is defined
1707 in same rule such as $function::name where rule function defines name.
1709 * $dynamicscope gets you the stack
1711 * rule scopes go like this now:
1715 scope slist,Symbols;
1719 * Created RuleReturnScope as a generic rule return value. Makes it easier
1721 RuleReturnScope r = parser.program();
1722 System.out.println(r.getTemplate().toString());
1724 * $template, $tree, $start, etc...
1726 * $r.x in current rule. $r is ignored as fully-qualified name. $r.start works too
1728 * added warning about $r referring to both return value of rule and dynamic scope of rule
1730 * integrated StringTemplate in a very simple manner
1733 -> template(arglist) "..."
1734 -> template(arglist) <<...>>
1735 -> namedTemplate(arglist)
1736 -> {free expression}
1740 a : A B -> {p1}? foo(a={$A.text})
1741 -> {p2}? foo(a={$B.text})
1742 -> // return nothing
1744 An arg list is just a list of template attribute assignments to actions in curlies.
1746 There is a setTemplateLib() method for you to use with named template rewrites.
1751 options {output=template;}
1754 This all should work for tree grammars too, but I'm still testing.
1756 * fixed bugs where strings were improperly escaped in exceptions, comments, etc.. For example, newlines came out as newlines not the escaped version
1758 3.0ea6 - November 13, 2005
1760 * turned off -debug/-profile, which was on by default
1762 * completely refactored the output templates; added some missing templates.
1764 * dramatically improved infinite recursion error messages (actually
1765 left-recursion never even was printed out before).
1767 * wasn't printing dangling state messages when it reanalyzes with k=1.
1769 * fixed a nasty bug in the analysis engine dealing with infinite recursion.
1770 Spent all day thinking about it and cleaned up the code dramatically.
1771 Bug fixed and software is more powerful and I understand it better! :)
1773 * improved verbose DFA nodes; organized by alt
1775 * got much better random phrase generation. For example:
1777 $ java org.antlr.tool.RandomPhrase simple.g program
1778 int Ktcdn ';' method wh '(' ')' '{' return 5 ';' '}'
1780 * empty rules like "a : ;" generated code that didn't compile due to
1781 try/catch for RecognitionException. Generated code couldn't possibly
1782 throw that exception.
1784 * when printing out a grammar, such as in comments in generated code,
1785 ANTLR didn't print ast suffix stuff back out for literals.
1787 * This never exited loop:
1788 DATA : (options {greedy=false;}: .* '\n' )* '\n' '.' ;
1789 and now it works due to new default nongreedy .* Also this works:
1790 DATA : (options {greedy=false;}: .* '\n' )* '.' ;
1792 * Dot star ".*" syntax didn't work; in lexer it is nongreedy by
1793 default. In parser it is on greedy but also k=1 by default. Added
1794 unit tests. Added blog entry to describe.
1796 * ~T where T is the only token yielded an empty set but no error
1798 * Used to generate unreachable message here:
1805 z.g:3:11: The following alternatives are unreachable: 2
1807 In fact it should really be an error; now it generates:
1809 no start rule in grammar t (no rule can obviously be followed by EOF)
1811 Per next change item, ANTLR cannot know that EOF follows rule 'a'.
1813 * added error message indicating that ANTLR can't figure out what your
1814 start rule is. Required to properly generate code in some cases.
1816 * validating semantic predicates now work (if they are false, they
1817 throw a new FailedPredicateException
1819 * two hideous bug fixes in the IntervalSet, which made analysis go wrong
1820 in a few cases. Thanks to Oliver Zeigermann for finding lots of bugs
1821 and making suggested fixes (including the next two items)!
1823 * cyclic DFAs are now nonstatic and hence can access instance variables
1825 * labels are now allowed on lexical elements (in the lexer)
1827 * added some internal debugging options
1829 * ~'a'* and ~('a')* were not working properly; refactored antlr.g grammar
1831 3.0ea5 - July 5, 2005
1833 * Using '\n' in a parser grammar resulted in a nonescaped version of '\n' in the token names table making compilation fail. I fixed this by reorganizing/cleaning up portion of ANTLR that deals with literals. See comment org.antlr.codegen.Target.
1835 * Target.getMaxCharValue() did not use the appropriate max value constant.
1837 * ALLCHAR was a constant when it should use the Target max value def. set complement for wildcard also didn't use the Target def. Generally cleaned up the max char value stuff.
1839 * Code gen didn't deal with ASTLabelType properly...I think even the 3.0ea7 example tree parser was broken! :(
1841 * Added a few more unit tests dealing with escaped literals
1843 3.0ea4 - June 29, 2005
1845 * tree parsers work; added CommonTreeNodeStream. See simplecTreeParser
1846 example in examples-v3 tarball.
1848 * added superClass and ASTLabelType options
1850 * refactored Parser to have a BaseParser and added TreeParser
1852 * bug fix: actions being dumped in description strings; compile errors
1855 3.0ea3 - June 23, 2005
1859 * Automatic tree construction operators are in: ! ^ ^^
1861 * Tree construction rewrite rules are in
1862 -> {pred1}? rewrite1
1863 -> {pred2}? rewrite2
1867 The rewrite rules may be elements like ID, expr, $label, {node expr}
1868 and trees ^( <root> <children> ). You have have (...)?, (...)*, (...)+
1871 You may have rewrites in subrules not just at outer level of rule, but
1872 any -> rewrite forces auto AST construction off for that alternative
1875 To avoid cycles, copy semantics are used:
1877 r : INT -> INT INT ;
1879 means make two new nodes from the same INT token.
1881 Repeated references to a rule element implies a copy for at least one
1884 a : atom -> ^(atom atom) ; // NOT CYCLE! (dup atom tree)
1886 * $ruleLabel.tree refers to tree created by matching the labeled element.
1888 * A description of the blocks/alts is generated as a comment in output code
1890 * A timestamp / signature is put at top of each generated code file
1892 3.0ea2 - June 12, 2005
1896 * Some error messages were missing the stackTrace parameter
1898 * Removed the file locking mechanism as it's not cross platform
1900 * Some absolute vs relative path name problems with writing output
1901 files. Rules are now more concrete. -o option takes precedence
1902 // -o /tmp /var/lib/t.g => /tmp/T.java
1903 // -o subdir/output /usr/lib/t.g => subdir/output/T.java
1904 // -o . /usr/lib/t.g => ./T.java
1905 // -o /tmp subdir/t.g => /tmp/subdir/t.g
1906 // If they didn't specify a -o dir so just write to location
1907 // where grammar is, absolute or relative
1909 * does error checking on unknown option names now
1911 * Using just language code not locale name for error message file. I.e.,
1912 the default (and for any English speaking locale) is en.stg not en_US.stg
1915 * The error manager now asks the Tool to panic rather than simply doing
1918 * Lots of refactoring concerning grammar, rule, subrule options. Now
1919 detects invalid options.
1921 3.0ea1 - June 1, 2005
1923 Initial early access release