YARD 1.0 - Yet Another Recursive Descent Parsing Framework for C++
YARD is a C++ framework for constructing recursive-descent parsers from parsing expression grammars (PEG) at compile-time. This technique allows YARD parsers to be extremely fast and require minimal executable overhead. A parser constructed with YARD can automatically construct an abstract syntax tree (by placing “Store” actions in the grammar) and eliminates the need for a scanning phase (also known as a lexing phase).
I’ve been developing, testing, and using YARD for over three years and as far as I am concerned it is now stable. I’ve finally posted an official version 1.0 of the YARD framework to SourceForge. Documentation is not yet complete for the current version, but the source code is only 28k and in 7 files. A great example of YARD in action can be found in the Cat to C++ translator. You can view the YARD grammar for Cat online at: http://cat-language.googlecode.com/svn/trunk/cpp/cat_to_cpp/cat_grammar.hpp. All of the source code for the current version of YARD can also be browsed online in the Cat subversion repository at: http://cat-language.googlecode.com/svn/trunk/cpp/yard/. I should point out that YARD will not work on some older compilers that are not compliant to the C++ specification, so if you are one of those lost souls still using or supporting Visual C++ 6.0 you may need to do some rewriting.
I haven’t compared YARD to other parsing frameworks in a while, but it was significantly faster than Boost.Spirit when I last tried a couple of years ago. I expect YARD to still be much faster, because of the fact that YARD parsers can be very heavily optimized at compile-time by the C++ compiler. YARD also has a much smaller footprint on your executable size and compilation times than Boost.Spirit.
Other people have had positive things to say about YARD, and personally I can’t live without it. YARD is public domain, so hack away to your heart’s content.
Hi,
since I already said how much Yard appeals to me, e.g. for its simplicity, I’ll stick to one suggestion here: To separate the two tasks of class yard::Parser, i.e. have one class supply the input to the actual recursive descent parsing classes, and let these classes take a second, templated, argument that the user [of Yard] can use for whatever purpose; the generation of an AST as currently hard-wired into class Parser would be one example of this use. In my eyes this would make Yard much more flexible.
Ciao, Colin
Comment by Colin — July 2, 2007 @ 6:57 am
Hi Colin, thanks for the compliment.
I appreciate the suggestion but unfortunately I don’t see how adding a new class makes things more flexible. If you need additional functionality, I would suggest writing wrapping the current one and adding new functions. Then you can add new action rules that leverage the new functions. It seems the most straightforward route to go. If you don’t want tree generation functionality, well you can simply write a new parser without it and don’t use Store rules. This is trivial, since the parser concept is very easy to model, and the default implementation is not very big. Another possibility for those hard pressed for time is to simply rewrite: ‘AddNode’, ‘CompleteNode’, and ‘AbandonNode’ to not create a tree but instead perform some other action. I had experimented with more uncoupled designs previously, but they inevitably increase the overall code size and complexity. I opted to keep it a simple as I could. Cheers,Christopher
Comment by cdiggins — July 2, 2007 @ 5:09 pm
Hi Christopher,
suppose I have to try out my idea to see whether it helps in any way; the suggestion was primarily based on the observation that your class yard::Parser currently has two tasks, (a) to provide the input, and (b) to store the output. My instinctive reaction is to separate the responsability so that e.g. writing a simple calculator application that doesn’t even bother to create an AST becomes a trivial exercise. Will report back if and when I have something (interesting) to show…
Ciao, Colin
Comment by Colin — July 9, 2007 @ 5:54 pm
My view was that Parser object really only does one thing (or N things where N is user specific): it manages each of the state requirements imposed by the grammar rules. This would usually include managing an input iterator (assuming you use the core YARD rules) and sometimes it includes AST generation.
Beyond this the Parser object must maintain other states depending on the rules used. For example, my implementation of a Parser object provides automatic tree management, but this is only one of a myriad of different possibilities. My rationale was that I could never satisfy everyone’s needs with a single parser design so I opted to write a simple default one, and let user’s supply their own as needed.
Your example would require a different design of a parsing object, probably with an “EvaluateExpression” rule, and a stack of values (for dealing with precedence evaluation), but rather than YARD providing the uber-parser for any circumstance, I chose to let users write their own parsers as they need. It is only a handful of lines of code.
An alternative design would have been to use a policy based parser object, but in the end this becomes silly because the policy becomes as complex as the parser object itself, with no real benefit.
All of that said a calculuator example would be much appreciated, and it would still be good to show different design approaches to YARD. I hope you make it open-source, and share it back with the community.
Comment by cdiggins — July 9, 2007 @ 6:24 pm