BYACC/Java
Java extension v 0.91
1998, Bob Jamison
What's New in Version 0.91?
.91 is a bugfix, replacing the erroneous generic object constructor of '0' with the correct void.
Many thanks to Hendrik Schreiber for reporting this bug.
Much requested feature!
You now can also set the semantic type, to use your own type directly, with
yacc -j -s<semantic_type> <yaccfilename>
This is a new and not totally tested feature, any user feedback would be appreciated.
What is it?
BYACC/Java is an extension of the Berkeley v 1.8 YACC-compatible parser
generator. Standard YACC takes a YACC source file, and generates one or
more C files from it, which if compiled properly, will produce a LALR-grammar
parser. This is useful for expression parsing, interactive command parsing,
and file reading. Many megabytes of YACC code have been written over the
years.
This is the standard YACC tool that is in use every day to produce
C/C++ parsers. I have added a "-j" flag which will cause BYACC to generate
Java source code, instead. So there finally is a YACC for Java now!
How does BYACC/Java compare to other parser generators?
Of course, the original YACC design is about twenty years old now, and
newer and better technologies are currently available. I think Jacc
is great, and so is Java Cup. Both of these provide more thorough
parsing of LALR and LL grammars than the venerable YACC. Yet the idea of
a YACC for Java is, in my opinion, extremely valuable.
Several benefits are derived from a Java parser-generator of this sort:
-
BYACC/Java can be executed from existing Makefiles and IDE's.
-
BYACC/Java is coded in C, so the generation of Java code is extremely
fast.
-
The resulting byte code is small -- starting at about 11 kbytes.
-
Only one or two classfiles are included. If you need only a single type or an Object
class, then one class file is generated. If you need a simple generic type, a
simple data class is generated for you, making another small file.
-
No additional runtime libraries are required. The generated source code is the
entire parser.
-
It can parse existing YACC grammars, enabling the 'Javanizing' ;-) of a large
installed base of YACC source code (of course, your 'actions' need to be
in Java).
-
Many developers are already very familiar with the workings of YACC.
-
It is absolutely free; no license, no royalties, free!
How do I use it?
First of all, read a YACC book. Since this is actual Berkeley YACC, all
of the usual procedures using YACC apply here. Some good descriptions and
tutorials of YACC grammar and procedures are available in book form and
on the Net. Here is the standard YACC manual
page. Since Java's syntax is different from C, certain format differences
must be followed. A typical YACC source file consists of the following:
-
The first part of the file is the DECLARATIONS area, where you define tokens,
precedences, etc.
-
The second part is the ACTIONS area, where the grammar and the user's C
actions are parsed.
-
The third part is the CODE area, where user functions are added.
They are separated by '%%' at the start of a line. Visually:
DECLARATIONS
%%
ACTIONS
%%
CODE
Portions of the file can be set off and ignored by YACC by surrounding
them with %{ and %} . This ability is typically used in YACC C files to
insert definitions and #include statements. BYACC/Java uses this area for
the Java package and import statements. Everything after
this will be wrapped in a Java class called parser. Thus all functions
you write will become methods belonging to parser.
All of the user actions you write will be Java code inserted into a
method called yyparse(). This means that only curly braces '{,}'
are allowed. No classes or methods can be defined in the ACTIONS area.
Of course, your code can instantiate classes and call methods.
Here is our example Java implementation of the classic YACC calculator
demo:
%{
import java.lang.Math;
import java.io.*;
import java.util.StringTokenizer;
%}
/* YACC Declarations */
%token NUM
%left '-' '+'
%left '*' '/'
%left NEG /* negation--unary minus */
%right '^' /* exponentiation */
/* Grammar follows */
%%
input: /* empty string */
| input line
;
line: '\n'
| exp '\n' { System.out.println(" " + $1.dval + " "); }
;
exp: NUM { $$ = $1; }
| exp '+' exp { $$.dval = $1.dval + $3.dval; }
| exp '-' exp { $$.dval = $1.dval - $3.dval; }
| exp '*' exp { $$.dval = $1.dval * $3.dval; }
| exp '/' exp { $$.dval = $1.dval / $3.dval; }
| '-' exp %prec NEG { $$.dval = -$2.dval; }
| exp '^' exp { $$.dval = Math.pow($1.dval, $3.dval); }
| '(' exp ')' { $$.dval = $2.dval; }
;
%%
String ins;
StringTokenizer st;
void yyerror(String s)
{
System.out.println("par:"+s);
}
boolean newline;
int yylex()
{
String s;
int tok;
Double d;
//System.out.print("yylex ");
if (!st.hasMoreTokens())
if (!newline)
{
newline=true;
return '\n'; //So we look like classic YACC example
}
else
return 0;
s = st.nextToken();
//System.out.println("tok:"+s);
try
{
d = Double.valueOf(s);/*this may fail*/
yylval = new parserval(d.doubleValue()); //SEE BELOW
tok = NUM;
}
catch (Exception e)
{
tok = s.charAt(0);/*if not float, return char*/
}
return tok;
}
void dotest()
{
BufferedReader in = new BufferedReader(new InputStreamReader(System.in));
System.out.println("BYACC/Java Calculator Demo");
System.out.println("Note: Since this example uses the StringTokenizer");
System.out.println("for simplicity, you will need to separate the items");
System.out.println("with spaces, i.e.: '( 3 + 5 ) * 2'");
while (true)
{
System.out.print("expression:");
try
{
ins = in.readLine();
}
catch (Exception e)
{
}
st = new StringTokenizer(ins);
newline=false;
yyparse();
}
}
public static void main(String args[])
{
parser par = new parser(false);
par.dotest();
}
|
If this code were in a file called calc.y, you would yacc-process
it with the command:
yacc -j calc.y
This will generate the file parser.java, which can then be compiled
by:
javac parser.java
to create the file parser.class which can be run with:
java parser
The same file, but using the semantic_type option, as in
yacc -j -s double tf.y
is available here.
Command Line Options
In addition to the normal yacc command line switches, I have supplied these:
-
-j Switches from C/C++ to Java output
-
-f<classname> Changes the name of the Java class (and
.java file) to classname
-
-x<extendname> Changes the class the parser extends from
the default Thread to extendname.
-
-s<semantic_type> Changes the semantic (value of the rules'
variables) type to semantic_type. No extra class will be
generated.
User-Supplied Methods
In order for javac to compile the code properly, the user must supply
two methods in the YACC source:
-
void yyerror(String msg) -- This method is expected by BYACC/Java,
and is used to provide error messages to be directed to the channels the
user desires.
-
int yylex() -- This method is the one where BYACC/Java expects to
obtain its input tokens. Wrap any file/string scanning code you have in
this function. This method should return <0 if there is an error, and
0 when it encounters the end of input. See the examples to clarify what
we mean.
About the Generated Java File
I suggest heartily that the user peruse the file parser.java to
see how YACC's parsing algorithm works. I have done an immense amount of
analysis and reverse engineering of the original BYACC sources. The Java
code that is generated is heavily commented, and is amenable to debugging,
and can provide a nice education in the workings of a YACC parser.
Normally, the class generated is made an extension of Thread,
as a convience, so that parsing may be performed as a background thread,
allowing the current execution to continue unimpeded. A run()
method and a constructor are also inserted into the code.
However, it may occur that the programmer needs to extend a different
class. In this case, the -x<classname> option is provided,
which will create an alternate extension. Since it is impossible
to predict the needs of the other class, the run() and constructor
will be omitted.
About the parserval (previously 'semantic') class
Previously, BYACC/Java gave the programmer a choice of either double or
int semantic (the value of a number or string) values.
This worked for very simple parsing, but was extremely limiting.
It would have been very difficult to mix value types within a file, thus
making things like interpreters and compilers impossible.
Starting with this version, the
semantic value is stored in a public class called parserval, which
is defined thusly:
public class parserval
{
public int ival;
public double dval;
public String sval;
public Object obj;
public parserval(int val)
{
ival=val;
}
public parserval(double val)
{
dval=val;
}
public parserval(String val)
{
sval=val;
}
public parserval(Object val)
{
obj=val;
}
}//end class |
So now a semantic value can be an int, a double,
a String, or an Object. In your scanner (or something that yylex()
calls), you may use this like:
yylval = new parserval(doubleval);
yylval = new parserval(integerval);
...or even something like...
yylval = new parserval(new myTypeOfObject());
|
And on the Left Hand Side (the YACC side) you can
use the values of the $ and the $$ just as easily:
$$.ival = $1.ival + $2.ival;
$$.dval = $1.dval - $2.dval; |
A side effect of using this inner class is that the default parser no longer
fits into one .class file, however, the resulting parserval.class
is extremely small.
Why the name change from "semantic" to "parserval"? Do you hate the users or what?
No, of course not! The reason is this:
Because of popular demand, and because it makes sense, we made the change from an inner
class to a public one. This allows external classes to easily access the semantic value,
and save BYACC users work. This requires a public name, which is visible from all
classes in the current project.
One of the needs of users is to have different parsers in a given project for different
putposes. Byacc/Java allows you to change the name of the generated class with a
command-line flag. The default parser class is parser. Now, to be unique,
the semantic class's name must be tied to the parent class, also. The logical
path would be to name the class parsersemantic or parser2semantic
or myparsersemantic. Well, I thought that was just getting to be ridiculous,
so i just shortened the extension to val, which is what the class is anyway.
My apologies about the change, but if you consider the problem, you will realize
that it had to be done. I suggest merely using a search-and-replace in your text editor
to make the change.
Examples
As time goes on, we will provide some examples and templates to speed you
on your way.
-
Here is an example of what a 3d
object file might look like. A corresponding bare-bones YACC
parser is implemented in Java! This is also a good example of how to
read a file from a URL.
Why?
Because someone said YACC couldn't be done in Java. Silly person!
Credits
Of course, thanks go to Tom Corbett for BYACC, a fine implementation
of YACC. And thanks to his altruistic nature for putting it in the Public
Domain. I just added the Java switch. Check the ACKNOWLEDGEMENTS
file for more contributors.
Availability
The modified/cleaned up/updated Berkeley Yacc source files, GNU makefile,
and Borland C++ 5 project file can be obtained here.
Also, a couple of binaries for BYACC/Java can be obtained here. These
are native console applications, so they do not require any class libraries
to work. Of course, you will need a Java development environment to process
the generated source files.
And remember, this version of YACC also parses "standard" C/C++
YACC source files!
| Binary for SunOS/Solaris |
Approx 43k |
| Binary for SGI/IRIX |
Approx 70k |
| Binary for Win95/NT -(New! No runtime DLLs required!) |
Approx 59k |
| Source files, GNU Makefile, Borland project in a
GZIP TAR file |
Approx 40k |
Check here often, as updates/upgrades/bug fixes are continuously being
made.
Questions
YACC has already been described many times, and in great detail, so I would
appreciate that BYACC/Java users' questions about YACC and LALR parsers
be directed to the many good sources available on the Net and in print.
In other words, I will not do your homework for you! ;-) However, I would
be happy to help with the Java file generation, as that is the portion
that I have implemented.
Links
|
Try some cutting-edge technology!
Owwwl - Agent-based Web search tool |
|
Intelligent Computer-Aided Training - The Training Technology! |
|
LinCom Avionics Systems Group
Our home page. Please visit! |
|
Gamelan - The Mecca of Java |
An excellent place to look for Java resources:
For more information, please write Bob
Jamison at LinCom-ASG.Com!
Last updated: 28 Nov 97