BYACC/Java

Java extension v 0.91
1998, Bob Jamison

What's New in Version 0.91?

.91 is a bugfix, replacing the erroneous generic object constructor of '0' with the correct void.
Many thanks to Hendrik Schreiber for reporting this bug.
Much requested feature!
You now can also set the semantic type, to use your own type directly, with
      yacc -j -s<semantic_type> <yaccfilename>
This is a new and not totally tested feature, any user feedback would be appreciated.

What is it?

BYACC/Java is an extension of the Berkeley v 1.8 YACC-compatible parser generator. Standard YACC takes a YACC source file, and generates one or more C files from it, which if compiled properly, will produce a LALR-grammar parser. This is useful for expression parsing, interactive command parsing, and file reading. Many megabytes of YACC code have been written over the years.

 This is the standard YACC tool that is in use every day to produce C/C++ parsers. I have added a "-j" flag which will cause BYACC to generate Java source code, instead. So there finally is a YACC for Java now!


How does BYACC/Java compare to other parser generators?

Of course, the original YACC design is about twenty years old now, and newer and better technologies are currently available. I think Jacc is great, and so is Java Cup. Both of these provide more thorough parsing of LALR and LL grammars than the venerable YACC. Yet the idea of a YACC for Java is, in my opinion, extremely valuable.
Several benefits are derived from a Java parser-generator of this sort:

How do I use it?

First of all, read a YACC book. Since this is actual Berkeley YACC, all of the usual procedures using YACC apply here. Some good descriptions and tutorials of YACC grammar and procedures are available in book form and on the Net. Here is the standard YACC manual page. Since Java's syntax is different from C, certain format differences must be followed. A typical YACC source file consists of the following:
  1. The first part of the file is the DECLARATIONS area, where you define tokens, precedences, etc.
  2. The second part is the ACTIONS area, where the grammar and the user's C actions are parsed.
  3. The third part is the CODE area, where user functions are added.
They are separated by '%%' at the start of a line. Visually:
DECLARATIONS
%%
ACTIONS
%%
CODE

Portions of the file can be set off and ignored by YACC by surrounding them with %{ and %} . This ability is typically used in YACC C files to insert definitions and #include statements. BYACC/Java uses this area for the Java package and import statements. Everything after this will be wrapped in a Java class called parser. Thus all functions you write will become methods belonging to parser.
All of the user actions you write will be Java code inserted into a method called yyparse(). This means that only curly braces '{,}' are allowed. No classes or methods can be defined in the ACTIONS area. Of course, your code can instantiate classes and call methods.

Here is our example Java implementation of the classic YACC calculator demo:
%{
import java.lang.Math;
import java.io.*;
import java.util.StringTokenizer;
%}
     
/* YACC Declarations */
%token NUM
%left '-' '+'
%left '*' '/'
%left NEG     /* negation--unary minus */
%right '^'    /* exponentiation        */
     
/* Grammar follows */
%%
input:    /* empty string */
             | input line
     ;
     
line:     '\n'
          | exp '\n'  { System.out.println(" " + $1.dval + " "); }
     ;
     
exp:      NUM                { $$ = $1;         }
             | exp '+' exp        { $$.dval = $1.dval + $3.dval;    }
             | exp '-' exp        { $$.dval = $1.dval - $3.dval;    }
             | exp '*' exp        { $$.dval = $1.dval * $3.dval;    }
             | exp '/' exp        { $$.dval = $1.dval / $3.dval;    }
             | '-' exp  %prec NEG { $$.dval = -$2.dval;        }
             | exp '^' exp        { $$.dval = Math.pow($1.dval, $3.dval); }
             | '(' exp ')'        { $$.dval = $2.dval;         }
     ;
%%

String ins;
StringTokenizer st;

void yyerror(String s)
{
  System.out.println("par:"+s);
}

boolean newline;
int yylex()
{
String s;
int tok;
Double d;
  //System.out.print("yylex ");
  if (!st.hasMoreTokens())
    if (!newline)
      {
      newline=true;
      return '\n';  //So we look like classic YACC example
      }
    else
      return 0;
  s = st.nextToken();
  //System.out.println("tok:"+s);
  try
    {
    d = Double.valueOf(s);/*this may fail*/
    yylval = new parserval(d.doubleValue()); //SEE BELOW
    tok = NUM;
    }
  catch (Exception e)
    {
    tok = s.charAt(0);/*if not float, return char*/
    }
  return tok;
}

void dotest()
{
BufferedReader in = new BufferedReader(new InputStreamReader(System.in));
  System.out.println("BYACC/Java Calculator Demo");
  System.out.println("Note: Since this example uses the StringTokenizer");
  System.out.println("for simplicity, you will need to separate the items");
  System.out.println("with spaces, i.e.:  '( 3 + 5 ) * 2'");
  while (true)
    {
    System.out.print("expression:");
    try
      {
      ins = in.readLine();
      }
    catch (Exception e)
      {
      }
    st = new StringTokenizer(ins);
    newline=false;
    yyparse();
    }
}

public static void main(String args[])
{
  parser par = new parser(false);
  par.dotest();
}
If this code were in a file called calc.y, you would yacc-process it with the command:
yacc -j calc.y
This will generate the file parser.java, which can then be compiled by:
javac parser.java
to create the file parser.class which can be run with:
java parser

The same file, but using the semantic_type option, as in
yacc -j -s double tf.y
is available here.


Command Line Options

In addition to the normal yacc command line switches, I have supplied these:
 

User-Supplied Methods

In order for javac to compile the code properly, the user must supply two methods in the YACC source:

About the Generated Java File

I suggest heartily that the user peruse the file parser.java to see how YACC's parsing algorithm works. I have done an immense amount of analysis and reverse engineering of the original BYACC sources. The Java code that is generated is heavily commented, and is amenable to debugging, and can provide a nice education in the workings of a YACC parser.

Normally, the class generated is made an extension of Thread, as a convience, so that parsing may be performed as a background thread, allowing the current execution to continue unimpeded.  A run() method and a constructor are also inserted into the code.

However, it may occur that the programmer needs to extend a different class.  In this case, the -x<classname> option is provided, which will create an alternate extension.  Since it is impossible to predict the needs of the other class, the run() and constructor will be omitted.


About the parserval (previously 'semantic') class

Previously, BYACC/Java gave the programmer a choice of either double or int semantic (the value of a number or string) values.  This worked for very simple parsing, but was extremely limiting.  It would have been very difficult to mix value types within a file, thus making things like interpreters and compilers impossible.

Starting with this version, the semantic value is stored in a public class called parserval, which is defined thusly:
 
public class parserval 
{ 
  public int ival; 
  public double dval; 
  public String sval; 
  public Object obj; 
  public parserval(int val) 
  { 
    ival=val; 
  } 
  public parserval(double val) 
  { 
    dval=val; 
  } 
  public parserval(String val) 
  { 
    sval=val; 
  } 
  public parserval(Object val) 
  { 
    obj=val; 
  } 
}//end class
So now a semantic value can be an int, a double, a String, or an Object. In your scanner (or something that yylex() calls), you may use this like:
 
yylval = new parserval(doubleval);
yylval = new parserval(integerval);
...or even something like...
yylval = new parserval(new myTypeOfObject());
And on the Left Hand Side (the YACC side) you can use the values of the $ and the $$ just as easily:
 
$$.ival = $1.ival + $2.ival; 
$$.dval = $1.dval - $2.dval;
 

A side effect of using this inner class is that the default parser no longer fits into one .class file, however, the resulting parserval.class is extremely small.


Why the name change from "semantic" to "parserval"? Do you hate the users or what?

No, of course not! The reason is this: Because of popular demand, and because it makes sense, we made the change from an inner class to a public one. This allows external classes to easily access the semantic value, and save BYACC users work. This requires a public name, which is visible from all classes in the current project.

One of the needs of users is to have different parsers in a given project for different putposes. Byacc/Java allows you to change the name of the generated class with a command-line flag. The default parser class is parser. Now, to be unique, the semantic class's name must be tied to the parent class, also. The logical path would be to name the class parsersemantic or parser2semantic or myparsersemantic. Well, I thought that was just getting to be ridiculous, so i just shortened the extension to val, which is what the class is anyway.

My apologies about the change, but if you consider the problem, you will realize that it had to be done. I suggest merely using a search-and-replace in your text editor to make the change.

Examples

As time goes on, we will provide some examples and templates to speed you on your way.

Why?

Because someone said YACC couldn't be done in Java. Silly person!

Credits

Of course, thanks go to Tom Corbett for BYACC, a fine implementation of YACC. And thanks to his altruistic nature for putting it in the Public Domain. I just added the Java switch. Check the ACKNOWLEDGEMENTS file for more contributors. 

Availability

The modified/cleaned up/updated Berkeley Yacc source files, GNU makefile, and Borland C++ 5 project file can be obtained here.


Also, a couple of binaries for BYACC/Java can be obtained here. These are native console applications, so they do not require any class libraries to work. Of course, you will need a Java development environment to process the generated source files.
And remember, this version of YACC also parses "standard" C/C++ YACC source files!

 

Binary for SunOS/Solaris Approx 43k
Binary for SGI/IRIX Approx 70k
Binary for Win95/NT -(New! No runtime DLLs required!) Approx 59k
Source files, GNU Makefile, Borland project in a GZIP TAR file Approx 40k
Check here often, as updates/upgrades/bug fixes are continuously being made. 

Questions

YACC has already been described many times, and in great detail, so I would appreciate that BYACC/Java users' questions about YACC and LALR parsers be directed to the many good sources available on the Net and in print. In other words, I will not do your homework for you! ;-) However, I would be happy to help with the Java file generation, as that is the portion that I have implemented. 

Links

 
Try some cutting-edge technology! 
Owwwl - Agent-based Web search tool 
Intelligent Computer-Aided Training -
The Training Technology!
LinCom Avionics Systems Group 
Our home page. Please visit! 
 
Gamelan - The Mecca of Java 

An excellent place to look for Java resources: 


For more information, please write Bob Jamison at LinCom-ASG.Com!
Last updated: 28 Nov 97