| YACC(1P) POSIX Programmer's Manual YACC(1P) |
| |
| |
| |
| PROLOG |
| This manual page is part of the POSIX Programmer's Man- |
| ual. The Linux implementation of this interface may |
| differ (consult the corresponding Linux manual page for |
| details of Linux behavior), or the interface may not be |
| implemented on Linux. |
| |
| NAME |
| yacc - yet another compiler compiler (DEVELOPMENT) |
| |
| SYNOPSIS |
| yacc [-dltv][-b file_prefix][-p sym_prefix] grammar |
| |
| DESCRIPTION |
| The yacc utility shall read a description of a context- |
| free grammar in grammar and write C source code, con- |
| forming to the ISO C standard, to a code file, and |
| optionally header information into a header file, in the |
| current directory. The C code shall define a function |
| and related routines and macros for an automaton that |
| executes a parsing algorithm meeting the requirements in |
| Algorithms . |
| |
| The form and meaning of the grammar are described in the |
| EXTENDED DESCRIPTION section. |
| |
| The C source code and header file shall be produced in a |
| form suitable as input for the C compiler (see c99 ). |
| |
| OPTIONS |
| The yacc utility shall conform to the Base Definitions |
| volume of IEEE Std 1003.1-2001, Section 12.2, Utility |
| Syntax Guidelines. |
| |
| The following options shall be supported: |
| |
| -b file_prefix |
| Use file_prefix instead of y as the prefix for |
| all output filenames. The code file y.tab.c, the |
| header file y.tab.h (created when -d is speci- |
| fied), and the description file y.output (created |
| when -v is specified), shall be changed to |
| file_prefix .tab.c, file_prefix .tab.h, and |
| file_prefix .output, respectively. |
| |
| -d Write the header file; by default only the code |
| file is written. The #define statements associate |
| the token codes assigned by yacc with the user- |
| declared token names. This allows source files |
| other than y.tab.c to access the token codes. |
| |
| -l Produce a code file that does not contain any |
| #line constructs. If this option is not present, |
| it is unspecified whether the code file or header |
| file contains #line directives. This should only |
| be used after the grammar and the associated |
| actions are fully debugged. |
| |
| -p sym_prefix |
| |
| Use sym_prefix instead of yy as the prefix for |
| all external names produced by yacc. The names |
| affected shall include the functions yyparse(), |
| yylex(), and yyerror(), and the variables yylval, |
| yychar, and yydebug. (In the remainder of this |
| section, the six symbols cited are referenced |
| using their default names only as a notational |
| convenience.) Local names may also be affected by |
| the -p option; however, the -p option shall not |
| affect #define symbols generated by yacc. |
| |
| -t Modify conditional compilation directives to per- |
| mit compilation of debugging code in the code |
| file. Runtime debugging statements shall always |
| be contained in the code file, but by default |
| conditional compilation directives prevent their |
| compilation. |
| |
| -v Write a file containing a description of the |
| parser and a report of conflicts generated by |
| ambiguities in the grammar. |
| |
| |
| OPERANDS |
| The following operand is required: |
| |
| grammar |
| A pathname of a file containing instructions, |
| hereafter called grammar, for which a parser is |
| to be created. The format for the grammar is |
| described in the EXTENDED DESCRIPTION section. |
| |
| |
| STDIN |
| Not used. |
| |
| INPUT FILES |
| The file grammar shall be a text file formatted as spec- |
| ified in the EXTENDED DESCRIPTION section. |
| |
| ENVIRONMENT VARIABLES |
| The following environment variables shall affect the |
| execution of yacc: |
| |
| LANG Provide a default value for the internationaliza- |
| tion variables that are unset or null. (See the |
| Base Definitions volume of IEEE Std 1003.1-2001, |
| Section 8.2, Internationalization Variables for |
| the precedence of internationalization variables |
| used to determine the values of locale cate- |
| gories.) |
| |
| LC_ALL If set to a non-empty string value, override the |
| values of all the other internationalization |
| variables. |
| |
| LC_CTYPE |
| Determine the locale for the interpretation of |
| sequences of bytes of text data as characters |
| (for example, single-byte as opposed to multi- |
| byte characters in arguments and input files). |
| |
| LC_MESSAGES |
| Determine the locale that should be used to |
| affect the format and contents of diagnostic mes- |
| sages written to standard error. |
| |
| NLSPATH |
| Determine the location of message catalogs for |
| the processing of LC_MESSAGES . |
| |
| |
| The LANG and LC_* variables affect the execution of the |
| yacc utility as stated. The main() function defined in |
| Yacc Library shall call: |
| |
| |
| setlocale(LC_ALL, "") |
| |
| and thus the program generated by yacc shall also be |
| affected by the contents of these variables at runtime. |
| |
| ASYNCHRONOUS EVENTS |
| Default. |
| |
| STDOUT |
| Not used. |
| |
| STDERR |
| If shift/reduce or reduce/reduce conflicts are detected |
| in grammar, yacc shall write a report of those conflicts |
| to the standard error in an unspecified format. |
| |
| Standard error shall also be used for diagnostic mes- |
| sages. |
| |
| OUTPUT FILES |
| The code file, the header file, and the description file |
| shall be text files. All are described in the following |
| sections. |
| |
| Code File |
| This file shall contain the C source code for the |
| yyparse() function. It shall contain code for the vari- |
| ous semantic actions with macro substitution performed |
| on them as described in the EXTENDED DESCRIPTION sec- |
| tion. It also shall contain a copy of the #define state- |
| ments in the header file. If a %union declaration is |
| used, the declaration for YYSTYPE shall also be included |
| in this file. |
| |
| Header File |
| The header file shall contain #define statements that |
| associate the token numbers with the token names. This |
| allows source files other than the code file to access |
| the token codes. If a %union declaration is used, the |
| declaration for YYSTYPE and an extern YYSTYPE yylval |
| declaration shall also be included in this file. |
| |
| Description File |
| The description file shall be a text file containing a |
| description of the state machine corresponding to the |
| parser, using an unspecified format. Limits for internal |
| tables (see Limits ) shall also be reported, in an |
| implementation-defined manner. (Some implementations may |
| use dynamic allocation techniques and have no specific |
| limit values to report.) |
| |
| EXTENDED DESCRIPTION |
| The yacc command accepts a language that is used to |
| define a grammar for a target language to be parsed by |
| the tables and code generated by yacc. The language |
| accepted by yacc as a grammar for the target language is |
| described below using the yacc input language itself. |
| |
| The input grammar includes rules describing the input |
| structure of the target language and code to be invoked |
| when these rules are recognized to provide the associ- |
| ated semantic action. The code to be executed shall |
| appear as bodies of text that are intended to be C-lan- |
| guage code. The C-language inclusions are presumed to |
| form a correct function when processed by yacc into its |
| output files. The code included in this way shall be |
| executed during the recognition of the target language. |
| |
| Given a grammar, the yacc utility generates the files |
| described in the OUTPUT FILES section. The code file can |
| be compiled and linked using c99. If the declaration and |
| programs sections of the grammar file did not include |
| definitions of main(), yylex(), and yyerror(), the com- |
| piled output requires linking with externally supplied |
| versions of those functions. Default versions of main() |
| and yyerror() are supplied in the yacc library and can |
| be linked in by using the -l y operand to c99. The yacc |
| library interfaces need not support interfaces with |
| other than the default yy symbol prefix. The application |
| provides the lexical analyzer function, yylex(); the lex |
| utility is specifically designed to generate such a rou- |
| tine. |
| |
| Input Language |
| The application shall ensure that every specification |
| file consists of three sections in order: declarations, |
| grammar rules, and programs, separated by double percent |
| signs ( "%%" ). The declarations and programs sections |
| can be empty. If the latter is empty, the preceding "%%" |
| mark separating it from the rules section can be omit- |
| ted. |
| |
| The input is free form text following the structure of |
| the grammar defined below. |
| |
| Lexical Structure of the Grammar |
| The <blank>s, <newline>s, and <form-feed>s shall be |
| ignored, except that the application shall ensure that |
| they do not appear in names or multi-character reserved |
| symbols. Comments shall be enclosed in "/* ... */", and |
| can appear wherever a name is valid. |
| |
| Names are of arbitrary length, made up of letters, peri- |
| ods ( '.' ), underscores ( '_' ), and non-initial dig- |
| its. Uppercase and lowercase letters are distinct. Con- |
| forming applications shall not use names beginning in yy |
| or YY since the yacc parser uses such names. Many of the |
| names appear in the final output of yacc, and thus they |
| should be chosen to conform with any additional rules |
| created by the C compiler to be used. In particular they |
| appear in #define statements. |
| |
| A literal shall consist of a single character enclosed |
| in single-quotes ( '" ). All of the escape sequences |
| supported for character constants by the ISO C standard |
| shall be supported by yacc. |
| |
| The relationship with the lexical analyzer is discussed |
| in detail below. |
| |
| The application shall ensure that the NUL character is |
| not used in grammar rules or literals. |
| |
| Declarations Section |
| The declarations section is used to define the symbols |
| used to define the target language and their relation- |
| ship with each other. In particular, much of the addi- |
| tional information required to resolve ambiguities in |
| the context-free grammar for the target language is pro- |
| vided here. |
| |
| Usually yacc assigns the relationship between the sym- |
| bolic names it generates and their underlying numeric |
| value. The declarations section makes it possible to |
| control the assignment of these values. |
| |
| It is also possible to keep semantic information associ- |
| ated with the tokens currently on the parse stack in a |
| user-defined C-language union, if the members of the |
| union are associated with the various names in the gram- |
| mar. The declarations section provides for this as well. |
| |
| The first group of declarators below all take a list of |
| names as arguments. That list can optionally be pre- |
| ceded by the name of a C union member (called a tag |
| below) appearing within '<' and '>' . (As an exception |
| to the typographical conventions of the rest of this |
| volume of IEEE Std 1003.1-2001, in this case <tag> does |
| not represent a metavariable, but the literal angle |
| bracket characters surrounding a symbol.) The use of tag |
| specifies that the tokens named on this line shall be of |
| the same C type as the union member referenced by tag. |
| This is discussed in more detail below. |
| |
| For lists used to define tokens, the first appearance of |
| a given token can be followed by a positive integer (as |
| a string of decimal digits). If this is done, the under- |
| lying value assigned to it for lexical purposes shall be |
| taken to be that number. |
| |
| The following declares name to be a token: |
| |
| |
| %token [<tag>] name [number][name [number]]... |
| |
| If tag is present, the C type for all tokens on this |
| line shall be declared to be the type referenced by tag. |
| If a positive integer, number, follows a name, that |
| value shall be assigned to the token. |
| |
| The following declares name to be a token, and assigns |
| precedence to it: |
| |
| |
| %left [<tag>] name [number][name [number]]... |
| %right [<tag>] name [number][name [number]]... |
| |
| One or more lines, each beginning with one of these sym- |
| bols, can appear in this section. All tokens on the same |
| line have the same precedence level and associativity; |
| the lines are in order of increasing precedence or bind- |
| ing strength. %left denotes that the operators on that |
| line are left associative, and %right similarly denotes |
| right associative operators. If tag is present, it shall |
| declare a C type for names as described for %token. |
| |
| The following declares name to be a token, and indicates |
| that this cannot be used associatively: |
| |
| |
| %nonassoc [<tag>] name [number][name [number]]... |
| |
| If the parser encounters associative use of this token |
| it reports an error. If tag is present, it shall declare |
| a C type for names as described for %token. |
| |
| The following declares that union member names are non- |
| terminals, and thus it is required to have a tag field |
| at its beginning: |
| |
| |
| %type <tag> name... |
| |
| Because it deals with non-terminals only, assigning a |
| token number or using a literal is also prohibited. If |
| this construct is present, yacc shall perform type |
| checking; if this construct is not present, the parse |
| stack shall hold only the int type. |
| |
| Every name used in grammar not defined by a %token, |
| %left, %right, or %nonassoc declaration is assumed to |
| represent a non-terminal symbol. The yacc utility shall |
| report an error for any non-terminal symbol that does |
| not appear on the left side of at least one grammar |
| rule. |
| |
| Once the type, precedence, or token number of a name is |
| specified, it shall not be changed. If the first decla- |
| ration of a token does not assign a token number, yacc |
| shall assign a token number. Once this assignment is |
| made, the token number shall not be changed by explicit |
| assignment. |
| |
| The following declarators do not follow the previous |
| pattern. |
| |
| The following declares the non-terminal name to be the |
| start symbol, which represents the largest, most general |
| structure described by the grammar rules: |
| |
| |
| %start name |
| |
| By default, it is the left-hand side of the first gram- |
| mar rule; this default can be overridden with this dec- |
| laration. |
| |
| The following declares the yacc value stack to be a |
| union of the various types of values desired: |
| |
| |
| %union { body of union (in C) } |
| |
| By default, the values returned by actions (see below) |
| and the lexical analyzer shall be of type int. The yacc |
| utility keeps track of types, and it shall insert corre- |
| sponding union member names in order to perform strict |
| type checking of the resulting parser. |
| |
| Alternatively, given that at least one <tag> construct |
| is used, the union can be declared in a header file |
| (which shall be included in the declarations section by |
| using a #include construct within %{ and %}), and a |
| typedef used to define the symbol YYSTYPE to represent |
| this union. The effect of %union is to provide the dec- |
| laration of YYSTYPE directly from the yacc input. |
| |
| C-language declarations and definitions can appear in |
| the declarations section, enclosed by the following |
| marks: |
| |
| |
| %{ ... %} |
| |
| These statements shall be copied into the code file, and |
| have global scope within it so that they can be used in |
| the rules and program sections. |
| |
| The application shall ensure that the declarations sec- |
| tion is terminated by the token %%. |
| |
| Grammar Rules in yacc |
| The rules section defines the context-free grammar to be |
| accepted by the function yacc generates, and associates |
| with those rules C-language actions and additional |
| precedence information. The grammar is described below, |
| and a formal definition follows. |
| |
| The rules section is comprised of one or more grammar |
| rules. A grammar rule has the form: |
| |
| |
| A : BODY ; |
| |
| The symbol A represents a non-terminal name, and BODY |
| represents a sequence of zero or more names, literals, |
| and semantic actions that can then be followed by |
| optional precedence rules. Only the names and literals |
| participate in the formation of the grammar; the seman- |
| tic actions and precedence rules are used in other ways. |
| The colon and the semicolon are yacc punctuation. If |
| there are several successive grammar rules with the same |
| left-hand side, the vertical bar '|' can be used to |
| avoid rewriting the left-hand side; in this case the |
| semicolon appears only after the last rule. The BODY |
| part can be empty (or empty of names and literals) to |
| indicate that the non-terminal symbol matches the empty |
| string. |
| |
| The yacc utility assigns a unique number to each rule. |
| Rules using the vertical bar notation are distinct |
| rules. The number assigned to the rule appears in the |
| description file. |
| |
| The elements comprising a BODY are: |
| |
| name, literal |
| These form the rules of the grammar: name is |
| either a token or a non-terminal; literal stands |
| for itself (less the lexically required quotation |
| marks). |
| |
| semantic action |
| |
| With each grammar rule, the user can associate |
| actions to be performed each time the rule is |
| recognized in the input process. (Note that the |
| word "action" can also refer to the actions of |
| the parser-shift, reduce, and so on.) |
| |
| These actions can return values and can obtain the val- |
| ues returned by previous actions. These values are kept |
| in objects of type YYSTYPE (see %union). The result |
| value of the action shall be kept on the parse stack |
| with the left-hand side of the rule, to be accessed by |
| other reductions as part of their right-hand side. By |
| using the <tag> information provided in the declarations |
| section, the code generated by yacc can be strictly type |
| checked and contain arbitrary information. In addition, |
| the lexical analyzer can provide the same kinds of val- |
| ues for tokens, if desired. |
| |
| An action is an arbitrary C statement and as such can do |
| input or output, call subprograms, and alter external |
| variables. An action is one or more C statements |
| enclosed in curly braces '{' and '}' . |
| |
| Certain pseudo-variables can be used in the action. |
| These are macros for access to data structures known |
| internally to yacc. |
| |
| $$ |
| The value of the action can be set by assigning |
| it to $$. If type checking is enabled and the |
| type of the value to be assigned cannot be deter- |
| mined, a diagnostic message may be generated. |
| |
| $number |
| This refers to the value returned by the compo- |
| nent specified by the token number in the right |
| side of a rule, reading from left to right; num- |
| ber can be zero or negative. If number is zero or |
| negative, it refers to the data associated with |
| the name on the parser's stack preceding the |
| leftmost symbol of the current rule. (That is, |
| "$0" refers to the name immediately preceding the |
| leftmost name in the current rule to be found on |
| the parser's stack and "$-1" refers to the symbol |
| to its left.) If number refers to an element past |
| the current point in the rule, or beyond the bot- |
| tom of the stack, the result is undefined. If |
| type checking is enabled and the type of the |
| value to be assigned cannot be determined, a |
| diagnostic message may be generated. |
| |
| $<tag>number |
| |
| These correspond exactly to the corresponding |
| symbols without the tag inclusion, but allow for |
| strict type checking (and preclude unwanted type |
| conversions). The effect is that the macro is |
| expanded to use tag to select an element from the |
| YYSTYPE union (using dataname.tag). This is par- |
| ticularly useful if number is not positive. |
| |
| $<tag>$ |
| This imposes on the reference the type of the |
| union member referenced by tag. This construction |
| is applicable when a reference to a left context |
| value occurs in the grammar, and provides yacc |
| with a means for selecting a type. |
| |
| |
| Actions can occur anywhere in a rule (not just at the |
| end); an action can access values returned by actions to |
| its left, and in turn the value it returns can be |
| accessed by actions to its right. An action appearing |
| in the middle of a rule shall be equivalent to replacing |
| the action with a new non-terminal symbol and adding an |
| empty rule with that non-terminal symbol on the left- |
| hand side. The semantic action associated with the new |
| rule shall be equivalent to the original action. The use |
| of actions within rules might introduce conflicts that |
| would not otherwise exist. |
| |
| By default, the value of a rule shall be the value of |
| the first element in it. If the first element does not |
| have a type (particularly in the case of a literal) and |
| type checking is turned on by %type, an error message |
| shall result. |
| |
| precedence |
| The keyword %prec can be used to change the |
| precedence level associated with a particular |
| grammar rule. Examples of this are in cases where |
| a unary and binary operator have the same sym- |
| bolic representation, but need to be given dif- |
| ferent precedences, or where the handling of an |
| ambiguous if-else construction is necessary. The |
| reserved symbol %prec can appear immediately |
| after the body of the grammar rule and can be |
| followed by a token name or a literal. It shall |
| cause the precedence of the grammar rule to |
| become that of the following token name or lit- |
| eral. The action for the rule as a whole can fol- |
| low %prec. |
| |
| |
| If a program section follows, the application shall |
| ensure that the grammar rules are terminated by %%. |
| |
| Programs Section |
| The programs section can include the definition of the |
| lexical analyzer yylex(), and any other functions; for |
| example, those used in the actions specified in the |
| grammar rules. It is unspecified whether the programs |
| section precedes or follows the semantic actions in the |
| output file; therefore, if the application contains any |
| macro definitions and declarations intended to apply to |
| the code in the semantic actions, it shall place them |
| within "%{ ... %}" in the declarations section. |
| |
| Input Grammar |
| The following input to yacc yields a parser for the |
| input to yacc. This formal syntax takes precedence over |
| the preceding text syntax description. |
| |
| The lexical structure is defined less precisely; Lexical |
| Structure of the Grammar defines most terms. The corre- |
| spondence between the previous terms and the tokens |
| below is as follows. |
| |
| IDENTIFIER |
| This corresponds to the concept of name, given |
| previously. It also includes literals as defined |
| previously. |
| |
| C_IDENTIFIER |
| This is a name, and additionally it is known to |
| be followed by a colon. A literal cannot yield |
| this token. |
| |
| NUMBER A string of digits (a non-negative decimal inte- |
| ger). |
| |
| TYPE, LEFT, MARK, LCURL, RCURL |
| |
| These correspond directly to %type, %left, %%, |
| %{, and %}. |
| |
| { ... } |
| This indicates C-language source code, with the |
| possible inclusion of '$' macros as discussed |
| previously. |
| |
| |
| |
| /* Grammar for the input to yacc. */ |
| /* Basic entries. */ |
| /* The following are recognized by the lexical analyzer. */ |
| |
| |
| %token IDENTIFIER /* Includes identifiers and literals */ |
| %token C_IDENTIFIER /* identifier (but not literal) |
| followed by a :. */ |
| %token NUMBER /* [0-9][0-9]* */ |
| |
| |
| /* Reserved words : %type=>TYPE %left=>LEFT, and so on */ |
| |
| |
| %token LEFT RIGHT NONASSOC TOKEN PREC TYPE START UNION |
| |
| |
| %token MARK /* The %% mark. */ |
| %token LCURL /* The %{ mark. */ |
| %token RCURL /* The %} mark. */ |
| |
| |
| /* 8-bit character literals stand for themselves; */ |
| /* tokens have to be defined for multi-byte characters. */ |
| |
| |
| %start spec |
| |
| |
| %% |
| |
| |
| spec : defs MARK rules tail |
| ; |
| tail : MARK |
| { |
| /* In this action, set up the rest of the file. */ |
| } |
| | /* Empty; the second MARK is optional. */ |
| ; |
| defs : /* Empty. */ |
| | defs def |
| ; |
| def : START IDENTIFIER |
| | UNION |
| { |
| /* Copy union definition to output. */ |
| } |
| | LCURL |
| { |
| /* Copy C code to output file. */ |
| } |
| RCURL |
| | rword tag nlist |
| ; |
| rword : TOKEN |
| | LEFT |
| | RIGHT |
| | NONASSOC |
| | TYPE |
| ; |
| tag : /* Empty: union tag ID optional. */ |
| | '<' IDENTIFIER '>' |
| ; |
| nlist : nmno |
| | nlist nmno |
| ; |
| nmno : IDENTIFIER /* Note: literal invalid with % type. */ |
| | IDENTIFIER NUMBER /* Note: invalid with % type. */ |
| ; |
| |
| |
| /* Rule section */ |
| |
| |
| rules : C_IDENTIFIER rbody prec |
| | rules rule |
| ; |
| rule : C_IDENTIFIER rbody prec |
| | '|' rbody prec |
| ; |
| rbody : /* empty */ |
| | rbody IDENTIFIER |
| | rbody act |
| ; |
| act : '{' |
| { |
| /* Copy action, translate $$, and so on. */ |
| } |
| '}' |
| ; |
| prec : /* Empty */ |
| | PREC IDENTIFIER |
| | PREC IDENTIFIER act |
| | prec ';' |
| ; |
| |
| Conflicts |
| The parser produced for an input grammar may contain |
| states in which conflicts occur. The conflicts occur |
| because the grammar is not LALR(1). An ambiguous grammar |
| always contains at least one LALR(1) conflict. The yacc |
| utility shall resolve all conflicts, using either |
| default rules or user-specified precedence rules. |
| |
| Conflicts are either shift/reduce conflicts or |
| reduce/reduce conflicts. A shift/reduce conflict is |
| where, for a given state and lookahead symbol, both a |
| shift action and a reduce action are possible. A |
| reduce/reduce conflict is where, for a given state and |
| lookahead symbol, reductions by two different rules are |
| possible. |
| |
| The rules below describe how to specify what actions to |
| take when a conflict occurs. Not all shift/reduce con- |
| flicts can be successfully resolved this way because the |
| conflict may be due to something other than ambiguity, |
| so incautious use of these facilities can cause the lan- |
| guage accepted by the parser to be much different from |
| that which was intended. The description file shall con- |
| tain sufficient information to understand the cause of |
| the conflict. Where ambiguity is the reason either the |
| default or explicit rules should be adequate to produce |
| a working parser. |
| |
| The declared precedences and associativities (see Decla- |
| rations Section ) are used to resolve parsing conflicts |
| as follows: |
| |
| 1. A precedence and associativity is associated with |
| each grammar rule; it is the precedence and associa- |
| tivity of the last token or literal in the body of |
| the rule. If the %prec keyword is used, it overrides |
| this default. Some grammar rules might not have both |
| precedence and associativity. |
| |
| |
| 2. If there is a shift/reduce conflict, and both the |
| grammar rule and the input symbol have precedence |
| and associativity associated with them, then the |
| conflict is resolved in favor of the action (shift |
| or reduce) associated with the higher precedence. If |
| the precedences are the same, then the associativity |
| is used; left associative implies reduce, right |
| associative implies shift, and non-associative |
| implies an error in the string being parsed. |
| |
| |
| 3. When there is a shift/reduce conflict that cannot be |
| resolved by rule 2, the shift is done. Conflicts |
| resolved this way are counted in the diagnostic out- |
| put described in Error Handling . |
| |
| |
| 4. When there is a reduce/reduce conflict, a reduction |
| is done by the grammar rule that occurs earlier in |
| the input sequence. Conflicts resolved this way are |
| counted in the diagnostic output described in Error |
| Handling . |
| |
| |
| Conflicts resolved by precedence or associativity shall |
| not be counted in the shift/reduce and reduce/reduce |
| conflicts reported by yacc on either standard error or |
| in the description file. |
| |
| Error Handling |
| The token error shall be reserved for error handling. |
| The name error can be used in grammar rules. It indi- |
| cates places where the parser can recover from a syntax |
| error. The default value of error shall be 256. Its |
| value can be changed using a %token declaration. The |
| lexical analyzer should not return the value of error. |
| |
| The parser shall detect a syntax error when it is in a |
| state where the action associated with the lookahead |
| symbol is error. A semantic action can cause the parser |
| to initiate error handling by executing the macro YYER- |
| ROR. When YYERROR is executed, the semantic action |
| passes control back to the parser. YYERROR cannot be |
| used outside of semantic actions. |
| |
| When the parser detects a syntax error, it normally |
| calls yyerror() with the character string "syntax error" |
| as its argument. The call shall not be made if the |
| parser is still recovering from a previous error when |
| the error is detected. The parser is considered to be |
| recovering from a previous error until the parser has |
| shifted over at least three normal input symbols since |
| the last error was detected or a semantic action has |
| executed the macro yyerrok. The parser shall not call |
| yyerror() when YYERROR is executed. |
| |
| The macro function YYRECOVERING shall return 1 if a syn- |
| tax error has been detected and the parser has not yet |
| fully recovered from it. Otherwise, zero shall be |
| returned. |
| |
| When a syntax error is detected by the parser, the |
| parser shall check if a previous syntax error has been |
| detected. If a previous error was detected, and if no |
| normal input symbols have been shifted since the preced- |
| ing error was detected, the parser checks if the looka- |
| head symbol is an endmarker (see Interface to the Lexi- |
| cal Analyzer ). If it is, the parser shall return with a |
| non-zero value. Otherwise, the lookahead symbol shall be |
| discarded and normal parsing shall resume. |
| |
| When YYERROR is executed or when the parser detects a |
| syntax error and no previous error has been detected, or |
| at least one normal input symbol has been shifted since |
| the previous error was detected, the parser shall pop |
| back one state at a time until the parse stack is empty |
| or the current state allows a shift over error. If the |
| parser empties the parse stack, it shall return with a |
| non-zero value. Otherwise, it shall shift over error and |
| then resume normal parsing. If the parser reads a looka- |
| head symbol before the error was detected, that symbol |
| shall still be the lookahead symbol when parsing is |
| resumed. |
| |
| The macro yyerrok in a semantic action shall cause the |
| parser to act as if it has fully recovered from any pre- |
| vious errors. The macro yyclearin shall cause the parser |
| to discard the current lookahead token. If the current |
| lookahead token has not yet been read, yyclearin shall |
| have no effect. |
| |
| The macro YYACCEPT shall cause the parser to return with |
| the value zero. The macro YYABORT shall cause the parser |
| to return with a non-zero value. |
| |
| Interface to the Lexical Analyzer |
| The yylex() function is an integer-valued function that |
| returns a token number representing the kind of token |
| read. If there is a value associated with the token |
| returned by yylex() (see the discussion of tag above), |
| it shall be assigned to the external variable yylval. |
| |
| If the parser and yylex() do not agree on these token |
| numbers, reliable communication between them cannot |
| occur. For (single-byte character) literals, the token |
| is simply the numeric value of the character in the cur- |
| rent character set. The numbers for other tokens can |
| either be chosen by yacc, or chosen by the user. In |
| either case, the #define construct of C is used to allow |
| yylex() to return these numbers symbolically. The |
| #define statements are put into the code file, and the |
| header file if that file is requested. The set of char- |
| acters permitted by yacc in an identifier is larger than |
| that permitted by C. Token names found to contain such |
| characters shall not be included in the #define declara- |
| tions. |
| |
| If the token numbers are chosen by yacc, the tokens |
| other than literals shall be assigned numbers greater |
| than 256, although no order is implied. A token can be |
| explicitly assigned a number by following its first |
| appearance in the declarations section with a number. |
| Names and literals not defined this way retain their |
| default definition. All token numbers assigned by yacc |
| shall be unique and distinct from the token numbers used |
| for literals and user-assigned tokens. If duplicate |
| token numbers cause conflicts in parser generation, yacc |
| shall report an error; otherwise, it is unspecified |
| whether the token assignment is accepted or an error is |
| reported. |
| |
| The end of the input is marked by a special token called |
| the endmarker, which has a token number that is zero or |
| negative. (These values are invalid for any other |
| token.) All lexical analyzers shall return zero or nega- |
| tive as a token number upon reaching the end of their |
| input. If the tokens up to, but excluding, the endmarker |
| form a structure that matches the start symbol, the |
| parser shall accept the input. If the endmarker is seen |
| in any other context, it shall be considered an error. |
| |
| Completing the Program |
| In addition to yyparse() and yylex(), the functions |
| yyerror() and main() are required to make a complete |
| program. The application can supply main() and yyer- |
| ror(), or those routines can be obtained from the yacc |
| library. |
| |
| Yacc Library |
| The following functions shall appear only in the yacc |
| library accessible through the -l y operand to c99; they |
| can therefore be redefined by a conforming application: |
| |
| int main(void) |
| |
| This function shall call yyparse() and exit with |
| an unspecified value. Other actions within this |
| function are unspecified. |
| |
| int yyerror(const char *s) |
| |
| This function shall write the NUL-terminated |
| argument to standard error, followed by a <new- |
| line>. |
| |
| |
| The order of the -l y and -l l operands given to c99 is |
| significant; the application shall either provide its |
| own main() function or ensure that -l y precedes -l l. |
| |
| Debugging the Parser |
| The parser generated by yacc shall have diagnostic |
| facilities in it that can be optionally enabled at |
| either compile time or at runtime (if enabled at compile |
| time). The compilation of the runtime debugging code is |
| under the control of YYDEBUG, a preprocessor symbol. If |
| YYDEBUG has a non-zero value, the debugging code shall |
| be included. If its value is zero, the code shall not be |
| included. |
| |
| In parsers where the debugging code has been included, |
| the external int yydebug can be used to turn debugging |
| on (with a non-zero value) and off (zero value) at run- |
| time. The initial value of yydebug shall be zero. |
| |
| When -t is specified, the code file shall be built such |
| that, if YYDEBUG is not already defined at compilation |
| time (using the c99 -D YYDEBUG option, for example), |
| YYDEBUG shall be set explicitly to 1. When -t is not |
| specified, the code file shall be built such that, if |
| YYDEBUG is not already defined, it shall be set explic- |
| itly to zero. |
| |
| The format of the debugging output is unspecified but |
| includes at least enough information to determine the |
| shift and reduce actions, and the input symbols. It also |
| provides information about error recovery. |
| |
| Algorithms |
| The parser constructed by yacc implements an LALR(1) |
| parsing algorithm as documented in the literature. It is |
| unspecified whether the parser is table-driven or |
| direct-coded. |
| |
| A parser generated by yacc shall never request an input |
| symbol from yylex() while in a state where the only |
| actions other than the error action are reductions by a |
| single rule. |
| |
| The literature of parsing theory defines these concepts. |
| |
| Limits |
| The yacc utility may have several internal tables. The |
| minimum maximums for these tables are shown in the fol- |
| lowing table. The exact meaning of these values is |
| implementation-defined. The implementation shall define |
| the relationship between these values and between them |
| and any error messages that the implementation may gen- |
| erate should it run out of space for any internal struc- |
| ture. An implementation may combine groups of these |
| resources into a single pool as long as the total avail- |
| able to the user does not fall below the sum of the |
| sizes specified by this section. |
| |
| Table: Internal Limits in yacc |
| |
| Minimum |
| Limit Maximum Description |
| {NTERMS} 126 Number of tokens. |
| {NNONTERM} 200 Number of non-terminals. |
| {NPROD} 300 Number of rules. |
| {NSTATES} 600 Number of states. |
| {MEMSIZE} 5200 Length of rules. The total length, in |
| names (tokens and non-terminals), of all |
| the rules of the grammar. The left-hand |
| side is counted for each rule, even if |
| it is not explicitly repeated, as speci- |
| fied in Grammar Rules in yacc . |
| {ACTSIZE} 4000 Number of actions. "Actions" here (and |
| in the description file) refer to parser |
| actions (shift, reduce, and so on) not |
| to semantic actions defined in Grammar |
| Rules in yacc . |
| |
| EXIT STATUS |
| The following exit values shall be returned: |
| |
| 0 Successful completion. |
| |
| >0 An error occurred. |
| |
| |
| CONSEQUENCES OF ERRORS |
| If any errors are encountered, the run is aborted and |
| yacc exits with a non-zero status. Partial code files |
| and header files may be produced. The summary informa- |
| tion in the description file shall always be produced if |
| the -v flag is present. |
| |
| The following sections are informative. |
| |
| APPLICATION USAGE |
| Historical implementations experience name conflicts on |
| the names yacc.tmp, yacc.acts, yacc.debug, y.tab.c, |
| y.tab.h, and y.output if more than one copy of yacc is |
| running in a single directory at one time. The -b option |
| was added to overcome this problem. The related problem |
| of allowing multiple yacc parsers to be placed in the |
| same file was addressed by adding a -p option to over- |
| ride the previously hard-coded yy variable prefix. |
| |
| The description of the -p option specifies the minimal |
| set of function and variable names that cause conflict |
| when multiple parsers are linked together. YYSTYPE does |
| not need to be changed. Instead, the programmer can use |
| -b to give the header files for different parsers dif- |
| ferent names, and then the file with the yylex() for a |
| given parser can include the header for that parser. |
| Names such as yyclearerr do not need to be changed |
| because they are used only in the actions; they do not |
| have linkage. It is possible that an implementation has |
| other names, either internal ones for implementing |
| things such as yyclearerr, or providing non-standard |
| features that it wants to change with -p. |
| |
| Unary operators that are the same token as a binary |
| operator in general need their precedence adjusted. This |
| is handled by the %prec advisory symbol associated with |
| the particular grammar rule defining that unary opera- |
| tor. (See Grammar Rules in yacc .) Applications are not |
| required to use this operator for unary operators, but |
| the grammars that do not require it are rare. |
| |
| EXAMPLES |
| Access to the yacc library is obtained with library |
| search operands to c99. To use the yacc library main(): |
| |
| |
| c99 y.tab.c -l y |
| |
| Both the lex library and the yacc library contain |
| main(). To access the yacc main(): |
| |
| |
| c99 y.tab.c lex.yy.c -l y -l l |
| |
| This ensures that the yacc library is searched first, so |
| that its main() is used. |
| |
| The historical yacc libraries have contained two simple |
| functions that are normally coded by the application |
| programmer. These functions are similar to the follow- |
| ing code: |
| |
| |
| #include <locale.h> |
| int main(void) |
| { |
| extern int yyparse(); |
| |
| |
| setlocale(LC_ALL, ""); |
| |
| |
| /* If the following parser is one created by lex, the |
| application must be careful to ensure that LC_CTYPE |
| and LC_COLLATE are set to the POSIX locale. */ |
| (void) yyparse(); |
| return (0); |
| } |
| |
| |
| #include <stdio.h> |
| |
| |
| int yyerror(const char *msg) |
| { |
| (void) fprintf(stderr, "%s\n", msg); |
| return (0); |
| } |
| |
| RATIONALE |
| The references in may be helpful in constructing the |
| parser generator. The referenced DeRemer and Pennello |
| article (along with the works it references) describes a |
| technique to generate parsers that conform to this vol- |
| ume of IEEE Std 1003.1-2001. Work in this area contin- |
| ues to be done, so implementors should consult current |
| literature before doing any new implementations. The |
| original Knuth article is the theoretical basis for this |
| kind of parser, but the tables it generates are imprac- |
| tically large for reasonable grammars and should not be |
| used. The "equivalent to" wording is intentional to |
| assure that the best tables that are LALR(1) can be gen- |
| erated. |
| |
| There has been confusion between the class of grammars, |
| the algorithms needed to generate parsers, and the algo- |
| rithms needed to parse the languages. They are all rea- |
| sonably orthogonal. In particular, a parser generator |
| that accepts the full range of LR(1) grammars need not |
| generate a table any more complex than one that accepts |
| SLR(1) (a relatively weak class of LR grammars) for a |
| grammar that happens to be SLR(1). Such an implementa- |
| tion need not recognize the case, either; table compres- |
| sion can yield the SLR(1) table (or one even smaller |
| than that) without recognizing that the grammar is |
| SLR(1). The speed of an LR(1) parser for any class is |
| dependent more upon the table representation and com- |
| pression (or the code generation if a direct parser is |
| generated) than upon the class of grammar that the table |
| generator handles. |
| |
| The speed of the parser generator is somewhat dependent |
| upon the class of grammar it handles. However, the orig- |
| inal Knuth article algorithms for constructing LR |
| parsers were judged by its author to be impractically |
| slow at that time. Although full LR is more complex than |
| LALR(1), as computer speeds and algorithms improve, the |
| difference (in terms of acceptable wall-clock execution |
| time) is becoming less significant. |
| |
| Potential authors are cautioned that the referenced |
| DeRemer and Pennello article previously cited identifies |
| a bug (an over-simplification of the computation of |
| LALR(1) lookahead sets) in some of the LALR(1) algorithm |
| statements that preceded it to publication. They should |
| take the time to seek out that paper, as well as current |
| relevant work, particularly Aho's. |
| |
| The -b option was added to provide a portable method for |
| permitting yacc to work on multiple separate parsers in |
| the same directory. If a directory contains more than |
| one yacc grammar, and both grammars are constructed at |
| the same time (by, for example, a parallel make pro- |
| gram), conflict results. While the solution is not his- |
| torical practice, it corrects a known deficiency in his- |
| torical implementations. Corresponding changes were made |
| to all sections that referenced the filenames y.tab.c |
| (now "the code file"), y.tab.h (now "the header file"), |
| and y.output (now "the description file"). |
| |
| The grammar for yacc input is based on System V documen- |
| tation. The textual description shows there that the |
| ';' is required at the end of the rule. The grammar and |
| the implementation do not require this. (The use of |
| C_IDENTIFIER causes a reduce to occur in the right |
| place.) |
| |
| Also, in that implementation, the constructs such as |
| %token can be terminated by a semicolon, but this is not |
| permitted by the grammar. The keywords such as %token |
| can also appear in uppercase, which is again not dis- |
| cussed. In most places where '%' is used, '\' can be |
| substituted, and there are alternate spellings for some |
| of the symbols (for example, %LEFT can be "%<" or even |
| "\<" ). |
| |
| Historically, <tag> can contain any characters except |
| '>', including white space, in the implementation. How- |
| ever, since the tag must reference an ISO C standard |
| union member, in practice conforming implementations |
| need to support only the set of characters for ISO C |
| standard identifiers in this context. |
| |
| Some historical implementations are known to accept |
| actions that are terminated by a period. Historical |
| implementations often allow '$' in names. A conforming |
| implementation does not need to support either of these |
| behaviors. |
| |
| Deciding when to use %prec illustrates the difficulty in |
| specifying the behavior of yacc. There may be situations |
| in which the grammar is not, strictly speaking, in |
| error, and yet yacc cannot interpret it unambiguously. |
| The resolution of ambiguities in the grammar can in many |
| instances be resolved by providing additional informa- |
| tion, such as using %type or %union declarations. It is |
| often easier and it usually yields a smaller parser to |
| take this alternative when it is appropriate. |
| |
| The size and execution time of a program produced with- |
| out the runtime debugging code is usually smaller and |
| slightly faster in historical implementations. |
| |
| Statistics messages from several historical implementa- |
| tions include the following types of information: |
| |
| |
| n/512 terminals, n/300 non-terminals |
| n/600 grammar rules, n/1500 states |
| n shift/reduce, n reduce/reduce conflicts reported |
| n/350 working sets used |
| Memory: states, etc. n/15000, parser n/15000 |
| n/600 distinct lookahead sets |
| n extra closures |
| n shift entries, n exceptions |
| n goto entries |
| n entries saved by goto default |
| Optimizer space used: input n/15000, output n/15000 |
| n table entries, n zero |
| Maximum spread: n, Maximum offset: n |
| |
| The report of internal tables in the description file is |
| left implementation-defined because all aspects of these |
| limits are also implementation-defined. Some implementa- |
| tions may use dynamic allocation techniques and have no |
| specific limit values to report. |
| |
| The format of the y.output file is not given because |
| specification of the format was not seen to enhance |
| applications portability. The listing is primarily |
| intended to help human users understand and debug the |
| parser; use of y.output by a conforming application |
| script would be unusual. Furthermore, implementations |
| have not produced consistent output and no popular for- |
| mat was apparent. The format selected by the implementa- |
| tion should be human-readable, in addition to the |
| requirement that it be a text file. |
| |
| Standard error reports are not specifically described |
| because they are seldom of use to conforming applica- |
| tions and there was no reason to restrict implementa- |
| tions. |
| |
| Some implementations recognize "={" as equivalent to '{' |
| because it appears in historical documentation. This |
| construction was recognized and documented as obsolete |
| as long ago as 1978, in the referenced Yacc: Yet Another |
| Compiler-Compiler. This volume of IEEE Std 1003.1-2001 |
| chose to leave it as obsolete and omit it. |
| |
| Multi-byte characters should be recognized by the lexi- |
| cal analyzer and returned as tokens. They should not be |
| returned as multi-byte character literals. The token |
| error that is used for error recovery is normally |
| assigned the value 256 in the historical implementation. |
| Thus, the token value 256, which is used in many multi- |
| byte character sets, is not available for use as the |
| value of a user-defined token. |
| |
| FUTURE DIRECTIONS |
| None. |
| |
| SEE ALSO |
| c99, lex |
| |
| COPYRIGHT |
| Portions of this text are reprinted and reproduced in |
| electronic form from IEEE Std 1003.1, 2003 Edition, |
| Standard for Information Technology -- Portable Operat- |
| ing System Interface (POSIX), The Open Group Base Speci- |
| fications Issue 6, Copyright (C) 2001-2003 by the Insti- |
| tute of Electrical and Electronics Engineers, Inc and |
| The Open Group. In the event of any discrepancy between |
| this version and the original IEEE and The Open Group |
| Standard, the original IEEE and The Open Group Standard |
| is the referee document. The original Standard can be |
| obtained online at http://www.open- |
| group.org/unix/online.html . |
| |
| |
| |
| IEEE/The Open Group 2003 YACC(1P) |