qt-everywhere-src-5.14.1/gnuwin32/man/cat1p/yacc.1p.txt - orbit - Git at Google

 YACC(1P)           POSIX Programmer's Manual           YACC(1P)


 PROLOG
        This  manual page is part of the POSIX Programmer's Man-
        ual.  The Linux implementation  of  this  interface  may
        differ  (consult the corresponding Linux manual page for
        details of Linux behavior), or the interface may not  be
        implemented on Linux.

 NAME
        yacc - yet another compiler compiler (DEVELOPMENT)

 SYNOPSIS
        yacc [-dltv][-b file_prefix][-p sym_prefix] grammar

 DESCRIPTION
        The  yacc utility shall read a description of a context-
        free grammar in grammar and write C  source  code,  con-
        forming  to  the  ISO C  standard,  to  a code file, and
        optionally header information into a header file, in the
        current  directory.  The  C code shall define a function
        and related routines and macros for  an  automaton  that
        executes a parsing algorithm meeting the requirements in
        Algorithms .

        The form and meaning of the grammar are described in the
        EXTENDED DESCRIPTION section.

        The C source code and header file shall be produced in a
        form suitable as input for the C compiler (see c99 ).

 OPTIONS
        The yacc utility shall conform to the  Base  Definitions
        volume  of  IEEE Std 1003.1-2001,  Section 12.2, Utility
        Syntax Guidelines.

        The following options shall be supported:

        -b  file_prefix
               Use file_prefix instead of y as  the  prefix  for
               all  output filenames. The code file y.tab.c, the
               header file y.tab.h (created when  -d  is  speci-
               fied), and the description file y.output (created
               when  -v  is  specified),  shall  be  changed  to
               file_prefix   .tab.c,   file_prefix  .tab.h,  and
               file_prefix .output, respectively.

        -d     Write the header file; by default only  the  code
               file is written. The #define statements associate
               the token codes assigned by yacc with  the  user-
               declared  token  names.  This allows source files
               other than y.tab.c to access the token codes.

        -l     Produce a code file that  does  not  contain  any
               #line constructs.  If this option is not present,
               it is unspecified whether the code file or header
               file  contains #line directives. This should only
               be used after  the  grammar  and  the  associated
               actions are fully debugged.

        -p  sym_prefix

               Use  sym_prefix  instead  of yy as the prefix for
               all external names produced by  yacc.  The  names
               affected  shall  include the functions yyparse(),
               yylex(), and yyerror(), and the variables yylval,
               yychar,  and  yydebug.  (In the remainder of this
               section, the six  symbols  cited  are  referenced
               using  their  default  names only as a notational
               convenience.) Local names may also be affected by
               the  -p  option; however, the -p option shall not
               affect #define symbols generated by yacc.

        -t     Modify conditional compilation directives to per-
               mit  compilation  of  debugging  code in the code
               file. Runtime debugging statements  shall  always
               be  contained  in  the  code file, but by default
               conditional compilation directives prevent  their
               compilation.

        -v     Write  a  file  containing  a  description of the
               parser and a report  of  conflicts  generated  by
               ambiguities in the grammar.


 OPERANDS
        The following operand is required:

        grammar
               A  pathname  of  a  file containing instructions,
               hereafter called grammar, for which a  parser  is
               to  be  created.  The  format  for the grammar is
               described in the EXTENDED DESCRIPTION section.


 STDIN
        Not used.

 INPUT FILES
        The file grammar shall be a text file formatted as spec-
        ified in the EXTENDED DESCRIPTION section.

 ENVIRONMENT VARIABLES
        The  following  environment  variables  shall affect the
        execution of yacc:

        LANG   Provide a default value for the internationaliza-
               tion  variables  that are unset or null. (See the
               Base Definitions volume of  IEEE Std 1003.1-2001,
               Section  8.2,  Internationalization Variables for
               the precedence of internationalization  variables
               used  to  determine  the  values  of locale cate-
               gories.)

        LC_ALL If set to a non-empty string value, override  the
               values  of  all  the  other  internationalization
               variables.

        LC_CTYPE
               Determine the locale for  the  interpretation  of
               sequences  of  bytes  of  text data as characters
               (for example, single-byte as  opposed  to  multi-
               byte characters in arguments and input files).

        LC_MESSAGES
               Determine  the  locale  that  should  be  used to
               affect the format and contents of diagnostic mes-
               sages written to standard error.

        NLSPATH
               Determine  the  location  of message catalogs for
               the processing of LC_MESSAGES .


        The LANG and LC_* variables affect the execution of  the
        yacc  utility  as stated. The main() function defined in
        Yacc Library shall call:


               setlocale(LC_ALL, "")

        and thus the program generated by  yacc  shall  also  be
        affected  by the contents of these variables at runtime.

 ASYNCHRONOUS EVENTS
        Default.

 STDOUT
        Not used.

 STDERR
        If shift/reduce or reduce/reduce conflicts are  detected
        in grammar, yacc shall write a report of those conflicts
        to the standard error in an unspecified format.

        Standard error shall also be used  for  diagnostic  mes-
        sages.

 OUTPUT FILES
        The code file, the header file, and the description file
        shall be text files. All are described in the  following
        sections.

    Code File
        This  file  shall  contain  the  C  source  code for the
        yyparse() function. It shall contain code for the  vari-
        ous  semantic  actions with macro substitution performed
        on them as described in the  EXTENDED  DESCRIPTION  sec-
        tion. It also shall contain a copy of the #define state-
        ments in the header file. If  a  %union  declaration  is
        used, the declaration for YYSTYPE shall also be included
        in this file.

    Header File
        The header file shall contain  #define  statements  that
        associate  the  token numbers with the token names. This
        allows source files other than the code file  to  access
        the  token  codes.  If a %union declaration is used, the
        declaration for YYSTYPE and  an  extern  YYSTYPE  yylval
        declaration shall also be included in this file.

    Description File
        The  description  file shall be a text file containing a
        description of the state machine  corresponding  to  the
        parser, using an unspecified format. Limits for internal
        tables (see Limits )  shall  also  be  reported,  in  an
        implementation-defined manner. (Some implementations may
        use dynamic allocation techniques and have  no  specific
        limit values to report.)

 EXTENDED DESCRIPTION
        The  yacc  command  accepts  a  language that is used to
        define a grammar for a target language to be  parsed  by
        the  tables  and  code  generated  by yacc. The language
        accepted by yacc as a grammar for the target language is
        described below using the yacc input language itself.

        The  input  grammar  includes rules describing the input
        structure of the target language and code to be  invoked
        when  these  rules are recognized to provide the associ-
        ated semantic action. The  code  to  be  executed  shall
        appear  as bodies of text that are intended to be C-lan-
        guage code. The C-language inclusions  are  presumed  to
        form  a correct function when processed by yacc into its
        output files. The code included in  this  way  shall  be
        executed  during the recognition of the target language.

        Given a grammar, the yacc utility  generates  the  files
        described in the OUTPUT FILES section. The code file can
        be compiled and linked using c99. If the declaration and
        programs  sections  of  the grammar file did not include
        definitions of main(), yylex(), and yyerror(), the  com-
        piled  output  requires linking with externally supplied
        versions of those functions. Default versions of  main()
        and  yyerror()  are supplied in the yacc library and can
        be linked in by using the -l y operand to c99.  The yacc
        library  interfaces  need  not  support  interfaces with
        other than the default yy symbol prefix. The application
        provides the lexical analyzer function, yylex(); the lex
        utility is specifically designed to generate such a rou-
        tine.

    Input Language
        The  application  shall  ensure that every specification
        file consists of three sections in order:  declarations,
        grammar rules, and programs, separated by double percent
        signs ( "%%" ). The declarations and  programs  sections
        can be empty. If the latter is empty, the preceding "%%"
        mark separating it from the rules section can  be  omit-
        ted.

        The  input  is free form text following the structure of
        the grammar defined below.

    Lexical Structure of the Grammar
        The <blank>s,  <newline>s,  and  <form-feed>s  shall  be
        ignored,  except  that the application shall ensure that
        they do not appear in names or multi-character  reserved
        symbols.  Comments shall be enclosed in "/* ... */", and
        can appear wherever a name is valid.

        Names are of arbitrary length, made up of letters, peri-
        ods  ( '.'  ), underscores ( '_' ), and non-initial dig-
        its. Uppercase and lowercase letters are distinct.  Con-
        forming applications shall not use names beginning in yy
        or YY since the yacc parser uses such names. Many of the
        names  appear in the final output of yacc, and thus they
        should be chosen to conform with  any  additional  rules
        created by the C compiler to be used. In particular they
        appear in #define statements.

        A literal shall consist of a single  character  enclosed
        in  single-quotes  (  '"  ). All of the escape sequences
        supported for character constants by the ISO C  standard
        shall be supported by yacc.

        The  relationship with the lexical analyzer is discussed
        in detail below.

        The application shall ensure that the NUL  character  is
        not used in grammar rules or literals.

    Declarations Section
        The  declarations  section is used to define the symbols
        used to define the target language and  their  relation-
        ship  with  each other. In particular, much of the addi-
        tional information required to  resolve  ambiguities  in
        the context-free grammar for the target language is pro-
        vided here.

        Usually yacc assigns the relationship between  the  sym-
        bolic  names  it  generates and their underlying numeric
        value. The declarations section  makes  it  possible  to
        control the assignment of these values.

        It is also possible to keep semantic information associ-
        ated with the tokens currently on the parse stack  in  a
        user-defined  C-language  union,  if  the members of the
        union are associated with the various names in the gram-
        mar. The declarations section provides for this as well.

        The first group of declarators below all take a list  of
        names  as  arguments.   That list can optionally be pre-
        ceded by the name of a C  union  member  (called  a  tag
        below)  appearing  within '<' and '>' . (As an exception
        to the typographical conventions of  the  rest  of  this
        volume  of IEEE Std 1003.1-2001, in this case <tag> does
        not represent a  metavariable,  but  the  literal  angle
        bracket characters surrounding a symbol.) The use of tag
        specifies that the tokens named on this line shall be of
        the  same  C type as the union member referenced by tag.
        This is discussed in more detail below.

        For lists used to define tokens, the first appearance of
        a  given token can be followed by a positive integer (as
        a string of decimal digits). If this is done, the under-
        lying value assigned to it for lexical purposes shall be
        taken to be that number.

        The following declares name to be a token:


               %token [<tag>] name [number][name [number]]...

        If tag is present, the C type for  all  tokens  on  this
        line shall be declared to be the type referenced by tag.
        If a positive integer,  number,  follows  a  name,  that
        value shall be assigned to the token.

        The  following  declares name to be a token, and assigns
        precedence to it:


               %left [<tag>] name [number][name [number]]...
               %right [<tag>] name [number][name [number]]...

        One or more lines, each beginning with one of these sym-
        bols, can appear in this section. All tokens on the same
        line have the same precedence level  and  associativity;
        the lines are in order of increasing precedence or bind-
        ing strength. %left denotes that the operators  on  that
        line  are left associative, and %right similarly denotes
        right associative operators. If tag is present, it shall
        declare a C type for names as described for %token.

        The following declares name to be a token, and indicates
        that this cannot be used associatively:


               %nonassoc [<tag>] name [number][name [number]]...

        If the parser encounters associative use of  this  token
        it reports an error. If tag is present, it shall declare
        a C type for names as described for %token.

        The following declares that union member names are  non-
        terminals,  and  thus it is required to have a tag field
        at its beginning:


               %type <tag> name...

        Because it deals with non-terminals  only,  assigning  a
        token  number  or using a literal is also prohibited. If
        this construct  is  present,  yacc  shall  perform  type
        checking;  if  this  construct is not present, the parse
        stack shall hold only the int type.

        Every name used in grammar  not  defined  by  a  %token,
        %left,  %right,  or  %nonassoc declaration is assumed to
        represent a non-terminal symbol. The yacc utility  shall
        report  an  error  for any non-terminal symbol that does
        not appear on the left side  of  at  least  one  grammar
        rule.

        Once  the type, precedence, or token number of a name is
        specified, it shall not be changed. If the first  decla-
        ration  of  a token does not assign a token number, yacc
        shall assign a token number.  Once  this  assignment  is
        made,  the token number shall not be changed by explicit
        assignment.

        The following declarators do  not  follow  the  previous
        pattern.

        The  following  declares the non-terminal name to be the
        start symbol, which represents the largest, most general
        structure described by the grammar rules:


               %start name

        By  default, it is the left-hand side of the first gram-
        mar rule; this default can be overridden with this  dec-
        laration.

        The  following  declares  the  yacc  value stack to be a
        union of the various types of values desired:


               %union { body of union (in C) }

        By default, the values returned by actions  (see  below)
        and  the lexical analyzer shall be of type int. The yacc
        utility keeps track of types, and it shall insert corre-
        sponding  union  member names in order to perform strict
        type checking of the resulting parser.

        Alternatively, given that at least one  <tag>  construct
        is  used,  the  union  can  be declared in a header file
        (which shall be included in the declarations section  by
        using  a  #include  construct  within  %{ and %}), and a
        typedef used to define the symbol YYSTYPE  to  represent
        this  union. The effect of %union is to provide the dec-
        laration of YYSTYPE directly from the yacc input.

        C-language declarations and definitions  can  appear  in
        the  declarations  section,  enclosed  by  the following
        marks:


               %{ ... %}

        These statements shall be copied into the code file, and
        have  global scope within it so that they can be used in
        the rules and program sections.

        The application shall ensure that the declarations  sec-
        tion is terminated by the token %%.

    Grammar Rules in yacc
        The rules section defines the context-free grammar to be
        accepted by the function yacc generates, and  associates
        with  those  rules  C-language  actions  and  additional
        precedence information.  The grammar is described below,
        and a formal definition follows.

        The  rules  section  is comprised of one or more grammar
        rules. A grammar rule has the form:


               A : BODY ;

        The symbol A represents a non-terminal  name,  and  BODY
        represents  a  sequence of zero or more names, literals,
        and semantic  actions  that  can  then  be  followed  by
        optional  precedence  rules. Only the names and literals
        participate in the formation of the grammar; the  seman-
        tic actions and precedence rules are used in other ways.
        The colon and the semicolon  are  yacc  punctuation.  If
        there are several successive grammar rules with the same
        left-hand side, the vertical bar  '|'  can  be  used  to
        avoid  rewriting  the  left-hand  side; in this case the
        semicolon appears only after the  last  rule.  The  BODY
        part  can  be  empty (or empty of names and literals) to
        indicate that the non-terminal symbol matches the  empty
        string.

        The  yacc  utility assigns a unique number to each rule.
        Rules using  the  vertical  bar  notation  are  distinct
        rules.  The  number  assigned to the rule appears in the
        description file.

        The elements comprising a BODY are:

        name, literal
               These form the rules  of  the  grammar:  name  is
               either  a token or a non-terminal; literal stands
               for itself (less the lexically required quotation
               marks).

        semantic action

               With  each  grammar  rule, the user can associate
               actions to be performed each  time  the  rule  is
               recognized  in  the input process. (Note that the
               word "action" can also refer to  the  actions  of
               the parser-shift, reduce, and so on.)

        These  actions can return values and can obtain the val-
        ues returned by previous actions. These values are  kept
        in  objects  of  type  YYSTYPE  (see %union). The result
        value of the action shall be kept  on  the  parse  stack
        with  the  left-hand side of the rule, to be accessed by
        other reductions as part of their right-hand  side.   By
        using the <tag> information provided in the declarations
        section, the code generated by yacc can be strictly type
        checked  and contain arbitrary information. In addition,
        the lexical analyzer can provide the same kinds of  val-
        ues for tokens, if desired.

        An action is an arbitrary C statement and as such can do
        input or output, call subprograms,  and  alter  external
        variables.  An  action  is  one  or  more  C  statements
        enclosed in curly braces '{' and '}' .

        Certain pseudo-variables can  be  used  in  the  action.
        These  are  macros  for  access to data structures known
        internally to yacc.

        $$
               The value of the action can be set  by  assigning
               it  to  $$.  If  type checking is enabled and the
               type of the value to be assigned cannot be deter-
               mined, a diagnostic message may be generated.

        $number
               This  refers  to the value returned by the compo-
               nent specified by the token number in  the  right
               side  of a rule, reading from left to right; num-
               ber can be zero or negative. If number is zero or
               negative,  it  refers to the data associated with
               the name on  the  parser's  stack  preceding  the
               leftmost  symbol  of  the current rule. (That is,
               "$0" refers to the name immediately preceding the
               leftmost  name in the current rule to be found on
               the parser's stack and "$-1" refers to the symbol
               to its left.) If number refers to an element past
               the current point in the rule, or beyond the bot-
               tom  of  the  stack,  the result is undefined. If
               type checking is enabled  and  the  type  of  the
               value  to  be  assigned  cannot  be determined, a
               diagnostic message may be generated.

        $<tag>number

               These correspond  exactly  to  the  corresponding
               symbols  without the tag inclusion, but allow for
               strict type checking (and preclude unwanted  type
               conversions).  The  effect  is  that the macro is
               expanded to use tag to select an element from the
               YYSTYPE  union (using dataname.tag). This is par-
               ticularly useful if number is not positive.

        $<tag>$
               This imposes on the reference  the  type  of  the
               union member referenced by tag. This construction
               is applicable when a reference to a left  context
               value  occurs  in  the grammar, and provides yacc
               with a means for selecting a type.


        Actions can occur anywhere in a rule (not  just  at  the
        end); an action can access values returned by actions to
        its left, and in  turn  the  value  it  returns  can  be
        accessed  by  actions to its right.  An action appearing
        in the middle of a rule shall be equivalent to replacing
        the  action with a new non-terminal symbol and adding an
        empty rule with that non-terminal symbol  on  the  left-
        hand  side.  The semantic action associated with the new
        rule shall be equivalent to the original action. The use
        of  actions  within rules might introduce conflicts that
        would not otherwise exist.

        By default, the value of a rule shall be  the  value  of
        the  first  element in it. If the first element does not
        have a type (particularly in the case of a literal)  and
        type  checking  is  turned on by %type, an error message
        shall result.

        precedence
               The keyword %prec  can  be  used  to  change  the
               precedence  level  associated  with  a particular
               grammar rule. Examples of this are in cases where
               a  unary  and  binary operator have the same sym-
               bolic representation, but need to be  given  dif-
               ferent  precedences,  or where the handling of an
               ambiguous if-else construction is necessary.  The
               reserved  symbol  %prec  can  appear  immediately
               after the body of the grammar  rule  and  can  be
               followed  by  a token name or a literal. It shall
               cause the  precedence  of  the  grammar  rule  to
               become  that  of the following token name or lit-
               eral. The action for the rule as a whole can fol-
               low %prec.


        If  a  program  section  follows,  the application shall
        ensure that the grammar rules are terminated by %%.

    Programs Section
        The programs section can include the definition  of  the
        lexical  analyzer  yylex(), and any other functions; for
        example, those used in  the  actions  specified  in  the
        grammar  rules.   It is unspecified whether the programs
        section precedes or follows the semantic actions in  the
        output  file; therefore, if the application contains any
        macro definitions and declarations intended to apply  to
        the  code  in  the semantic actions, it shall place them
        within "%{ ... %}" in the declarations section.

    Input Grammar
        The following input to yacc  yields  a  parser  for  the
        input  to yacc. This formal syntax takes precedence over
        the preceding text syntax description.

        The lexical structure is defined less precisely; Lexical
        Structure  of the Grammar defines most terms. The corre-
        spondence between the  previous  terms  and  the  tokens
        below is as follows.

        IDENTIFIER
               This  corresponds  to  the concept of name, given
               previously. It also includes literals as  defined
               previously.

        C_IDENTIFIER
               This  is  a name, and additionally it is known to
               be followed by a colon.  A literal  cannot  yield
               this token.

        NUMBER A  string of digits (a non-negative decimal inte-
               ger).

        TYPE, LEFT, MARK, LCURL, RCURL

               These correspond directly to  %type,  %left,  %%,
               %{, and %}.

        { ... }
               This  indicates  C-language source code, with the
               possible inclusion of  '$'  macros  as  discussed
               previously.


               /* Grammar for the input to yacc. */
               /* Basic entries. */
               /* The following are recognized by the lexical analyzer. */


               %token    IDENTIFIER      /* Includes identifiers and literals */
               %token    C_IDENTIFIER    /* identifier (but not literal)
                                            followed by a :. */
               %token    NUMBER          /* [0-9][0-9]* */


               /* Reserved words : %type=>TYPE %left=>LEFT, and so on */


               %token    LEFT RIGHT NONASSOC TOKEN PREC TYPE START UNION


               %token    MARK            /* The %% mark. */
               %token    LCURL           /* The %{ mark. */
               %token    RCURL           /* The %} mark. */


               /* 8-bit character literals stand for themselves; */
               /* tokens have to be defined for multi-byte characters. */


               %start    spec


               %%


               spec  : defs MARK rules tail
                     ;
               tail  : MARK
                     {
                       /* In this action, set up the rest of the file. */
                     }
                     | /* Empty; the second MARK is optional. */
                     ;
               defs  : /* Empty. */
                     |    defs def
                     ;
               def   : START IDENTIFIER
                     |    UNION
                     {
                       /* Copy union definition to output. */
                     }
                     |    LCURL
                     {
                       /* Copy C code to output file. */
                     }
                       RCURL
                     |    rword tag nlist
                     ;
               rword : TOKEN
                     | LEFT
                     | RIGHT
                     | NONASSOC
                     | TYPE
                     ;
               tag   : /* Empty: union tag ID optional. */
                     | '<' IDENTIFIER '>'
                     ;
               nlist : nmno
                     | nlist nmno
                     ;
               nmno  : IDENTIFIER         /* Note: literal invalid with % type. */
                     | IDENTIFIER NUMBER  /* Note: invalid with % type. */
                     ;


               /* Rule section */


               rules : C_IDENTIFIER rbody prec
                     | rules  rule
                     ;
               rule  : C_IDENTIFIER rbody prec
                     | '|' rbody prec
                     ;
               rbody : /* empty */
                     | rbody IDENTIFIER
                     | rbody act
                     ;
               act   : '{'
                       {
                         /* Copy action, translate $$, and so on. */
                       }
                       '}'
                     ;
               prec  : /* Empty */
                     | PREC IDENTIFIER
                     | PREC IDENTIFIER act
                     | prec ';'
                     ;

    Conflicts
        The  parser  produced  for  an input grammar may contain
        states in which conflicts  occur.  The  conflicts  occur
        because the grammar is not LALR(1). An ambiguous grammar
        always contains at least one LALR(1) conflict. The  yacc
        utility   shall  resolve  all  conflicts,  using  either
        default rules or user-specified precedence rules.

        Conflicts   are   either   shift/reduce   conflicts   or
        reduce/reduce  conflicts.   A  shift/reduce  conflict is
        where, for a given state and lookahead  symbol,  both  a
        shift  action  and  a  reduce  action  are  possible.  A
        reduce/reduce conflict is where, for a given  state  and
        lookahead  symbol, reductions by two different rules are
        possible.

        The rules below describe how to specify what actions  to
        take  when  a conflict occurs. Not all shift/reduce con-
        flicts can be successfully resolved this way because the
        conflict  may  be due to something other than ambiguity,
        so incautious use of these facilities can cause the lan-
        guage  accepted  by the parser to be much different from
        that which was intended. The description file shall con-
        tain  sufficient  information to understand the cause of
        the conflict. Where ambiguity is the reason  either  the
        default  or explicit rules should be adequate to produce
        a working parser.

        The declared precedences and associativities (see Decla-
        rations  Section ) are used to resolve parsing conflicts
        as follows:

         1. A precedence and associativity  is  associated  with
            each grammar rule; it is the precedence and associa-
            tivity of the last token or literal in the  body  of
            the rule. If the %prec keyword is used, it overrides
            this default. Some grammar rules might not have both
            precedence and associativity.


         2. If  there  is  a shift/reduce conflict, and both the
            grammar rule and the input  symbol  have  precedence
            and  associativity  associated  with  them, then the
            conflict is resolved in favor of the  action  (shift
            or reduce) associated with the higher precedence. If
            the precedences are the same, then the associativity
            is  used;  left  associative  implies  reduce, right
            associative  implies  shift,   and   non-associative
            implies an error in the string being parsed.


         3. When there is a shift/reduce conflict that cannot be
            resolved by rule 2, the  shift  is  done.  Conflicts
            resolved this way are counted in the diagnostic out-
            put described in Error Handling .


         4. When there is a reduce/reduce conflict, a  reduction
            is  done  by the grammar rule that occurs earlier in
            the input sequence.  Conflicts resolved this way are
            counted  in the diagnostic output described in Error
            Handling .


        Conflicts resolved by precedence or associativity  shall
        not  be  counted  in  the shift/reduce and reduce/reduce
        conflicts reported by yacc on either standard  error  or
        in the description file.

    Error Handling
        The  token  error  shall be reserved for error handling.
        The name error can be used in grammar  rules.  It  indi-
        cates  places where the parser can recover from a syntax
        error. The default value of  error  shall  be  256.  Its
        value  can  be  changed  using a %token declaration. The
        lexical analyzer should not return the value of error.

        The parser shall detect a syntax error when it is  in  a
        state  where  the  action  associated with the lookahead
        symbol is error. A semantic action can cause the  parser
        to  initiate error handling by executing the macro YYER-
        ROR. When  YYERROR  is  executed,  the  semantic  action
        passes  control  back  to  the parser. YYERROR cannot be
        used outside of semantic actions.

        When the parser detects  a  syntax  error,  it  normally
        calls yyerror() with the character string "syntax error"
        as its argument. The call  shall  not  be  made  if  the
        parser  is  still  recovering from a previous error when
        the error is detected. The parser is  considered  to  be
        recovering  from  a  previous error until the parser has
        shifted over at least three normal input  symbols  since
        the  last  error  was  detected or a semantic action has
        executed the macro yyerrok. The parser  shall  not  call
        yyerror() when YYERROR is executed.

        The macro function YYRECOVERING shall return 1 if a syn-
        tax error has been detected and the parser has  not  yet
        fully  recovered  from  it.  Otherwise,  zero  shall  be
        returned.

        When a syntax error  is  detected  by  the  parser,  the
        parser  shall  check if a previous syntax error has been
        detected. If a previous error was detected,  and  if  no
        normal input symbols have been shifted since the preced-
        ing error was detected, the parser checks if the  looka-
        head  symbol is an endmarker (see Interface to the Lexi-
        cal Analyzer ). If it is, the parser shall return with a
        non-zero value. Otherwise, the lookahead symbol shall be
        discarded and normal parsing shall resume.

        When YYERROR is executed or when the  parser  detects  a
        syntax error and no previous error has been detected, or
        at least one normal input symbol has been shifted  since
        the  previous  error  was detected, the parser shall pop
        back one state at a time until the parse stack is  empty
        or  the current state allows a shift over error.  If the
        parser empties the parse stack, it shall return  with  a
        non-zero value. Otherwise, it shall shift over error and
        then resume normal parsing. If the parser reads a looka-
        head  symbol  before the error was detected, that symbol
        shall still be the  lookahead  symbol  when  parsing  is
        resumed.

        The  macro  yyerrok in a semantic action shall cause the
        parser to act as if it has fully recovered from any pre-
        vious errors. The macro yyclearin shall cause the parser
        to discard the current lookahead token. If  the  current
        lookahead  token  has not yet been read, yyclearin shall
        have no effect.

        The macro YYACCEPT shall cause the parser to return with
        the value zero. The macro YYABORT shall cause the parser
        to return with a non-zero value.

    Interface to the Lexical Analyzer
        The yylex() function is an integer-valued function  that
        returns  a  token  number representing the kind of token
        read. If there is a  value  associated  with  the  token
        returned  by  yylex() (see the discussion of tag above),
        it shall be assigned to the external variable yylval.

        If the parser and yylex() do not agree  on  these  token
        numbers,  reliable  communication  between  them  cannot
        occur. For (single-byte character) literals,  the  token
        is simply the numeric value of the character in the cur-
        rent character set. The numbers  for  other  tokens  can
        either  be  chosen  by  yacc,  or chosen by the user. In
        either case, the #define construct of C is used to allow
        yylex()  to  return  these  numbers  symbolically.   The
        #define statements are put into the code file,  and  the
        header  file if that file is requested. The set of char-
        acters permitted by yacc in an identifier is larger than
        that  permitted  by C. Token names found to contain such
        characters shall not be included in the #define declara-
        tions.

        If  the  token  numbers  are  chosen by yacc, the tokens
        other than literals shall be  assigned  numbers  greater
        than  256,  although no order is implied. A token can be
        explicitly assigned a  number  by  following  its  first
        appearance  in  the  declarations section with a number.
        Names and literals not defined  this  way  retain  their
        default  definition.  All token numbers assigned by yacc
        shall be unique and distinct from the token numbers used
        for  literals  and  user-assigned  tokens.  If duplicate
        token numbers cause conflicts in parser generation, yacc
        shall  report  an  error;  otherwise,  it is unspecified
        whether the token assignment is accepted or an error  is
        reported.

        The end of the input is marked by a special token called
        the endmarker, which has a token number that is zero  or
        negative.  (These  values  are  invalid  for  any  other
        token.) All lexical analyzers shall return zero or nega-
        tive  as  a  token number upon reaching the end of their
        input. If the tokens up to, but excluding, the endmarker
        form  a  structure  that  matches  the start symbol, the
        parser shall accept the input. If the endmarker is  seen
        in any other context, it shall be considered an error.

    Completing the Program
        In  addition  to  yyparse()  and  yylex(), the functions
        yyerror() and main() are required  to  make  a  complete
        program.  The  application  can  supply main() and yyer-
        ror(), or those routines can be obtained from  the  yacc
        library.

    Yacc Library
        The  following  functions  shall appear only in the yacc
        library accessible through the -l y operand to c99; they
        can therefore be redefined by a conforming application:

        int  main(void)

               This  function shall call yyparse() and exit with
               an unspecified value. Other actions  within  this
               function are unspecified.

        int  yyerror(const char *s)

               This  function  shall  write  the  NUL-terminated
               argument to standard error, followed by  a  <new-
               line>.


        The  order of the -l y and -l l operands given to c99 is
        significant; the application shall  either  provide  its
        own main() function or ensure that -l y precedes -l l.

    Debugging the Parser
        The  parser  generated  by  yacc  shall  have diagnostic
        facilities in it  that  can  be  optionally  enabled  at
        either compile time or at runtime (if enabled at compile
        time). The compilation of the runtime debugging code  is
        under  the control of YYDEBUG, a preprocessor symbol. If
        YYDEBUG has a non-zero value, the debugging  code  shall
        be included. If its value is zero, the code shall not be
        included.

        In parsers where the debugging code has  been  included,
        the  external  int yydebug can be used to turn debugging
        on (with a non-zero value) and off (zero value) at  run-
        time. The initial value of yydebug shall be zero.

        When  -t is specified, the code file shall be built such
        that, if YYDEBUG is not already defined  at  compilation
        time  (using  the  c99  -D YYDEBUG option, for example),
        YYDEBUG shall be set explicitly to 1.  When  -t  is  not
        specified,  the  code  file shall be built such that, if
        YYDEBUG is not already defined, it shall be set  explic-
        itly to zero.

        The  format  of  the debugging output is unspecified but
        includes at least enough information  to  determine  the
        shift and reduce actions, and the input symbols. It also
        provides information about error recovery.

    Algorithms
        The parser constructed by  yacc  implements  an  LALR(1)
        parsing algorithm as documented in the literature. It is
        unspecified  whether  the  parser  is  table-driven   or
        direct-coded.

        A  parser generated by yacc shall never request an input
        symbol from yylex() while in  a  state  where  the  only
        actions  other than the error action are reductions by a
        single rule.

        The literature of parsing theory defines these concepts.

    Limits
        The  yacc  utility may have several internal tables. The
        minimum maximums for these tables are shown in the  fol-
        lowing  table.  The  exact  meaning  of  these values is
        implementation-defined.  The implementation shall define
        the  relationship  between these values and between them
        and any error messages that the implementation may  gen-
        erate should it run out of space for any internal struc-
        ture. An implementation  may  combine  groups  of  these
        resources into a single pool as long as the total avail-
        able to the user does not fall  below  the  sum  of  the
        sizes specified by this section.

                     Table: Internal Limits in yacc

                  Minimum
     Limit        Maximum   Description
     {NTERMS}     126       Number of tokens.
     {NNONTERM}   200       Number of non-terminals.
     {NPROD}      300       Number of rules.
     {NSTATES}    600       Number of states.
     {MEMSIZE}    5200      Length of rules. The total length, in
                            names (tokens and non-terminals), of all
                            the rules of the grammar. The left-hand
                            side is counted for each rule, even if
                            it is not explicitly repeated, as speci-
                            fied in Grammar Rules in yacc .
     {ACTSIZE}    4000      Number of actions. "Actions" here (and
                            in the description file) refer to parser
                            actions (shift, reduce, and so on) not
                            to semantic actions defined in Grammar
                            Rules in yacc .

 EXIT STATUS
        The following exit values shall be returned:

         0     Successful completion.

        >0     An error occurred.


 CONSEQUENCES OF ERRORS
        If any errors are encountered, the run  is  aborted  and
        yacc  exits  with  a non-zero status. Partial code files
        and header files may be produced. The  summary  informa-
        tion in the description file shall always be produced if
        the -v flag is present.

        The following sections are informative.

 APPLICATION USAGE
        Historical implementations experience name conflicts  on
        the  names  yacc.tmp,  yacc.acts,  yacc.debug,  y.tab.c,
        y.tab.h, and y.output if more than one copy of  yacc  is
        running in a single directory at one time. The -b option
        was added to overcome this problem. The related  problem
        of  allowing  multiple  yacc parsers to be placed in the
        same file was addressed by adding a -p option  to  over-
        ride the previously hard-coded yy variable prefix.

        The  description  of the -p option specifies the minimal
        set of function and variable names that  cause  conflict
        when  multiple parsers are linked together. YYSTYPE does
        not need to be changed. Instead, the programmer can  use
        -b  to  give the header files for different parsers dif-
        ferent names, and then the file with the yylex()  for  a
        given  parser  can  include  the header for that parser.
        Names such as yyclearerr  do  not  need  to  be  changed
        because  they  are used only in the actions; they do not
        have linkage. It is possible that an implementation  has
        other  names,  either  internal  ones  for  implementing
        things such as  yyclearerr,  or  providing  non-standard
        features that it wants to change with -p.

        Unary  operators  that  are  the  same token as a binary
        operator in general need their precedence adjusted. This
        is  handled by the %prec advisory symbol associated with
        the particular grammar rule defining that  unary  opera-
        tor.  (See Grammar Rules in yacc .) Applications are not
        required to use this operator for unary  operators,  but
        the grammars that do not require it are rare.

 EXAMPLES
        Access  to  the  yacc  library  is obtained with library
        search operands to c99. To use the yacc library main():


               c99 y.tab.c -l y

        Both the  lex  library  and  the  yacc  library  contain
        main().  To access the yacc main():


               c99 y.tab.c lex.yy.c -l y -l l

        This ensures that the yacc library is searched first, so
        that its main() is used.

        The historical yacc libraries have contained two  simple
        functions  that  are  normally  coded by the application
        programmer.  These functions are similar to the  follow-
        ing code:


               #include <locale.h>
               int main(void)
               {
                   extern int yyparse();


                   setlocale(LC_ALL, "");


                   /* If the following parser is one created by lex, the
                      application must be careful to ensure that LC_CTYPE
                      and LC_COLLATE are set to the POSIX locale. */
                   (void) yyparse();
                   return (0);
               }


               #include <stdio.h>


               int yyerror(const char *msg)
               {
                   (void) fprintf(stderr, "%s\n", msg);
                   return (0);
               }

 RATIONALE
        The  references  in  may  be helpful in constructing the
        parser generator.  The referenced DeRemer  and  Pennello
        article (along with the works it references) describes a
        technique to generate parsers that conform to this  vol-
        ume  of IEEE Std 1003.1-2001.  Work in this area contin-
        ues to be done, so implementors should  consult  current
        literature  before  doing  any  new implementations. The
        original Knuth article is the theoretical basis for this
        kind  of parser, but the tables it generates are imprac-
        tically large for reasonable grammars and should not  be
        used.  The  "equivalent  to"  wording  is intentional to
        assure that the best tables that are LALR(1) can be gen-
        erated.

        There  has been confusion between the class of grammars,
        the algorithms needed to generate parsers, and the algo-
        rithms  needed to parse the languages. They are all rea-
        sonably orthogonal. In particular,  a  parser  generator
        that  accepts  the full range of LR(1) grammars need not
        generate a table any more complex than one that  accepts
        SLR(1)  (a  relatively  weak class of LR grammars) for a
        grammar that happens to be SLR(1). Such  an  implementa-
        tion need not recognize the case, either; table compres-
        sion can yield the SLR(1) table  (or  one  even  smaller
        than  that)  without  recognizing  that  the  grammar is
        SLR(1). The speed of an LR(1) parser for  any  class  is
        dependent  more  upon  the table representation and com-
        pression (or the code generation if a direct  parser  is
        generated) than upon the class of grammar that the table
        generator handles.

        The speed of the parser generator is somewhat  dependent
        upon the class of grammar it handles. However, the orig-
        inal  Knuth  article  algorithms  for  constructing   LR
        parsers  were  judged  by its author to be impractically
        slow at that time. Although full LR is more complex than
        LALR(1),  as computer speeds and algorithms improve, the
        difference (in terms of acceptable wall-clock  execution
        time) is becoming less significant.

        Potential  authors  are  cautioned  that  the referenced
        DeRemer and Pennello article previously cited identifies
        a  bug  (an  over-simplification  of  the computation of
        LALR(1) lookahead sets) in some of the LALR(1) algorithm
        statements  that preceded it to publication. They should
        take the time to seek out that paper, as well as current
        relevant work, particularly Aho's.

        The -b option was added to provide a portable method for
        permitting yacc to work on multiple separate parsers  in
        the  same  directory.  If a directory contains more than
        one yacc grammar, and both grammars are  constructed  at
        the  same  time  (by,  for example, a parallel make pro-
        gram), conflict results.  While the solution is not his-
        torical practice, it corrects a known deficiency in his-
        torical implementations. Corresponding changes were made
        to  all  sections  that referenced the filenames y.tab.c
        (now "the code file"), y.tab.h (now "the header  file"),
        and y.output (now "the description file").

        The grammar for yacc input is based on System V documen-
        tation.  The textual description shows  there  that  the
        ';'  is required at the end of the rule. The grammar and
        the implementation do not  require  this.  (The  use  of
        C_IDENTIFIER  causes  a  reduce  to  occur  in the right
        place.)

        Also, in that implementation,  the  constructs  such  as
        %token can be terminated by a semicolon, but this is not
        permitted by the grammar. The keywords  such  as  %token
        can  also  appear  in uppercase, which is again not dis-
        cussed. In most places where '%' is  used,  '\'  can  be
        substituted,  and there are alternate spellings for some
        of the symbols (for example, %LEFT can be "%<"  or  even
        "\<" ).

        Historically,  <tag>  can  contain any characters except
        '>', including white space, in the implementation.  How-
        ever,  since  the  tag  must reference an ISO C standard
        union member,  in  practice  conforming  implementations
        need  to  support  only  the set of characters for ISO C
        standard identifiers in this context.

        Some historical  implementations  are  known  to  accept
        actions  that  are  terminated  by  a period. Historical
        implementations often allow '$' in names.  A  conforming
        implementation  does not need to support either of these
        behaviors.

        Deciding when to use %prec illustrates the difficulty in
        specifying the behavior of yacc. There may be situations
        in which the  grammar  is  not,  strictly  speaking,  in
        error,  and  yet yacc cannot interpret it unambiguously.
        The resolution of ambiguities in the grammar can in many
        instances  be  resolved by providing additional informa-
        tion, such as using %type or %union declarations. It  is
        often  easier  and it usually yields a smaller parser to
        take this alternative when it is appropriate.

        The size and execution time of a program produced  with-
        out  the  runtime  debugging code is usually smaller and
        slightly faster in historical implementations.

        Statistics messages from several historical  implementa-
        tions include the following types of information:


               n/512 terminals, n/300 non-terminals
               n/600 grammar rules, n/1500 states
               n shift/reduce, n reduce/reduce conflicts reported
               n/350 working sets used
               Memory: states, etc. n/15000, parser n/15000
               n/600 distinct lookahead sets
               n extra closures
               n shift entries, n exceptions
               n goto entries
               n entries saved by goto default
               Optimizer space used: input n/15000, output n/15000
               n table entries, n zero
               Maximum spread: n, Maximum offset: n

        The report of internal tables in the description file is
        left implementation-defined because all aspects of these
        limits are also implementation-defined. Some implementa-
        tions may use dynamic allocation techniques and have  no
        specific limit values to report.

        The  format  of  the  y.output file is not given because
        specification of the format  was  not  seen  to  enhance
        applications   portability.  The  listing  is  primarily
        intended to help human users understand  and  debug  the
        parser;  use  of  y.output  by  a conforming application
        script would be  unusual.  Furthermore,  implementations
        have  not produced consistent output and no popular for-
        mat was apparent. The format selected by the implementa-
        tion  should  be  human-readable,  in  addition  to  the
        requirement that it be a text file.

        Standard error reports are  not  specifically  described
        because  they  are  seldom of use to conforming applica-
        tions and there was no reason  to  restrict  implementa-
        tions.

        Some implementations recognize "={" as equivalent to '{'
        because it appears  in  historical  documentation.  This
        construction  was  recognized and documented as obsolete
        as long ago as 1978, in the referenced Yacc: Yet Another
        Compiler-Compiler.  This  volume of IEEE Std 1003.1-2001
        chose to leave it as obsolete and omit it.

        Multi-byte characters should be recognized by the  lexi-
        cal  analyzer and returned as tokens. They should not be
        returned as multi-byte  character  literals.  The  token
        error  that  is  used  for  error  recovery  is normally
        assigned the value 256 in the historical implementation.
        Thus,  the token value 256, which is used in many multi-
        byte character sets, is not available  for  use  as  the
        value of a user-defined token.

 FUTURE DIRECTIONS
        None.

 SEE ALSO
        c99, lex

 COPYRIGHT
        Portions  of  this  text are reprinted and reproduced in
        electronic form from  IEEE  Std  1003.1,  2003  Edition,
        Standard  for Information Technology -- Portable Operat-
        ing System Interface (POSIX), The Open Group Base Speci-
        fications Issue 6, Copyright (C) 2001-2003 by the Insti-
        tute of Electrical and Electronics  Engineers,  Inc  and
        The  Open Group. In the event of any discrepancy between
        this version and the original IEEE and  The  Open  Group
        Standard,  the original IEEE and The Open Group Standard
        is the referee document. The original  Standard  can  be
        obtained        online        at        http://www.open-
        group.org/unix/online.html .


 IEEE/The Open Group           2003                     YACC(1P)