PLaC-3.1
3. Context-free Grammars and Syntactic Analysis

Input:   token sequence

Tasks:
Parsing: construct a derivation according to the concrete syntax,
Tree construction: build a structure tree according to the abstract syntax,
Error handling: detection of an error, message, recovery

Result:  abstract program tree

Compiler module parser:
deterministic stack automaton, augmented by actions for tree construction
top-down parsers: leftmost derivation; tree construction top-down or bottom-up
bottom-up parsers: rightmost derivation backwards; tree construction bottom-up

Abstract program tree (condensed derivation tree):
represented by a
* data structure in memory for the translation phase to operate on,
* linear sequence of nodes on a file (costly in runtime),

* sequence of calls of functions of the translation phase.

(c) 2003 bei Prof. Dr. Uwe Kastens

--------------------------------------------------------------------------------


Lecture Programming Languages and Compilers WS 2013/14 / Slide 301

Objectives:
Relation between parsing and tree construction

In the lecture:
* Explain the tasks, use example on PLaC-1.3.
* Sources of prerequisites:
* context-free grammars: "Grundlagen der Programmiersprachen (2nd Semester), or "Berechenbarkeit und formale
Sprachen" (3rd Semester),
* Tree representation in prefix form, postfix form: "Modellierung" (1st Semester).

Suggested reading:
Kastens / Übersetzerbau, Section 4.1

--------------------------------------------------------------------------------


PLaC-3.1a
Generating the structuring phase from specifications (Eli)

compiler designer          generators          compiler
specifications
Eli

non-lit. tokens          lex. ana
(.gla)          scanner          ident.
generator         Scanner
literals
(GLA)

concrete syntax
(.con)          token sequence
parser

mapping          generator         synt. ana
(.map)          (PGS)
parser

abstract syntax          tree construction
(.lido)          attribute
Map      evaluator          abstr. progr. tree
generator
(Liga)          sem. ana.

(c) 2013 bei Prof. Dr. Uwe Kastens

--------------------------------------------------------------------------------


Lecture Programming Languages and Compilers WS 2013/14 / Slide 301a

Objectives:
Understand how generators build the structuring phase

In the lecture:
Explain

* the flow of information from the specifications to the generators,
* the generated products in the compiler.

Suggested reading:
Kastens / Übersetzerbau, Section 4.1

--------------------------------------------------------------------------------


PLaC-3.2
3.1 Concrete and abstract syntax

concrete syntax          abstract syntax

- context-free grammar          - context-free grammar
- defines the structure of source programs       - defines abstract program trees
- is unambiguous          - is usually ambiguous
- specifies derivation and parser          - translation phase is based on it
- parser actions specify the tree construction   --->- tree construction

- some chain productions have only syntactic purpose

Expr ::= Fact     have no action         no node created
- symbols are mapped {Expr,Fact} ->         to one abstract symbol Exp

- same action at structural equivalent productions: - creates tree nodes
Expr ::= Expr AddOpr Fact &BinEx
Fact ::= Fact MulOpr Opd  &BinEx

- semantically relevant chain productions, e.g.    - are kept (tree node is created)
ParameterDecl ::= Declaration

- terminal symbols          - only semantically relevant ones are kept
identifiers, literals,          identifiers, literals
keywords, special symbols

- concrete syntax and symbol mapping specify    - abstract syntax (can be generated)

(c) 2013 bei Prof. Dr. Uwe Kastens

--------------------------------------------------------------------------------


Lecture Programming Languages and Compilers WS 2013/14 / Slide 302

Objectives:
Distinguish roles and properties of concrete and abstract syntax

In the lecture:
* Use the expression grammar of PLaC-3.3, PLaC-3.4 for comparison.
* Construct abstract syntax systematically.
* Context-free grammars specify trees - not only strings! Is also used in software engineering to specify interfaces.

Suggested reading:
Kastens / Übersetzerbau, Section 4.1

Assignments:
* Generate abstract syntaxes from concrete syntaxes and symbol classes.
* Use Eli for that task. Exercise 10

Questions:
* Why is no information lost, when an expression is represented by an abstract program tree?
* Give examples for semantically irrelevant chain productions outside of expressions.
* Explain: XML-based languages are defined by context-free grammars. Their sentences are textual representations of
trees.

--------------------------------------------------------------------------------


PLaC-3.3
Example: concrete expression grammar

name production          action

p1: Expr  ::= Expr AddOpr Fact BinEx
p2: Expr  ::= Fact
p3: Fact  ::= Fact MulOpr Opd BinEx

derivation tree for a * (b + c) p4: Fact  ::= Opd
p5: Opd   ::= quotesingle (quotesingle Expr quotesingle )quotesingle
Expr          p6: Opd   ::= Ident          IdEx
p2          p7: AddOpr ::= quotesingle +quotesingle          PlusOpr

Fact          p8: AddOpr ::= quotesingle -quotesingle          MinusOpr
p3          p9: MulOpr ::= quotesingle *quotesingle          TimesOpr
Fact MulOpr    Opd    p10: MulOpr ::= quotesingle /quotesingle          DivOpr

p4     p9          p5

Opd     *     ( Expr )
p6          p1          +, - lower precedence
a        Expr  AddOpr Fact         *, / higher precedence
p2     p7       p4
Fact    +    Opd

p4          p6
Opd          c

p6
b

(c) 2013 bei Prof. Dr. Uwe Kastens

--------------------------------------------------------------------------------


Lecture Programming Languages and Compilers WS 2013/14 / Slide 303

Objectives:
Illustrate comparison of concrete and abstract syntax

In the lecture:
* Repeat concepts of "GdP" (slide GdP-2.5): Grammar expresses operator precedences and associativity.
* The derivation tree is constructed by the parser - not necessarily stored as a data structure.
* Chain productions have only one non-terminal symbol on their right-hand side.

Suggested reading:
Kastens / Übersetzerbau, Section 4.1

Suggested reading:
slide GdP-2.5

Questions:
* How does a grammar express operator precedences and associativity?
* What is the purpose of the chain productions in this example.
* What other purposes can chain productions serve?

--------------------------------------------------------------------------------


PLaC-3.3a
Patterns for expression grammars

Expression grammars are systematically constructed,
such that structural properties of expressions are defined:

one level of precedence, binary  one level of precedence, binary
operator,left-associative:          operator,right-associative:

A ::= A Opr B          A ::= B Opr A
A ::= B          A ::= B

one level of precedence,        one level of precedence,
unary Operator, prefix:          unary Operator, postfix:

A ::= Opr A          A ::= A Opr
A ::= B          A ::= B

Elementary operands: only derived     Expressions in parentheses: only
from the nonterminal of the highest       derived from the nonterminal of the
precedence level (be H here):          highest precedence level (assumed to be
H here); contain the nonterminal of the
H ::= Ident          lowest precedence level (be A here):

H ::= quotesingle (quotesingle A quotesingle )quotesingle

(c) 2013 bei Prof. Dr. Uwe Kastens

--------------------------------------------------------------------------------


Lecture Programming Languages and Compilers WS 2013/14 / Slide 303a

Objectives:
Be able to apply the patterns

In the lecture:
Explain the patterns

Assignments:
Apply the patterns to understand given and construct new expression grammars.

--------------------------------------------------------------------------------


PLaC-3.4
Example: abstract expression grammar

name       production

BinEx:    Exp    ::= Exp BinOpr Exp
IdEx:     Exp    ::= Ident
PlusOpr:  BinOpr ::= quotesingle +quotesingle
MinusOpr: BinOpr ::= quotesingle -quotesingle
TimesOpr: BinOpr ::= quotesingle *quotesingle        abstract program tree for a * (b + c)
DivOpr:   BinOpr ::= quotesingle /quotesingle

Exp
BinEx

Exp  BinOpr          Exp
IdEx     TimesOpr          BinEx
a          Exp   BinOpr Exp
*   IdEx  PlusOpr      IdEx
b      +     c

symbol classes:   Exp = { Expr, Fact, Opd }
BinOpr = { AddOpr, MulOpr }

Actions of the concrete syntax: productions of the abstract syntax to create tree nodes for
no action at a concrete chain production: no tree node is created

(c) 2003 bei Prof. Dr. Uwe Kastens

--------------------------------------------------------------------------------


Lecture Programming Languages and Compilers WS 2013/14 / Slide 304

Objectives:
Illustrate comparison of concrete and abstract syntax

In the lecture:
* Repeat concepts of "GdP" (slide GdP-2.9):
* Compare grammars and trees.
* Actions create nodes of the abstract program tree.
* Symbol classes shrink node pairs that represent chain productions into one node

Suggested reading:
Kastens / Übersetzerbau, Section 4.1

Suggested reading:
slide GdP-2.9

Questions:
* Is this abstract grammar unambiguous?
* Why is that irrelevant?

--------------------------------------------------------------------------------


PLaC-3.4a
3.2 Design of concrete grammars

Objectives

The concrete grammar for parsing

* is parsable: fulfills the grammar condition of the chosen
parser generator;
* specifies the intended language - or a small super set of it;
* is provably related to the documented grammar;
* can be mapped to a suitable abstract grammar.

(c) 2013 bei Prof. Dr. Uwe Kastens

--------------------------------------------------------------------------------


Lecture Programming Languages and Compilers WS 2013/14 / Slide 304a

Objectives:
Guiding objectives

In the lecture:
The objectives are explained.

--------------------------------------------------------------------------------


PLaC-3.4aa

A strategy for grammar development

1. Examples: Write at least one example for every intended language construct. Keep the
examples for checking the grammar and the parser.

2. Sub-grammars: Decompose a non-trivial task into topics covered by sub-gammars, e.g.
statements, declarations, expressions, over-all structure.

3. Top-down: Begin with the start symbol of the (sub-)grammar, and refine each nonterminal
according to steps 4 - 7 until all nonterminals of the (sub-)grammar are refined.

4. Alternatives: Check whether the language construct represented by the current
nonterminal, say Statement, shall occur in structurally different alternatives, e.g. while-
statement, if-statement, assignment. Either introduce chain productions, like
Statement ::= WhileStatement | IfStatement | Assignment.
or apply steps 5 - 7 for each alternative separately.

5. Consists of: For each (alternative of a) nonterminal representing a language construct
explain its immediate structure in words, e.g. "A Block is a declaration sequence followed
by a statement sequence, both enclosed in curly braces." Refine only one structural level.
Translate the description into a production. If a sub-structure is not yet specified introduce
a new nonterminal with a speaking name for it, e.g.
Block ::= '{' DeclarationSeq StatementSeq '}'.

6. Natural structure: Make sure that step 5 yields a "natural" structure, which supports
notions used for static or dynamic semantics, e.g. a range for valid bindings.

7. Useful patterns: In step 5 apply patterns for description of sequences, expressions, etc.

(c) 2008 bei Prof. Dr. Uwe Kastens

--------------------------------------------------------------------------------


Lecture Programming Languages and Compilers WS 2013/14 / Slide 304aa

Objectives:
Develop CFGs systematically

In the lecture:
* Apply the strategy for a little task.
* Apply the strategy in context of the running project.
* Apply the patterns of slides GPS-2.10, GPS-2.10, 12, 14, 15.
* The strategy is applicable for the concrete and the abstract syntax.

Suggested reading:
Kastens / Übersetzerbau, Section 4.1

Suggested reading:
slide GdP-2.10ff

--------------------------------------------------------------------------------


PLaC-3.4b
Grammar design for an existing language

* Take the grammar of the language specification literally.

* Only conservative modifications for parsability or for mapping to abstract syntax.

* Describe all modifications.
(see ANSI C Specification in the Eli system description
http://www.uni-paderborn.de/fachbereich/AG/agkastens/eli/examples/eli_cE.html)

* Java language specification (1996):
Specification grammar is not LALR(1).
5 problems are described and how to solve them.

* Ada language specification (1983):
Specification grammar is LALR(1)
- requirement of the language competition

* ANSI C, C++:
several ambiguities and LALR(1) conflicts, e.g.
"dangling else",
"typedef problem":

A (*B);
is a declaration of variable B, if A is a type name,
otherwise it is a call of function A

(c) 2007 bei Prof. Dr. Uwe Kastens

--------------------------------------------------------------------------------


Lecture Programming Languages and Compilers WS 2013/14 / Slide 304b

Objectives:
Avoid document modifications

In the lecture:
* Explain the conservative strategy.
* Java gives a solution for the dangling else problem.
* For typedef problem see PLaC-2.3.

--------------------------------------------------------------------------------


PLaC-3.4c
Grammar design together with language design

Read grammars before writing a new grammar.
Apply grammar patterns systematically (cf. GPS-2.5, GPS-2.8)
* repetitions
* optional constructs
* precedence, associativity of operators

Syntactic structure should reflect semantic structure:

E. g. a range in the sense of scope rules should be represented by a single
subtree of the derivation tree (of the abstract tree).

Violated in Pascal:

functionDeclaration ::= functionHeading block
functionHeading ::= `function` identifier formalParameters `:` resultType `;`

formalParameters together with block form a range,
but identifier does not belong to it

(c) 2007 bei Prof. Dr. Uwe Kastens

--------------------------------------------------------------------------------


Lecture Programming Languages and Compilers WS 2013/14 / Slide 304c

Objectives:
Grammar design rules

In the lecture:
* Refer to GdP slides.
* Explain semantic structure.
* Show violation of the example.

--------------------------------------------------------------------------------


PLaC-3.4d
Syntactic restrictions versus semantic conditions

Express a restriction syntactically only if
it can be completely covered with reasonable complexity:

* Restriction can not be decided syntactically:
e.g. type check in expressions:
BoolExpression ::= IntExpression `<` IntExpression

* Restriction can not always be decided syntactically:
e. g. disallow array type to be used as function result
Type ::= ArrayType | NonArrayType | Identifier
ResultType ::= NonArrayType
If a type identifier may specify an array type,
a semantic condition is needed, anyhow

* Syntactic restriction is unreasonably complex:
e. g. distinction of compile-time expressions from ordinary
expressions requires duplication of the expression syntax.

(c) 2013 bei Prof. Dr. Uwe Kastens

--------------------------------------------------------------------------------


Lecture Programming Languages and Compilers WS 2013/14 / Slide 304d

Objectives:
How to express restrictions

In the lecture:
* Examples are explained.
* Semantic conditions are formulated with attribute grammar concepts, see next chapter.

Assignments:
Discuss further examples for restrictions.

--------------------------------------------------------------------------------


PLaC-3.4e
Eliminate ambiguities
unite syntactic constructs - distinguish them semantically

Examples:

* Java:  ClassOrInterfaceType  ::= ClassType | InterfaceType
InterfaceType          ::= TypeName
ClassType          ::= TypeName

replace first production by
ClassOrInterfaceType  ::= TypeName
semantic analysis distinguishes between class type and interface type

* Pascal: factor          ::= variable | ... | functionDesignator
variable          ::= entireVariable | ...
entireVariable       ::= variableIdentifier
variableIdentifier    ::= identifier          (**)
functionDesignator ::= functionIdentifier          (*)

|     functionIdentifer '(' actualParameters ')'
functionIdentifier    ::= identifier

eliminate marked (*) alternative
semantic analysis checks whether (**) is a function identifier

(c) 2007 bei Prof. Dr. Uwe Kastens

--------------------------------------------------------------------------------


Lecture Programming Languages and Compilers WS 2013/14 / Slide 304e

Objectives:
Typical ambiguities

In the lecture:
* Same notation with different meanings;
* ambiguous, if they occur in the same context.
* Conflicting notations may be separated by several levels of productions (Pascal example)

Questions:

--------------------------------------------------------------------------------


PLaC-3.4f
Unbounded lookahead

The decision for a reduction is determined by a distinguishing token that may
be arbitrarily far to the right:

Example, forward declarations as could have been defined in Pascal:

functionDeclaration ::=
`function` forwardIdent formalParameters `:` resultType `;` `forward`

| `function` functionIdent formalParameters `:` resultType `;` block

The distinction between forwardIdent and functionIdent would require to see
the forward or the begin token.

Replace forwardIdent and functionIdent by the same nonterminal;
distinguish semantically.

(c) 2007 bei Prof. Dr. Uwe Kastens

--------------------------------------------------------------------------------


Lecture Programming Languages and Compilers WS 2013/14 / Slide 304f

Objectives:
Typical situation

In the lecture:
Explain the problem and the solution using the example

Questions:

--------------------------------------------------------------------------------


PLaC-3.5
3.3 Recursive descent parser
top-down (construction of the derivation tree), predictive method

Systematic transformation of a context-free grammar into a set of functions:

non-terminal symbol X          function X
alternative productions for X          branches in the function body
decision set of production pi          decision for branch pi
non-terminal occurrence X ::= ... Y ...      function call Y()
terminal occurrence X ::= ... t ...          accept a token t and read the next token

void Stmt ()

Productions for Stmt:          { switch (CurrSymbol)
{
case decision set for p1:
p1: Stmt ::=          Variable();
Variable quotesingle :=quotesingle Expr          accept(assignSym);
Expr();

break;

case decision set for p2:
accept(whileSym);

Expr();
p2: Stmt ::=          accept(doSym);
quotesingle whilequotesingle Expr quotesingle doquotesingle Stmt          Stmt();
break;
default: Fehlerbehandlung();

} }

(c) 2007 bei Prof. Dr. Uwe Kastens

--------------------------------------------------------------------------------


Lecture Programming Languages and Compilers WS 2013/14 / Slide 305

Objectives:
Understand the construction schema

In the lecture:
Explanation of the method:

* Demonstrate the construction of a left-derivation and the top-down construction of a derivation tree by this animation.
* Relate grammar constructs to function constructs.
* Each function plays the role of an acceptor for a symbol.
* accept function for reading and checking of the next token (scanner).
* Computation of decision sets on PLaC-3.6.
* Decision sets must be pairwise disjoint!

Suggested reading:
Kastens / Übersetzerbau, Section 4.2

Questions:
* A parser algorithm is based on a stack automaton. Where is the stack of a recursive descent parser? What corresponds
to the states of the stack automaton?
* Where can actions be inserted into the functions to output production sequences in postfix or in prefix form?

--------------------------------------------------------------------------------


PLaC-3.6
Grammar conditions for recursive descent

Definition: A context-free grammar is strong LL(1), if for any pair of productions that have the
same symbol on their left-hand sides, A ::= u and A ::= v, the decision sets are disjoint:

DecisionSet (A ::= u) intersection DecisionSet (A ::= v) = emptyset
with
DecisionSet (A ::= u) := if nullable (u) then First (u) union Follow (A) else First (u)

nullable (u) holds iff a derivation u arrowdblright * epsilon exists
First (u)   := { t element T | v element V* exists and a derivation u arrowdblright * t v }
Follow (A):= { t element T | u,v element V* exist, A element N and a derivation S arrowdblright * u A t v }

Example:
production          DecisionSet

p1:  Prog  ::= Block #          begin         non-terminal
p2:  Block ::= begin Decls Stmts end begin          X     First (X)    Follow (X)
p3:  Decls ::= Decl ; Decls          new
p4:  Decls ::=          Ident begin   Prog   begin
p5:  Decl   ::= new Ident          new          Block   begin       # ; end
p6:  Stmts ::= Stmts ; Stmt          begin Ident    Decls   new         Ident begin
p7:  Stmts ::= Stmt          begin Ident    Decl   new          ;
p8:  Stmt  ::= Block          begin         Stmts  begin Ident  ; end
p9:  Stmt  ::= Ident := Ident          Ident          Stmt    begin Ident  ; end

(c) 2013 bei Prof. Dr. Uwe Kastens

--------------------------------------------------------------------------------


Lecture Programming Languages and Compilers WS 2013/14 / Slide 306

Objectives:
Strong LL(1) can easily be checked

In the lecture:
* Explain the definitions using the example.
* First set: set of terminal symbols, which may begin some token sequence that is derivable from u.
* Follow set: set of terminal symbols, which may follow an A in some derivation.
* Disjoint decision sets imply that decisions can be made deterministically using the next input token.
* For k=1: Strong LL(k) is equivalent to LL(k).

Suggested reading:
Kastens / Übersetzerbau, Section 4.2, LL(k) conditions, computation of First sets and Follow sets

Questions:
The example grammar is not strong LL(1).

* Show where the condition is violated.
* Explain the reason for the violation.
* What would happen if we constructed a recursive descent parser although the condition is violated?

--------------------------------------------------------------------------------


PLaC-3.6a
Computation rules for nullable, First, and Follow

Definitions:

nullable(u) holds iff a derivation u arrowdblright * epsilon exists
First(u):= { t element T | v element V* exists and a derivation u arrowdblright * t v }
Follow(A):= { t element T | u,v element V* exist, A element N and a derivation S arrowdblright * u A v such that t element First(v) }

with G = (T, N, P, S); V = T union N; t element T; A element N; u,v element V*

Computation rules:

nullable(epsilon ) = true; nullable(t) = false; nullable(uv) = nullable(u) logicaland nullable(v);
nullable(A) = true iff es gibt  A::=u element P logicaland nullable(u)

First(epsilon ) = emptyset ; First(t) = {t};
First(uv) = if nullable(u) then First(u) union First(v) else First(u)
First(A) = First(u1) union ...union First(un) for all A::=ui element P

Follow(A):
if A=S then # element Follow(A)
if Y::=uAv element P then First(v) reflexsubset Follow(A) and if nullable(v) then Follow(Y) reflexsubset Follow(A)

(c) 2013 bei Prof. Dr. Uwe Kastens

--------------------------------------------------------------------------------


Lecture Programming Languages and Compilers WS 2013/14 / Slide 306a

Objectives:
Compute First- and Follow-sets

In the lecture:
* Explain and apply computation rules

Suggested reading:
Kastens / Übersetzerbau, Section 4.2, LL(k) conditions, computation of First sets and Follow sets

--------------------------------------------------------------------------------


PLaC-3.7
Grammar transformations for LL(1)

Consequences of strong LL(1) condition:       Simple grammar transformations that
A strong LL(1) grammar can not have        keep the defined language invariant:

* alternative productions that begin          left-factorization:
with the same symbols:          non-LL(1) productions     transformed

A ::= v u          A ::= v X
A ::= v w          X ::= u
X ::= w

* productions that are directly or          elimination of direct recursion:
indirectly left-recursive:          A ::= A u          A ::= v X

A ::= v          X ::= u X
X ::=

special case empty v:
u, v, w element V*
X element N does not occur in the          A ::= A u          A ::= u A
original grammar          A ::=          A ::=

(c) 2013 bei Prof. Dr. Uwe Kastens

--------------------------------------------------------------------------------


Lecture Programming Languages and Compilers WS 2013/14 / Slide 307

Objectives:
Understand transformations and their need

In the lecture:
* Argue why strong LL(1) grammars can not have such productions.
* Show why the transformations remove those problems.
* Replacing left-recursion by right recursion would usually distort the structure.
* There are more general rules for indirect recursion.

Questions:
* Apply recursion elimination for expression grammars.

--------------------------------------------------------------------------------


PLaC-3.7a
LL(1) extension for EBNF constructs
EBNF constructs can avoid violation of strong LL(1) condition:

EBNF construct:   Option [ u ]          Repetition ( u )*
Production:         A ::= v [ u ] w          A ::= v ( u )* w

additional
LL(1)-condition:          if nullable(w)
then First(u) intersection  (First(w) union  Follow(A)) = emptyset
else  First(u) intersection  First(w) = emptyset

in recursive        v          v
descent parser:     if (CurrToken in First(u)) { u }    while (CurrToken in First(u)) { u }
w          w

Repetition ( u )+ left as exercise

(c) 2011 bei Prof. Dr. Uwe Kastens

--------------------------------------------------------------------------------


Lecture Programming Languages and Compilers WS 2013/14 / Slide 307a

Objectives:
Understand transformations and their need

In the lecture:
* Show EBNF productions in recursive descent parsers.

Questions:
* Write a strong LL(1) expression grammar using EBNF.

--------------------------------------------------------------------------------


PLaC-3.8
Comparison: top-down vs. bottom-up

Information a stack automaton has when it decides to apply production  A ::= x :
top-down, predictive          bottom-up
leftmost derivation          rightmost derivation backwards

contents of          A
the stack          A          direction of     u

u         v          tree construction          x
x

input          input
accepted     k          accepted          k
lookahead          lookahead

A bottom-up parser has seen more of the input when it decides to apply a production.

Consequence: bottom-up parsers and their grammar classes are more powerful.

(c) 2013 bei Prof. Dr. Uwe Kastens

--------------------------------------------------------------------------------


Lecture Programming Languages and Compilers WS 2013/14 / Slide 308

Objectives:
Understand the decision basis of the automata

In the lecture:
Explain the meaning of the graphics:

* role of the stack: contains states of the automaton,
* accepted input: will not be considered again,
* lookahead: the next k symbols, not yet accepted
* leftmost derivation: leftmost non-terminal is derived next; rightmost correspondingly,
* consequences for the direction of tree construction,
Abbreviations

* LL: (L)eft-to-right, (L)eftmost derivation,
* LR: (L)eft-to-right, (R)ightmost derivation,
* LALR: (L)ook(A)head LR

Suggested reading:
Kastens / Übersetzerbau, Section Text zu Abb. 4.2-1, 4.3-1

Questions:
Use the graphics to explain why a bottom-up parser without lookahead (k=0) is reasonable, but a top-down parser is not.

--------------------------------------------------------------------------------

PLaC-3.9
Leftmost and rightmost derivations

leftmost          rightmost

S          S
=>*          =>*
u   A   v          u  A   ee

=>*          =>
tt      A    v          u   x  ee
=>          =>*
tt      x     v          u   ss   ee
=>          =>*
tt      ss     v          tt   ss   ee

u, v, x element V*
forward  produce          tt, ss, ee element T*

A element N

(c) 2007 bei Prof. Dr. Uwe Kastens

backward   reduce

--------------------------------------------------------------------------------


Lecture Programming Languages and Compilers WS 2013/14 / Slide 309

Objectives:
Understand rightmost derivation backward

In the lecture:
* Explain the two derivation patterns.

--------------------------------------------------------------------------------


PLaC-3.9a

Derivation tree: top-down vs. bottom-up construction

p0:  P ::= D
P1:  D ::= FF
P2:  D ::= FB
P3:  FF ::= 'fun' FI '(' Ps ')' 'fwd'
P4:  FB ::= 'fun' FI '(' Ps ')' B
P5:  Ps ::= Ps PI
P6:  Ps ::=
p7:  B ::= '{' '}'
p8:  FI ::= Id
p9:  PI ::= Id

P          P

p0          p0
D          D

p1          p1
FF          FF

p3          p3
fun FI ( Ps      ) fwd          Ps       ) fwd

p8          p5
Id          PI

p5          p9
Ps   PI          Ps    Id

p5          p5
Ps PI          PI

p6          p9

Ps  Id
p9          p6
Id          FI (
p9          p8
Id          fun id

fun Id (    Id Id ) fwd          fun Id (    Id Id ) fwd

(c) 2008 bei Prof. Dr. Uwe Kastens

--------------------------------------------------------------------------------


Lecture Programming Languages and Compilers WS 2013/14 / Slide 309a

Objectives:
Understand derivation tree construction

In the lecture:
Use this animation to explain

* On the left: construction of a left-derivation.
* The magenta production names indicate that the decision can not be made on the base of the derivation so far and the
next input tokens.
* On the right: construction of a derivation backward (bottom-up).
* No decision problem occurs.
* It is a right-derivation constructed backward.

--------------------------------------------------------------------------------


PLaC-3.9b
LR(0) -Automaton

1          D   8 red.p0         reduction stack          input

fun          FF          1         fun Id(Id Id)fwd

2          9          1 2
FB     red.p1          Id(Id Id)fwd
1 2 11          (Id Id)fwd

FI          p8
10          1 2 3          (Id Id)fwd
3          red.p2          1 2 3 4          Id Id)fwd
p6
Id          1 2 3 4 5        Id Id)fwd

(          1 2 3 4 5 12        Id)fwd
11          p9

red.p8          1 2 3 4 5 13        Id)fwd
4          p5  1 2 3 4 5          Id)fwd

red.p6         12          1 2 3 4 5 12          )fwd

Ps          p9
Id      red.p9          1 2 3 4 5 13          )fwd
p5
5          1 2 3 4 5          )fwd

13          1 2 3 4 5 6          fwd

)       PI      red.p5          1 2 3 4 5 6 7          #
p3

6          1 9          #
{          p1
14       } 15          1 8          #
red.p7  p0
fwd

7        B  16
red.p3       red.p4

(c) 2008 bei Prof. Dr. Uwe Kastens

--------------------------------------------------------------------------------


Lecture Programming Languages and Compilers WS 2013/14 / Slide 309b

Objectives:
Understand understand how LR automata work

In the lecture:
* See PLaC-3.12 for explanations of the operations shift and reduce.
* Execute the automaton.

--------------------------------------------------------------------------------


PLaC-3.10
3.4 LR parsing

LR(k) grammars introduced 1965 by Donald Knuth; non-practical until subclasses were defined.

LR parsers construct the derivation tree bottom-up, a right-derivation backwards.

LR(k) grammar condition can not be checked directly, but
a context-free grammar is LR(k), iff the (canonical) LR(k) automaton is deterministic.
We consider only 1 token lookahead: LR(1).

Comparison of LL and LR states:

The stacks of LR(k) and LL(k) automata contain states.

The construction of LR and LL states is based on the notion of items (see next slide).

Each state of an automaton represents   LL: one item       LR: a set of items
An LL item corresponds to a position in a case branch of a recursive function.

(c) 2007 bei Prof. Dr. Uwe Kastens

--------------------------------------------------------------------------------


Lecture Programming Languages and Compilers WS 2013/14 / Slide 310

Objectives:
Introduction

In the lecture:
* Explain the comparison.

--------------------------------------------------------------------------------


PLaC-3.11
LR(1) items

An item represents the progress of analysis with respect to one production:

[  A  ::=  u .  v     R  ]          e. g.   [ B ::= ( . D ; S )  {#}]

. marks the position of analysis:accepted and reduced . to be accepted

R expected right context:

a set of terminals which may follow in the input
when the complete production is accepted.
(general k>1: R contains sequences of terminals not longer than k)
Items can distinguish different right contexts: [ A ::= u . v  R ] and [ A ::= u . v  R' ]
Reduce item:

[  A  ::=  u v .   R  ]          e. g.   [ B ::= (  D ; S ) .   {#}]

characterizes a reduction using this production if the next input token is in R.
The automaton uses R only for the decision on reductions!
A state of an LR automaton represents a set of items

(c) 2007 bei Prof. Dr. Uwe Kastens

--------------------------------------------------------------------------------


Lecture Programming Languages and Compilers WS 2013/14 / Slide 311

Objectives:
Fundamental notions of LR automata

In the lecture:
Explain

* items are also called situations,
* meaning of an item,
* lookahead in the input and right context in the automaton.
* There is no right context set in case of an LR(0) automaton.

Suggested reading:
Kastens / Übersetzerbau, Section 4.3

Questions:
* What contains the right context set in case of a LR(3) automaton?

--------------------------------------------------------------------------------


PLaC-3.12
LR(1) states and operations

A state of an LR automaton represents a set of items
Each item represents a way in which analysis may
proceed from that state.          2   B ::= ( . D ; S )   {#}

D ::= . D ; a      {;}

A shift transition is made under          D ::= . a          {;}
a token read from input or
a non-terminal symbol          D
obtained from a preceding reduction.          4 B ::= ( D . ; S ) {#}
The state is pushed.          a     D ::= D . ; a     {;}

A reduction is made according to a reduce item.
n states are popped for a production of length n.          3 D ::= a .          {;}    red. p3

Operations:     shift     read and push the next state on the stack
reduce   reduce with a certain production, pop n states from the stack
error     error recognized, report it, recover
stop     input accepted

(c) 2007 bei Prof. Dr. Uwe Kastens

--------------------------------------------------------------------------------


Lecture Programming Languages and Compilers WS 2013/14 / Slide 312

Objectives:
Understand LR(1) states and operations

In the lecture:
Explain

* Sets of items,
* shift transitions,
* reductions.

Suggested reading:
Kastens / Übersetzerbau, Section 4.3

Questions:
* Explain: A state is encoded by a number. A state represents complex information which is important for construction
of the automaton.

--------------------------------------------------------------------------------


PLaC-3.13
Example for a LR(1) automaton

Grammar:          1

p1  B ::= ( D ; S )          B ::= . ( D ; S ) {#}

p2  D ::= D ; a          (          3
2          a
B ::= ( . D ; S ) {#}          D ::= a .          {;}     red. p3
p3  D ::= a          D ::= . D ; a     {;}
p4  S ::= b ; S          D ::= . a          {;}
p5  S ::= b          D          6 D ::= D ; a .     {;}    red. p2

4 B ::= ( D . ; S ) {#}

D ::= D . ; a     {;}        a  7

;          S ::= b . ; S     {)}

In state 7 a decision is     5          S ::= b .          {)}    red. p5
B ::= ( D ; . S ) {#}
required on next input:        D ::= D ; . a     {;}          ;

* if ; then shift          S ::= . b ; S     {)}     b   8
S ::= b ; . S      {)}
S ::= . b          {)}
* if ) then reduce p5          S ::= . b ; S      {)}
S          b    S ::= . b          {)}
10
In states 3, 6, 9, 11 a          B ::= ( D ; S . ) {#}          S
decision is not          9
)
11          S ::= b ; S .      {)}    red. p4
required:          B ::= ( D ; S ) . {#}

* reduce on any input

red. p1, stop

(c) 2007 bei Prof. Dr. Uwe Kastens

--------------------------------------------------------------------------------


Lecture Programming Languages and Compilers WS 2013/14 / Slide 313

Objectives:
Example for states, transitions, and automaton construction

In the lecture:
Use the example to explain

* the start state,
* the creation of new states,
* transitions into successor states,
* transitive closure of item set,
* push and pop of states,
* consequences of left-recursive and right-recursive productions,
* use of right context to decide upon a reduction,
erläutern.

Suggested reading:
Kastens / Übersetzerbau, Section 4.3

Questions:
* Describe the subgraphs for left-recursive and right-recursive productions. How do they differ?
* How does a LR(0) automaton decide upon reductions?

--------------------------------------------------------------------------------


PLaC-3.14

Construction of LR(1) automata
Algorithm:1. Create the start state.
2. For each created state compute the transitive closure of its items.
3. Create transitions and successor states as long as new ones can be created.

Transitive closure is to be applied to each state q:
Consider all items in q with the analysis position        before:2 B ::= ( . D ; S ) {#}
before a non-terminal B:
[ A1 ::= u1 . B v1  R1 ] ... [ An ::= un . B vn  Rn ],       after: 2 B ::= ( . D ; S ) {#}

then for each production B ::= w          D ::= . D ; a     {;}union {;}
[ B ::=  .  w  First (v1 R1)union ...union First (vn Rn)]          D ::= . a          {;}union {;}
has to be added to state q.

Start state:          1 B ::= . ( D ; S ) {#}
Closure of [ S ::=  . u   {#} ]
S ::= u   is the unique start production,
# is an (artificial) end symbol (eof)

2
Successor states:          B ::= ( . D ; S ) {#}
For each symbol x (terminal or non-terminal),          D ::= . D ; a     {;}
which occurs in some items after the analysis position,          D ::= . a          {;}
a transition is created to a successor state.  4          D          a
That contains corresponding items       B ::= ( D . ; S ) {#}     3
with the analysis position          D ::= D . ; a     {;}        D ::= a .          {;}
advanced behind the x occurrence.

(c) 2010 bei Prof. Dr. Uwe Kastens

--------------------------------------------------------------------------------


Lecture Programming Languages and Compilers WS 2013/14 / Slide 314

Objectives:
Understand the method

In the lecture:
Explain using the example on PLaC-3.13:

* transitive closure,
* computation of the right context sets,
* relation between the items of a state and those of one of its successor

Suggested reading:
Kastens / Übersetzerbau, Section 4.3

Questions:
* Explain the role of the right context.
* Explain its computation.

--------------------------------------------------------------------------------


PLaC-3.15
Operations of LR(1) automata

Example:

shift x (terminal or non-terminal):          stack          input reduction
from current state q
under x into the successor state q` ,        1          ( a ; a ; b ; b ) #
push q`          1 2          a ; a ; b ; b ) #
1 2 3          ; a ; b ; b ) #        p3
reduce p:          1 2          ; a ; b ; b ) #
apply production p  B ::= u ,          1 2 4          ; a ; b ; b ) #
pop as many  states,          1 2 4 5          a ; b ; b ) #
as there are symbols in u, from the         1 2 4 5 6          ; b ; b ) #       p2
new current state make a shift with B       1 2          ; b ; b ) #

error:          1 2 4          ; b ; b ) #
the current state has no transition          1 2 4 5          b ; b ) #
under the next input token,          1 2 4 5 7          ; b ) #
issue a message and recover          1 2 4 5 7 8          b ) #
1 2 4 5 7 8 7          ) #      p5
stop:          1 2 4 5 7 8          ) #
reduce start production,          1 2 4 5 7 8 9          ) #     p4
see # in the input          1 2 4 5          ) #
1 2 4 5 10          ) #
1 2 3 5 10 11          #      p1
1          #

(c) 2010 bei Prof. Dr. Uwe Kastens

--------------------------------------------------------------------------------


Lecture Programming Languages and Compilers WS 2013/14 / Slide 315

Objectives:
Understand how the automaton works

In the lecture:
Explain operations

Questions:
* Why does the automaton behave differently on a-sequences than on b-sequences?
* Which behaviour is better?

--------------------------------------------------------------------------------


PLaC-3.16
Left recursion versus right recursion

left recursive productions:          right recursive productions:

p2: D ::= D ; a          p4:  S ::= b ; S
p3: D ::= a          p5:  S ::= b

a          b
2       3     red. p3          5       7    red. p5

D          ;   if next is )
b
4          8

;          S
6    red. p2          9   red. p4
5   a

reduction immediately after          the states for all  ; b  are
each  ; a is accepted          pushed before the first reduction

(c) 2011 bei Prof. Dr. Uwe Kastens

--------------------------------------------------------------------------------


Lecture Programming Languages and Compilers WS 2013/14 / Slide 316

Objectives:
Understand the difference

In the lecture:
Explain

* why right recursion fills the stack deeply,
* why left recursion is advantagous.

--------------------------------------------------------------------------------


PLaC-3.17
LR conflicts

An LR(1) automaton that has conflicts is not deterministic.
Its grammar is not LR(1);
correspondingly defined for any other LR class.

2 kinds of conflicts:
...
reduce-reduce conflict:          A ::= u .   R1  R1, R2
A state contains two reduce items, the          B ::= v .   R2   not
right context sets of which are not disjoint:          ...          disjoint

shift-reduce conflict:
 A state contains          ...
A ::= u .t v   R1
a shift item with the analysis position in front of a  t and   B ::= w .      R2     t element R2
a reduce item with t in its right context set.          ...

(c) 2010 bei Prof. Dr. Uwe Kastens

--------------------------------------------------------------------------------


Lecture Programming Languages and Compilers WS 2013/14 / Slide 317

Objectives:
Understand LR conflicts

In the lecture:
Explain: In certain situations the given input token t can not determine
* Reduce-reduce: which reduction is to be taken;
* Shift-reduce: whether the next token is to be shifted, a reduction is to be made.

Suggested reading:
Kastens / Übersetzerbau, Section 4.3

Questions:
* Why can a shift-shift conflict not exist?
* In LR(0) automata items do not have a right-context set. Explain why a state with a reduce item may not contain any
other item.

--------------------------------------------------------------------------------


PLaC-3.18
Shift-reduce conflict for "dangling else" ambiguity

1 S    ::= . Stmt          {#}         Stmt

Stmt ::= . if ... then Stmt          {#}
Stmt ::= . if ... then Stmt else Stmt   {#}
Stmt ::= . a          {#}         a
if   ...       then

3 Stmt ::= if ... then . Stmt          {#}

Stmt ::= if ... then . Stmt else Stmt    {#}          Stmt
Stmt ::= . if ... then Stmt          {# else}
Stmt ::= . if ... then Stmt else Stmt    {# else}       a
Stmt ::= . a          {# else}
if    ...         then

5 Stmt ::= if ... then . Stmt          {# else}

Stmt ::= if ... then . Stmt else Stmt    {# else}         if
Stmt ::= . if ... then Stmt          {# else}
Stmt ::= . if ... then Stmt else Stmt    {# else}
Stmt ::= . a          {# else}     a

Stmt
6  Stmt ::= if ... then Stmt .          {# else}     else

Stmt ::= if ... then Stmt . else Stmt    {# else}      shift-reduce conflict

(c) 2010 bei Prof. Dr. Uwe Kastens

--------------------------------------------------------------------------------


Lecture Programming Languages and Compilers WS 2013/14 / Slide 318

Objectives:
See a conflict in an automaton

In the lecture:
Explain

* the construction
* a solution of the conflict: The automaton can be modified such that in state 6, if an else is the next input token, it is
shifted rather than a reduction is made. In that case the ambiguity is solved such that the else part is bound to the inner
if. That is the structure required in Pascal and C. Some parser generators can be instructed to resolve conflicts in this
way.

Suggested reading:
Kastens / Übersetzerbau, Section 4.3

--------------------------------------------------------------------------------


PLaC-3.19
Decision of ambiguity
dangling else ambiguity:

Stmt          Stmt

if Cond then Stmt          if Cond then Stmt else Stmt

if Cond then Stmt
if Cond then Stmt else Stmt
desired solution for Pascal, C, C++, Java

Stmt
6  Stmt ::= if ... then Stmt .          {# else}     else
Stmt ::= if ... then Stmt . else Stmt    {# else}      shift-reduce conflict

State 6 of the automaton can be modified such that

an input token else is shifted (instead of causing a reduction);
yields the desired behaviour.

Some parser generators allow such modifications.

(c) 2010 bei Prof. Dr. Uwe Kastens

--------------------------------------------------------------------------------


Lecture Programming Languages and Compilers WS 2013/14 / Slide 319

Objectives:
Understand modification of automaton

In the lecture:
Explain why the desired effect is achieved.

--------------------------------------------------------------------------------


PLaC-3.20

Simplified LR grammar classes

LR(1):
too many states for practical use, because right-contexts distinguish many states.
Strategy: simplify right-contexts sets; fewer states; grammar classes less powerful

LALR(1):          q        r
construct LR(1) automaton,          A ::= u . v R1     A ::= u . v R1`
identify LR(1) states if their items          B ::= x . y R2     B ::= x . y R2`
differ only in their right-context sets,          C ::= z .   R3      C ::= z .   R3`
unite the sets for those items;

yields the states of the LR(0) automaton      q, r identified:
augmented by the "exact" LR(1) right-context.          A ::= u . v R1 union R1`
qr B ::= x . y  R2 union R2`
State-of-the-art parser generators          C ::= z .  R3 union R3`
accept LALR(1)

SLR(1):
LR(0) states; in reduce items          A ::= u . v
use larger right-context sets for decision:          B ::= x . y
C ::= z .     Follow(C)
[ A ::= u .  Follow (A) ]

LR(0):
all items without right-context          C ::= z .
Consequence: reduce items only in singleton sets

(c) 2013 bei Prof. Dr. Uwe Kastens

--------------------------------------------------------------------------------


Lecture Programming Languages and Compilers WS 2013/14 / Slide 320

Objectives:
Understand relations between LR classes

In the lecture:
Explain:

* LALR(1), SLR(1), LR(0) automata have the same number of states,
* compare their states,
* discuss the grammar classes for the example on slide PLaC-3.13.

Suggested reading:
Kastens / Übersetzerbau, Section 4.3

Questions:

--------------------------------------------------------------------------------


PLaC-3.21
Hierarchy of grammar classes

context-free
unambiguous

LR(k)
LL(k)          LR(1)

LALR(1)        increasing

strong LL(1) = LL(1)         SLR(1)     same

LR(0)

increasing

number of
strict inclusions          precision of right
states        context sets

(c) 2013 bei Prof. Dr. Uwe Kastens

--------------------------------------------------------------------------------


Lecture Programming Languages and Compilers WS 2013/14 / Slide 321

Objectives:
Understand the hierarchy

In the lecture:
Explain:
* the reasons for the strict inclusions,

Suggested reading:
Kastens / Übersetzerbau, Section 4.3

Questions:
* Assume that the LALR(1) construction for a given grammar yields conflicts. Classify the potential reasons using the LR
hierarchy.

--------------------------------------------------------------------------------


PLaC-3.21a
Reasons for LALR(1) conflicts

Grammar condition does not hold:
context-free

unambiguous     ambiguous          most cases

LR(k)        unbounded lookahead needed
LR(1)        fixed length lookahead > 1 needed
merge of LR(1) states          rare cases
LALR(1)        introduces conflicts

LALR(1) parser generator can not distinguish these cases.

(c) 2007 bei Prof. Dr. Uwe Kastens

--------------------------------------------------------------------------------


Lecture Programming Languages and Compilers WS 2013/14 / Slide 321a

Objectives:
Distinguish cases

In the lecture:
The cases are explained.

--------------------------------------------------------------------------------


PLaC-3.21b
LR(1) but not LALR(1)

Identification of LR(1) states causes non-disjoint right-context sets.

Artificial example:

Grammar:        LR(1) states
Z ::= S
S ::= A a
S ::= B c          Z ::= . S       {#}
S ::= b A c          S ::= . A a     {#}
S ::= b B a          S ::= . B c     {#}
A ::= d.          S ::= . b A c   {#}
B ::= d.          S ::= . b B a  {#}
A ::= . d       {a}   d  A ::= d .       {a}
B ::= . d       {c}        B ::= d .       {c}          LALR(1) state

b          identified         A ::= d .       {a, c}
S ::= b . A c  {#}          states          B ::= d .       {a, c}
S ::= b . B a  {#}
A ::= . d       {c}    d  A ::= d .       {c}
B ::= . d       {a}        B ::= d .       {a}

Avoid the distinction between A and B - at least in one of the contexts.

(c) 2010 bei Prof. Dr. Uwe Kastens

--------------------------------------------------------------------------------


Lecture Programming Languages and Compilers WS 2013/14 / Slide 321b

Objectives:
Understand source of conflicts

In the lecture:
Explain the pattern, and why identification of states causes a conflict.

--------------------------------------------------------------------------------


PLaC-3.22
Table driven implementation of LR automata

LR parser tables

terminals  t   nonterminals
sq: shift into state q
sq          sq

~  rp          rp: reduce production p

e          ~     e: error
s          ~
r          e          ~: not reachable
don't care

nonterminal table

* has no reduce entries and no error entries          B ::= u . A v   R        q
A ::= . w        First(vR)
(only shift and don't-care entries)
reason:          A
a reduction to A reaches a state from where          B ::= u A . v   R
s
a shift under A exists (by construction)

unreachable entries in terminal table:
if t is erroneus input in state r, then          r
A ::= w .        First(vR)
state s will not be reached with input t

t     error

(c) 2013 bei Prof. Dr. Uwe Kastens
states

notelement

--------------------------------------------------------------------------------


Lecture Programming Languages and Compilers WS 2013/14 / Slide 322

Objectives:
Understand properties of LR tables

In the lecture:
Explanation of

* pair of tables and their entries,
* unreachable entries,

Questions:
* Why are there no error entries in the nonterminal part?
* Why are there unreachable entries?

--------------------------------------------------------------------------------


PLaC-3.23
Implementation of LR automata

terminals     nonterminals          LR(0) reduce state:

sq          sq          ...          t
rp          C ::= u . t R        C ::= u t . R
...
e          ~
~

Compress tables:
* merge rows or columns that differ only in irrelevant entries; method: graph coloring

* extract a separate error matrix (bit matrix); increases the chances for merging

* normalize the values of rows or columns; yields smaller domain; supports merging

* eliminate LR(0) reduce states; new operation in predecessor state: shift-reduce
eliminates about 30\% of the states in practical cases
About 10-20\% of the original table sizes can be achieved!

Directly programmed LR-automata are possible - but usually too large.

(c) 2010 bei Prof. Dr. Uwe Kastens
states

--------------------------------------------------------------------------------


Lecture Programming Languages and Compilers WS 2013/14 / Slide 323

Objectives:
Implementation of LR tables

In the lecture:
Explanation of

* compression techniques, derived from general table compression,
* Singleton reduction states yield an effective optimization.

Questions:
* Why are there no error entries in the nonterminal part?
* Why are there unreachable entries?
* Why does a parser need a shift-reduce operation if the optimization of LR(0)-reduction states is applied?

--------------------------------------------------------------------------------


PLaC-3.24
Parser generators

PGS     Univ. Karlsruhe; in Eli   LALR(1), table-driven
Cola     Univ. Paderborn; in Eli  LALR(1), optional: table-driven or directly programmed
Lalr      Univ. / GMD Karlsruhe  LALR(1), table-driven
Yacc     Unix tool          LALR(1), table-driven
Bison   Gnu          LALR(1), table-driven
Llgen    Amsterdam Compiler Kit LL(1), recursive descent
Deer     Univ. Colorado, Bouder LL(1), recursive descent

Form of grammar specification:
EBNF: Cola, PGS, Lalr;   BNF: Yacc, Bison

Error recovery:
simulated continuation, automatically generated: Cola, PGS, Lalr
error productions, hand-specified:          Yacc, Bison

Actions:
statements in the implementation language
at the end of productions:          Yacc, Bison
anywhere in productions:          Cola, PGS, Lalr

Conflict resolution:
modification of states (reduce if ...)          Cola, PGS, Lalr
order of productions:          Yacc, Bison
rules for precedence and associativity:          Yacc, Bison

Implementation languages:
C: Cola, Yacc, Bison          C, Pascal, Modula-2, Ada:  PGS, Lalr

(c) 2003 bei Prof. Dr. Uwe Kastens

--------------------------------------------------------------------------------


Lecture Programming Languages and Compilers WS 2013/14 / Slide 324

Objectives:
Overview over parser generators

In the lecture:
* Explain the significance of properties

Suggested reading:
Kastens / Übersetzerbau, Section 4.5

--------------------------------------------------------------------------------


PLaC-3.25
3.5 Syntax Error Handling
General criteria

* recognize error as early as possible
LL and LR can do that:
no transitions after error position

* report the symptom in terms of the source text
rather than in terms of the state of the parser

* continue parsing short after the error position
analyze as much as possible

* avoid avalanche errors

* build a tree that has a correct structure
later phases must not break

* do not backtrack, do not undo actions,
not possible for semantic actions

* no runtime penalty for correct programs

(c) 2007 bei Prof. Dr. Uwe Kastens

--------------------------------------------------------------------------------


Lecture Programming Languages and Compilers WS 2013/14 / Slide 325

Objectives:
Accept strong requirements

In the lecture:
* The reasons for and the consequences of the requirements are discussed.
* Some of the requirements hold for error handling in general - not only that of the syntactic analysis.

--------------------------------------------------------------------------------


PLaC-3.26
Error position

Error recovery: Means that are taken by the parser after recognition of a syntactic error
in order to continue parsing

Correct prefix: The token sequence w element T* is a correct prefix in the language L(G),
if there is an u element T* such that w u element L(G);   i. e. w can be extended to a sentence in L(G).

Error position: t is the (first) error position in the input w t x , where t element T and w, x element T*,
if w is a correct prefix in L(G) and w t is not a correct prefix.

Example:        int compute (int i) { a = i * / c; return i;}

w          t

LL and LR parsers recognize an error at the error position;
they can not accept t in the current state.

(c) 2003 bei Prof. Dr. Uwe Kastens

--------------------------------------------------------------------------------


Lecture Programming Languages and Compilers WS 2013/14 / Slide 326

Objectives:
Error position from the view of the parser

In the lecture:
Explain the notions with respect to parser actions using the examples.

Questions:
Assume the programmer omitted an opening parenthesis.

* Where is the error position?
* What is the symptom the parser recognizes?

--------------------------------------------------------------------------------


PLaC-3.27
Error recovery

Continuation point:
A token d at or behind the error position t such that
parsing of the input continues at d.

Error repair
with respect to a consistent derivation          error position
- regardless the intension of the programmer!

w t x   =
Let the input be w t x with the          w y d z
error position at t and let w t x = w y d z,          w v d z
then the recovery (conceptually) deletes y and inserts v,
such that w v d is a correct prefix in L(G),          continuation
with d element T and w, y, v, z element T*.

Examples:

w    y d  z          w   yd   z          w       y d z

a = i * / c;...       a = i * / c;...        a = i * / c;...
a = i *  c;...       a = i *e/ c;...        a = i * e ;...

delete /          insert error identifier e         delete / c
and insert error id. e

(c) 2013 bei Prof. Dr. Uwe Kastens

--------------------------------------------------------------------------------


Lecture Programming Languages and Compilers WS 2013/14 / Slide 327

Objectives:
Understand error recovery

In the lecture:
Explain the notions with respect to parser actions using the examples.

Questions:
Assume the programmer omitted an opening parenthesis.
* What could be a suitable repair?

--------------------------------------------------------------------------------


PLaC-3.28
Recovery method: simulated continuation

Problem: Determine a continuation point close to the error position and reach it.
Idea: Use parse stack to determine a set D of tokens as potential continuation points.

Steps of the method:

1. Save the contents of the parse stack when an error is recognized.

2. Compute a set D reflexsubset T of tokens that may be used as continuation point (anchor set)
Let a modified parser run to completion:
Instead of reading a token from input it is inserted into D; (modification given below)

3. Find a continuation point d: Skip input tokens until a token of D is found.

4. Reach the continuation point d:          error          contin.
Restore the saved parser stack as the current stack.        pos.          point
Perform dedicated transitions until d is acceptable.          (1)
Instead of reading tokens (conceptually) insert tokens.          (5)
(2)
Thus a correct prefix is constructed.

5. Continue normal parsing.          (3)

Augment parser construction for steps 2 and 4:          (4)
For each parser state select a transition and its token,
such that the parser empties its stack and terminates as fast as possible.
This selection can be generated automatically.
The quality of the recovery can be improved by deletion/insertion of elements in D.

(c) 2013 bei Prof. Dr. Uwe Kastens

--------------------------------------------------------------------------------


Lecture Programming Languages and Compilers WS 2013/14 / Slide 328

Objectives:
Error recovery can be generated

In the lecture:
* Explain the idea and the steps of the method.
* The method yields a correct parse for any input!
* Other, less powerful methods determine sets D statically at parser construction time, e. g. semicolon and curly bracket
for errors in statements.

Questions:
* How does this method fit to the general requirements for error handling?

--------------------------------------------------------------------------------