GSS-2.1 2. Constructing Trees - Overview Check the notation and the structure of the input and represent it as a tree. Input processing Scanning Tasks: Symbol coding Parsing Conversion Tree construction Phases: Lexical Syntactic analysis analysis Interfaces: Source text Symbol sequence Structure tree Input Customer representation: (addr: Address; Fields account: int; ) Field Field FieldName FieldName [1, 1] Ident: 12 TypeName TypeName [2, 3] open [2, 4] Ident: 13 [2, 8] colon [2,10] Ident: 14 (c) 2013 bei Prof. Dr. Uwe Kastens -------------------------------------------------------------------------------- Lecture Generating Software from Specifications WS 2013/14 / Slide 201 Objectives: Understand the structuring phase In the lecture: * Remember the tasks of GSS-1.15. * Explain the tasks and representations. -------------------------------------------------------------------------------- GSS-2.2 Eli: Specification of the Tree Construction Symbol specification Scanner G. GLA Scanner Concrete syntax Parser G. PGS, Cola Parser Mapping concr - abstr Synt. Map tool Tree construction Abstract syntax Attr.eval.-G. Liga Attrib.evaluator Specification Generator Target text (c) 2013 bei Prof. Dr. Uwe Kastens -------------------------------------------------------------------------------- Lecture Generating Software from Specifications WS 2013/14 / Slide 202 Objectives: Understand how the structuring phase is generated In the lecture: Explain * Roles of the specifications, * tasks of the generators, * cooperation between the generators. -------------------------------------------------------------------------------- GSS-2.3 Specifications for the Structure Generator Symbol Ident: PASCAL_IDENTIFIER specifications FileName: C_STRING_LIT Notations of non-literal tokens C_COMMENT .gla Concrete Descriptions:(Import / Structure)*. syntax Structure: StructureName quotesingle (quotesingle Fields quotesingle )quotesingle . Structure of input, Fields: Field*. literal tokens Field: FieldName quotesingle :quotesingle TypeName. .con ... Mapping is empty if concret and abstract syntax coincide concr - abstr Synt .map RULE: Descriptions LISTOF Import|Structure COMPUTE ... Abstract syntax SYMBOL FieldName COMPUTE ... SYMBOL TypeName COMPUTE ... Structure of trees Only those symbols and productions, which need .lido computations (c) 2013 bei Prof. Dr. Uwe Kastens -------------------------------------------------------------------------------- Lecture Generating Software from Specifications WS 2013/14 / Slide 203 Objectives: A simple example In the lecture: Get an idea of the specifications -------------------------------------------------------------------------------- GSS-2.4 Calendar Example: Structuring Task A new example for the specification of the structuring task up to tree construction: Input language: Sequence of calendar entries: 1.11. 20:00 "Theater" Thu 14:15 "GSS lecture" Weekday 12:05 "Dinner in Palmengarten" Mon, Thu 8:00 "Deanquotesingle s office" 31.12. 23:59 "Jahresende" 12/31 23:59 "End of year" (c) 2010 bei Prof. Dr. Uwe Kastens -------------------------------------------------------------------------------- Lecture Generating Software from Specifications WS 2013/14 / Slide 204 Objectives: Introduce a new example In the lecture: Explain the task using the examples -------------------------------------------------------------------------------- GSS-2.4a Design of a Concrete Syntax 1. Develop a set of examples, such that all aspects of the intended language are covered. 2. Develop a context-free grammar using a top-down strategy (see PLaC-3.4aa), and update the set of examples correspondingly. 3. Apply the design rules of PLaC-3.4c - 3.4f: - Syntactic structure should reflect semantic structure - Syntactic restrictions versus semantic conditions - Eliminate ambiguities - Avoid unbounded lookahead 4. Design notations of non-literal tokens. (c) 2012 bei Prof. Dr. Uwe Kastens -------------------------------------------------------------------------------- Lecture Generating Software from Specifications WS 2013/14 / Slide 204a Objectives: Issues of grammar design In the lecture: * The strategy is explained. * Repeat the methods learned in PLaC Sect. 3.2 -------------------------------------------------------------------------------- GSS-2.5 Concrete Syntax specifies the structure of the input by a context-free grammar: Calendar: Entry+ . Entry: Date Event. Notation: Date: DayNum quotesingle .quotesingle MonNum quotesingle .quotesingle / * Sequence of productions MonNum quotesingle /quotesingle DayNum / DayNames / GeneralPattern. * literal terminals between quotesingle DayNum: Integer. * EBNF constructs: MonNum: Integer. / alternative DayNames: DayName / ( ) parentheses DayNames quotesingle ,quotesingle DayName. [ ] option DayName: Day. +, * repetition GeneralPattern: SimplePattern / // repetition with SimplePattern Modifier. SimplePattern: quotesingle Weekdayquotesingle / quotesingle Weekendquotesingle . separator Modifier: quotesingle +quotesingle DayNames / quotesingle -quotesingle DayNames. Event: When Description / Description. (for meaning see GPS) When: Time / Time quotesingle -quotesingle Time. Example: 1.11. 20:00 "Theater" Thu 14:15 "GSS lecture" Weekday 12:05 "Dinner in Palmengarten" Mon, Thu 8:00 "Deanquotesingle s office" 31.12. 23:59 "Jahresende" 12/31 23:59 "End of year" (c) 2013 bei Prof. Dr. Uwe Kastens -------------------------------------------------------------------------------- Lecture Generating Software from Specifications WS 2013/14 / Slide 205 Objectives: Learn the CFG notation In the lecture: * Design of productions, * notation of productions, * relate to example input. -------------------------------------------------------------------------------- GSS-2.6 Literal and Non-Literal Terminals Calendar: Entry+ . Definition of notations of Entry: Date Event. * literal terminals (unnamed): Date: DayNum quotesingle .quotesingle MonNum quotesingle .quotesingle / MonNum quotesingle /quotesingle DayNum / in the concrete syntax DayNames / GeneralPattern. * non-literal terminals DayNum: Integer. (named): MonNum: Integer. in an additional DayNames: DayName / specification for the DayNames quotesingle ,quotesingle DayName. DayName: Day. scanner generator GeneralPattern: SimplePattern / SimplePattern Modifier. SimplePattern: quotesingle Weekdayquotesingle / quotesingle Weekendquotesingle . Modifier: quotesingle +quotesingle DayNames / quotesingle -quotesingle DayNames. Event: When Description / Description. When: Time / Time quotesingle -quotesingle Time. (c) 2012 bei Prof. Dr. Uwe Kastens -------------------------------------------------------------------------------- Lecture Generating Software from Specifications WS 2013/14 / Slide 206 Objectives: Classification of terminals In the lecture: Notation of terminals specified in different ways -------------------------------------------------------------------------------- GSS-2.7 Specification of Non-Literal Terminals The generator GLA generates a scanner from * notations of literal terminals, extracted from the concrete syntax by Eli * specifications of non-literal terminals in files of type.gla Form of specifications: Name: $ regular expression [Coding function] Day: $ Mon|Tue|Wed|Thu|Fri|Sat|Son [mkDay] Time: $(([0-9]|1[0-9]|2[0-3]):[0-5][0-9]) [mkTime] Canned specifications: Description: C_STRING_LIT Integer: PASCAL_INTEGER (c) 2013 bei Prof. Dr. Uwe Kastens -------------------------------------------------------------------------------- Lecture Generating Software from Specifications WS 2013/14 / Slide 207 Objectives: Understand scanner specifications In the lecture: Explain * Notation of regular expressions, * Task and interface of coding function, * canned specifications. -------------------------------------------------------------------------------- GSS-2.8 Scanner Specification: Regular Expressions Notation accepted character sequences c the character c; except characters that have special meaning, see \c \c space, tab, newline, \".[]^()|?+*{}/$< "s" the character sequence s . any single character except newline [xyz] exactly one character of the set {x, y, z} [^xyz] exactly one character that is not in the set {x, y, z} [c-d] exactly one character, the ASCII code of which lies between c and d (incl.) (e) character sequence as specified by e ef character sequences as specified by e followed by f e | f character sequence as specified by e or by f e? character sequence as specified by e or empty sequence e+ one or more character sequences as specified by e e* character sequence as specified by e+ or empty e {m,n} at least m, and at most n character sequences as specified by e e and f are regular expressions as defined here. Each regular expression accepts the longest character sequence, that obeys its definition. Solving ambiguities: 1. the longer accepted sequence 2. equal length: the earlier stated rule (c) 2013 bei Prof. Dr. Uwe Kastens -------------------------------------------------------------------------------- Lecture Generating Software from Specifications WS 2013/14 / Slide 208 Objectives: Notation of regular expressions In the lecture: Explain how to apply the definintions -------------------------------------------------------------------------------- GSS-2.9 Scanner Specification: Programmed Scanner There are situations where the to be accepted character sequences are very difficult to define by a regular expression. A function may be implemented to accept such sequences. The begin of the squence is specified by a regular expression, followed by the name of the function, that will accept the remainder. For example, line comments of Ada: $-- (auxEOL) Parameters of the function: a pointer to the first character of the so far accepted sequence, and its length. Function result: a pointer to the charater immediately following the complete sequence: char *Name(char *start, int length) Some of the available programmed scanners: auxEOL all characters up to and including the next newline auxCString a C string literal after the opening " auxM3Comment a Modula 3 comment after the opening (*, up to and including the closing *); may contain nested comments paranthesized by (* and *) Ctext C compound statements after the opening {, up to the closing }; may contain nested statements parenthesized by { and } (c) 2013 bei Prof. Dr. Uwe Kastens -------------------------------------------------------------------------------- Lecture Generating Software from Specifications WS 2013/14 / Slide 209 Objectives: Recognize useful applications In the lecture: * Explain the principle and examples, * refer to the list of available functions in the documentation. -------------------------------------------------------------------------------- GSS-2.10 Scanner Specification: Coding Functions The accepted character sequence (start, length) is passed to a coding function. It computes the code of the accepted token (intrinsic) i.e. an integral number, representing the identity of the token. For that purpose the function may store and/or convert the character sequence, if necessary. All coding functions have the same signature: void Name (char *start, int length, int *class, int *intrinsic) The token class (terminal code, parameter class) may be changed by the function call, if necessary, e.g. to distinguish keywords from identifiers. Available coding functions: mkidn enter character sequence into a hash table and encode it bijectively mkstr store character sequence, return a new code c_mkstr C string literal, converted into its value, stored, and given a new code mkint convert a sequences of digits into an integral value and return it value c_mkint convert a literal for an integral number in C and return its value (c) 2013 bei Prof. Dr. Uwe Kastens -------------------------------------------------------------------------------- Lecture Generating Software from Specifications WS 2013/14 / Slide 210 Objectives: Recognize the principle and useful applications In the lecture: * Explain the interface and examples * refer to the list of available functions in the documentation -------------------------------------------------------------------------------- GSS-2.11 Scanner Specification: Canned Specifications Complete canned specifications (regular expression, a programmed scanner, and a coding function) can be instantiated by their names: Identifier: C_IDENTIFIER For many tokens of several programming languages canned specifications are available (complete list of descriptions in the documentation): C_IDENTIFIER, C_INTEGER, C_INT_DENOTATION, C_FLOAT, C_STRING_LIT, C_CHAR_CONSTANT, C_COMMENT PASCAL_IDENTIFIER, PASCAL_INTEGER, PASCAL_REAL, PASCAL_STRING, PASCAL_COMMENT MODULA2_INTEGER, MODULA2_CHARINT, MODULA2_LITERALDQ, MODULA2_LITERALSQ, MODULA2_COMMENT MODULA3_COMMENT, ADA_IDENTIFIER, ADA_COMMENT, AWK_COMMENT SPACES, TAB, NEW_LINE are only used, if some token begins with one of these characters, but, if these characters still separate tokens. The used coding functions may be overridden. (c) 2013 bei Prof. Dr. Uwe Kastens -------------------------------------------------------------------------------- Lecture Generating Software from Specifications WS 2013/14 / Slide 211 Objectives: Recognize the potential for reuse In the lecture: * Explain some of the specifications, * refer to the documentation -------------------------------------------------------------------------------- GSS-2.12 Abstract Syntax specifies the structure trees using a context-free grammar: RULE pCalendar: Calendar LISTOF Entry END; RULE pEntry: Entry ::= Date Event END; RULE pDateNum: Date ::= DayNum MonNum END; RULE pDatePattern: Date ::= Pattern END; RULE pDateDays: Date ::= DayNames END; RULE pDayNum: DayNum ::= Integer END; RULE pMonth: MonNum ::= Integer END; RULE pDayNames: DayNames LISTOF DayName END; RULE pDay: DayName ::= Day END; RULE pWeekday: Pattern ::= quotesingle Weekdayquotesingle END; RULE pWeekend: Pattern ::= quotesingle Weekendquotesingle END; RULE pModifier: Pattern ::= Pattern Modifier END; RULE pPlus: Modifier ::= quotesingle +quotesingle DayNames END; RULE pMinus: Modifier ::= quotesingle -quotesingle DayNames END; RULE pTimedEvent: Event ::= When Description END; RULE pUntimedEvent: Event ::= Description END; RULE pTime: When ::= Time END; RULE pTimeRange: When ::= Time quotesingle -quotesingle Time END; Notation: * Language Lido for computations in structure trees * optionally named productions, * no EBNF, except LISTOF (possibly empty sequence) (c) 2013 bei Prof. Dr. Uwe Kastens -------------------------------------------------------------------------------- Lecture Generating Software from Specifications WS 2013/14 / Slide 212 Objectives: Learn the notation for abstract syntax In the lecture: * Design of productions, * notation of productions -------------------------------------------------------------------------------- GSS-2.13 Example for a Structure Tree * Production names are node types Tree output produced by Eli's * Values of terminals at leaves unparser generator pEntry( pDateNum(pDayNum(1),pMonth(11)), pTimedEvent(pTime(1200),"Theater")), pEntry( pDateDays(pDay(4)),pTimedEvent(pTime(855),"GSS lecture")), pEntry( pDatePattern(pWeekday()), pTimedEvent(pTime(725),"Dinner in Palmengarten")), pEntry( pDateDays(pDay(1),pDay(4)),pUntimedEvent("Deanquotesingle s office")), pEntry( pDateNum(pDayNum(31),pMonth(12)), pTimedEvent(pTime(1439),"Jahresende")), pEntry( pDateNum(pDayNum(31),pMonth(12)), pTimedEvent(pTime(1439),"End of year")) (c) 2013 bei Prof. Dr. Uwe Kastens -------------------------------------------------------------------------------- Lecture Generating Software from Specifications WS 2013/14 / Slide 213 Objectives: Read tree in notation of named parenthesis In the lecture: * Relate to example input, * relate to abstract syntax. -------------------------------------------------------------------------------- GSS-2.14 Graphic Structure Tree * Names of productions as node types Output produced by * Values of terminals at leaves Eli`s unparser generator, Tree structure given by parentheses pCalendar ( pEntry ) ( pDateNum , pTimedEvent ) ( pDayNum, pMonth ) ( pTime Description ) ( ) "Theater" Integer (Integer ) 1 11 ( Time ) 1200 (c) 2013 bei Prof. Dr. Uwe Kastens -------------------------------------------------------------------------------- Lecture Generating Software from Specifications WS 2013/14 / Slide 214 Objectives: Understand the tree representation In the lecture: Understand the relation between the abstract syntax (tree grammar) and the textual representation -------------------------------------------------------------------------------- GSS-2.15 Symbol Mapping: Concrete - Abstract Syntax concrete syntax: SimplePattern: quotesingle Weekdayquotesingle / quotesingle Weekendquotesingle . simplify to create GeneralPattern: SimplePattern / abstract syntax: SimplePattern Modifier. Set of nonterminals of the mapping: concrete syntax mapped to MAPSYM Pattern ::= GeneralPattern one nonterminal of the SimplePattern. abstract syntax abstract syntax: RULE pWeekday: Pattern ::= quotesingle Weekdayquotesingle END; RULE pWeekend: Pattern ::= quotesingle Weekendquotesingle END; RULE pModifier: Pattern ::= Pattern Modifier END; (c) 2013 bei Prof. Dr. Uwe Kastens -------------------------------------------------------------------------------- Lecture Generating Software from Specifications WS 2013/14 / Slide 215 Objectives: Simplification of the structure tree In the lecture: * Explain symbol mapping, * cf. symbol mapping for expression grammars in (GPS-2-9) -------------------------------------------------------------------------------- GSS-2.16 Rule Mapping Concrete Syntax: Date: DayNum quotesingle .quotesingle MonNum quotesingle .quotesingle / MonNum quotesingle /quotesingle DayNum . Different productions of the concrete syntax Mapping: are unified in the MAPRULE abstract syntax Date: DayNum quotesingle .quotesingle MonNum quotesingle .quotesingle < $1 $2 >. Date: MonNum quotesingle /quotesingle DayNum < $2 $1 >. Abstract syntax: RULE pDateNum: Date ::= DayNum MonNum END; (c) 2013 bei Prof. Dr. Uwe Kastens -------------------------------------------------------------------------------- Lecture Generating Software from Specifications WS 2013/14 / Slide 216 Objectives: Tree simplification In the lecture: * Explain rule mapping, * cf. simplification of expression grammars (GPS-2-9), * abstract sytax can be genrated from concrete syntax and mapping specification, * concrete syntax can be generated from abstract syntax and mapping specification, * Abstract and concrete syntax can be matched, yielding the mapping specification. * The grammars can be matched piecewise. -------------------------------------------------------------------------------- GSS-2.17 Generate Tree Output Produce structure trees with node types and values at terminal leaves: pEntry( pDateNum(pDayNum(1),pMonth(11)), pTimedEvent(pTime(1200),"Theater")), Pattern constructor functions are called in tree contexts to produce output. Specifications are created automatically by Eli's unparser generator: Unparser is generated from the specification: Output of non-literal terminals: Calendar.fw Idem_Day: $ int Calendar.fw:tree Idem_Time: $ int Idem_Integer: $ int Output at grammar root: SYMBOL ROOTCLASS COMPUTE Use predefined PTG patterns: BP_Out(THIS.IdemPtg); END; $/Output/PtgCommon.fw (c) 2013 bei Prof. Dr. Uwe Kastens -------------------------------------------------------------------------------- Lecture Generating Software from Specifications WS 2013/14 / Slide 217 Objectives: Learn to use the unparser generator In the lecture: Explain the roles of the specification * Unparser generator generates Eli specifications (ptg and lido)! * Individual specifications needed for the root and the leaves only. * Another variant of the unparser generator can reproduce the input text: instead of ":tree" derive ":idem". It may be used for language extensions. --------------------------------------------------------------------------------