[FrontPage] [TitleIndex] [WordIndex

Notes for Lecture 10 - Tuesday October 12th, 2004


Review of what we’ve done

                / \
               4   x
                  / \
                 3   2


Every expression language (e.g., English, Java, Lisp) needs a grammar. When you define a programming language, you must create a grammar for it.

What’s in a grammar?

Or, actually, a context free grammar, which is what computer science people usually deal with.

Here's an example of a grammar that corresponds to arithmatic expressions using integers (like the one in the tree above):

Sigma = {1,2,3,4,5,6,7,8,9,0,+,x,-,/}
Gamma = {E}
  E -> E + E
  E -> E x E
  E -> E - E
  E -> E / E
  E -> (0 ... 9)(0 ... 9)*     (recall that "*" means "0 or more")

Generating Strings and Parsing

Now, using this grammar, we can generate any arithmatic expression involving integers and the basic operations. We could generate the one we used in the example above like this:

                  / \
                 E + E
                /   / \
               4 + E x E
              /   /   /
             4 + 3 x 2

Or, more commonly, we might be given a string that we need to evaluate. Using our grammar, we can figure out how to treat each part of the string. When we try this with our example, though, we discover a problem with our grammar:

           4 + 3 x 2                       4 + 3 x 2
            \ /   /                         \   \ /
             E x E                           E + E
              \ /                             \ /  
               E                               E 

Both of these trees are valid according to our grammar and the rules of evaluating trees, but they give different answers. For the tree on the left, 4 + 3 x 2 = 7 * 2 = 14. For the tree on the right, 4 + 3 x 2 = 4 + 6 = 10. This is not good. We call a grammar that allows situations like this one to occur ambiguous. In this case, we could could make our grammar unambiguous by adding another rule about operator precedence or demand that parentheses be used.

Another Example

Here's a simple grammar for English:

  S -> NP VP
  NP -> det noun
  VP -> verb_i | verb_t NP
  NP -> properNoun | det adj* noun

where our nonterminals are
  S = start symbol (we must begin with a start symbol to get a complete phrase) 
  NP = noun phrase
  VP = verb phrase

and our terminals are represented by the words that begin with lower cases. (Each could be replaced by a specific word of its type.)
  verb_i = intransitive verb
  verb_t = transitive verb
  det  = determiner (the, a)
  adj = adjective

We can use this grammar to generate a string, by rolling dice to choose which production we use for NP (which has two possibilities) and to choose what specific word to substitute for each terminal. One possibility is below:

                            / \
                          NP   VP
                         /     / \ 
                properNoun   v_t  NP
                    /        /   / | \  
                 Jesus  saves det  a* noun 
                              /   / \   \
                             a  adj adj  screenshot
                                /    \
                           wicked   cool

So, we have generated the phrase, "Jesus saves a wicked cool screenshot."

Note that in this grammar we have a start symbol, but we did not have one in the previous grammar. This is because our English grammar has multiple nonterminals (VP, NP, S), while our integer arithmetic grammar had only one (E), so we can start anywhere. The start symbol could potentially have multiple productions, too.

Big Point

You use parse trees to interpret a language


In Lisp:

When we see an expression like this:

((lambda (x)
    (+, x, x)), 4) 

it means:

Evaluate (+, x, x) with x = 4

The basic idea here is that we start with a table that binds operations to symbols (so "+" means add, "/" means divide, etc.). When we're doing an assignment like x=4, we temporarily add to the table (so "x" means 4). (Lisp reality is slightly more complicated than this - there are actually nested lists, but that doesn't really matter for us, for now.)

Lisp's Grammar

Here's what we need:

program -> S-expression
S-expression -> number
S-expression -> symbol
S-expression -> ()
S-expression -> application
S-expression -> conditional
S-expression -> quotation
S-expression -> define
S-expression -> lambda

And we might want to have S-expression -> set if we want mutation. This grammar will give all parsable scheme programs. These will not necessarily run without error, though, just like you can say something grammatically correct that doesn't actually mean anything. (For example, Chompsky says that the sentence "Colorless green ideas sleep furiously." is correct but meaningless.)

Now, how do we deal each part of this?

S-expression -> number    
number -> -?(0...9)(0...9)*.(0...9)(0...9)*

And look! Number is equivalent to a regular expression! ("?" could be constructed from the basic regular expression operations (concatenate, or, 1 or more) if we wanted to make this look uglier.) Having regular expressions define terminals is called lexical analysis. To do lexical analysis, we use a lexical analyzer like Lex (such great names). This is done by Yacc ("Yet Another Compiler Compiler").

S-expression -> application
application -> (S-expression  S-expression*)

S-expression -> conditional
conditional -> (if S-expression_test  S-expression_result  S-expression_else)

S-expression -> quotation
quotation -> (quote S-expression)

Notes on Lisp grammar

2013-07-17 10:43