Notes for Lecture 10 - Tuesday October 12th, 2004

Return to the main FrontPage.
Scribe = Kathy King
Editor =

Contents

Notes for Lecture 10 - Tuesday October 12th, 2004

Administrivia

Next exam will be November 12 (probably posted before this date and due later)
Next problem set is to be posted on Friday

Review of what weve done

Talked about Data Structures and Abstract Data Types
- Recall that an abstract data type is removed from any language or implementation. It is simply the contract.
- A data structure is the actual implementation of an ADT in some language.
Looked at a particular language (namely Lisp)
Studied automata
- Finite State Machines (which are models of computation) and
- Regular expressions (which are models of language). Even though computer scientists talk about these as if theyre identical, its important to recall that theyre distinct.
Trees
- Expressions can be represented as trees
- Here's an example, for the expression 4 + 3 x 2 (where x means "multiply)

To interpret this tree, we interpret the left child, then the right child, and finally combine these two with the tree root's operation. The base case is "if we have a number, return it."
A tree itself is an abstract data type (we can use them in any language, without knowing the implementation).
All languages are evaluated using trees! That's why we just reviewed all of this.

Grammars

Every expression language (e.g., English, Java, Lisp) needs a grammar. When you define a programming language, you must create a grammar for it.

Whats in a grammar?

Or, actually, a context free grammar, which is what computer science people usually deal with.

An alphabet, Sigma
- The "letters" of this alphabet are the things that appear in the string we're trying to parse.
- These symbols are called terminals.
A set of symbols, Gamma
- These symbols are internal to the grammar.
- These are called nonterminals.
A set of rules (also called productions) that explain how the nonterminals and terminals may be combined

Here's an example of a grammar that corresponds to arithmatic expressions using integers (like the one in the tree above):

Sigma = {1,2,3,4,5,6,7,8,9,0,+,x,-,/}
Gamma = {E}
Rules:
  E -> E + E
  E -> E x E
  E -> E - E
  E -> E / E
  E -> (0 ... 9)(0 ... 9)*     (recall that "*" means "0 or more")

Generating Strings and Parsing

Now, using this grammar, we can generate any arithmatic expression involving integers and the basic operations. We could generate the one we used in the example above like this:

Or, more commonly, we might be given a string that we need to evaluate. Using our grammar, we can figure out how to treat each part of the string. When we try this with our example, though, we discover a problem with our grammar:

           4 + 3 x 2                       4 + 3 x 2
            \ /   /                         \   \ /
             E x E                           E + E
              \ /                             \ /  
               E                               E

Both of these trees are valid according to our grammar and the rules of evaluating trees, but they give different answers. For the tree on the left, 4 + 3 x 2 = 7 * 2 = 14. For the tree on the right, 4 + 3 x 2 = 4 + 6 = 10. This is not good. We call a grammar that allows situations like this one to occur ambiguous. In this case, we could could make our grammar unambiguous by adding another rule about operator precedence or demand that parentheses be used.

Another Example

Here's a simple grammar for English:

  S -> NP VP
  NP -> det noun
  VP -> verb_i | verb_t NP
  NP -> properNoun | det adj* noun

where our nonterminals are
  S = start symbol (we must begin with a start symbol to get a complete phrase) 
  NP = noun phrase
  VP = verb phrase

and our terminals are represented by the words that begin with lower cases. (Each could be replaced by a specific word of its type.)
  verb_i = intransitive verb
  verb_t = transitive verb
  det  = determiner (the, a)
  adj = adjective

We can use this grammar to generate a string, by rolling dice to choose which production we use for NP (which has two possibilities) and to choose what specific word to substitute for each terminal. One possibility is below:

                             S
                            / \
                          NP   VP
                         /     / \ 
                properNoun   v_t  NP
                    /        /   / | \  
                 Jesus  saves det  a* noun 
                              /   / \   \
                             a  adj adj  screenshot
                                /    \
                           wicked   cool

So, we have generated the phrase, "Jesus saves a wicked cool screenshot."

Note that in this grammar we have a start symbol, but we did not have one in the previous grammar. This is because our English grammar has multiple nonterminals (VP, NP, S), while our integer arithmetic grammar had only one (E), so we can start anywhere. The start symbol could potentially have multiple productions, too.

Big Point

You use parse trees to interpret a language

Lisp

In Lisp:

When we see an expression like this:

((lambda (x)
    (+, x, x)), 4) 

it means:

Evaluate (+, x, x) with x = 4

The basic idea here is that we start with a table that binds operations to symbols (so "+" means add, "/" means divide, etc.). When we're doing an assignment like x=4, we temporarily add to the table (so "x" means 4). (Lisp reality is slightly more complicated than this - there are actually nested lists, but that doesn't really matter for us, for now.)

Lisp's Grammar

Here's what we need:

program -> S-expression
S-expression -> number
S-expression -> symbol
S-expression -> ()
S-expression -> application
S-expression -> conditional
S-expression -> quotation
S-expression -> define
S-expression -> lambda

And we might want to have S-expression -> set if we want mutation. This grammar will give all parsable scheme programs. These will not necessarily run without error, though, just like you can say something grammatically correct that doesn't actually mean anything. (For example, Chompsky says that the sentence "Colorless green ideas sleep furiously." is correct but meaningless.)

Now, how do we deal each part of this?

Number
- Want to return the number itself

S-expression -> number    
number -> -?(0...9)(0...9)*.(0...9)(0...9)*

And look! Number is equivalent to a regular expression! ("?" could be constructed from the basic regular expression operations (concatenate, or, 1 or more) if we wanted to make this look uglier.) Having regular expressions define terminals is called lexical analysis. To do lexical analysis, we use a lexical analyzer like Lex (such great names). This is done by Yacc ("Yet Another Compiler Compiler").

Symbol
- Look up the symbol in the table of symbols (discussed above) for the environment. Then return whatever the symbol is bound to.
- We could write a regular expression for symbol.
Empty Set
- Return the empty list
Application
- First, evaluate all of the sub-S-expressions. Then apply the value of the first one to all of the rest.

S-expression -> application
application -> (S-expression  S-expression*)

Conditional
- First, evaluate S-expression_test. Then, depending on the result, evaluate S-expression_result or S-expression_else.
- Note that in Lisp, you must define either if or cond; you can build the other out of whichever you choose.

S-expression -> conditional
conditional -> (if S-expression_test  S-expression_result  S-expression_else)

Quotation
- Return the S-expression without evaluating

S-expression -> quotation
quotation -> (quote S-expression)

Notes on Lisp grammar

We need to be careful, because right now the grammar we've written is ambiguous; conditional and application could look the same. To fix this, we need a rule to say "if it's a conditional, don't check whether it's an application."
The grammar productions are directly converted to rules. By associating grammar productions with rules, you can build an interpreter.
We still need to create rules of define, lambda, and possibly set!. (NB: define is important only for recursive procedures and requires set! or some other additional machinery.)
This interpreter is commonly called eval. We also need to define apply, which does important work.
Once we have all of these things, we have an interpreter that can handle all of Lisp, because everything else can built out of these.
So now we better understand why Lisp is cool. Because
1. You can use Lisp to write a grammar for Lisp.
2. The grammars of Java, Python, and most other languages have at least several pages of grammar rules. Lisp is so simple and elegant!