Programming Languages: Application and Interpretation by Shriram Krishnamurthi - HTML preview

PLEASE NOTE: This is an HTML preview only and some elements such as links or page numbers may be incorrect.
Download the book in PDF, ePub, Kindle for a complete version.

fun-defs)]

[id (v) (error ’interp ”free identifier”)]

[app (fun-name arg-expr)

(local ([define the-fun-def (lookup-fundef fun-name fun-defs)])

(interp (subst (fundef-body the-fun-def )

(fundef-arg-name the-fun-def )

(num (interp arg-expr fun-defs)))

fun-defs))]))

Figure 4.2: Implementation of Functions: Interpreter

Chapter 5

Deferring Substitution

Let’s examine the process of interpreting the following small program. Consider the following sequence of

evaluation steps:

{with {x 3}

{with {y 4}

{with {z 5}

{+ x {+ y z}}}}}

= {with {y 4}

{with {z 5}

{+ 3 {+ y z}}}}

= {with {z 5}

{+ 3 {+ 4 z}}}

= {+ 3 {+ 4 5}}

at which point it reduces to an arithmetic problem. To reduce it, though, the interpreter had to apply substi-

tution three times: once for each with. This is slow! How slow? Well, if the program has size n (measured

in abstract syntax tree nodes), then each substitution sweeps the rest of the program once, making the com-

plexity of this interpreter at least O(n2). That seems rather wasteful; surely we can do better.

How do we avoid this computational redundancy? We should create and use a repository of deferred

substitutions. Concretely, here’s the idea. Initially, we have no substitutions to perform, so the repository

is empty. Every time we encounter a substitution (in the form of a with or application), we augment the

repository with one more entry, recording the identifier’s name and the value (if eager) or expression (if

lazy) it should eventually be substituted with. We continue to evaluate without actually performing the

substitution.

This strategy breaks a key invariant we had established earlier, which is that any identifier the interpreter

encounters is of necessity free, for had it been bound, it would have been replaced by substitution. Because

we’re no longer using substitution, we will encounter bound identifiers during interpretation. How do we

handle them? We must substitute them with by consulting the repository.

Exercise 5.0.2 Can the complexity of substitution be worse than O(n2)?

33

34

CHAPTER 5. DEFERRING SUBSTITUTION

5.1

The Substitution Repository

Let’s provide a data definition for the repository:

(define-type DefrdSub

[mtSub]

[aSub (name symbol?) (value number?) (ds DefrdSub?)])

where DefrdSub stands for “deferred substitutions”. A DefrdSub has two forms: it’s either empty (mtSub1)

or non-empty (represented by an aSub structure). The latter contains a reference to the rest of the repository

in its third field.

The interpreter obviously needs to consume a repository in addition to the expression to interpret. There-

fore, its contract becomes

;; interp : F1WAE listof(fundef) DefrdSub → number

It will need a helper function that looks up the value of identifiers in the repository. Its code is:

;; lookup : symbol DefrdSub → F1WAE

(define (lookup name ds)

(type-case DefrdSub ds

[mtSub () (error ’lookup ”no binding for identifier”)]

[aSub (bound-name bound-value rest-ds)

(if (symbol=? bound-name name)

bound-value

(lookup name rest-ds))]))

With that introduction, we can now present the interpreter:

(define (interp expr fun-defs ds)

(type-case F1WAE expr

[num (n) n]

[add (l r) (+ (interp l fun-defs ds) (interp r fun-defs ds))]

[with (bound-id named-expr bound-body)

(interp bound-body

fun-defs

(aSub bound-id

(interp named-expr

fun-defs

ds)

ds))]

[id (v) (lookup v ds)]

[app (fun-name arg-expr)

(local ([define the-fun-def (lookup-fundef fun-name fun-defs)])

1“Empty sub”—get it?

5.2. DEFERRING SUBSTITUTION CORRECTLY

35

(interp (fundef-body the-fun-def )

fun-defs

(aSub (fundef-arg-name the-fun-def )

(interp arg-expr fun-defs ds)

ds)))]))

Three clauses have changed: those for with, identifiers and applications. Applications must look up the

value of an identifier in the repository. The rule for with evaluates the body in a repository that extends

the current one (ds) with a binding for the with-bound identifier to its interpreted value. The rule for

an application similarly evaluates the body of the function with the repository extended with the formal

argument bound to the value of the actual argument.

To make sure this is correct, we recommend that you first study its behavior on programs that have no

identifiers—i.e., verify that the arithmetic rules do the right thing—and only then proceed to the rules that

involve identifiers.

5.2

Deferring Substitution Correctly

Consider the evaluation of the expression

{with {n 5}

{f 10}}

in the following list of function definitions:

(list (fundef ’f ’p (id ’n)))

That is, f consumes an argument p and returns the value bound to n. This corresponds to the Scheme

definition

(define (f p) n)

followed by the application

(local ([define n 5])

(f 10))

What result does Scheme produce?

Our interpreter produces the value 5. Is this the correct answer? Well, it’s certainly possible that this is

correct—after all, it’s what the interpreter returns, and this could well be the interpreter for some language.

But we do have a better way of answering this question.

Recall that the interpreter was using the repository to conduct substitution more efficiently. We hope that

that’s all it does—that is, it must not also change the meaning of a program! Our “reference implementation”

is the one that performs explicit substitution. If we want to know what the value of the program really “is”,

we need to return to that implementation.

What does the substitution-based interpreter return for this program? It says the program has an unbound

identifier (specifically, n). So we have to regard our interpreter with deferred substitutions as buggy.

36

CHAPTER 5. DEFERRING SUBSTITUTION

While this new interpreter is clearly buggy relative to substitution, which it was supposed to represent,

let’s think for a moment about what we, as the human programmer, would want this program to evaluate to.

It produces the value 5 because the identifier n gets the value it was bound to by the with expression, that

is, from the scope in which the function f is used. Is this a reasonable way for the language to behave? A

priori, is one interpretation better than the other? Before we tackle that, let’s introduce some terminology to

make it easier to refer to these two behaviors.

Definition 10 (Static Scope) In a language with static scope, the scope of an identifier’s binding is a syn-

tactically delimited region.

A typical region would be the body of a function or other binding construct. In contrast:

Definition 11 (Dynamic Scope) In a language with dynamic scope, the scope of an identifier’s binding is

the entire remainder of the execution during which that binding is in effect.

That is, in a language with dynamic scope, if a function g binds identifier n and then invokes f, then f can

refer to n—and so can every other function invoked by g until it completes its execution—even though f

has no locally visible binding for n.

Armed with this terminology, we claim that dynamic scope is entirely unreasonable. The problem is

that we simply cannot determine what the value of a program will be without knowing everything about its

execution history. If the function f were invoked by some other sequence of functions that did not bind a

value for n, then that particular application of f would result in an error, even though a previous application

of f in the very same program’s execution completed successfully! In other words, simply by looking at the

source text of f, it would be impossible to determine one of the most rudimentary properties of a program:

whether or not a given identifier was bound. You can only imagine the mayhem this would cause in a large

software system, especially with multiple developers and complex flows of control. We will therefore regard

dynamic scope as an error and reject its use in the remainder of this text.

5.3

Fixing the Interpreter

Let’s return to our interpreter. Our choice of static over dynamic scope has the benefit of confirming that

the substituting interpreter did the right thing, so all we need do is make the new interpeter be a correct

reimplementation of it. We only need to focus our attention on one rule, that for function application. This

currently reads:

[app (fun-name arg-expr)

(local ([define the-fun-def (lookup-fundef fun-name fun-defs)])

(interp (fundef-body the-fun-def )

fun-defs

(aSub (fundef-arg-name the-fun-def )

(interp arg-expr fun-defs ds)

ds)))]

5.3. FIXING THE INTERPRETER

37

When the interpreter evaluates the body of the function definition, what deferred substitutions does it rec-

ognize? It recognizes all those already are in ds, with one more for the function’s formal parameter to

be replaced with the value of the actual parameter. But how many substitutions should be in effect in the

function’s body? In our substitution-based interpreter, we implemented application as follows:

[app (fun-name arg-expr)

(local ([define the-fun-def (lookup-fundef fun-name fun-defs)])

(interp (subst (fundef-body the-fun-def )

(fundef-arg-name the-fun-def )

(num (interp arg-expr fun-defs)))

fun-defs))]

This performs only one substitution on the function’s body: the formal parameter for its value. The code

demonstrates that none of the substitutions applied to the calling function are in effect in the body of the

called function. Therefore, at the point of invoking a function, our new interpeter must “forget” about the

current substitutions. Put differently, at the beginning of every function’s body, there is only one bound

identifier—the function’s formal parameter—independent of the invoking context.

How do we fix our implementation? We clearly need to create a substitution for the formal parameter

(which we obtain using the expression (fundef-arg-name the-fun-def )). But the remaining substitutions must

be empty, so as to not pick up the bindings of the calling context. Thus,

[app (fun-name arg-expr)

(local ([define the-fun-def (lookup-fundef fun-name fun-defs)])

(interp (fundef-body the-fun-def )

fun-defs

(aSub (fundef-arg-name the-fun-def )

(interp arg-expr fun-defs ds)

(mtSub) )))]

That is, we use the empty repository to initiate evaluation of a function’s body, extending it immediately with

the formal parameter but no more. The difference between using ds and (mtSub) in the position of the box

succintly captures the implementation distinction between dynamic and static scope, respectively—though

the consequences of that distinction are far more profound that this small code change might suggest.

Exercise 5.3.1 How come we never seem to “undo” additions to the repository? Doesn’t this run the risk

that one substitution might override another in a way that destroys static scoping?

Exercise 5.3.2 Why is the last ds in the interpretation of with also not replaced with (mtSub)? What

would happen if we were to effect this replacement? Write a program that illustrates the difference, and

argue whether the replacement is sound or not.

Exercise 5.3.3 Our implementation of lookup can take time linear in the size of the program to find some

identifiers. Therefore, it’s not clear we have really solved the time-complexity problem that motivated the

use of a substitution repository. We could address this by using a better data structure and algorithm for

lookup: a hash table, say. What changes do we need to make if we use a hash table?

Hint: This is tied closely to Exercise 5.3.1!

38

CHAPTER 5. DEFERRING SUBSTITUTION

(define-type F1WAE

[num (n number?)]

[add (lhs F1WAE?) (rhs F1WAE?)]

[with (name symbol?) (named-expr F1WAE?) (body F1WAE?)]

[id (name symbol?)]

[app (fun-name symbol?) (arg F1WAE?)])

(define-type FunDef

[fundef (fun-name symbol?)

(arg-name symbol?)

(body F1WAE?)])

(define-type DefrdSub

[mtSub]

[aSub (name symbol?) (value number?) (ds DefrdSub?)])

;; lookup-fundef : symbol listof(FunDef) −→ FunDef

(define (lookup-fundef fun-name fundefs)

(cond

[(empty? fundefs) (error fun-name ”function not found”)]

[else (if (symbol=? fun-name (fundef-fun-name (first fundefs)))

(first fundefs)

(lookup-fundef fun-name (rest fundefs)))]))

;; lookup : symbol DefrdSub → F1WAE

(define (lookup name ds)

(type-case DefrdSub ds

[mtSub () (error ’lookup ”no binding for identifier”)]

[aSub (bound-name bound-value rest-ds)

(if (symbol=? bound-name name)

bound-value

(lookup name rest-ds))]))

Figure 5.1: Functions with Deferred Substitutions: Support Code

5.3. FIXING THE INTERPRETER

39

;; interp : F1WAE listof(fundef) DefrdSub → number

(define (interp expr fun-defs ds)

(type-case F1WAE expr

[num (n) n]

[add (l r) (+ (interp l fun-defs ds) (interp r fun-defs ds))]

[with (bound-id named-expr bound-body)

(interp bound-body

fun-defs

(aSub bound-id

(interp named-expr

fun-defs

ds)

ds))]

[id (v) (lookup v ds)]

[app (fun-name arg-expr)

(local ([define the-fun-def (lookup-fundef fun-name fun-defs)])

(interp (fundef-body the-fun-def )

fun-defs

(aSub (fundef-arg-name the-fun-def )

(interp arg-expr fun-defs ds)

(mtSub))))]))

Figure 5.2: Functions with Deferred Substitutions: Interpreter

40

CHAPTER 5. DEFERRING SUBSTITUTION

Chapter 6

First-Class Functions

We began Section 4 by observing the similarity between a with expression and a function definition applied

immediately to a value. Specifically, we observed that

{with {x 5} {+ x 3}}

is essentially the same as

f (x) = x + 3; f (5)

Actually, that’s not quite right: in the math equation above, we give the function a name, f , whereas there is

no identifier named f anywhere in the WAE program. We can, however, rewrite the mathematical formula-

tion as

f = λ (x).x + 3; f (5)

which we can then rewrite as

(λ (x)x + 3)(5)

to get rid of the unnecessary name ( f ).

Notice, however, that our langugae F1WAE does not permit anonymous functions of the style we’ve

used above. Because such functions are useful in their own right, we now extend our study of functions.

6.1

A Taxonomy of Functions

The translation of with into mathematical notation exploits two features of functions: the ability to create

anonymous functions, and the ability to define functions anywhere in the program (in this case, in the func-

tion position of an application). Not every programming language offers one or both of these capabilities.

There is, therefore, a taxonomy that governs these different features, which we can use when discussing

what kind of functions a language provides:

first-order Functions are not values in the language. They can only be defined in a designated portion of

the program, where they must be given names for use in the remainder of the program. The functions

in F1WAE are of this nature, which explains the 1 in the name of the language.

41

42

CHAPTER 6. FIRST-CLASS FUNCTIONS

higher-order Functions can return other functions as values.

first-class Functions are values with all the rights of other values. In particular, they can be supplied as the

value of arguments to functions, returned by functions as answers, and stored in data structures.

We would like to extend F1WAE to have the full power of functions, to reflect the capability of Scheme. In

fact, it will be easier to return to WAE and extend it with first-class functions.

6.2

Enriching the Language with Functions

To add functions to WAE, we must define their concrete and abstract syntax. First let’s examine some

concrete programs:

{{fun {x} {+ x 4}}

5}

This program defines a function that adds 4 to its argument and immediately applies this function to 5,

resulting in the value 9. This one

{with {double {fun {x} {+ x x}}}

{+ {double 10}

{double 5}}}

evaluates to 30. The program defines a function, binds it to double, then uses that name twice in slightly

different contexts (i.e., it instantiates the formal parameter with different actual parameters).

From these examples, it should be clear that we must introduce two new kinds of expressions: function

applications (as before), as well as (anonymous) function definitions. Here’s the revised BNF corresponding

to these examples:

<FWAE> ::= <num>

| {+ <FWAE> <FWAE>}

| {with {<id> <FWAE>} <FWAE>}

| <id>

| {fun {<id>} <FWAE>}

| {<FWAE> <FWAE>}

Note that F1WAE did not have function definitions as part of the expression language, since the definitions

were assumed to reside outside the expression being evaluated. In this language, functions can be defined

anywhere, including in the function position of an application (the last BNF production): instead of just

the name of a function, programmers can write an arbitrary expression that must be evaluated to obtain the

function to apply. The corresponding abstract syntax is:

(define-type FWAE

[num (n number?)]

[add (lhs FWAE?) (rhs FWAE?)]

[with (name symbol?) (named-expr FWAE?) (body FWAE?)]

6.2. ENRICHING THE LANGUAGE WITH FUNCTIONS

43

[id (name symbol?)]

[fun (param symbol?) (body FWAE?)]

[app (fun-expr FWAE?) (arg-expr FWAE?)])

To define our interpreter, we must think a little about what kinds of values it consumes and produces.

Naturally, the interpreter consumes values of type FWAE. What does it produce? Clearly, a program that

meets the WAE description must yield numbers. As we have seen above, some program that use functions

and applications also evaluate to numbers. How about a program that consists solely of a function? That is,

what is the value of the program

{fun {x} x}

? It clearly doesn’t represent a number. It may be a function that, when applied to a numeric argument,

produces a number, but it’s not itself a number (if you think differently, you need to indicate which number

it will be: 0? 1? 1729?). We instead realize from this that functions are also values that may be the result of

a computation.

We could design an elaborate representation for function values, but for now, we’ll remain modest. We’ll

let the function evaluate to its abstract syntax representation (i.e., a fun structure). (We will soon get more

sophisticated than this.) For consistency, we’ll also let numbers evaluate to num structures. Thus, the result

of evaluating the program above would be the value

(fun ’x (id ’x))

Now we’re ready to write the interpreter. We must pick a type for the value that interp returns. Since

we’ve decided to represent function and number answers using the abstract syntax, it makes sense to use

FWAE, with the caveat that only two kinds of FWAE terms can appear in the output: numbers and functions.

Our first interpreter will use explicit substitution, to offer a direct comparison with the corresponding WAE

and F1WAE interpreters.

;; interp : FWAE → FWAE

;; evaluates FWAE expressions by reducing them to their corresponding values

;; return values are either num or fun

(define (interp expr)

(type-case FWAE expr

[num (n) expr]

[add (l r) (add-numbers (interp l) (interp r))]

[with (bound-id named-expr bound-body)

(interp (subst bound-body

bound-id

(interp named-expr)))]

[id (v) (error ’interp ”free identifier”)]

44

CHAPTER 6. FIRST-CLASS FUNCTIONS

[fun (bound-id bound-body)

expr]

[app (fun-expr arg-expr)

(local ([define fun-val (interp fun-expr)])

(interp (subst (fun-body fun-val)

(fun-param fun-val)

(interp arg-expr))))]

))

(We’ve made a small change to the rule for add: it uses a helper named add-numbers because interp now

returns an FWAE. As an exercise, define this helper function for yourself.)

The rule for a function says, simply, to return the function itself. (Notice the similarity to the rule for

numbers, the other kind of constant in our language!) That leaves only the rule for applications to study.

This rule first evaluates the function position of an application. This is because that position may itself

contain a complex expression that needs to be reduced to an actual function. For instance, in the expression

{{{fun {x} x}

{fun {x} {+ x 5}}}

3}

the outer function position consists of the application of the identity function to a function that adds five to

its argument.

When evaluated, the function position had better reduce to a function value, not a number (or anything

else). For now, we implicitly assume that the programs fed to the interpreter have no errors. (Chapter X will study how we can detect erroneous programs.) Given a function, we need to evaluate its body after having

substituted the formal argument with its actual value. That’s what the rest of the program does: evaluate

the expression that will become the bound value, bind it using substitution, and then interpet the resulting

expression. The last few lines are very similar to the code for with.

To understand this interpreter better, consider what it produces in response to evaluating the following

term:

{with {x 3}

{fun {y}

{+ x y}}}

DrScheme prints the following:

(fun ’y (add (num 3) (id ’y)))

Notice that the x inside the function body has been replaced by 3 as a result of substitution, so the function

has no references to x left in it.

Exercise 6.2.1 What induced the small change in the rule for add? Explain, with an example, what would

go wrong if we did not make this change.

Exercise 6.2.2 Did you notice the small change in the interpretation of with?

Exercise 6.2.3 What goes wrong if the interpreter fails to evaluate the function position (by invoking the

interpreter on it)? Write a program and present the expected and actual results.

6.3. MAKING WITH REDUNDANT

45

6.3

Making with Redundant

Now that we have functions and function invocation as two distinct primitives, we can combine them to

recover the behavior of with as a special case. Every time we encounter an expression of the form

{with {var named} body}

we can replace it with

{{fun {var} body}

named}

and obtain the same effect. The result of this translation does not use with, so it can be evaluated by a more

primitive interpreter: one for AE enriched with functions.

Exercise 6.3.1 Implement a pre-processor that performs this translation.

We will assume the existence of such a pre-processor, and use the language FAE as our basis for subse-

quent exploration. Section 36 discusses such pre-processors in great detail.

6.4

Implementing Functions using Deferred Substitutions

As Section 5 described, our implementation will be more sprightly if we deferred substitutions instead of

performing them greedily. Let’s study how to adapt our interpreter to use this more efficient strategy.

First, we must provide a definition of the substitution repository. The repository associates identifiers