Chapter 1: Building Abstractions with Functions by Harold Abelson, Gerald Jay Sussman, Julie Sussman - HTML preview

PLEASE NOTE: This is an HTML preview only and some elements such as links or page numbers may be incorrect.
Download the book in PDF, ePub, Kindle for a complete version.

ate value. We will return to the topic of dispatch functions several times throughout this chapter.

The point of exhibiting the functional representation of a pair is not that Python actually works this way (tuples

are implemented more directly, for efficiency reasons) but that it could work this way. The functional representa-

tion, although obscure, is a perfectly adequate way to represent pairs, since it fulfills the only conditions that pairs

need to fulfill. This example also demonstrates that the ability to manipulate functions as values automatically

provides us the ability to represent compound data.

2.3 Sequences

A sequence is an ordered collection of data values. Unlike a pair, which has exactly two elements, a sequence can

have an arbitrary (but finite) number of ordered elements.

The sequence is a powerful, fundamental abstraction in computer science. For example, if we have sequences,

we can list every student at Berkeley, or every university in the world, or every student in every university. We can

8

index-44_1.png

index-44_2.png

list every class ever taken, every assignment ever completed, every grade ever received. The sequence abstraction

enables the thousands of data-driven programs that impact our lives every day.

A sequence is not a particular abstract data type, but instead a collection of behaviors that different types share.

That is, there are many kinds of sequences, but they all share certain properties. In particular,

Length. A sequence has a finite length.

Element selection. A sequence has an element corresponding to any non-negative integer index less than its

length, starting at 0 for the first element.

Unlike an abstract data type, we have not stated how to construct a sequence. The sequence abstraction is a

collection of behaviors that does not fully specify a type (i.e., with constructors and selectors), but may be shared

among several types. Sequences provide a layer of abstraction that may hide the details of exactly which sequence

type is being manipulated by a particular program.

In this section, we develop a particular abstract data type that can implement the sequence abstraction. We

then introduce built-in Python types that also implement the same abstraction.

2.3.1 Nested Pairs

For rational numbers, we paired together two integer objects using a two-element tuple, then showed that we could

implement pairs just as well using functions. In that case, the elements of each pair we constructed were integers.

However, like expressions, tuples can nest. Either element of a pair can itself be a pair, a property that holds true

for either method of implementing a pair that we have seen: as a tuple or as a dispatch function.

A standard way to visualize a pair --- in this case, the pair (1,2) --- is called box-and-pointer notation.

Each value, compound or primitive, is depicted as a pointer to a box. The box for a primitive value contains a

representation of that value. For example, the box for a number contains a numeral. The box for a pair is actually

a double box: the left part contains (an arrow to) the first element of the pair and the right part contains the second.

This Python expression for a nested tuple,

>>> ((1, 2), (3, 4))

((1, 2), (3, 4))

would have the following structure.

Our ability to use tuples as the elements of other tuples provides a new means of combination in our program-

ming language. We call the ability for tuples to nest in this way a closure property of the tuple data type. In

general, a method for combining data values satisfies the closure property if the result of combination can itself

be combined using the same method. Closure is the key to power in any means of combination because it permits

us to create hierarchical structures --- structures made up of parts, which themselves are made up of parts, and so

on. We will explore a range of hierarchical structures in Chapter 3. For now, we consider a particularly important

structure.

9

index-45_1.png

2.3.2 Recursive Lists

We can use nested pairs to form lists of elements of arbitrary length, which will allow us to implement the sequence

abstraction. The figure below illustrates the structure of the recursive representation of a four-element list: 1, 2,

3, 4.

The list is represented by a chain of pairs. The first element of each pair is an element in the list, while the

second is a pair that represents the rest of the list. The second element of the final pair is None, which indicates

that the list has ended. We can construct this structure using a nested tuple literal:

>>> (1, (2, (3, (4, None))))

(1, (2, (3, (4, None))))

This nested structure corresponds to a very useful way of thinking about sequences in general, which we have

seen before in the execution rules of the Python interpreter. A non-empty sequence can be decomposed into:

• its first element, and

• the rest of the sequence.

The rest of a sequence is itself a (possibly empty) sequence. We call this view of sequences recursive, because

sequences contain other sequences as their second component.

Since our list representation is recursive, we will call it an rlist in our implementation, so as not to confuse

it with the built-in list type in Python that we will introduce later in this chapter. A recursive list can be

constructed from a first element and the rest of the list. The value None represents an empty recursive list.

>>> empty_rlist = None

>>> def make_rlist(first, rest):

"""Make a recursive list from its first element and the rest."""

return (first, rest)

>>> def first(s):

"""Return the first element of a recursive list s."""

return s[0]

>>> def rest(s):

"""Return the rest of the elements of a recursive list s."""

return s[1]

These two selectors, one constructor, and one constant together implement the recursive list abstract data

type. The single behavior condition for a recursive list is that, like a pair, its constructor and selectors are inverse

functions.

• If a recursive list s was constructed from element f and list r, then first(s) returns f, and rest(s)

returns r.

We can use the constructor and selectors to manipulate recursive lists.

>>> counts = make_rlist(1, make_rlist(2, make_rlist(3, make_rlist(4, empty_rlist))))

>>> first(counts)

1

>>> rest(counts)

(2, (3, (4, None)))

10

index-46_1.png

Recall that we were able to represent pairs using functions, and therefore we can represent recursive lists using

functions as well.

The recursive list can store a sequence of values in order, but it does not yet implement the sequence ab-

straction. Using the abstract data type we have defined, we can implement the two behaviors that characterize a

sequence: length and element selection.

>>> def len_rlist(s):

"""Return the length of recursive list s."""

length = 0

while s != empty_rlist:

s, length = rest(s), length + 1

return length

>>> def getitem_rlist(s, i):

"""Return the element at index i of recursive list s."""

while i > 0:

s, i = rest(s), i - 1

return first(s)

Now, we can manipulate a recursive list as a sequence:

>>> len_rlist(counts)

4

>>> getitem_rlist(counts, 1)

# The second item has index 1

2

Both of these implementations are iterative. They peel away each layer of nested pair until the end of the list

(in len_rlist) or the desired element (in getitem_rlist) is reached.

The series of environment diagrams below illustrate the iterative process by which getitem_rlist finds

the element 2 at index 1 in the recursive list. First, the function getitem_rlist is called, creating a local

frame.

The expression in the while header evaluates to true, which causes the assignment statement in the while

suite to be executed.

11

index-47_1.png

index-47_2.png

In this case, the local name s now refers to the sub-list that begins with the second element of the original list.

Evaluating the while header expression now yields a false value, and so Python evaluates the expression in the

return statement on the final line of getitem_rlist.

This final environment diagram shows the local frame for the call to first, which contains the name s

bound to that same sub-list. The first function selects the value 2 and returns it, completing the call to

getitem_rlist.

This example demonstrates a common pattern of computation with recursive lists, where each step in an

iteration operates on an increasingly shorter suffix of the original list. This incremental processing to find the

length and elements of a recursive list does take some time to compute. (In Chapter 3, we will learn to characterize

the computation time of iterative functions like these.) Python’s built-in sequence types are implemented in a

different way that does not have a large computational cost for computing the length of a sequence or retrieving

its elements.

The way in which we construct recursive lists is rather verbose. Fortunately, Python provides a variety of built-

in sequence types that provide both the versatility of the sequence abstraction, as well as convenient notation.

2.3.3 Tuples II

In fact, the tuple type that we introduced to form primitive pairs is itself a full sequence type. Tuples provide

substantially more functionality than the pair abstract data type that we implemented functionally.

12

Tuples can have arbitrary length, and they exhibit the two principal behaviors of the sequence abstraction:

length and element selection. Below, digits is a tuple with four elements.

>>> digits = (1, 8, 2, 8)

>>> len(digits)

4

>>> digits[3]

8

Additionally, tuples can be added together and multiplied by integers. For tuples, addition and multiplication

do not add or multiply elements, but instead combine and replicate the tuples themselves. That is, the add function

in the operator module (and the + operator) returns a new tuple that is the conjunction of the added arguments.

The mul function in operator (and the * operator) can take an integer k and a tuple and return a new tuple that

consists of k copies of the tuple argument.

>>> (2, 7) + digits * 2

(2, 7, 1, 8, 2, 8, 1, 8, 2, 8)

Mapping. A powerful method of transforming one tuple into another is by applying a function to each element

and collecting the results. This general form of computation is called mapping a function over a sequence, and

corresponds to the built-in function map. The result of map is an object that is not itself a sequence, but can be

converted into a sequence by calling tuple, the constructor function for tuples.

>>> alternates = (-1, 2, -3, 4, -5)

>>> tuple(map(abs, alternates))

(1, 2, 3, 4, 5)

The map function is important because it relies on the sequence abstraction: we do not need to be concerned

about the structure of the underlying tuple; only that we can access each one of its elements individually in order

to pass it as an argument to the mapped function (abs, in this case).

2.3.4 Sequence Iteration

Mapping is itself an instance of a general pattern of computation: iterating over all elements in a sequence. To

map a function over a sequence, we do not just select a particular element, but each element in turn. This pattern

is so common that Python has an additional control statement to process sequential data: the for statement.

Consider the problem of counting how many times a value appears in a sequence. We can implement a function

to compute this count using a while loop.

>>> def count(s, value):

"""Count the number of occurrences of value in sequence s."""

total, index = 0, 0

while index < len(s):

if s[index] == value:

total = total + 1

index = index + 1

return total

>>> count(digits, 8)

2

The Python for statement can simplify this function body by iterating over the element values directly, with-

out introducing the name index at all. For example (pun intended), we can write:

>>> def count(s, value):

"""Count the number of occurrences of value in sequence s."""

total = 0

for elem in s:

if elem == value:

total = total + 1

return total

13

>>> count(digits, 8)

2

A for statement consists of a single clause with the form:

for <name> in <expression>:

<suite>

A for statement is executed by the following procedure:

1. Evaluate the header <expression>, which must yield an iterable value.

2. For each element value in that sequence, in order:

A. Bind <name> to that value in the local environment.

B. Execute the <suite>.

Step 1 refers to an iterable value. Sequences are iterable, and their elements are considered in their sequential

order. Python does include other iterable types, but we will focus on sequences for now; the general definition of

the term “iterable” appears in the section on iterators in Chapter 4.

An important consequence of this evaluation procedure is that <name> will be bound to the last element of

the sequence after the for statement is executed. The for loop introduces yet another way in which the local

environment can be updated by a statement.

Sequence unpacking. A common pattern in programs is to have a sequence of elements that are themselves

sequences, but all of a fixed length. For statements may include multiple names in their header to “unpack” each

element sequence into its respective elements. For example, we may have a sequence of pairs (that is, two-element

tuples),

>>> pairs = ((1, 2), (2, 2), (2, 3), (4, 4))

and wish to find the number of pairs that have the same first and second element.

>>> same_count = 0

The following for statement with two names in its header will bind each name x and y to the first and second

elements in each pair, respectively.

>>> for x, y in pairs:

if x == y:

same_count = same_count + 1

>>> same_count

2

This pattern of binding multiple names to multiple values in a fixed-length sequence is called sequence un-

packing; it is the same pattern that we see in assignment statements that bind multiple names to multiple values.

Ranges. A range is another built-in type of sequence in Python, which represents a range of integers. Ranges

are created with the range function, which takes two integer arguments: the first number and one beyond the last

number in the desired range.

>>> range(1, 10)

# Includes 1, but not 10

range(1, 10)

Calling the tuple constructor on a range will create a tuple with the same elements as the range, so that the

elements can be easily inspected.

>>> tuple(range(5, 8))

(5, 6, 7)

If only one argument is given, it is interpreted as one beyond the last value for a range that starts at 0.

>>> tuple(range(4))

(0, 1, 2, 3)

14

Ranges commonly appear as the expression in a for header to specify the number of times that the suite

should be executed:

>>> total = 0

>>> for k in range(5, 8):

total = total + k

>>> total

18

A common convention is to use a single underscore character for the name in the for header if the name is

unused in the suite:

>>> for _ in range(3):

print(’Go Bears!’)

Go Bears!

Go Bears!

Go Bears!

Note that an underscore is just another name in the environment as far as the interpreter is concerned, but has

a conventional meaning among programmers that indicates the name will not appear in any expressions.

2.3.5 Sequence Abstraction

We have now introduced two types of native data types that implement the sequence abstraction: tuples and ranges.

Both satisfy the conditions with which we began this section: length and element selection. Python includes two

more behaviors of sequence types that extend the sequence abstraction.

Membership. A value can be tested for membership in a sequence. Python has two operators in and not

in that evaluate to True or False depending on whether an element appears in a sequence.

>>> digits

(1, 8, 2, 8)

>>> 2 in digits

True

>>> 1828 not in digits

True

All sequences also have methods called index and count, which return the index of (or count of) a value in

a sequence.

Slicing. Sequences contain smaller sequences within them. We observed this property when developing our

nested pairs implementation, which decomposed a sequence into its first element and the rest. A slice of a sequence

is any span of the original sequence, designated by a pair of integers. As with the range constructor, the first

integer indicates the starting index of the slice and the second indicates one beyond the ending index.

In Python, sequence slicing is expressed similarly to element selection, using square brackets. A colon sep-

arates the starting and ending indices. Any bound that is omitted is assumed to be an extreme value: 0 for the

starting index, and the length of the sequence for the ending index.

>>> digits[0:2]

(1, 8)

>>> digits[1:]

(8, 2, 8)

Enumerating these additional behaviors of the Python sequence abstraction gives us an opportunity to reflect

upon what constitutes a useful data abstraction in general. The richness of an abstraction (that is, how many

behaviors it includes) has consequences. For users of an abstraction, additional behaviors can be helpful. On the

other hand, satisfying the requirements of a rich abstraction with a new data type can be challenging. To ensure

that our implementation of recursive lists supported these additional behaviors would require some work. Another

negative consequence of rich abstractions is that they take longer for users to learn.

15

Sequences have a rich abstraction because they are so ubiquitous in computing that learning a few complex

behaviors is justified. In general, most user-defined abstractions should be kept as simple as possible.

Further reading. Slice notation admits a variety of special cases, such as negative starting values, ending

values, and step sizes. A complete description appears in the subsection called slicing a list in Dive Into Python 3.

In this chapter, we will only use the basic features described above.

2.3.6 Strings

Text values are perhaps more fundamental to computer science than even numbers. As a case in point, Python

programs are written and stored as text. The native data type for text in Python is called a string, and corresponds

to the constructor str.

There are many details of how strings are represented, expressed, and manipulated in Python. Strings are

another example of a rich abstraction, one which requires a substantial commitment on the part of the programmer

to master. This section serves as a condensed introduction to essential string behaviors.

String literals can express arbitrary text, surrounded by either single or double quotation marks.

>>> ’I am string!’

’I am string!’

>>> "I’ve got an apostrophe"

"I’ve got an apostrophe"

>>> ’??’

’??’

We have seen strings already in our code, as docstrings, in calls to print, and as error messages in assert

statements.

Strings satisfy the two basic conditions of a sequence that we introduced at the beginning of this section: they

have a length and they support element selection.

>>> city = ’Berkeley’

>>> len(city)

8

>>> city[3]

’k’

The elements of a string are themselves strings that have only a single character. A character is any single letter

of the alphabet, punctuation mark, or other symbol. Unlike many other programming languages, Python does not

have a separate character type; any text is a string, and strings that represent single characters have a length of 1.

Like tuples, strings can also be combined via addition and multiplication.

>>> ’Berkeley’ + ’, CA’

’Berkeley, CA’

>>> ’Shabu ’ * 2

’Shabu Shabu ’

Membership. The behavior of strings diverges from other sequence types in Python. The string abstraction

does not conform to the full sequence abstraction that we described for tuples and ranges. In particular, the mem-

bership operator in applies to strings, but has an entirely different behavior than when it is applied to sequences.

It matches substrings rather than elements.

>>> ’here’ in "Where’s Waldo?"

True

Likewise, the count and index methods on strings take substrings as arguments, rather than single-character

elements. The behavior of count is particularly nuanced; it counts the number of non-overlapping occurrences

of a substring in a string.

>>> ’Mississippi’.count(’i’)

4

>>> ’Mississippi’.count(’issi’)

1

16

Multiline Literals. Strings aren’t limited to a single line. Triple quotes delimit string literals that span multiple

lines. We have used this triple quoting extensively already for docstrings.

>>> """The Zen of Python

claims, Readability counts.

Read more: import this."""

’The Zen of Python\nclaims, "Readability counts."\nRead more: import this.’

In the printed result above, the \n (pronounced “backslash en”) is a single element that represents a new line.

Although it appears as two characters (backslash and “n”), it is considered a single character for the purposes of

length and element selection.

String Coercion. A string can be created from any object in Python by calling the str constructor function

with an object value as its argument. This feature of strings is useful for constructing descriptive strings from

objects of various types.

>>> str(2) + ’ is an element of ’ + str(digits)

’2 is an element of (1, 8, 2, 8)’

The mechanism by which a single str function can apply to any type of argument and return an appropriate

value is the subject of the later section on generic functions.

Methods. The behavior of strings in Python is extremely productive because of a rich set of methods for

returning string variants and searching for contents. A few of these methods are introduced below by example.

>>> ’1234’.isnumeric()

True

>>> ’rOBERT dE nIRO’.swapcase()

’Robert De Niro’

>>> ’snakeyes’.upper().endswith(’YES’)

True

Further reading. Encoding text in computers is a complex topic. In this chapter, we will abstract away the

details of how strings are represented. However, for many applications, the particular details of how strings are

encoded by computers is essential knowledge. Sections 4.1-4.3 of Dive Into Python 3 provides a description of character encodings and Unicode.

2.3.7 Conventional Interfaces

In working with compound data, we’ve stressed how data abstraction permits us to design programs without

becoming enmeshed in the details of data representations, and how abstraction preserves for us the flexibility to

experiment with alternative representations. In this section, we introduce another powerful design principle for

working with data structures --- the use of conventional interfaces.

A conventional interface is a data format that is shared across many modular components, which can be mixed

and matched to perform data processing. For example, if we have several functions that all take a sequence as

an argument and return a sequence as a value, then we can apply each to the output of the next in any order we

choose. In t