CS 3110: Data Structures and Functional Programming

Week 1. Overview

What is a functional language?

A functional language

defines computations as mathematical functions
avoids mutable state
specifies what to compute
- Variables never change value
- Functions never have side effects
is a powerful way to build correct programs

Mutability

A major reason why we use functional languages is the fantasy of mutability. This is the idea that a machine performs operations on a state sequentially.

The reality is that machines are good at complicated manipulation of a state while humans are not good at understanding it.
Mutability breaks referential transparency: the ability to replace an expression with its value without affecting a computation. For example, both function $f(x) = x^2$ and $g(x) = x^2$ are the same function and can be swapped. However, to Java these are two different functions.

Imperative programming

Imperative programming is what we learned in Java. Commands specify how to compute by destructively changing a state. Some examples are

x = x+1; 
//This doesnt make sense in math speak, 
//so why do programmers use it??
a[i] = 42;
p.next = p.next.next;

In imperative programming, functions and methods not only return a value, but also have side effects.

Why study functional programming

Functional languages teach you that programming trasncends programming in a language
Functional languages predict the future
1. Some features such as garbage collection, type inference and generics (Object) were implemented in functional languages way before they were in Java.
Functional languages are sometimes used in industry
Functional languages are elegant

OCaml

Ocaml stands for Objective Categorical Abstract Machine Language (not a helpful acronym). There are object oriented features in OCaml, but we will not use them.

13. Learning a Programming Language

There are 5 aspects of a programming language that we will keep in mind when learning a new language.

Syntax: How do you write language constructs?
Semantics: What do programs mean?
1. Semantics is a meta tool. It will help you learn languages
Idioms: What are typical patterns for using language features to express your computation?
1. Idioms will make you a better programmer in a language
Libraries: What facilities does the language provide as “standard”?
Tools: What do language implementations provide to make your job easier? (E.g., top-level, debugger, GUI editor, …)

Syntax is always boring! Focus on semantics and idioms.

14. Expressions

Expressions have two pieces: syntax and semantics.

Semantics involves type-checking rules (static semantics). These produce a type or fail with an error message. It is static because although the compiler is run, the program is not run yet.

Semantics also involves evaluation rules (dynamic semantics). These produce a value, or exception or infinite loop.

Definition (value). A value is an expression that does not need any further evaluation.

Example (int). In utop, let us type a couple statements.

# 3110;;
- : int = 3110
# false;;
- : bool = false
# 3110 > 2110;;
- : bool = true
# "big";;
- : string = "big"
# "big" ^ "red";;
- : string = "bigred"
# 9 + 10;;
- : int = 19

This means that 3110;; evaluates to the int 3110.

Proposition (Ocaml does not have overloading). Note that no operators in OCaml are overloaded. Floats need specific operators different from the int operators.

# 2.0 * 3.14;;
Line 1, characters 0-3:
Error: This expression has type float but an expression was expected of type
         int
# 2.0 *. 3.14;;
- : float = 6.28

Type inference and annotation

Note that the OCaml compiler infers types. It checks types at compile time.

The compiler will fail with a type error if it can’t infer a type.
The hard part of language design is guaranteeing the compiler can infer types when a program is written correctly.

In OCaml, you can manually annotate types anywhere.

Replace $\tt e$ with $\tt (e:t)$.
This is useful for diagnosing type errors.

# (3110 : int);;
- : int = 3110
# (3110 : bool);;
(* This gives an error*)

15. Let Definitions

Let definitions are like variable definitions in other languages, except the values cannot change.

# let x = 42;;
val x : int = 42
# x;;
- : int = 42

Definitions

A definition is a way of giving a name to a value.

Note that definitions are not expressions, or vice-versa! By definitions syntactically contain expressions.

/assets/cs3110/Untitled.png

$\tt let$

The syntax for the let statement is as follows:

\[\texttt{let x = e}\]

where $\tt x$ is an identifier.

The evaluation of the let statement occurs in this order:

Evaluate $\tt e$ to a value $\tt v$.
Bind $\tt v$ to $\tt x$; henceforth, $\tt x$ will evaluate to $\tt v$.
1. Under the hood, $\tt x$ is a pointer to the memory location that contains $\tt v$.

Note that $\tt x$ itself is not an expression!

(let x = 1) + 2;;
Error: Syntax error: operator expected.

16. If Expressions

If expressions follow the syntax

\[\texttt{if e1 then e2 else e3}.\]

If $\tt e1$ evaluates to $\tt true$, and if $\tt e2$ evaluates to $\tt v$, then $\texttt{if e1 then e2 else e3}.$ evaluates to $\tt v$.
If $\tt e1$ evaluates to $\tt false$, and if $\tt e3$ evaluates to $\tt v$, then $\texttt{if e1 then e2 else e3}.$ evaluates to $\tt v$

Type checking:

If $\tt e1$ has type $\tt bool$ and $\tt e2$ has type $\tt t$, and $\tt e3$ has type $\tt t$, then $\texttt{if e1 then e2 else e3}.$ has type $\tt t$.

For example,

if "batman" > "superman" then "yay" else "boo";;

This is very similar to the ternary operator.

Note that OCaml will be strict about types. The if statement must have type bool. The two branches of the if statement must be of the same type.

The else branch is also required.

17. Let Expressions

Let expressions are like let definitions, but they can be nested together. They have the form

\[\texttt{let x = e1 in e2}.\]

Evaluation rule (dynamic semantics):

Evaluate $\tt e1$ to a value $\tt v1$.
Substitute $\tt v1$ for $\tt x$ in $\tt e2$, yielding a new expression $\tt e2’$.
Evaluate $\tt e2’$ to a value $\tt v2$.
The result of $\texttt{let x = e1 in e2}$ is the value $\tt v2$.

Type checking rule (static semantics):

If $\tt e1:t1$ and consequently $\tt x: t1$, and $\tt e2:t2$, then $\texttt{(let x = e1 in e2): t2}$.

Note that this does not bind a value to any variable! This is because of the variable’s scope “in” an expression.

# let a = 0 in a;;
- : int 0 
# let b = 1 in 2 * b;;
- : int 2
# a;;
Error: Unbound value a
# let c = 5 in (let d = 3 in c + d)
- : int 8

In addition, the following statement is tricky to understand, but valid:

let e = 5 in (let e = 6 in e);;
Line 1, characters 4-5:
Warning 26: unused variable e.
- : int = 6

This code compiles because of the let expression’s type-checking rules and evaluation rules.

18. Variable Expressions and Scope

$\tt let$ in toplevel

In utop, any let definition is implicitly $\tt in$ the rest that you type.

Scope

The scope of a variable is where its name is meaningful.

let x = 42 in
	(* y is not meaningful here *)
	x + (let y = "3110" in
		(* y is meaningful here *)
		int_of_string y)

Name

Principle of Name Irrelevance: the name of a variable shouldn’t intrinsically matter.

For example, in math, the following are the same functions:

f(x) = x+1
f(y) = y+1.

To ensure name irrelevance in programs, stop substituting when you reach a binding of the same name.

let x = 5 in x + (let x = 6 in x + x))
- : int = 17

19. Scope and the Toplevel

In utop, the following is legal, giving the illusion of mutability.

let x = 1;;
let x = 2;;
x;;

In reality, it is just nested scopes.

let x = 1 in
	let x = 2 in
		x

2.3. Functions

Recall from week one that the following code

let x = 42

has an expression in it, but is not an expression itself. Rather, it is a definition since it binds values to names. In OCaml, functions must begin with lowercase letters.

A function definition is a particular type of definition defined like this:

let f x = ...

Recursive functions are defined with the rec keyword:

let rec f x = ...

Example (factorial).

(* Precondition: Requires n >= 0*)
(* Returns: n!*)
let rec fact n =
  if n = 0  then 1 else n * fact(n-1)
             
(* Requires y >b 0*)
(* Returns x^y*)
let rec pow x y =
  if y = 0 then 1
  else x * pow x (y-1)
             
(* Power with type annotations*)
let rec pow (x:int) (y:int) : int =
  if y = 0 then 1
  else x * pow x (y-1)

Mutual recursion can also be implemented with the and keyword.

(* [even n] is whether [n] is even.
* requires: [n >= 0] *)
let rec even n =
	n = 0 || odd (n-1)

(* [odd n] is whether [n] is odd.
* requires: [n >= 0]*)
and odd n =
	 n <> 0 || even (n-1)

Dynamic semantics of functions

Functions do not have dynamic semantics since nothing is evaluated. Function $\tt f$ has arguments $\tt x1…xn$ and body $\tt e$.

Static semantics of functions.

For non-recursive functions, assuming that $\tt (x1:t1) \dots (xn:tn)$ and $\tt e: u$, $\tt f$ is the map

\[\tt f : t1\to t2\to\dots\to tn\to u.\]

For recursive functions, assuming that $\tt (x1:t1) \dots (xn:tn)$ and $\tt e: u$ and $\tt f : t1\to t2\to\dots\to tn\to u,$ we can conclude that $\tt f : t1\to t2\to\dots\to tn\to u.$
Note that the recursive static semantics assumes that $\tt f$ has a particular type.

2.3.1 Anonymous Functions

An anonymous function is one that does not have a name. They are denoted with the fun keyword. Here are two increment functions. One is anonymous and one is not.

let inc x = x + 1
let inc = fun x -> x+1

Anonymous functions are also called lambda expressions, a term from lambda calculus. This was a model of computation in the same sense that Turing machines are a model of computation.

Since functions are values, they can be used anywhere we use values. Functions can take arguments as functions and return functions as results.

Dynamic semantics.

Since anonymous functions are already a value, no computation is performed.

Static semantics.

If by assuming $\tt (x1:t1) \dots (xn:tn)$ we can conclude that $\tt e :u$, then $\tt fun \space x1 \dots xn \to e : t1 \to \dots \to tn\to u$.

2.3.2. Function Application

Function application is written by just putting functions next to each other.

Given the syntax

e0 e1 e2 ... en

the first expression $\tt e0$ is the function, and $\tt e1 \dots en$ are arguments.

Static semantics.

If $\tt e0: t1 \to \dots \to tn \to u$ and $\tt (e1:t1) \dots (en:tn)$, then $\tt e0 \space e1 \dots en : u$.

Dynamic semantics.

Evaluate $\tt e0$ to a function. Also evaluate the arguments $\tt e1 \dots en$ to $\tt v1 \dots vn$.
For $\tt e0$, the result might be an anonymous function $\tt fun \space x1 \dots x-n \to e$ or a name $\tt f$.
Substitute each value $\tt vi$ for the corresponding argument name $\tt xi$ in the body $\tt e$ of the function. The substitution results in a new expression $\tt e’$.
Evaluate $\tt e’$ to a value $\tt v$, which is the result of evaluating $\tt e0 \dots en$.

Anywhere $\tt let \space x = e1 \space in \space e2$ occurs, we could replace it with $\tt (fun \space x \to e2)e1$.

Pipeline

There is an infix operator in OCaml called the pipeline operator $\tt >$, which is equivalent to the function application. Values are sent through the pipe from left to right. For example, with functions inc and square, here are two equivalent functions.

let (|>) x f = f x
square (inc 5)
5 |> inc |> square
(*both yield 36 *)

This improves the readability of code since less parenthesis are used. Here is another example:

5 |> inc |> square |> inc |> inc |> square

The forward application operator is

let (@@) f x = f x
succ (2 * 10)
succ @@ 2 * 10

2.3.3. Polymorphic Functions

The identity function is the function that simply returns its input.

let id x = x;;
id : 'a -> 'a = <fun>

Here, $\tt ‘a$ is a type variable, which stands for an unknown type. Type variables begin with a single quote.

Because $\tt id$ can be applied to multiple types of values, it is a polymorphic function.

2.3.4. Labeled and Optional Arguments

It can be useful to label functions with many arguments. For example, String.sub is labeled, indicating that it probably operates on strings.

String.sub;;
-: string -> int -> int -> string

In OCaml, you can label arguments with the following syntax:

let f ~name1:arg1 ~name2:arg2 = arg1 + arg2;;
val f : name1:int -> name2:int -> int = <fun>

The function can then be called by passing arguments in either order.

f ~name2:3 ~name1:4
7

To make the variable name same as the label, OCaml supports the following syntax, which are equivalent

let f ~name1:name1 ~name2:name2 = name1+name2
let f ~name1 ~name2 = name1 + name2

Explicit type annotations can also be added.

let f ~name1:(arg1:int) ~name2:(arg2:int) = arg1 + arg2

Optional arguments can also be added with a default value.

let f ?name:(arg1=8) arg2 = arg1 + arg2
f ~name:2 7
f 7

2.3.5. Partial Application

Partial application is when a function is partially applied to one argument when it can take multiple.

Example. Suppose we have the following add function:

let addx x = fun y -> x + y

addx takes an int and returns a function of type int -> int. Something interesting we can do with addx is the folowing:

let add5 = addx 5
add5 2

In fact, this is how add is implemented in OCaml. The statement let add5 = add 5;; is completely legal.

To recap, here are several semantically equivalent ways of writing add

let x y = x+y
let add x = fun y -> x + y
let add = fun x -> fun y -> x + y

Recall that functions are values, so

functions can take functions as arguments.
functions can return functions as resuts.

Function associativity

The reality of OCaml is that every function takes exactly one argument.

A function $\tt f$ takes in an argument and evaluates types in a right associative manner. To illustrate,

t1 -> t2 -> t3 -> t4

is equivalent to

t1 -> (t2 -> (t3 -> t4))

Notice that this differs from function application, which is left associative.

2.3.6. Operators as Functions

Operators can be written in prefix notation by putting parenthesis before them ( + ). We can even define our own infix operators.

let ( ^^ ) x y = max x y
2 ^ ^ 3
-: 3

3.1.1. Lists

In OCaml, a list is a sequence of values with the same type. They are singly linked and immutable.

They are good for sequential access of shot to medium length lists (say up to 10k elements).

Building Lists

Syntax. In OCaml, there are 3 ways to build a list:

[]
e1::e2
[e1; e2; ...; en]

The first is an empty list. The second prepends e1 to e2. The operator is called “cons”’, short for constructs. $[]$ is pronounced “nil”, from Lisp.

The third is just syntactic sugar for ::.

[1; 2; 3] = 1::2::3::[]

Evaluation.

$\tt []$ is a value.
To evaluate $\tt e1 :: e2$,
- Evaluate $\tt e1$ to a value $\tt v1$.
- Evaluate $\tt e2$ to a value $\tt v2$
- Evaluate $\tt e2$ to a list value $\tt v2$ and return
- $\tt v1 :: v2$.

This is the same for the bracket notation $\tt [e1; e2]$.

List Types

For any type $\tt t$, the type $\tt t\space list$ describes lists where all elements have type $\tt t$.

[1; 2; 3] : int list
[true] : bool list
[[1 + 1; 2 - 3]; [3 * 7]] : int list list

List Semantics

For nil, an empty list, the type is $\tt [] : ‘a \space list$.

For cons, if $\tt e1: t$ and $\tt e2: t \space list$, then $\tt e1 :: e2 : t \space list$.

Records and Tuples

Records

A record is similar to a struct in C.

type student = {
    name : string;
    year : int; (* grad year *)
  }

let rbg : student = {
    name = "Ruth Bader";
    year = 1954;
  }

Syntax. To understand records, we will introduce a new kind of definition, a type definition.

$\tt {f1 = e2; f2 = e2}$ is a record with fields named $\tt f1$ and $\tt f2$. The order of fields is irrelevant. The dot operator accesses a field from expression $\tt e$.

Field names are identifiers not expressions.

Dynamic Semantics. If $\tt ei \to vi$ then $\tt {f1 = e1; \dots; fn = en} \to {f1 = v1; \dots; fn = vn}$.

If $\tt e \to {…; f = v; …}$ then $\tt e.f \to v$.

Static Semantics. Record types must be defined before used so OCaml knows the field names.

Record Copy

$\texttt{e with f1 = e2; f2 = e2; \dots }$ creates a copy of $\tt e$ with a new field value for $\tt f1$. Remember that you cannot add new fields.

Tuples

A tuple is an unnamed record.

type time = int * int * string
let t : time = (10, 10, "am") 

type point = float * float
let p = (5., 3.5)

There is a function $\tt fst$ that takes in a tuple pair and returns the first value.

Syntax. The order of tuples are important. $\tt (e1, e2, e3)$.

Dynamic Semantics. If $\tt ei \to vi$ then $(e1, \dots, en) \to (v1, \dots, vn)$.

Type Checking. If $\tt ei : ti$ then $\tt (e1, \dots, en) : t1 * \dots * tn$.

Comparison

When deciding what data type to use, here are some considerations.

/assets/cs3110/Untitled%201.png

Pattern Matching

A list can be

nil
the cons of an element onto another list

Pattern matching on a list works only in these 2 ways.

let empty lst =
	match lst with
	| [] -> true
	| h :: t -> false

Syntax. The syntax takes the general form

match e with
| p1 -> e1
| p2 -> e2

$\tt p$ is a new syntactic class: pattern expressions.

Static Semantics. If $\tt e$ and $\tt pi$ have type $\tt ta$ and $\tt ei$ has type $\tt tb$, then the expression as type $\tt tb$.

Dynamic Semantics. Evaluate $e$ to a value $v$. Then proceed top to bottom. If e matches with $\tt pi$, then $\tt ei \to vi$. Then, the expression evaluates to $\tt vi$.

If no patterns match, raise an exception $\tt Match_failure$.

In pattern matching,

a constant matches itself
an identifier, pattern variable matches to anything and binds it to the scope of the branch
The wildcard matches to anything but does not bind to it.

Function keyword

Immediately matching against an implicit final argument is so useful that there is syntactic sugar for it.

let f x y z =
	match z with
	| p1 -> e1
	| p2 -> e2

let f x y = function
	| p1 -> e1
	| p2 -> e2

Append

Append combines two lists. Keep in mind it is linear time with the first list, like what we saw in our implementation.

let rec append lst1 lst2 =
  match lst1 with
  | [] -> lst2
  | h :: t -> h :: (append t lst2)

[432] @ [432]
'a list -> 'a list -> 'a list