Pattern Matching in Standard ML

Homework 3 of the programming language course taught by Professor Dan Grossman from University of Washington.

Standard ML extensively revised earlier dialects of the functional programming language ML, including the module facility that supports large-scale program development. The use of pattern matching to access components of aggregate data structures is one of the most powerful, and distinctive, features of ML family1.

Table of Contents

String Functions

only_capitals()

Write a function that takes a string list and returns a string list that has only the strings in the argument that start with an uppercase letter:

val only_capitals =
  List.filter (fn str => Char.isUpper(String.sub(str, 0)))

longest_string1()

Write a function that takes a string list and returns the longest string in the list. If the list is empty, return empty string "". In the case of a tie, return the string closest to the beginning of the list:

val longest_string1 =
  foldl
  ( fn (str, max) => 
      case (String.size str > String.size max) of 
        true => str | false => max )
  ""

longest_string2()

Write a function that is exactly like longest_string1() except in the case of ties it returns the string closest to the end of the list:

val longest_string2 =
  foldl
  ( fn (str, max) =>
      (* >= rather than > *) 
      case (String.size str >= String.size max) of 
        true => str | false => max )
  ""

longest_string_helper(), longest_string3(), and longest_string4()

var longest_string_helper f =
  foldl
  ( fn (x, y) =>
      (* A more general function*)
      case f (String.size x, String.size y) of 
        true => x | false => y )
  ""

val longest_string3 =
  longest_string_helper(fn (x, y) => x > y)

val longest_string4 =
  longest_string_helper(fn (x, y) => x >= y)

longest_capitalized()

Write a function that takes a string list and returns the longest string in the list that begins with an uppercase letter, or "" if there are no such strings. Assume all strings have at least one character. Resolve ties like in longest_string1().

val longest_capitalized = longest_string1 o only_capitals

rev_string()

Write a function that takes a string and returns the string that is the same characters in reverse order.

val rev_string = implode o rev o explode

Pattern Matching Functions

Utilities for Pattern Matching

Write a function first_answer() of type:

('a -> 'b option) -> 'a list -> 'b

The first argument should be applied to elements of the second argument in order until the first time it returns SOME v for some v and then v is the result of the call to first_answer(). If the first argument returns NONE for all list elements, then first_answer() should raise the exception NoAnswer.

fun first_answer f lst =
  case lst of 
    [] => raise NoAnswer
  | x::xs => 
      case f x of
        SOME v => v
      | NONE => first_answer f xs

Next, write another function all_answers() of type:

('a -> 'b list option) -> 'a list -> 'b list option

The first argument should be applied to elements of the second argument. If it returns NONE for any element, then the result for all_answers() is NONE. Else the calls to the first argument will have produced SOME lst1, SOME lst2, … SOME lstn and the result of all_answers() is SOME lst where lst is lst1, lst2, … lstn appended together (order doesn’t matter).

fun all_answers f lst =
  let fun aux(f, lst, acc) =
    case lst of
      [] => SOME acc
    | x::xs =>
        case f x of
          SOME v => aux(f, xs, v@acc)
        | NONE => NONE
  in
    aux(f, lst, [])
  end

Type Declarations

Given v of valu and p of pattern, either p matches v or not. If it does, the match produces a list of string * valu pairs; order in the list does not matter.

(* Inspired by the type definitions an ML implementation
 * would use to implement pattern matching: *)
datatype pattern = Wildcard
                 | Variable of string
                 | UnitP
                 | ConstP of int
                 | TupleP of pattern list
                 | ConstructorP of string * pattern

datatype valu = Const of int
              | Unit
              | Tuple of valu list
              | Constructor of string * valu

fun g f1 f2 p =
  let 
    val r = g f1 f2 
  in
    case p of
      Wildcard          => f1 ()
    | Variable x        => f2 x
    | TupleP ps         => List.foldl (fn (p,i) => (r p) + i) 0 ps
    | ConstructorP(_,p) => r p
    | _                 => 0
  end

The rules for matching are:

count_wildcards()

Use the provided g function to define a function that takes a pattern and returns how many Wildcard patterns it contains:

val count_wildcards = g (fn _ => 1) (fn _ => 0)

count_wild_and_variable_lengths()

Use the provided g function to define a function that takes a pattern and returns the number of Wildcard patterns it contains plus the sum of the string lengths of all the variables in the variable patterns it contains:

val count_wild_and_variable_lengths = g (fn _ => 1) String.size

count_some_var()

Use the provided g function to define a function that takes a string-pattern pair and returns the number of times the string appears as a variable in the pattern:

val count_some_var = fn (str, pat) =>
  g (fn _ => 0) (fn x => case x = str of true => 1 | false => 0) pat

check_pat()

Write a function that takes a pattern and returns true if and only if all the variables appearing in the pattern are distinct from each other (i.e., use different strings):

val check_pat =
  let
    (* Takes a pattern and returns a list of all strings the pattern
     * uses for variables. *)
    fun get_strlst pat =
      case pat of
        Variable x => [x]
      | TupleP ps => foldl (fn (p, acc) => (get_strlst p)@acc) [] ps
      | ConstructorP(_, p) => get_strlst p
      | _ => []

    (* Takes a list of strings and returns true iff no repeated strings. *)
    fun diff_str lst =
      case lst of
        [] => true 
      | x::xs =>
          case List.exists (fn y => x = y) xs of
            false => diff_str xs
          | true => false
  in
    diff_str o get_strlst
  end

match()

Write a function that takes a valu * pattern pair and returns a (string * valu) list option, namely NONE if the pattern does not match and SOME lst where lst is the list of bindings if it does. Note that if the value matches but the pattern has no patterns of the form Variable s, then the result is SOME [].

fun match x = (* To recursively call match, avoid the anonymous function binding *)
  case x of
    (* Wildcard matches everything *)
    (_, Wildcard) => SOME []
  | (v, Variable s) => SOME [(s, v)]
    (* UnitP matches only Unit *)
  | (Unit, UnitP) => SOME []
    (* int value matches *)
  | (Const y, ConstP z) => 
      case y = z of 
      true => SOME [] 
    | false => NONE
  | (Tuple vs, TupleP ps) =>
      case List.length vs = List.length ps of
        true => all_answers match (ListPair.zip (vs, ps))
      | false => NONE
  | (Constructor(s, v), ConstructorP(t, p)) =>
      case s = t of 
        true => match(v, p)
      | false => NONE )
  | _ => NONE

first_match()

Write a function that takes a valu and a list of patterns and returns a (string * valu) list option, namely NONE if no pattern in the list matches or SOME lst where lst is the list of bindings for the first pattern in the list that matches.

fun first_match v lst =
  SOME (first_answer (fn p => match (v, p)) lst)
  handle NoAnswer => NONE

Challenge Problem: typecheck_patterns()

Write a function typecheck_patterns() of type

((string * string * typ) list) * (pattern list) -> typ option

that “type-checks” a pattern list. Types for our made-up pattern language are values of type typ, which is defined as:

datatype typ = Anything           (* any type of value is okay *)
             | UnitT              (* type of Unit *)
             | IntT               (* type for integers *)
             | TupleT of typ list (* tuple types *)
             | Datatype of string (* some named datatype *)

The first argument of type (string * string * typ) list contains elements that take the form of:

(* constructor name, datatype name, typ *)
("foo", "bar", IntT)

which means constructor foo makes a value of type datatype bar given a value of type IntT. Assume list elements all have different first fields (the constructor name), but there are probably elements with the same second field (the datatype name). typecheck_patterns() “type-check” the pattern list to see if there exists some typ t that all the patterns in the list can have. If so, return SOME t, else return NONE.

Consider a case expression with different patterns:

case x of p1 | p2 | ... | pn

The objective of this challenge exercise is to create an algorithm that, like the SML compiler2, is capable of inferring the type t of x based on the patterns p1, p2, …, pn.

My implementation here includes four helper functions. The first one, pattern_to_typ(), converts a pattern to a typ:

fun pattern_to_typ (pat, cons) =
  case pat of
    Wildcard => Anything
  | Variable _ => Anything
  | UnitP => UnitT
  | ConstP _ => IntT
  | TupleP ps =>
      (* Converts every pattern to typ in ps,
       * where foldl returns a typ list *)
      TupleT (foldl (fn (p, acc) => acc@[pattern_to_typ (p, cons)]) [] ps)
  | ConstructorP (s, p) => 
    (* Matches a ConstructorP in a list of string * string * typ pairs
     * (the constructor list) and converts it to a named Datatype *)
      let
        fun cons_to_nd cs =
          case cs of
            [] => Datatype ""
          | (x, y, z)::rest =>
              if (x = s) andalso
                 (pattern_to_typ (p, cons) = z orelse
                  pattern_to_typ (p, cons) = Anything)
              then Datatype y
              else cons_to_nd rest
      in
        cons_to_nd cons
      end

The second helper function, patlst_to_typlst(), takes a list of patterns and returns a typ list making use of the above function:

fun patlst_to_typlst (cons, pats, acc) =
  case (pats, acc) of
    ([], []) => [Datatype ""]
  | ([], _) => acc
  | (p::ps, _) => patlst_to_typlst (cons, ps, acc@[pattern_to_typ (p, cons)])

The third helper function, typlst_to_typ(), takes a typ list and returns the “most lenient” typ:

val typlst_to_typ =
  foldl
  (* Given two typs, returns an appropriate typ *)
  ( fn (t1, t2) => case (t1, t2) of
      (Anything, t2) => t2
    | (t1, Anything) => t1
    | (TupleT(x), TupleT(y)) =>
        let
          fun aux(ps, acc) =
            case ps of
              [] => TupleT(acc)
            | (p1, p2)::rest => aux(rest, acc@[typlst_to_typ([p1, p2])])
        in
          (* raises UnequalLengths with zipEq *)
          aux(ListPair.zipEq(x, y), [])
          handle UnequalLengths => TupleT([Datatype ""])
        end
    | (d1, d2) => case d1 = d2 of true => d1 | false => Datatype ""
  Anything

The last helper function, typop(), converts a typ to a typ option:

fun typop t =
  case t of
    Datatype "" => NONE
  | TupleT(typlst) =>
      let 
        fun find([]) = SOME t
          | find(head::rest) =
              case typop head of
                NONE => NONE
              | _ => find(rest)
      in
        find(typlst)
      end
  | _ => SOME t

Finally, our typecheck_patterns() function is just a wrapper:

fun typecheck_patterns (conslst, patlst) =
  (typop o typlst_to_typ o patlst_to_typlst) (conslst, patlst, [])

Notes

  1. See Robert Harper, Programming in Standard ML, 2011. 

  2. For the SML/NJ compiler’s type-checking code, please go to the GitHub directory: https://github.com/smlnj/smlnj/compiler/