måndag 1 februari 2010

Clojure and monads

Edit: In order to make this post at least slightly comprehensible I reedited great parts of it since my first try.
Disclaimer: Since this post is based on Jim Duey's great collection of articles on monads in Clojure and is heavily inspired by this great lecture by Dr. Erik Meijer I write this post in English.

Introduction

I have for some time been fascinated by monads but haven't managed to get my head around them. After experimenting some on a recursive descent parser using monads my head is at least getting in some kind of reasonably stable orbit around monads. In this post I am working with one monad, or really a combination of two monads: The state monad and the maybe monad. There are lots of other monads, but I will only cover this combined monad.
This monad can be used to take care of much infrastructure of the parsing. Here a Parser is defined as a function that takes a string as single argument and returns either a pair containing the parsed result and the rest of the string or nil if it cannot parse the content. Say we want a number-parser it has to have the following results:
(number-parser "123abc456") => (123 "abc456")
(number-parser "abc123456") => nil
The infrastructure referred to above is that we can construct parsers being combinations of other parsers. That type of combination will be the task of the domonad. Say we have a number parser and a plus sign parser which tries to find a single "+". Then we can make a simple addition parser:
(def addition-parser
(domonad
[first-term number-parser
plus-sign plus-parser
rest-terms addition-parser]
(+ first-term rest-terms)))
The left hand side stuff in the first argument to domonad are the results from the parsers on the right hand side. The second argument is simply an expression with the results. Note that the first-term is the result of the number-parser, not the parser itself!
(addition-parser "1+2+3abc456") => (6 "abc456")
(number-parser "1+2+abc123456") => nil
So, the monads do quite a lot for us: They store the parsed values (that's the state-monad) and it gracefully exits and return nil if any of its components are nil (that's the maybe-monad). The nice thing now is that the domonad can combine both parsers we have written ourself and parsers created with the domonad function.
A second way to combine parsers is to have a fallback. Say we want to parse a number or a variable if it is not a number. Then we can create the parser
(def number-or-variable-parser (m-plus number-parser variable-parser))
The m-plus is provided by the monad framework.

What I did

To understand how to build a parser from Duey's recipies I decided to make a calculator parser. Firstly I followed Dr. Meijers methods as closely as I could (they are in Haskel) but I ran into problems when I added the minus operator and the division operator. Dr. Meijer calculates sum as usual in recursion, by adding the first element with the sum of the rest: a+b+c+d => a+(b+(c+d)) Adding also minus to this we run into problems: a-b+c+d => a-(b+(c+d))=a-b-c-d (and the same for multiplication and division) which is not what we want. I spent one night trying to figure out how to expand recursively to a+b+c+d => ((a+b)+c)+d in order for minus to work. Unfortunately this provided me neither with sleep nor result. If you know, please tell me how to do that elegantly!

I cheated a little and wrote it non-recursivelly so perhaps it is only a semi-recursive descent parser. What I did was that for a+b+c+d I took the first a and applied the functions (+b, +c, +d) to it.

It is probably easiest to show the code and explain it to them who bothers to scroll down past it:
(ns com.yourcompany.monad-test
(:use clojure.contrib.monads)
(:use clojure.contrib.str-utils))

(with-monad (state-t maybe-m)

(defn optional [parser]
(m-plus parser (m-result nil)))

(def one-or-more)

(defn none-or-more [parser]
(optional (one-or-more parser)))

(defn one-or-more [parser]
(domonad
[a parser
as (none-or-more parser)]
(cons a as)))

(defn is-match [pattern]
"Creates a parser that takes what matches the regex pattern"
(let [re (re-pattern (str "^\\s*(" pattern ")"))]
(fn [strn]
(let [[a b] (re-find re strn)]
(if a
(list b (. strn (substring (count a))))
nil)))))

(defn is-one-of [& strs]
"Creates a parser that takes one of the strings given"
(is-match (str-join "|" (map #(str "\\Q" %1 "\\E") strs))))

(defn signed [parser]
"Flips the sign of parsed value if it is prefixed with a minus sign"
(domonad
[sign (optional (is-one-of "-"))
number parser]
(if sign (- number) number)))

(def parse-number
(domonad
[n (is-match "\\d+(\\.\\d+)?")]
(Double/parseDouble n)))

(def parse-expr)

(defn end [b]
"Returns a closing bracket or nil."
((apply hash-map (map str (seq "(){}[]"))) b))

(def parse-atom
(m-plus
parse-number
(domonad
[lb (is-one-of "(" "{" "[")
r parse-expr
rb (is-one-of (end lb))]
r)))

(defn extract [opss bottom-parser]
(let [[ops & inner-ops] opss
inner-parser (if inner-ops
(extract inner-ops bottom-parser)
bottom-parser)]
(domonad
[f (signed inner-parser)
t (none-or-more
(domonad
[op (apply is-one-of (keys ops))
e inner-parser]
#((ops op) % e)))]
(if (empty? t) f ((apply comp (reverse t)) f)))))

(def parse-expr
(extract [{"+" + "-" -}
{"*" * "/" / "%" mod "\\" quot}
{"^" #(Math/pow %1 %2)}]
parse-atom))

(defn calculate [strn]
(let [r (parse-expr strn)]
(if r (first r) "Error"))))
Which works as desired:
(calculate "-(- 21.66)+[2^(2*3-{{{ { 4 } }}})]/3") => 22.993333333333332
The functions not commented or explained in Duey's tutorials are:
parse-atom: parses the smalest part of the expressions. I.e. a number or a new expression within brackets. And yes, atom is a bad name for it.
extract: a poorly named function that costructs a parser parsing smaller and smaller parts depending of the operators order: expression to terms (+, -), to factors (*, /), to parts in a power expression (^). When there are no sub-parts available, parse the atoms.
parse-expr: utilizes the extract function to render a parser for any expression.
calculate: parses the expression and returns result or error.

Note that the parser deals with a few special cases which makes the code a little less tractable:
  • leading negative terms: Ex: -2+3
  • any kind of matching brackets (), {}, []
  • easy to add functions and orders of execution
  • spaces