Functional Parsing
We've seen several example instances of Show
(converting
a value to String
). Now we'll start implementing functions
that implement Read
values from String
s.
With calculator labs, used lots of string- (or list-) manipulating functions. In general, want to build parsers for complicated syntactic grammars in more principled ways. A lot of times, need to do much more backtracking.
As you'll see in most compilers courses, the the traditional way to write a parser is to break it up into lexing (which breaks a string into a stream of tokens) and parsing (which consumes tokens according to some language grammar). Domain-specific languages for lexers and parsers are often used to generate code that integrates with the rest of the compiler implementation (e.g. type checking, evaluation and/or code generation).
In contrast, we'll see an approach to parsing that builds and composes the kinds of functional programming building blocks we have been learning.
Parser
First cut at Parser
type, like State
:
type Parser a = String -> (a, String)
However, we'll often need to return multiple possible parses:
type Parser a = String -> [(a, String)]
And, as we've done many times before, create a new type...
data Parser a = Parser (String -> [(a, String)])
... without the run-time overhead.
newtype Parser a = Parser (String -> [(a, String)])
And we'll use records to give a name for unwrapping the function inside:
newtype Parser a = Parser { runParser :: String -> [(a, String)] }
So, we've got a "combination" of State
and List
. We'll
talk more about this another day.
Parsing Characters
char :: Char -> Parser Char
char c = Parser $ \s -> case s of
[] -> []
(a:as) -> if c == a
then [(a,as)]
else []
> runParser (char 'C') "Cool"
> runParser (char 'C') "Not Cool"
Clean this up with a cute pattern called Fokker's trick:
char c = Parser $ \s -> case s of
[] -> []
(a:as) -> [(a,as) | c == a]
Exercise: Expand out the list comprehension in terms of list-manipulating functions.
We can make this work more generally:
satisfy :: (Char -> Bool) -> Parser Char
satisfy p = Parser $ \s -> case s of
[] -> []
(a:as) -> [(a,as) | p a]
char c = satisfy (c==)
alpha = satisfy isAlpha
digit = satisfy isDigit
space = satisfy isSpace
> runParser alpha "Hello!"
> runParser alpha "!Hello!"
Parsing Strings
Okay, now let's match against sequences of Char
s (i.e. String
s).
string :: String -> Parser String
string str = Parser $ \s ->
let (t,u) = splitAt (length str) s in
if str == t then [(t,u)] else []
Again, use Fokker's trick:
[(t,u) | let (t,u) = splitAt (length str) s, str == t]
> runParser (parse "Hello") "Hello!"
> runParser (parse "Hello") " Hello!"
Parsing Tokens
A Token is a string in the source code used that has
some meaning. For example, in Haskell, the token True
refers to one of the data constructors of the Bool
datatype, and the token let
marks the beginning of
a let-expression.
The token
parser defines how to assign meaning
to a String
token:
token :: String -> a -> Parser a
token str a = Parser $ \s ->
[(a,u) | (t,u) <- runParser (string str) s]
parseTrue = token "True" True
parseFalse = token "False" False
> runParser parseTrue "True"
> runParser parseTrue "False"
Parser Combinators
We can now easily combine smaller parsers into bigger
ones using the Functor
, Applicative
, and Monad
interfaces.
instance Functor Parser where
-- fmap :: (a -> b) -> Parser a -> Parser b
fmap g (Parser f) = Parser $ \s -> [(g x, t) | (x,t) <- f s]
fmap g f = Parser $ \s -> [(g x, t) | (x,t) <- runParser f s]
We can now rewrite token
more succinctly:
token str tok = fmap (\_ -> tok) (string str)
Or, using const :: a -> b -> a
:
token str tok = fmap (const tok) (string str)
Or:
token str tok = const tok <$> string str
So far, we can parse True
or parse False
.
Want to be able to combine parsers into one for Bool
eans.
Enter monoids (in particular, Alternative
).
First need to define Applicative
.
instance Applicative Parser where
-- pure :: a -> Parser a
pure a = Parser $ \s -> [(a,s)]
-- (<*>) :: Parser (a -> b) -> Parser a -> Parser b
pab <*> pa = Parser $ \s ->
[ (f x, u) | (f,t) <- runParser pab s
, (x,u) <- runParser pa t ]
instance Alternative Parser where
-- empty :: Parser a
empty = Parser $ \s -> []
-- (<|>) :: Parser a -> Parser a -> Parser a
(Parser f) <|> (Parser g) = Parser $ \s -> f s ++ g s
Now, behold:
parseBool :: Parser Bool
parseBool = parseTrue <|> parseFalse
> runParser parseBool "TrueFalse"
> runParser parseBool "FalseTrue"
Cool!
Useful: one-or-more matches, or zero-or-more, or zero-or-one.
Control.Applicative
defines the following
(some/many are a bit clever, using pure
and (<|>)
):
some :: Alternative f => f a -> f [a]
many :: Alternative f => f a -> f [a]
optional :: Alternative f => f a -> f (Maybe a)
> runParser (some parseBool) "TrueTrue!"
> runParser (many parseBool) "TrueTrue!"
> runParser (optional parseBool) "TrueTrue!"
The Alternative
class does buy us something!
And then:
instance Monad Parser where
-- (>>=) :: Parser a -> (a -> Parser b) -> Parser b
pa >>= f = Parser $ \s ->
[ (y,u) | (x,t) <- runParser pa s
, (y,u) <- runParser (f x) t ]
Unambiguous Parse
maybeRead :: Parser a -> String -> Either String a
maybeRead p s =
case runParser p s of
[(a,"")] -> Right a
[(a,suffix)] -> Left $ "suffix not consumed: [" ++ suffix ++ "]"
[] -> Left $ "no successful parses"
_ -> Left $ "multiple successful parses"