Functional Parsing (continued)
Last time, we built up a nice simple functional Parser
library.
Though elegant, it turns out that our implementation --- lists of
potential results and backtracking --- is not the most efficient.
ReadP library
The Text.ParserCombinators.ReadP
library is a much faster
and more full-featured parser implementation.
ReadP a ~= Parser a
> :info ReadP
The type involves forall b...
. We're not going to worry about
this internal representation. ReadP
works like Parser
, but
there are no (public) data constructors.
> :info ReadS -- type alias for String -> [(a,String)]
Small differences in types/interface compared to our Parser
:
Parser | ReadP
--------------------------------------------------------------
Parser a | ReadP a
runParser | readP_to_S
char | char
satisfy | satisfy
string | string
token |
(<$>) | (<$>)
(<*>) | (<*>)
(>>=) | (>>=)
(<|>) (Alternative) | (<|>), (+++)
many (Alternative) | many
some (Alternative) | many1
optional (Alternative) | optional (works differently)
| option
| (<++) (left-biased sum)
| munch (greedy; similar to string)
| sepBy
| between
| look (lookahead efficiently)
Consider the difference between (many . satisfy)
and munch
:
> readP_to_S (R.many $ R.satisfy isAlpha) "asdf"
[("","asdf"),("a","sdf"),("as","df"),("asd","f"),("asdf","")]
> readP_to_S (R.many1 $ R.satisfy isAlpha) "asdf"
[("a","sdf"),("as","df"),("asd","f"),("asdf","")]
> readP_to_S (R.munch isAlpha) "asdf"
[("asdf","")]
Could implement munch for Parser
:
munch :: (Char -> Bool) -> Parser String
munch p = parser $ \s -> [span p s]
Example: Parsing Lists
Let's use ReadP
to write a simple parser for a simple type
of lists.
data List
= Nil
| ConsInt Int List
instance Show List where
show Nil = "[]"
show (ConsInt n ns) = show n ++ ":" ++ show ns
instance Read List where
readsPrec _ = readP_to_S parseList
parseList :: ReadP List
parseList = parseNil +++ parseCons
Start with parsing empty list:
parseNil :: ReadP List
parseNil = do
string "[]"
return Nil
Now parse cons:
parseInt :: ReadP Int
parseInt = do
n <- many1 $ satisfy isDigit
return $ read n
parseCons :: ReadP List
parseCons = do
n <- parseInt
char ':'
ns <- parseList
return $ ConsInt n ns
Okay, now let's add sugared versions:
parseSweetList :: ReadP List
parseSweetList = do
char '['
ns <- parseInt `sepBy` (char ',')
char ']'
return $ foldr ConsInt Nil ns
Clean up parseSweetList
with between
:
parseSweetList = do
ns <- between (char '[') (char ']') (sepBy parseInt (char ','))
return $ foldr ConsInt Nil ns
This works on its own, but...
parseList = parseNil +++ parseCons +++ parseSweetList
>> read "[]" :: List
*** Exception: Prelude.read: ambiguous parse
Problem is parseSweetList
might match []
, but so does parseNil
.
So, use sepBy1
instead:
ns <- isDigit `sepBy1` (char ',')
But, in any case, these different parsers accept disjoint sets of strings, so no need to run all.
parseList = parseNil <++ parseCons <++ parseSweetList
Monadic vs. Applicative parsing
In our implementations of parseNil
, parseInt
, parseCons
,
and parseSweetList
above, we used (>>=)
(via do
-notation)
to stitch together multiple parsers. But none of those
implementations actually require the full power of the Monad
interface (i.e. the ability of one parser to depend on the
result of the previous).
parseNil
, parseInt
, and parseSweetList
can be written via
the Functor
interface:
parseNil = const Nil <$> string "[]"
parseInt = read <$> (many1 $ satisfy isDigit)
parseSweetList =
foldr ConsInt Nil <$>
between (char '[') (char ']') (sepBy parseInt (char ','))
parseCons
cannot, because the pure function we want to
apply, a wrapper around ConsList
, takes one more than one
argument. Enter Applicative
:
parseCons =
(\n _ ns -> ConsInt n ns)
<$> parseInt
<*> char ':'
<*> parseList
Parser Pipelines
Having to create a lambda just to ignore some of the results
is clunky. Here is where the following Applicative
operator
comes in handy.
(<*) :: Applicative f => f a -> f b -> f a
fa <* fb = (\a _ -> a) <$> fa <*> fb
(<*) = liftA2 const
For example, rather than writing...
(\a1 _ a3 _ a5 -> e) <$> x1 <*> x2 <*> x3 <*> x4 <*> x5
... can write
(\a1 a3 a5 -> e) <$> x1 <* x2 <*> x3 <* x4 <*> x5
Notice that sprinkling in (<*>)
"keeps" the result of the
second parser whereas sprinkling in (<*)
"ignores" the result.
May also choose to forgo (<$>)
before the first argument so
that there are only two operators to look at:
pure (\a1 a3 a5 -> e) <*> x1 <* x2 <*> x3 <* x4 <*> x5
Formatting this expression with newlines results in a nice "parser pipeline", a pattern borrowed from an Elm library, albeit with different operators:
pure (\a1 a3 a5 -> e)
<*> x1
<* x2
<*> x3
<* x4
<*> x5
Now, back to parseCons
:
parseCons :: ReadP List
parseCons =
pure ConsInt
<*> parseInt
<* char ':'
<*> parseList
And, even though we don't need Applicative
,
we could choose to write parser pipelines for all
the rest, too, to create a similar feel:
parseInt :: ReadP Int
parseInt =
pure read
<*> (many1 $ satisfy isDigit)
parseCons :: ReadP List
parseCons =
pure ConsInt
<*> parseInt
<* char ':'
<*> parseList
parseSweetList :: ReadP List
parseSweetList =
pure (foldr ConsInt Nil)
<*> between (char '[') (char ']') (sepBy parseInt (char ','))