User-Defined Types
We've seen how lists are built with []
and (:)
, and
deconstructing with patterns using []
and (:)
. Now
we'll see a general way for users to define new types
that have one or more kinds of values.
Algebraic Datatypes (ADTs)
As an example, currency converter. For simplicity, only two kinds of currency.
data Currency
= USD Float -- US Dollars
| JPY Float -- Japanese Yen
deriving (Eq, Show)
The data
keyword defines a new algebraic data type (ADT),
or simply datatype, in this case called Currency
. Values
of type Currency
are either the data constructor called
USD
and a single Float
or the data constructor JPY
and
a single Float
. The smallest demonation of US currency, one
cent, is one-hundredth of a dollar; the smallest demonination
of Japanese currency is one yen. Note that we choose to
track JPY
with Float
s to improve precision; they will be
rounded to Integer
values later.
The deriving (Eq, Show)
clause instructs Haskell to automatically
implement a function called show :: Currency -> String
which is
described by the Show
type class, and an equality function
(==) :: Currency -> Currency -> Bool
which described by the
Eq
type class. We'll talk more about these later.
Try the interactions below with and without the deriving
clause.
> USD 11.99
> JPY 1353
> USD 11.99 == USD 11.99
> USD 11.99 == JPY 1353
To extract data wrapped by data constructors, we use patterns involving the data constructors and variables. For example, the following function allows adding two values in either currency.
yenPerDollar = 112.86 -- as of 9/27/17
dollarsPerYen = 1 / yenPerDollar
add :: Currency -> Currency -> Currency
add (USD d1) (USD d2) = USD (d1 + d2)
add (JPY y1) (JPY y2) = JPY (y1 + y2)
add (USD d) (JPY y) = USD (d + y * dollarsPerYen)
add (JPY y) (USD d) = JPY (y + d * yenPerDollar)
Note that any amount has two valid representations; the last two equations choose to retain the base currency of the first argument.
Data Constructors
In general, a datatype definition contains one or more data
constructors, beginning with capital letters (A
, B
, C
, D
)
below. The name of the new type (T
) below must also start with a
capital letter.
data T
= A
| B Int
| C Bool String
| D (Bool, String)
deriving (Eq, Show)
Each data constructor wraps zero or more values; A
does not
carry any additional data, B
carries a single Int
, C
carries
one Bool
and one String
, D
carries one pair containing a Bool
and an Int
.
Data constructors are functions that, when applied to arguments of
the specified types, produce values of the datatype (in this case, T
).
> :t A
> :t B
> :t C
> :t D
Pattern Matching
The data constructors of a datatype, along with variables (starting with
lowercase letters) and the wildcard pattern _
, can be used to match
and destruct values of a datatype.
foo :: T -> Int
foo A = 0
foo (B i) = 1
foo (C b s) = 2
foo (D (b, s)) = 3
At run-time, a variable pattern (of some type S
) matches any value
(of type S
). A data constructor pattern D p0 ... pn
(of some type S
)
matches only those values (of type S
) that are constructed with
D
and whose n+1 values match the patterns p0
through pn
, respectively.
The wildcard pattern _
, like a variable, matches any value but does
not introduce a name for it.
Even though a datatype constructor (e.g. `C) may take multiple arguments and, thus, may be partially applied like other functions, all data constructors must be "fully applied" to enough variables and patterns according to its definition.
Case Expressions
The definition of foo
above defines multiple equations (in this
case, one for each data constructor) to handle all possible argument
values of type T
. Another way to destruct values of type T
is
to use a case
expression.
foo :: T -> Int
foo t =
case t of
A -> 0
B i -> 1
C b s -> 2
D (b, s) -> 3
Each branch of case
expression must return the same type of
value (in this case, Int
). The equations in the original
version of foo
above are syntactic sugar for this version
using case
.
Polymorphic ADTs
A data
definition can refer to refer to one or more (lowercase)
type variables (a
, b
, etc. below).
data T a b c ... = ...
The type variables can be referred to in the types of the
data constructors. To see an example, recall
our head
and tail
functions from before.
head :: [a] -> a
head (x:xs) = x
tail :: [a] -> [a]
tail (x:xs) = xs
By omitting equations for the empty list, these functions implicitly fail (they crash with inexhaustive pattern match errors at run-time).
Versions with more explicit failure:
head :: [a] -> a
head [] = undefined
head (x:xs) = x
tail :: [a] -> [a]
tail [] = undefined
tail (x:xs) = xs
> :t undefined
Whoa! undefined
has every type?
This is only okay because undefined
throws an exception at
run-time, halting program execution at that point.
This isn't much better than before. Using undefined
is,
however, a cool trick in the type system that allows exceptions to
be raised anywhere in the code, while still accurately typechecking
other expressions. Using undefined
is often helpful during
program development, allowing you to put "placeholders" for
expressions that you haven't finished yet while still allowing
the rest of the program to type check and run.
Now, a better way than crashing to represent failure.
data Maybe a
= Nothing
| Just a
deriving (Eq, Show)
Notice that Maybe
is polymorphic; it takes one type
variable a
. A value of type Maybe a
is either
Nothing
with no additional data, or Just
with a
single value of type a
.
> :t Just True
> :t Just 1
> :t Just (+)
> :t Nothing
> :t Nothing :: Maybe Bool
> :t Nothing :: Maybe (Maybe (Maybe Bool))
We can use Maybe
values to deal with "errors" much more
effectively.
maybeHead :: [a] -> Maybe a
maybeHead [] = Nothing
maybeHead (x:xs) = Just x
maybeTail :: [a] -> Maybe [a]
maybeTail [] = Nothing
maybeTail (x:xs) = Just xs
This allows (requires) callers of these functions to handle each of the two cases explicitly.
The Maybe
type is extremely useful and is defined
in the standard Prelude
. Its cousin Either
is also
quite useful.
Recursive ADTs
Datatypes can also be recursive.
data List a
= Nil
| Cons a (List a)
deriving (Eq, Show)
What does this type remind you of?
Built-In ADTs
Lists
The built-in lists are just like the List
type above, but
with special syntax:
[]
and(:)
, rather than names starting with capital latters, are data constructors,- the expression
[1,2,3,4]
is syntactic sugar for nested calls to(:)
, - the type is called
[]
, rather than a name starting with a capital letter, and - the type
[a]
is syntactic sugar for[] a
.
For example:
> :t []
> :t [] :: [a]
> :t [] :: [] a
> :t "abc"
> :t "abc" :: [Char]
> :t "abc" :: [] Char
> let (x:xs) = [1, 2, 3]
> let ((:) x xs) = [1, 2, 3]
Tuples
data Pair a b = Pair a b deriving (Eq, Show)
data Triple a b c = Triple a b c deriving (Eq, Show)
data Tuple4 a b c d = Tuple4 a b c d deriving (Eq, Show)
Built-in pairs are just like these datatypes, again with special syntactic support.
> :t ('a', 2 :: Int)
> :t (,) 'a' 2 :: (,) Char Int
> :t (,)
> :t (,,)
> :t (,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,)
> let (a, b, c) = (1, 2, 3)
> let ((,,) a b c) = (1, 2, 3)
Unit
Sometimes it is useful to have a dummy value.
data Unit = Unit deriving (Eq, Show)
The built-in unit value and type are just like this datatype, again with special syntactic support.
> :t ()
> :t () :: ()
Booleans
Even booleans can be (and are) defined as a datatype.
data Bool
= False
| True
deriving (Eq, Show)
If-expressions, multi-way if-expressions, and guards are syntactic
sugar for pattern matching on Bool
s.
absoluteValue n
| n >= 0 = n
| otherwise = -n
-- source file {-# LANGUAGE MultiWayIf #-}
-- ghc command-line option -XMultiWayIf
-- ghci :set -XMultiWayIf
absoluteValue n =
if | n >= 0 -> n
| otherwise -> -n
absoluteValue n =
if n >= 0 then n
else if otherwise then -n
else undefined
absoluteValue n =
case n >= 0 of
True -> n
False ->
case otherwise of
True -> -n
False -> undefined
Type Aliases vs. Wrapper Types
Recall the currency example above. We were careful to use d
s
for dollar Float
s and y
s for yen Float
s, but it's easy
to accidentally, for example, add these two kinds of Float
s,
leading to a non-sensical, and probably hard-to-detect-and-debug,
result.
One approach might be to define type aliases.
type Dollars = Float
type Yen = Float
But remember, type aliases do not create new types, just synonyms.
So, Dollars
and Yen
, which are synonymous with Float
and
with each other, can still be misused.
Another option is to define wrapper (or "box") types with
single data constructors, for the purpose of forcing the
programmer to construct and destruct the different Float
types, with the usual support from the typechecker to prevent
errors.
data Currency
= USD Dollars -- US Dollars
| JPY Yen -- Japanese Yen
deriving (Eq, Show)
data Dollars = DollarsFloat Float deriving (Eq, Show)
data Yen = YenFloat Float deriving (Eq, Show)
Note, it is common to see definitions like the following:
data Dollars = Dollars Float deriving (Eq, Show)
data Yen = Yen Float deriving (Eq, Show)
Notice how the type and data constructor names are the same; this is not a problem, because types and expressions "live" in different places in the language syntax, so there's never any ambiguity.
Now, we redefine add
as follows, in terms of several helper
functions that significantly reduce the chances that we
mistreat the actual Float
values.
add :: Currency -> Currency -> Currency
add (USD d1) (USD d2) = USD (addDollars d1 d2)
add (JPY y1) (JPY y2) = JPY (addYen y1 y2)
add (USD d) (JPY y) = USD (addDollars d (convertYenToDollars y))
add (JPY y) (USD d) = JPY (addYen y (convertDollarsToYen d))
addDollars :: Dollars -> Dollars -> Dollars
addDollars (Dollars d1) (Dollars d2) = Dollars (d1 + d2)
addYen :: Yen -> Yen -> Yen
addYen (Yen y1) (Yen y2) = Yen (y1 + y2)
convertYenToDollars :: Yen -> Dollars
convertYenToDollars (Yen y) = Dollars (y * dollarsPerYen)
convertDollarsToYen :: Dollars -> Yen
convertDollarsToYen (Dollars d) = Yen (d * yenPerDollar)
newtype
Data constructors tag, or label, the values they carry in order to distinguish them from values created with different data constructors for the same type. As we have seen, it is sometimes useful to define a new datatype even with only one data constructor. In such cases, tagging and untagging (or constructing and destructing, or boxing and unboxing) values is useful for enforcing invariants while programming, but these operations add unnecessary run-time overhead: there is only one kind of value, so they ought to be free of labels.
Haskell allows datatypes with exactly one constructor (such as Dollars
and Yen
), and which carries exactly one value, to be defined with the
keyword newtype
in place of data
, such as
newtype Box a = Box a
For the purposes of programming, newtype
is almost exactly the same as
data
. But it tells the compiler to optimize the generated code by not
including explicit Box
labels at run-time. We will get into the habit of using
newtype
whenever we define a datatype with one constructor, without delving into
the subtle differences between using newtype
and data
.
Record Types
When data constructors carry multiple values, it can be hard to remember which components are meant to represent what (especially when multiple components have the same types). Furthermore, it can be tedious to write accessor functions that retrieve particular components carried by a data constructor.
Haskell offers an embellishment of data constructors that addresses
these two issues. Consider the following definition where the
data constructors A
and B
use record syntax to give names
to its data values.
data ABCD
= A { foo :: String, bar :: Int }
| B { foo :: String, baz :: () }
| C Int
| D
All data constructors can be used to construct values as before.
> :t A
> :t B
> :t C
> :t D
But now, record construction can also be used to create A
and
B
values.
> A { foo = "hello", bar = 17 }
> B { foo = "hello", baz = () }
> B { baz = (), foo = "hello" }
In addition, accessor functions have been automatically generated.
> :t foo
> :t bar
> :t baz
Furthermore, there is a way to create a new record by copying all of its fields except for some.
> let a = A { foo = "hello"; bar 17 }
> a { foo = "goodbye" }
Should the following be an acceptable data
definition?
data Data = One { data :: Int } | Two { data :: Bool }
Wrapper Types (Redux)
As we meet more of the type classes that are central to Haskell's design, we
will often create wrapper types (i.e. with one data constructor). Hence, we will
use newtype
. Furthermore, we will often write expressions of the form
Box . doSomething . unbox
to unwrap (unbox
), transform (doSomething
), and rewrap (Box
) values. Hence, we
will use records so that unwrapping functions are generated automatically.
newtype Box a = Box { unbox :: a }