4 Tagged Unions
In addition to struct, enum, and union, Cyclone
has tunion (for ``tagged union'') and xtunion (for
``extensible tagged union'') as ways to construct new aggregate types.
Like a union type, each tunion and xtunion has a
number of variants (or members). Unlike with union,
an object of a tunion or xtunion type is exactly one
variant, we can detect (or discriminate) that variant at run-time, and
the language prevents using an object as though it had a different
variant.
The difference between tunion and xtunion is that
tunion is closed---a definition lists all possible variants.
It is like the algebraic datatypes in ML. With xtunion,
separately compiled files can add variants, so no code can be sure
that it knows all the variants. There is a rough analogy with not
knowing all the subclasses of a class in an object-oriented language.
For sake of specificity, we first explain how to
create and use tunion types. We then
explain xtunion by way of contrast with
tunion. Because the only way to read parts of tunion
and xtunion types is pattern-matching, it is hard to understand
tunion without pattern-matching, but for sake of motivation and
completeness, some of the examples in the explanation of
pattern-matching use tunion! To resolve this circular
dependency, we will informally explain pattern-matching as we use it
here and we stick to its simplest uses.
4.1 tunion
Basic Type Declarations and Subtyping
[Warning: For expository purposes, this section contains a
white lie that is exposed in the later section called ``regions for
tunion''.]
A tunion type declaration lists all of its variants. At its
simplest, it looks just like an enum declaration. For example,
we could say:
tunion Color { Red, Green, Blue };
As with enum, the declaration creates a type (called
tunion Color) and three constants Red, Green, and
Blue. Unlike enum, these constants do not have type
tunion Color. Instead, each variant has its own type,
namely tunion Color.Red, tunion Color.Green, and
tunion Color.Blue. Fortunately these are all subtypes of
tunion Color and no explicit cast is necessary. So you can
write, as expected:
tunion Color c = Red;
In this simple example, we are splitting hairs, but we will soon find
all these distinctions useful. Unlike enum, tunion
variants may carry any fixed number of values, as in this example:
tunion Shape {
Point,
Circle(float),
Ellipse(float,float),
Polygon(int,float),
};
A Point has no accompanying information, a Circle has a
radius, an Ellipse has two axis lengths, and a (regular)
Polygon has a number of sides and a radius. (The value fields
do not have names, so it is often better style to have a variant carry
one value of a struct type, which of course has named members.) This
example creates five types: tunion Shape,
tunion Shape.Point, tunion Shape.Circle,
tunion Shape.Ellipse, and tunion Shape.Polygon. Like in
our previous example, tunion Shape.Point is a subtype of
tunion Shape and Point is a constant of
type tunion Shape.Point.
Variants that carry one or more values are treated differently.
Circle becomes a constructor; given a float it produces an
object of type tunion Shape.Circle, for example Circle(3.0).
Similarly, Ellipse(0,0) has type tunion Shape.Ellipse
(thanks to implicit casts from int to float for 0) and
Polygon(7,4.0) has type tunion Shape.Polygon. The
arguments to a constructor can be arbitrary expressions of the correct
type, for example, Ellipse(rand(), sqrt(rand())).
The second difference is that value-carrying variant types (e.g.,
tunion Shape.Circle) are not subtypes of the tunion type
(e.g., tunion Shape). Rather non-null pointers to the
value-carrying variant types are (e.g., tunion Shape.Circle @`H
is a subtype of tunion Shape). So the following are correct
initializations that use implicit subtyping:
tunion Shape s1 = Point;
tunion Shape s2 = new Circle(3.0);
tunion types are particularly useful for building recursive
structures. For example, a small language of arithmetic expressions
might look like this:
enum Unops { Negate, Invert};
enum Binops { Add, Subtract, Multiply, Divide };
tunion Exp {
Int(int),
Float(float),
Unop(enum Unops, tunion Exp),
Binop(enum Binops, tunion Exp, tunion Exp)
};
A function returning an expression representing the multiplication of
its parameter by two could like this:
tunion Exp double_exp(tunion Exp e) {
return new Binop(Multiply, e, new Int(2));
}
Accessing tunion Variants
Given a value of a tunion
type, such as tunion Shape, we do not know which variant it is.
For non-value variants, we can use a standard comparison. Continuing
the example from above, ``s1 == Point'' would be true whereas
``s2 == Point'' would be false.
Analogous comparisons would not work for value-carrying variants
because these variants are pointers. Rather than provide predicates
(perhaps of the form isCircle(s1)), Cyclone requires
pattern-matching. For example, here is how you could define
isCircle:
bool isCircle(tunion Shape s) {
switch(s) {
case &Circle(r): return true;
default: return false;
}
}
When a switch statement's argument has a tunion type,
the cases describe variants. One variant of tunion Shape is a
pointer to a Circle, which carries one value. The
corresponding pattern has & for the pointer, Circle for
the constructor name, and one identifier for each value carried by
Circle. The identifiers are binding occurrences (declarations,
if you will), and the initial values are the values of the fields of
the Circle at which s points. The scope is the extent
of the case clause. Pattern-matching works for non-value variants
too, but there is no & because they are not pointers.
Here is another example:
[The reader is asked to indulge compiler-writers who have
forgotten basic geometry.]
extern area_of_ellipse(float,float);
extern area_of_poly(int,float);
float area(tunion Shape s) {
float ans;
switch(s) {
case Point:
ans = 0;
break;
case &Circle(r):
ans = 3.14*r*r;
break;
case &Ellipse(r1,r2):
ans = area_of_ellipse(r1,r2);
break;
case &Polygon(sides,r):
ans = area_of_poly(sides,r);
break;
}
return ans;
}
The cases are compared in order against s. The following are
compile-time errors:
-
It is possible that a member of the tunion type matches
none of the cases. Note that default matches everything.
- A case is useless because it could only match if one of the
earlier cases match. For example, a default case at the end of the
switch in area would be an error.
We emphasize that Cyclone has much richer pattern-matching support
than we have used here.
Implementation
Non-value variants are translated to distinct small integers. Because
they are small, they cannot be confused with pointers to
value-carrying variants. Value-carrying variants have a distinct
integer tag field followed by fields for the values carried. Hence
all values of a tunion type occupy one word, either with a small
number or with a pointer.
Regions for tunion
We have seen that non-null pointers to
value-carrying variants are subtypes of the tunion type. For
example, tunion Shape.Circle @`H is a subtype of
tunion Shape. Because tunion Shape.Circle @`H is a
pointer into the heap, it would seem that all values of type
tunion Shape are either non-value variants or pointers into
the heap. In fact, this is true, but only because tunion
Shape is itself shorthand for
tunion `H Shape.
In other words, tunion types are region-polymorphic over the region
into which the value-carrying variants point. An explicit region
annotation goes after tunion, just like an explicit region annotation
goes after * or @. Here is an example using a stack
region:
tunion Shape.Circle c = Circle(3.0);
tunion _ Shape s = &c;
The _ is necessary because we did not give an explicit name to
the stack region.
We can now correct the white lie from the ``basic type declarations and
subtyping'' section. A declaration tunion Foo {...}
creates a
type constructor which given a region creates a type. For any region
`r, tunion `r Foo is a subtype of
tunion Foo.Bar @`r if tunion Foo.Bar carries values. If
tunion Foo.Bar does not carry values, then it is a subtype of
tunion `r Foo for all `r.
In the future, we may make the implied region for tunion Foo
depend on context, as we do with pointer types. For now,
tunion Foo is always shorthand tunion `H Foo.
Polymorphism and tunion
A tunion declaration may be
polymorphic over types and regions just like a struct
definition (see the section on
polymorphism). For example, here is a
declaration for binary trees where the leaves can hold some
BoxKind `a:
tunion <`a> Tree {
Leaf(`a);
Node(tunion Tree<`a>, tunion Tree<`a>);
};
In the above example, the root may be in any region, but all children
will be in the heap. This version allows the children to be in any
region, but they must all be in the same region. (The root can still
be in a different region.)
tunion <`a,`r::R> Tree {
Leaf(`a);
Node(tunion `r Tree<`a,`r>, tunion `r Tree<`a,`r>);
};
Existential Types
[This feature is independent of
the rest of tunion's features and can be safely ignored when first
learning Cyclone.]
In addition to polymorphic tunion types, it is also possible to
parameterize individual variants by additional type variables. (From
a type-theoretic point of view, these are existentially-quantified
variables.) Here is a useless example:
tunion T { Foo<`a>(`a, `a, int), Bar<`a,`b>(`a, `b), Baz(int) };
The constructors for variants with existential types are used the same
way, for example Foo("hi","mom",3), Foo(8,9,3), and
Bar("hello",17) are all well-typed. The compiler checks that
the type variables are used consistently---in our example, the first
two arguments to Foo must have the same type. There is no need
(and currently no way) to explicitly specify the types being used.
Once a value of an existential variant is created, there is no way to
determine the types at which it was used. For example,
Foo("hi","mom",3) and Foo(8,9,3) both have type, ``there
exists some `a such that the type is Foo<`a>''. When
pattern-matching an existential variant, you must give an explicit
name to the type variables; the name can be different from the name in
the type definition. Continuing our useless example, we can write:
void f(tunion T t) {
switch(t) {
case Foo<`a>(x,y,z): return;
case Bar<`b,`c>(x,y): return;
case Baz(x): return;
}
}
The scope of the type variables is the body of the case clause. So in
the first clause we could create a local variable of type `a
and assign x or y to it. Our example is fairly
``useless'' because there is no way for code to use the values of
existentially quantified types. In other words, given
Foo("hi","mom",3), no code will ever be able to use the strings
"hi" or "mom". Useful examples invariably use function
pointers. For a realistic library, see fn.cyc in the distribution.
Here is a smaller (and sillier) example; see the section on region and
effects for an explanation of why the `e stuff is necessary.
int f1(int x, int y) { return x+y; }
int f2(string x, int y) {printf("%s",x); return y; }
tunion T<`e::E> { Foo<`a>(`a, int f(`a, int; `e)); };
void g(bool b) {
tunion T<{}> t;
if(b)
t = Foo(37,f1);
else
t = Foo("hi",f2);
switch(t) {
case Foo<`a>(arg,fun):
`a x = arg;
int (*f)(`a,int;{}) = fun;
f(arg,19);
break;
}
}
The case clause could have just been fun(arg)---the compiler
would figure out all the types for us. Similarly, all of the explicit
types above are for sake of explanation; in practice, we tend to rely
heavily on type inference when using these advanced typing constructs.
Future
-
Currently, given a value of a variant type (e.g.,
tunion Shape.Circle), the only way to access the fields is
with pattern-matching even though the variant is known. We may
provide a tuple-like syntax in the future.
- If a tunion has only one value-carrying variant, it does
not need a tag field in its implementation. We have not yet
implemented this straightforward optimization.
4.2 xtunion
We now explain how an xtunion type differs from a tunion
type. The main difference is that later declarations may continue to
add variants. Extensible datatypes are useful for allowing clients to
extend data structures in unforeseen ways. For example:
xtunion Food;
xtunion Food { Banana; Grape; Pizza(list_t<xtunion Food>) };
xtunion Food { Candy; Broccoli };
After these declarations, Pizza(new List(Broccoli, null)) is a
well-typed expression.
If multiple declarations include the same variants, the variants must
have the same declaration (the number of values, types for the values,
and the same existential type variables).
Because different files may add different variants and Cyclone
compiles files separately, no code can know (for sure) all the
variants of an xtunion. Hence all pattern-matches against a
value of an xtunion type must end with a case that matches
everything, typically default.
There is one built-in xtunion type: xtunion exn is the
type of exceptions. Therefore, you declare new xtunion exn
types like this:
xtunion exn {BadFilename(string)};
The implementation of xtunion types is very similar to that of
tunion types, but non-value variants cannot be represented as
small integers because of separate compilation. Instead, these
variants are represented as pointers to unique locations in static
data. Creating a non-value variant still does not cause allocation.