h1
More Enum Types
-
2022-02-15
Yesterday I was talking with folks on Zulip about my last blog
post, and
specifically about the potential ergonomic improvements “anonymous enums” could
provide. In my post we only really looked at anonymous enums as a potential
ergonomics improvement for Stream::merge
. But in this post I instead want to
dig ever so slightly deeper and show how symmetry is missing in the Rust
language between struct and enum variants.
So I’m writing this post to dig into that some more. We’ll be keeping it brief, focusing mostly on showing something which stood out to me. The intent is less so for me to be making recommendations about the direction the language should take, and mostly on walking you through my observations. There are times when I spend months performing rigorous analysis of things, and there’s times like these when I just want to share some thoughts. This post also serves as practice for me to edit less and publish faster.
So with all that out of the way, let’s dig in by looking at structs first.
Structs
Structs are Rust’s most common data container. They come in three flavors: empty (no fields), record (key-value), or tuple (indexed fields). All three roughly work the same, and for the sake of simplicity we’ll only be looking at tuple structs in this post. So let’s start defining one!
#[derive(Debug, Copy, Clone, PartialEq, Eq, PartialOrd, Ord)]
struct MyStruct(u32, u16);
That’s right, that’s a tuple struct. It has two fields, both of which must be populated. And it has a bunch of derives, which allow us to perform common, useful operations on it. Now let’s create one and return it from a function.
#[derive(Debug, Copy, Clone, PartialEq, Eq, PartialOrd, Ord)]
struct MyStruct(u32, u16);
// named type
fn foo() -> MyStruct {
MyStruct(100, 12)
}
If you’ve ever used a tuple-struct that should look pretty familiar, right? We created an instance of the type, and returned it from the function. But you might be noticing that this is actually quite a bit of boilerplate. IDEs can help us with the derives. But they can’t help us work around defining the type, the layout, and naming it. What if we didn’t actually care about the name? Say, if this was just a helper function in our file? For cases where naming the returned types isn’t useful, we can use tuples:
// anonymous type
fn foo() -> (u32, u16) {
(100, 12)
}
This does almost the exact same thing as the code we had before, but it’s now much shorter! Gone is the struct declaration. And gone are the derives. This works because tuples have generic impls on them which say: “if all of my members implement a trait, then so will I.” You can almost think of it as automatically deriving traits, while named structs need to declare which traits they want to derive.
Now finally, sometimes we don’t want to expose which type we return. Often
because we don’t want to provide any stability guarantees about the type we
return. We want to be able to say: “the type we return conforms to this
interface, but that’s all I’m telling you.” For example -> impl Future
was
pretty common to author before we had async/.await
(in fact async {}
generates an impl Future
type, also known as “anonymous future”). But we can
write this in our own code like so:
// type erased type
fn foo() -> impl Debug {
MyStruct(100, 12)
}
All this tells us is: “the type we’re returning here implements Debug
, and
that’s it”.
Enums
Now let’s take a look at enums. Much like structs, enums can carry values as
well (empty, record, tuple) - but instead of being AND (contain multiple
values), they give us XOR (contain an exclusive, single value) 1.
Not sure how much sense this explainer makes, but please pretend it does. In an
enum, only one variant is active at any given time. So let’s define an enum
where we either have a u32
or a u16
:
Yes, “record enums” give us XOR and then AND. But those are almost like a different kind of shorthand for having named record fields nested in enums. If we have to go into those we’ll never finish this post. So let’s not think about them too much for now.
#[derive(Debug, Copy, Clone, PartialEq, Eq, PartialOrd, Ord)]
enum MyEnum {
Left(u32),
Right(u16),
}
The way we return an enum from a function is by constructing a variant. This is often times done based on a condition. So let’s pretend we have a function which is given a condition, and computes which enum to return:
#[derive(Debug, Copy, Clone, PartialEq, Eq, PartialOrd, Ord)]
enum MyEnum {
Left(u32),
Right(u16),
}
// named type
fn foo(cond: bool) -> MyEnum {
if cond { MyEnum::Left(100) } else { MyEnum::Right(12) }
}
This will either return the number 100u32
or the number 12u16
. That makes sense right?
Anonymous enums
Now what if we wanted to simplify this, say if we were writing a quick helper. The boilerplate of defining an enum and derives might be distracting, and much like a tuple we just want to send some data along. We might imagine that we would want to write this:
// named type
fn foo(cond: bool) -> u32 | u16 { // made up syntax for u32 OR u16
if cond { 100 } else { 12 }
}
However, Rust doesn’t support this. Unlike “anonymous structs” (tuples), there is no such thing as an “anonymous enum”. There’s an open issue about it, but I don’t know anyone who’s working on it. And I don’t know of any crates which provide implementations of this either.
Something which makes it difficult to prototype “anonymous enums” is that the syntax comes in pairs: not only do we need to figure out how to define them in function return positions, we need to think about how to use them as well. The version I’ve seen which I like the most used type ascription when matching, but that’s hard to implement, and has a bunch of other lang implications as well:
match foo(cond) {
n: u32 => {},
n: u16 => {}
}
Though if someone has ideas how to prototype this, it would be fun to see what people can come up with.
Probably the closest we can get to “anonymous enums” is by using the
either crate from the ecosystem. I
believe once upon a time this crate lived in the stdlib (pre 1.0), and it
provides us with a Result
-like type which holds two generic params: one for
the left-hand variant, and one for the right-hand variant. Much like “anonymous
enums”, it allows us to bypass having to provide our own enum by having a
pre-made one.
use either::Either;
// named type
fn foo(cond: bool) -> Either<u32, u16> {
if cond { Either::Left(100) } else { Either::Right(12) }
}
In the wild, types such as
futures::future::Select
yield Either
variants when wanting to return one of two types. This works
alright when joining two different futures, but when attempting to join more
than two, you’ll end up with nested Join<Join<First, Second>, Third>
, etc
constructions which are not fun to work with.
In my opinion it would be better if Future::select
/ Future::race
would
return a single type T
, allow race
to operate over N futures
concurrently,
and make enum construction an explicit step. That way if Rust ever gets
anonymous enums, the compiler could inference we’re passing different types,
which can be unified using an anonymous enum.
let a = future::ready(1u8);
let b = future::ready("hello");
let c = future::ready(3u16);
// The compiler could infer the Future's return type T
// should be an anonymous enum, which we can subsequently
// match over
match (a, b, c).race().await {
n: u8 => {},
n: u16 => {},
n: &str => {},
}
In async-std
the Future::race
method already returns a single type T
instead of an Either
type. And for futures-concurrency
we’re likely going to
do the same 2, but instead of providing a race
method from Future
, provide a
Race
trait directly from container types (though we’re likely going to rename
the traits before we merge them. Current contender: first/First
method/trait).
The plan is to merge futures-concurrency
back into async-std
once
we figure out all the parts of it. But that’s been taking a minute, and will
likely take a bit longer. But it’s important we get this right, sooooo. Hence
the last 3 years of work.
Type-erased enums
Now, taking things one level of abstraction further: what if we have N values, but we only care about the shape of the values returned. We could imagine we could write something like this:
// type erased type (currently unsupported)
fn foo(cond: bool) -> impl Debug {
if cond { 100u32 } else { 12u16 }
}
Both of these implement Debug
, and the syntax is valid this time around. So this should work right? Unfortunately not. We get the following error if we try this:
error[E0308]: `if` and `else` have incompatible types
--> src/lib.rs:4:31
|
4 | if cond { 100u32 } else { 12u16 }
| ------ ^^^^^ expected `u32`, found `u16`
| |
| expected because of this
|
help: you could change the return type to be a boxed trait object
|
3 | fn foo(cond: bool) -> Box<dyn Debug> {
| ~~~~~~~ +
help: if you change the return type to expect trait objects, box the returned expressions
|
4 | if cond { Box::new(100u32) } else { Box::new(12u16) }
| +++++++++ + +++++++++ +
Translating what the compiler is trying to tell us here: “You cannot use impl Debug
here, please use Box<dyn Debug>
instead”. But taking the compiler up on
their suggestion forces us to allocate the value on the heap instead of the
stack, and to go through a pointer vtable to find the right value to dispatch
to. But as we saw earlier: we could also just define an enum to get the same
effect. The suggestion made by the compiler here is sensible, but not optimal.
Though the compiler is not at fault, there’s definitely something missing here.
And I suspect if type-erased enums were supported by the language, the compiler
would never have to suggest how to work around this in the first place, thus
making the issue go away entirely.
Fortunately “type-erased enums” do have a crate in the ecosystem which shows us what using this would feel like: auto_enums. By adding just a few annotations our code suddenly works:
#[auto_enum(Debug)]
fn foo(cond: bool) -> impl Debug {
if cond { 100u32 } else { 12u16 }
}
auto_enums
uses a proc macro to generate an enum for each type in
the code. We have to tell it which traits the output type should implement, but
other than that it reads fairly naturally.
Looking at the code generate by the auto_enums
crate, I suspect that this
should be fairly straight forward to implement in the compiler if we can get
lang approval for it. Which at first glance doesn’t seem like a big language
feature either. But I don’t know enough about this side of the language to
appropriately judge what gotchas might apply (const, FFI, etc. all need to be
considered). So I don’t know how difficult this might be in practice to see
through to completion. But I’d be interested to hear from any lang/compiler
folks whether my intuition here is right?
Comparing enums and structs
Now that we’ve taken a look at both structs and enums, we can capture their capabilities in a table:
Structs | Enums | Enums Fallback | |
---|---|---|---|
Named | struct Foo(.., ..) | enum Foo { .., .. } | - |
Anonymous | (.., ..) | ❌ | either crate |
Type-Erased | impl Trait | ❌ | auto_enums crate |
As you can see, it’s much easier to quickly create a struct than it is to create an enum. Fallbacks for quick enum creation exist in the ecosystem, but Even if both structs and enums can both hold rich data, the bar to using structs in Rust is lower than for enums. I would argue, significantly so.
Wrapping up
We’ve shown the differences between struct and enum definitions. And shown how the lack of anonymous and type-erased enums is worked around today using crates from the ecosystem.
My current thinking is that fixing this asymmetry would be beneficial for Rust eventually. It’s a convenient piece of polish which could make daily driving Rust nicer. But it’s nothing fundamental that needs to be addressed straight away. So I don’t think it’s anything which requires being prioritized.
Anyway, I got thinking about this when chatting with folks about this earlier, and figured I’d put my thinking into words. In part to share the idea, and in part to learn how to publish things more quickly. If you liked this post, and would like to see what I make for dinner, follow me on Twitter.
Appendix A: Examples
Structs
struct MyStruct(u32, u16);
// named type
fn foo() -> MyStruct {
MyStruct(100, 12)
}
// anonymous type
fn foo() -> (u32, u16) {
(100, 12)
}
// type erased type
fn foo() -> impl Debug {
MyStruct(100, 12)
}
Enums
enum MyEnum {
Left(u32),
Right(u16),
}
// named type
fn foo(cond: bool) -> MyEnum {
if cond { MyEnum::Left(100) } else { MyEnum::Right(12) }
}
// anonymous type (currently unsupported)
fn foo(cond: bool) -> u32 | u16 {
if cond { 100u32 } else { 12u16 }
}
// type erased type (currently unsupported)
fn foo(cond: bool) -> impl Debug {
if cond { 100u32 } else { 12u16 }
}
Appendix B: Tables
Structs | Enums | Enums Fallback | |
---|---|---|---|
Named | struct Foo(.., ..) | enum Foo { .., .. } | - |
Anonymous | (.., ..) | ❌ | either crate |
Type-Erased | impl Trait | ❌ | auto_enums crate |