Yosh is writing

More Enum Types
—2022-02-15

Yesterday I was talking with folks on Zulip about my last blog post, and specifically about the potential ergonomic improvements “anonymous enums” could provide. In my post we only really looked at anonymous enums as a potential ergonomics improvement for Stream::merge. But in this post I instead want to dig ever so slightly deeper and show how symmetry is missing in the Rust language between struct and enum variants.

So I’m writing this post to dig into that some more. We’ll be keeping it brief, focusing mostly on showing something which stood out to me. The intent is less so for me to be making recommendations about the direction the language should take, and mostly on walking you through my observations. There are times when I spend months performing rigorous analysis of things, and there’s times like these when I just want to share some thoughts. This post also serves as practice for me to edit less and publish faster.

So with all that out of the way, let’s dig in by looking at structs first.

Structs

Structs are Rust’s most common data container. They come in three flavors: empty (no fields), record (key-value), or tuple (indexed fields). All three roughly work the same, and for the sake of simplicity we’ll only be looking at tuple structs in this post. So let’s start defining one!

#[derive(Debug, Copy, Clone, PartialEq, Eq, PartialOrd, Ord)]
struct MyStruct(u32, u16);

That’s right, that’s a tuple struct. It has two fields, both of which must be populated. And it has a bunch of derives, which allow us to perform common, useful operations on it. Now let’s create one and return it from a function.

#[derive(Debug, Copy, Clone, PartialEq, Eq, PartialOrd, Ord)]
struct MyStruct(u32, u16);

// named type
fn foo() -> MyStruct {
    MyStruct(100, 12)
}

If you’ve ever used a tuple-struct that should look pretty familiar, right? We created an instance of the type, and returned it from the function. But you might be noticing that this is actually quite a bit of boilerplate. IDEs can help us with the derives. But they can’t help us work around defining the type, the layout, and naming it. What if we didn’t actually care about the name? Say, if this was just a helper function in our file? For cases where naming the returned types isn’t useful, we can use tuples:

// anonymous type
fn foo() -> (u32, u16) {
    (100, 12)
}

This does almost the exact same thing as the code we had before, but it’s now much shorter! Gone is the struct declaration. And gone are the derives. This works because tuples have generic impls on them which say: “if all of my members implement a trait, then so will I.” You can almost think of it as automatically deriving traits, while named structs need to declare which traits they want to derive.

Now finally, sometimes we don’t want to expose which type we return. Often because we don’t want to provide any stability guarantees about the type we return. We want to be able to say: “the type we return conforms to this interface, but that’s all I’m telling you.” For example -> impl Future was pretty common to author before we had async/.await (in fact async {} generates an impl Future type, also known as “anonymous future”). But we can write this in our own code like so:

// type erased type
fn foo() -> impl Debug {
    MyStruct(100, 12)
}

All this tells us is: “the type we’re returning here implements Debug, and that’s it”.

Enums

Now let’s take a look at enums. Much like structs, enums can carry values as well (empty, record, tuple) - but instead of being AND (contain multiple values), they give us XOR (contain an exclusive, single value) 1. Not sure how much sense this explainer makes, but please pretend it does. In an enum, only one variant is active at any given time. So let’s define an enum where we either have a u32 or a u16:

1

Yes, “record enums” give us XOR and then AND. But those are almost like a different kind of shorthand for having named record fields nested in enums. If we have to go into those we’ll never finish this post. So let’s not think about them too much for now.

#[derive(Debug, Copy, Clone, PartialEq, Eq, PartialOrd, Ord)]
enum MyEnum {
    Left(u32),
    Right(u16),
}

The way we return an enum from a function is by constructing a variant. This is often times done based on a condition. So let’s pretend we have a function which is given a condition, and computes which enum to return:

#[derive(Debug, Copy, Clone, PartialEq, Eq, PartialOrd, Ord)]
enum MyEnum {
    Left(u32),
    Right(u16),
}

// named type
fn foo(cond: bool) -> MyEnum {
    if cond { MyEnum::Left(100) } else { MyEnum::Right(12) }
}

This will either return the number 100u32 or the number 12u16. That makes sense right?

Anonymous enums

Now what if we wanted to simplify this, say if we were writing a quick helper. The boilerplate of defining an enum and derives might be distracting, and much like a tuple we just want to send some data along. We might imagine that we would want to write this:

// named type
fn foo(cond: bool) -> u32 | u16 { // made up syntax for u32 OR u16
    if cond { 100 } else { 12 }
}

However, Rust doesn’t support this. Unlike “anonymous structs” (tuples), there is no such thing as an “anonymous enum”. There’s an open issue about it, but I don’t know anyone who’s working on it. And I don’t know of any crates which provide implementations of this either.

Something which makes it difficult to prototype “anonymous enums” is that the syntax comes in pairs: not only do we need to figure out how to define them in function return positions, we need to think about how to use them as well. The version I’ve seen which I like the most used type ascription when matching, but that’s hard to implement, and has a bunch of other lang implications as well:

match foo(cond) {
    n: u32 => {},
    n: u16 => {}
}

Though if someone has ideas how to prototype this, it would be fun to see what people can come up with.

Probably the closest we can get to “anonymous enums” is by using the either crate from the ecosystem. I believe once upon a time this crate lived in the stdlib (pre 1.0), and it provides us with a Result-like type which holds two generic params: one for the left-hand variant, and one for the right-hand variant. Much like “anonymous enums”, it allows us to bypass having to provide our own enum by having a pre-made one.

use either::Either;

// named type
fn foo(cond: bool) -> Either<u32, u16> {
    if cond { Either::Left(100) } else { Either::Right(12) }
}

In the wild, types such as futures::future::Select yield Either variants when wanting to return one of two types. This works alright when joining two different futures, but when attempting to join more than two, you’ll end up with nested Join<Join<First, Second>, Third>, etc constructions which are not fun to work with.

In my opinion it would be better if Future::select/ Future::race would return a single type T, allow race to operate over N futures concurrently, and make enum construction an explicit step. That way if Rust ever gets anonymous enums, the compiler could inference we’re passing different types, which can be unified using an anonymous enum.

let a = future::ready(1u8);
let b = future::ready("hello");
let c = future::ready(3u16);

// The compiler could infer the Future's return type T
// should be an anonymous enum, which we can subsequently
// match over
match (a, b, c).race().await {
    n: u8 => {},
    n: u16 => {},
    n: &str => {},
}

In async-std the Future::race method already returns a single type T instead of an Either type. And for futures-concurrency we’re likely going to do the same 2, but instead of providing a race method from Future, provide a Race trait directly from container types (though we’re likely going to rename the traits before we merge them. Current contender: first/First method/trait).

2

The plan is to merge futures-concurrency back into async-std once we figure out all the parts of it. But that’s been taking a minute, and will likely take a bit longer. But it’s important we get this right, sooooo. Hence the last 3 years of work.

Type-erased enums

Now, taking things one level of abstraction further: what if we have N values, but we only care about the shape of the values returned. We could imagine we could write something like this:

// type erased type (currently unsupported)
fn foo(cond: bool) -> impl Debug {
    if cond { 100u32 } else { 12u16 }
}

Both of these implement Debug, and the syntax is valid this time around. So this should work right? Unfortunately not. We get the following error if we try this:

error[E0308]: `if` and `else` have incompatible types
 --> src/lib.rs:4:31
  |
4 |     if cond { 100u32 } else { 12u16 }
  |               ------          ^^^^^ expected `u32`, found `u16`
  |               |
  |               expected because of this
  |
help: you could change the return type to be a boxed trait object
  |
3 | fn foo(cond: bool) -> Box<dyn Debug> {
  |                       ~~~~~~~      +
help: if you change the return type to expect trait objects, box the returned expressions
  |
4 |     if cond { Box::new(100u32) } else { Box::new(12u16) }
  |               +++++++++      +          +++++++++     +

Translating what the compiler is trying to tell us here: “You cannot use impl Debug here, please use Box<dyn Debug> instead”. But taking the compiler up on their suggestion forces us to allocate the value on the heap instead of the stack, and to go through a pointer vtable to find the right value to dispatch to. But as we saw earlier: we could also just define an enum to get the same effect. The suggestion made by the compiler here is sensible, but not optimal. Though the compiler is not at fault, there’s definitely something missing here. And I suspect if type-erased enums were supported by the language, the compiler would never have to suggest how to work around this in the first place, thus making the issue go away entirely.

Fortunately “type-erased enums” do have a crate in the ecosystem which shows us what using this would feel like: auto_enums. By adding just a few annotations our code suddenly works:

#[auto_enum(Debug)]
fn foo(cond: bool) -> impl Debug {
    if cond { 100u32 } else { 12u16 }
}

auto_enums uses a proc macro to generate an enum for each type in the code. We have to tell it which traits the output type should implement, but other than that it reads fairly naturally.

Looking at the code generate by the auto_enums crate, I suspect that this should be fairly straight forward to implement in the compiler if we can get lang approval for it. Which at first glance doesn’t seem like a big language feature either. But I don’t know enough about this side of the language to appropriately judge what gotchas might apply (const, FFI, etc. all need to be considered). So I don’t know how difficult this might be in practice to see through to completion. But I’d be interested to hear from any lang/compiler folks whether my intuition here is right?

Comparing enums and structs

Now that we’ve taken a look at both structs and enums, we can capture their capabilities in a table:

StructsEnumsEnums Fallback
Namedstruct Foo(.., ..)enum Foo { .., .. }-
Anonymous(.., ..)either crate
Type-Erasedimpl Traitauto_enums crate

As you can see, it’s much easier to quickly create a struct than it is to create an enum. Fallbacks for quick enum creation exist in the ecosystem, but Even if both structs and enums can both hold rich data, the bar to using structs in Rust is lower than for enums. I would argue, significantly so.

Wrapping up

We’ve shown the differences between struct and enum definitions. And shown how the lack of anonymous and type-erased enums is worked around today using crates from the ecosystem.

My current thinking is that fixing this asymmetry would be beneficial for Rust eventually. It’s a convenient piece of polish which could make daily driving Rust nicer. But it’s nothing fundamental that needs to be addressed straight away. So I don’t think it’s anything which requires being prioritized.

Anyway, I got thinking about this when chatting with folks about this earlier, and figured I’d put my thinking into words. In part to share the idea, and in part to learn how to publish things more quickly. If you liked this post, and would like to see what I make for dinner, follow me on Twitter.


Appendix A: Examples

Structs

struct MyStruct(u32, u16);

// named type
fn foo() -> MyStruct {
    MyStruct(100, 12)
}

// anonymous type
fn foo() -> (u32, u16) {
    (100, 12)
}

// type erased type
fn foo() -> impl Debug {
    MyStruct(100, 12)
}

Enums

enum MyEnum {
    Left(u32),
    Right(u16),
}

// named type
fn foo(cond: bool) -> MyEnum {
    if cond { MyEnum::Left(100) } else { MyEnum::Right(12) }
}

// anonymous type (currently unsupported)
fn foo(cond: bool) -> u32 | u16 {
    if cond { 100u32 } else { 12u16 }
}

// type erased type (currently unsupported)
fn foo(cond: bool) -> impl Debug {
    if cond { 100u32 } else { 12u16 }
}

Appendix B: Tables

StructsEnumsEnums Fallback
Namedstruct Foo(.., ..)enum Foo { .., .. }-
Anonymous(.., ..)either crate
Type-Erasedimpl Traitauto_enums crate