Skip to content

We should add ref bindings to Carbon, paralleling reference expressions #5261

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
chandlerc opened this issue Apr 8, 2025 · 20 comments
Open
Labels
leads question A question for the leads team

Comments

@chandlerc
Copy link
Contributor

chandlerc commented Apr 8, 2025

Summary

Carbon already has reference expressions. We should add the ability to declare and match a binding to a reference expression.

The core idea would be:

fn F(ptr: i32*) {
  // A reference binding `x`.
  let ref x: i32 = *ptr;

  // Use of `x` is a reference expression that refers to the same object as `*ptr`.
  x += 1;
}

Key suggestion highlights:

  • The keyword ref is used for this kind of binding.
  • Remove addr, and use this for the object parameter.
  • Allow this for any parameter (or nested within let, etc.).
  • Reference binding names, when used later on form reference expressions.
  • No ability to take the address of the reference itself, &x == ptr above.
  • No ability to use this as a class or struct field.

Background

Reference bindings have come up multiple times:

They also closely match the expression category.

How addresses interact with ref

The suggested model is that ref bindings mirror reference expressions in that they refer back to some underlying object. As a consequence, it should be possible to take the address of a ref binding and get the address of that object.

However, we expect reference expressions and as consequence ref bindings to work more like Swift inout than like a pointer: there may be implicit copies or moves that occur prior to forming the reference expression, or binding it to a name. The goal is that it should be possible for some types to implement ref parameters through move-in / move-out semantics.

When we have a ref binding specifically, we expect its address to be stable for the lifetime of the binding. And there is no valid move-in/move-out semantic model for overlapping bindings -- those must all reference the same underlying object, and the address of those must all match in addition to being stable. But for non-overlapping bindings such as parameters, a move-in/move-out model should be equally valid from the perspective of the ref binding, and the address within the function might be different from the address in the caller.

At least in cases where a type permits move-in/move-out, the address of a ref parameter should be implicitly nocapture in LLVM's semantic model for example. Whether we go further and restrict ref to be LLVM-nocapture more broadly is an open question that can likely also be an area for future work.

More general replacement for addr

We should consider ref as being available as a general part of our patterns -- for any parameter and in local declarations.

However, this will provide a significantly more general and likely more ergonomic replacement for addr and we should remove addr in favor of ref self: Self object parameters for methods that mutate (or take the address of) self. Potentially abbreviating the syntax further should be left as future work.

Improved C++ interop and migration

We expect this to improve interop and migration by allowing significantly more interface similarity between Carbon and C++. Previously, many things in C++ that used references on interface boundaries would be forced to switch to pointers. This adds ergonomic friction both at a basic level because of the forced change but also a deeper level because it will make it significantly harder to see the parallel usage across the boundary between C++ and Carbon. With reference bindings, the vast majority of this dissonance will be removed.

Open question: call site annotation

One important open question that should be answered here, at least initially, has to do with call-site annotation. When using pointers, there was often a "built-in" call site annotation of & when passing mutable state into a function. We need to decide what to do in that case.

There are three possible answers in this leads issue:

  1. We need a call-site annotation to move forward with reference bindings
  2. We don't need a call-site annotation decision to move forward directionally with reference bindings, and should consider that separately.
  3. We don't need a call-site annotation to reject reference bindings

If the result is (1), then we will need to work through what that annotation looks like (likely candidate: F(ref <expr>))

Details of impact on the type system

These will ultimately be part of the type system, but the goal is for them to only be part of the type system through patterns used in the type system: function parameters, etc.

Specifically, we don't expect them to be part of the object types in Carbon, but only part of the expression categories and bindings within patterns. In this regard, they are very similar to value bindings -- we retain a great deal of implementation flexibility around layout, etc.

This specifically means we will need to incorporate ref bindings into the Call interface and we will be adding complexity there that will need to be handled by overloading. The overloading impact specifically is likely future work, but will at least carry additional complexity to handle ref.

Details of lifetimes

We should ensure that reference expressions formed via reference bindings do not dangle.

So for any reference expression that has a known lifetime already in the language, such as those associated with temporaries or var declarations, we should either lifetime-extend (in the case of temporaries) or error (in the case of declarations) when trying to form a binding that would outlive the referenced object.

For reference expressions without known lifetimes currently such as dereferenced pointers, while we should allow them despite unsafety today, we should fully expect lifetime safety in Carbon to eventually introduce a way of reasoning about these lifetimes and with that a requirement that the lifetime of the binding be satisfied. That should be explicitly expected as future work and part of getting an overall safety story for Carbon.

This does fundamentally mean that we now have another kind of "pointer", potentially adding complexity to any memory-safety story. However, I think this ship already sailed to some extent with value bindings. Fundamentally, bindings are allowed to have pointer-like semantics from a lifetime perspective, and so will need to be considered as a pointer-like thing as we build out lifetime safety.

Details that should be addressed in a proposal

When this goes to a proposal, there are a collection of important details that will need to be worked out. However, this issue suggests that we do not need these details to decide this issue directionally. This issue is about "should we have reference bindings in the language in some form", and any details that are necessary to resolve that should be pulled up and covered in this issue. We don't want to expend the (considerable) effort of building a proposal for this w/o reasonable alignment that we'll actually move forward.

Specific details that we'll defer to the proposal:

  • Exact structure of how ref is attached within patterns w.r.t. destructuring, etc.
  • Should we have a top-level ref introducer, or is let ref good enough.
  • Exact specification of how this surfaces in Call or other interfaces.
  • Adding ref lambda captures and all of the details that need to be resolved there, this may even be deferred into a second proposal.
@chandlerc chandlerc added the leads question A question for the leads team label Apr 8, 2025
@danakj
Copy link
Contributor

danakj commented Apr 8, 2025

This issue does not yet touch on assignment, so I would like to suggest that C++ got assignment quite wrong with references, where it passes through. Rust has a much more clear model, where assigning to a reference changes the value of the reference variable.

The C++ model has introduced a lot of ambiguity for using references in types, such as the whole optional<T&> issue. Whereas having to dereference a reference to assign to the value inside is clear and allows for both behaviour choices:

let ref a: auto = b;
a = c;  // a points to c
*a = d;  // *a is assigned a copy of d

However this does suggest using * to access the thing inside the reference. The example code above does not use *. So maybe this is incompatible with the above proposal. Maybe this points out an issue with eliding the *.

There is also a question in here about what happens with a ref of a ref. Is it possible to form one?

I'd say a pain point in Rust is having to worry about the number of levels of references at times. Not for the dot operator which looks through them, but especially with lambdas where you need to **a at times. This can be important, if you want to mutate an inner reference, but the vast majority of cases have an immutable reference.

@geoffromer
Copy link
Contributor

As I understand it, the direction proposed here is that there will continue to be no such thing as an object or value with a reference type, and references will only be in the type system to the extent that we need to reflect patterns in the type system (e.g. for overload resolution). "Referenceness" will continue to be primarily a property of expressions, which is surfaced via bindings in much the same way that the value-vs-object distinction (or the template/symbolic/runtime phase distinction) is surfaced via bindings. So in particular:

  • If a is declared as a reference to b, a = c; can't be understood as assigning a new referent to a, because assignment is an operation on objects, and a itself is not an object. In principle we could introduce a separate operation for re-binding an existing binding, but that would be equally applicable to all kinds of bindings (and I doubt the cost/benefit would be favorable).
  • One of the key open questions (and IMO one of the key risks) is how to surface references in the type system without implying that there can be objects of reference type (or the moral equivalent). It may be useful to evaluate the design through the lens of how successfully it avoids creating problems like optional<T&>.
  • There will be no way to form a ref of a ref, because references can only refer to objects, and references are not objects.

Chandler, let me know if I got any of that wrong.

@chandlerc
Copy link
Contributor Author

This issue does not yet touch on assignment, so I would like to suggest that C++ got assignment quite wrong with references, where it passes through. Rust has a much more clear model, where assigning to a reference changes the value of the reference variable.

I don't really agree...

But I think the real issue is that Rust's references are much more automatically dereferenced pointers than references in C++, much less the reference bindings I'm suggesting here. They can be stored in objects, etc.

By confining references to expressions and bindings I think largely avoids the problem here -- or makes the problem immediate and obvious pushing the code to switch to a pointer which then has the exact distinction you're describing for assignment.

There is also a question in here about what happens with a ref of a ref. Is it possible to form one?

This is also I think precluded by the model -- because you bind a name to a reference expression, and that name when used is a reference expression, there is no way to add layers here.

@chandlerc
Copy link
Contributor Author

As I understand it, the direction proposed here is that there will continue to be no such thing as an object or value with a reference type, and references will only be in the type system to the extent that we need to reflect patterns in the type system (e.g. for overload resolution).

That was my intent.

  • One of the key open questions (and IMO one of the key risks) is how to surface references in the type system without implying that there can be objects of reference type (or the moral equivalent). It may be useful to evaluate the design through the lens of how successfully it avoids creating problems like optional<T&>.

Because ref isn't a type qualifier, you can't even write ref T to get a type parameter to optional. You would have to use pointers to build this kind of tool.

@geoffromer
Copy link
Contributor

  • One of the key open questions (and IMO one of the key risks) is how to surface references in the type system without implying that there can be objects of reference type (or the moral equivalent). It may be useful to evaluate the design through the lens of how successfully it avoids creating problems like optional<T&>.

Because ref isn't a type qualifier, you can't even write ref T to get a type parameter to optional. You would have to use pointers to build this kind of tool.

ref isn't a type qualifier, but as you acknowledge, there is going to be some way of expressing references in the type system (so that we can model things like function signatures), and it's not obvious to me how we'll do that without creating a temptation to use it for things like optional<T&>.

@josh11b
Copy link
Contributor

josh11b commented Apr 8, 2025

  • No ability to use this in a class member or struct.

How do I call ref self methods on class members then? Seems strange since we've been talking about using ref bindings on subobjects when destructuring a var binding.

One other use case for the background: ability to do forwarding of arguments preserving expression category.

@josh11b
Copy link
Contributor

josh11b commented Apr 8, 2025

  • No ability to use this in a class member or struct.

How do I call ref self methods on class members then? Seems strange since we've been talking about using ref bindings on subobjects when destructuring a var binding.

Oh, you probably mean you can't use a ref as a class member or struct member, not that you can't make a reference to a class member or struct member.

@chandlerc
Copy link
Contributor Author

  • No ability to use this in a class member or struct.

How do I call ref self methods on class members then? Seems strange since we've been talking about using ref bindings on subobjects when destructuring a var binding.

Oh, you probably mean you can't use a ref as a class member or struct member, not that you can't make a reference to a class member or struct member.

Correct, sorry. I'll edit to be more clear this is about fields.

@chandlerc
Copy link
Contributor Author

  • One of the key open questions (and IMO one of the key risks) is how to surface references in the type system without implying that there can be objects of reference type (or the moral equivalent). It may be useful to evaluate the design through the lens of how successfully it avoids creating problems like optional<T&>.

Because ref isn't a type qualifier, you can't even write ref T to get a type parameter to optional. You would have to use pointers to build this kind of tool.

ref isn't a type qualifier, but as you acknowledge, there is going to be some way of expressing references in the type system (so that we can model things like function signatures), and it's not obvious to me how we'll do that without creating a temptation to use it for things like optional<T&>.

I don't see how we get from ref in parameter patterns to a temptation to have a ref qualified type parameter....

We'll need to do something to model ref vs. var vs. let (default) for parameter patterns in the Call interface, but that doesn't need to actually use this capability in any object, just in describing the signature of the call modeled?

@geoffromer
Copy link
Contributor

We'll need to do something to model ref vs. var vs. let (default) for parameter patterns in the Call interface, but that doesn't need to actually use this capability in any object, just in describing the signature of the call modeled?

It's hard to articulate this without a concrete design. I just think there's a slippery slope from "types that describe reference parameters" to "types that describe references" to "reference types" to "object members with reference types", and durably stopping part-way down that slope will require intentional effort.

@chandlerc
Copy link
Contributor Author

We'll need to do something to model ref vs. var vs. let (default) for parameter patterns in the Call interface, but that doesn't need to actually use this capability in any object, just in describing the signature of the call modeled?

It's hard to articulate this without a concrete design. I just think there's a slippery slope from "types that describe reference parameters" to "types that describe references" to "reference types" to "object members with reference types", and durably stopping part-way down that slope will require intentional effort.

I agree it will require intentional effort.

My goal would be to leverage the parameter distinction, and at any time modeling parameters make sure we're either using parameters or have the opportunity to indirect through a different representation (pointers).

So concretely, binding some subset of parameters to produce a callable with a different signature, might have this modeled in the initial signature, in the end result signature, and in the signature accepting the value to bind, but would then implement that last piece using a pointer and not anything else.

I also think that this is a slope we are already on because of the distinction between let and var, which we similarly want to avoid sliding beyond that of parameter patterns.

@josh11b
Copy link
Contributor

josh11b commented Apr 9, 2025

Just note that we do want the elements of tuples to have their own category, and in general tuples should ideally be able to represent arbitrary argument lists so that we can use variadic tuples interchangeably with variadic parameter lists.

@chandlerc
Copy link
Contributor Author

Just note that we do want the elements of tuples to have their own category, and in general tuples should ideally be able to represent arbitrary argument lists so that we can use variadic tuples interchangeably with variadic parameter lists.

I'm not sure we're all aligned around elements of tuples objects.

For tuple literals, I agree, and the path we've been using there is to look through the literal to see the element expression and its category. I think that can generalize well to ref.

I think even for forwarding, we don't need this for tuple objects, but we do need a way to abstract over / deduce let vs. var vs. ref. I've somewhat omitted that here, but I do think figuring that out is a necessary step. It might be something we can defer to future work though, as I think we already need to figure out how to do that between let and var, it doesn't seem like ref makes that much worse.

@chandlerc
Copy link
Contributor Author

To start trying to flesh out a sketch for the important property of being able te deduce and programmatically control this aspect of parameter patterns (and potentially patterns more generally):

I might suggest a choice (or similar) type in the prelude along the lines of:

choice PatternCategory {
  Value,
  Reference,
  Variable,
}

And then an advanced syntax that uses these compile time constants. Here, I'm supposing balanced delimiters of [[ and ]], but we could use either any "fancy" balanced delimiter that composes other punctuation with (s, [s, or {s; or we could use any introducer combined with delimiters. I think [[ may not be the best because it creates ([[ as a punctuation sequence, and raises the question of how / whether this could appear in a deduced list, as [[[ would be... at a minimum very difficult to read visually, and possibly ambiguous.

fn DeducingArgCat[Cat:! Core.PatternCategory, T:! type]( [[Cat]] x: T) {
  ...
}

And then the default, ref, and var might be syntactic sugar for:

fn G([[Core.PatternCategory.Value]]     let_binding: i32,
     [[Core.PatternCategory.Reference]] ref_binding: i32,
     [[Core.PatternCategory.Variable]]  var_binding: i32) {
  ...
}

Forwarding might then look like, imagining destructuring inside an expansion which I'm unsure if we allow, but could likely rewrite this to avoid:

fn Forward[... (each Cat:! Core.PatternCategory, each T:! type)]
          (... [[each Cat]] arg: each T) {
  Target(... each arg);
}

And similarly, the Call interface might look like:

interface Call(... (each Cat:! Core.PatternCategory, each T:! type)) {
  let ResultT:! type;
  fn Op(... [[each Cat]] arg: each T) -> ResultT;
}

I feel like there might be better ways to do various parts of this, especially the expanded pair-wise destructuring seems like something I'm not sure was really fleshed out in variadics yet and might not even work. But the core of this is to avoid actually tuple objects that represent references, and instead use some other way of encoding the space at compile time and then reifying it into the pattern space where needed.

Not super excited about [[ ... ]] syntax, but I feel like we have a bunch of options to explore there. My initial ideas seem to have problems of needing keywords that are either very strange or collide with both identifiers and other concepts:

  • kind(...) -> popular identifier, and the term kind is already overloaded...
  • category(...) -> popular identifier, long
  • expr(...) -> popular identifier, lots of other possible interpretations
  • binding(...) -> somewhat popular keyword, although less, but likely to be confusing in this position
  • expr_cat(...) -> a very opaque keyword.

Maybe the opaqueness is OK -- this is going to be an advanced feature that is only used rarely and in fairly deep type-management library code I suspect. But maybe that also points to a punctuation syntax that isn't in too high of demand.

We could always go with cat(...), and I'd love to have more cats in the language ;] but not sure that's really more readable than expr_cat.

I'll stop live-thinking into the issue comment here...

@geoffromer
Copy link
Contributor

This kind of approach sounds like it would enable an optional type to support references, but it would have decidedly unpleasant ergonomics: optional<Foo&> would become something like Optional(Core.PatternCategory.Ref, Foo). That might be a sufficient disincentive for doing that, assuming we don't provide a shorthand for Core.PatternCategory.Ref. Note that we might be tempted to provide such a shorthand, because as it stands the type names of function pointers and type-erased functions will be extremely verbose, but we'll need to look for other solutions.

Forwarding might then look like, imagining destructuring inside an expansion which I'm unsure if we allow, but could likely rewrite this to avoid:

fn Forward[... (each Cat:! Core.PatternCategory, each T:! type)]
          (... [[each Cat]] arg: each T) {
  Target(... each arg);
}

Destructuring inside a pattern expansion should be fine; the variadics design is certainly intended to support that kind of thing. The part I'm not so sure about is having a tuple pattern in the implicit parameter list, which I'd been thinking of as a flat list of symbolic binding patterns. Allowing it to have that sort of richer structure isn't trivial, since it's populated by deduction rather than by pattern matching, but it seems solvable.

@josh11b
Copy link
Contributor

josh11b commented Apr 10, 2025

I'm concerned with cases where we need to use a tuple parameter for the situation where there are multiple variadic arguments. Examples without categories:

fn CallTwice[... each T:! type, FNT:! Call(... each T) where result = i32]
    (f: FNT, args1: (... each T), args2: (... each T)) -> i32 {
  return f(... each args1) + f(... each args2);
}

or

fn CallBoth[... each T1:! type, FNT1:! Call(... each T1) where result = i32,
            ... each T2:! type, FNT2:! Call(... each T2) where result = i32]
    (f: FNT1, args1: (... each T1), g: FNT2, args2: (... each T2)) -> i32 {
  return f(... each args1) + g(... each args2);
}

@geoffromer
Copy link
Contributor

geoffromer commented Apr 10, 2025

I'm concerned with cases where we need to use a tuple parameter for the situation where there are multiple variadic arguments. Examples without categories:

fn CallTwice[... each T:! type, FNT:! Call(... each T) where result = i32]
    (f: FNT, args1: (... each T), args2: (... each T)) -> i32 {
  return f(... each args1) + f(... each args2);
}

Maybe the expression inside [[ ]] can be a tuple literal, to represent a mixed category? That would let us express that example like so:

fn CallTwice[... (each Cat:! Core.PatternCategory, each T:! type),
             FNT:! Call(... (each Cat, each T)) where result = i32]
    (f: FNT,
     [[(... each Cat)]] args1: (... each T),
     [[(... each Cat)]] args2: (... each T)) -> i32 {
  return f(...expand args1) + f(...expand args2);
}

@josh11b
Copy link
Contributor

josh11b commented Apr 30, 2025

we do need a way to abstract over / deduce let vs. var vs. ref. I've somewhat omitted that here, but I do think figuring that out is a necessary step. It might be something we can defer to future work though, as I think we already need to figure out how to do that between let and var, it doesn't seem like ref makes that much worse.

I agree this is a distinct issue that can be addressed separately from this issue.

@josh11b
Copy link
Contributor

josh11b commented Apr 30, 2025

One question we should resolve is whether ref should be nocapture (and noalias) in order to support "move-in-move-out" semantics. Is this an important part of ref? The write up doesn't provide motivation for this choice.

@zygoloid
Copy link
Contributor

Do we allow shared mutable refs to the same object? How does that interact with the ability to support move-in-move-out? For example:

// Suppose i32 has move-in-move-out semantics for ref parameters.
fn F(ref a: i32, ref b: i32) -> bool {
  // Can this affect the value of b?
  a = 1;
  // Can this return true?
  return &a == &b;
}
fn G() -> bool {
  var v: i32;
  return F(v, v);
}

It seems like the restriction we would want on ref parameters to support move-in-move-out in cases like the above, and still have ref behave like a reference, would be something similar to noalias / a "no shared references" rule. It could well be that that's exactly what we want, but I think it's important that we're explicit about what choice we're making here.

If we can't make the difference between pass-by-pointer (alias) and move-in-move-out (borrow) unobservable for ref, for example by adding restrictions that prohibit observing the difference, I think we should consider having both ref and inout so that the programmer can state their intent.

@josh11b josh11b changed the title We should add ref bindings to to Carbon, paralleling reference expressions We should add ref bindings to Carbon, paralleling reference expressions May 16, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
leads question A question for the leads team
Projects
None yet
Development

No branches or pull requests

5 participants