Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add type definition for rectype #392

Open
wants to merge 5 commits into
base: main
Choose a base branch
from
Open

Conversation

abrown
Copy link
Contributor

@abrown abrown commented Sep 5, 2024

This changes the component model specification to reference a rectype in its type defintions. This makes sense since the GC proposal has graduated to become standard WebAssembly and the previous definitions refer to arraytype and structtype which are subsumed by rectype. Adding this also benefits the shared-everything-threads proposal, which uses shared bits on composite types.

In talking with @alexcrichton about this, the functype alternative is retained for now to allow backward compatibility for existing components (e.g., components using the 0x60 prefix to define a core functype). In the future, the functype alternative should be removed completely (since it is subsumed under rectype). Potentially the 0x00 prefix could be tweaked as well. In the meantime, this change allows more than one way to encode a functype.

This changes the component model specification to reference a `rectype`
in its type defintions. This makes sense since the [GC] proposal has
graduated to become standard WebAssembly and the previous definitions
refer to `arraytype` and `structtype` which are subsumed by `rectype`.
Adding this also benefits the [shared-everything-threads] proposal,
which uses `shared` bits on composite types.

In talking with @alexcrichton about this, the `functype` alternative is
retained for now to allow backward compatibility for existing components
(e.g., components using the `0x60` prefix to define a core `functype`).
In the future, the `functype` alternative should be removed completely
(since it is subsumed under `rectype`). Potentially the `0x00` prefix
could be tweaked as well. In the meantime, this change allows more than
one way to encode a `functype`.

[GC]: https://github.com/WebAssembly/gc
[shared-everything-threads]: https://github.com/WebAssembly/shared-everything-threads

Co-authored-by: Alex Crichton <[email protected]>
@alexcrichton
Copy link
Collaborator

To expand a bit more on this too, we were hoping that we could share prefix bytes here with core wasm but we already have a conflict. Currently two prefix bytes of core types are implemented: 0x60 for functions and 0x50 for core modules. GC types are spec'd but haven't actually been implemented yet. It ended up that subtype uses the 0x50 prefix additionally in core wasm to indicate a non-final type. This means that the 0x50 prefix for core modules is now overlapping, meaning it's no longer possible to match prefix bytes with core wasm.

In lieu of that I'd personally be in favor of scrapping the prefix-byte-matches-core-wasm entirely and instead define this as there's either (a) a core wasm type encoded exactly as-is in a core module or (b) a core module type which is something only in the component model. For that I think we'd ideally use prefix bytes 0x00 and 0x01, but given the prevalence of components and tooling such a change can't just be made.

As a compromise I'd propose using 0x00 as a prefix byte for core wasm types. When the component model 1.0 is released as a final binary encoding we are reserving the right to make tweaks and I'd consider this to be one of those tweaks. (although I'm not sure of the best place to write this down so we remember when 1.0 comes around)

@lukewagner
Copy link
Member

Arg, thanks for pointing that out! Yeah, I guess it was a mistake to attempt to pick an unused core type code for moduletype.

I think perhaps a conservative 1.0 goal state might look like:

  • have a "standard core type" section containing only core:type definitions that are phase 4 standards and are thus identical to what a core module contains
  • have a "component-model-added core type" section that contains moduletype

Importantly, while there are two section ids here, both append definitions to the core type index space and so they can be used in tandem.

I (and iirc Andreas) have hope that one far future day module-linking will be prioritized and added to the core wasm spec, at which point in time moduletype could be added to the "standard core type" section and the "component-model-added core type" section would be vestigial (which is a permanent wart, but a small one; maybe it's deprecated and removed eventually).

As for the transitional step: one backwards-compatible option we have is to retcon (and perhaps rename) our current "core:type" section into the latter "component-model-added core type" section and then add a new separate "standard core type" section that has all the correct type codes. While core:functype would initially be in this "component-model-added core type section", it would also be in the "standard core type section" and thus producer toolchains could gradually switch to emitting core:functype in the standard core type section at which point we would effectively be at the 1.0 goal state, reducing our "rebase" burden later (and avoiding some transitional complexity, I think).

WDYT? If you like the idea, I can write up a PR.

@alexcrichton
Copy link
Collaborator

To me the decision of two-sections or prefix-byte-in-one-section I think would come down to the integration elsewhere. Having fewer sections "feels" a bit nicer but I'm not really wed to any particular solution.

For the two-sections idea, how would that work out in the text format? Is the thinking that everything would look like (core type (....)) and then it'd get sorted into sections based on the kind that was written down?

Tooling-integration wise I do think that it'll be easier to have a single section though since if you're building things up it's easier to manage just one section vs sections interleaving one another. For example right now @abrown and I prototyped something where the API for creating a component automatically switches functions to use the old 0x60 encoding instead of the new 0x00 0x60 encoding to preserve compatibility with toolchains/runtimes today. That was easy to implement on the "create a type section" API whereas if we had two sections the functionality there would have to be lifted up to the "create a component" API which would be more difficult.

If module types are added to core wasm one day, then the vestigal parts could presumably be the prefix 0x00 byte and whatever temporary prefix was chosen for core module types? Overall I've found that for the relevant bits it's not too hard to share bits and pieces with core wasm and reusing strictly-exactly-the-same prefixes/sections with core wasm isn't too important so long as the bytes parsed by core wasm are exactly the same.

@lukewagner
Copy link
Member

I was thinking that section boundaries/grouping would be a binary encoding detail (just like it is now including whether two (core type ...) definitions are in separate sections or the same) and so the text format wouldn't change at all. Perhaps a "create a type section" API similarly spit out multiple sections, as needed?

I suppose all of these options are workable; I mostly just liked that the multiple-sections approach avoided us adding new things now that we intend to break later (trying to keep the "pending breaking change" list to a minimum). Also, hypothetically, core wasm may get core:alias definitions in the future and if for whatever reason the binary format isn't the same, that might be second situation where we want to two section ids. But if it's a lot more work in the short term, that's a valid argument against the multiple section approach in the short term.

@alexcrichton
Copy link
Collaborator

For tooling today I think it'll definitely be easiest to just tweak the current encodings rather than adding a new section. In terms of "we'll for sure break this later" perhaps both the 0x50 and 0x60 bytes could be deprecated? Rename 0x60 to 0x00 and allow any core type after it, and then rename 0x50 to 0x01 perhaps? That way we could annotate directly in Binary.md that the old opcodes are intended to be deleted?

@lukewagner
Copy link
Member

Thinking about it some more, the prefix trick doesn't solve the co-existence problem in general since since it doesn't fix the case where the core binary format decodes a single s33 as either a non-negative type index or negative type opcode (where there's no place for a prefix). This isn't a problem for moduletype afaics, but still it's long-term hazard for the general prefix approach (and the multi-section approach as well). Fortunately, I don't think we'll ever need or want to do this trick again beyond moduletype, so it's really just a question of what to do about moduletype in particular.

In the spirit of the original PR, what if we:

  • Pre 1.0, use the 0x00 prefix, but only for sub (so 0x00 0x50 means sub but, e.g., 0x4E means rec).
  • For 1.0, talk to the CG and reserve a permanent placeholder type code for moduletype in the core wasm spec and give that to moduletype and give subtype back its rightful 0x50.

That way, we just have this one localized wart we can cut off later and the end state has no prefixes.

One assumption that I think we've all been making that I wanted to call out to make sure it's fine (I think it is...) is that 0x50 only means moduletype when directly inside a core:type. When decoding, e.g., the vec(subtype) inside rectype, 0x50 means sub.

abrown added a commit to abrown/wasm-tools that referenced this pull request Sep 5, 2024
This change supports [bytecodealliance#392], which adds a way to use GC's `rectypes` as
type definitions in components. Previously, only function types were
supported and there was no way express array and struct types. This
keeps the previous function decoding support based on peeking the
function type `0x60` prefix but adds support for encoding `rectypes`
with a new `0x00` prefix.

[bytecodealliance#392]: WebAssembly/component-model#392
abrown added a commit to abrown/wasm-tools that referenced this pull request Sep 5, 2024
This change supports [bytecodealliance#392], which adds a way to use GC's `rectypes` as
type definitions in components. Previously, only function types were
supported and there was no way express array and struct types. This
keeps the previous function decoding support based on peeking the
function type `0x60` prefix but adds support for encoding `rectypes`
with a new `0x00` prefix.

[bytecodealliance#392]: WebAssembly/component-model#392
abrown added a commit to abrown/wasm-tools that referenced this pull request Sep 5, 2024
This change supports [bytecodealliance#392], which adds a way to use GC's `rectypes` as
type definitions in components. Previously, only function types were
supported and there was no way express array and struct types. This
keeps the previous function decoding support based on peeking the
function type `0x60` prefix but adds support for encoding `rectypes`
with a new `0x00` prefix.

[bytecodealliance#392]: WebAssembly/component-model#392

Co-authored-by: Alex Crichton <[email protected]>
abrown added a commit to abrown/wasm-tools that referenced this pull request Sep 5, 2024
This change supports [bytecodealliance#392], which adds a way to use GC's `rectypes` as
type definitions in components. Previously, only function types were
supported and there was no way express array and struct types. This
keeps the previous function decoding support based on peeking the
function type `0x60` prefix but adds support for encoding `rectypes`
with a new `0x00` prefix.

[bytecodealliance#392]: WebAssembly/component-model#392

Co-authored-by: Alex Crichton <[email protected]>
@alexcrichton
Copy link
Collaborator

That sounds like a reasonable transition path to me. I'll also admit though that the idea that you can decode things as s33 has perplexed me because you can't actually do that. It's not spec'd as doing that so you're not allowed to make an overlong encoding of a negative number for example. That means that, as far as I know, everyone's doing byte-by-byte parsing anyway and the coincidence that things line up in the leb space is purely coincidental and no one can actually take advantage of it.

One assumption that I think we've all been making ...

Agreed with this, this was only an issue for the top-level discriminator byte.


Another sort of orthogonal issue, I'm not sure if there's actually a great place to "reserve" opcodes in the CG/spec right now? For example I'm not sure how proposals coordinate with each other. I'm not sure there's even a centralized listing of "here's all the type opcodes" or at least I couldn't find one when I was writing up this comment

@lukewagner
Copy link
Member

Yeah, until GC, I think the weird opcode/typeindex overlay trick only shows up in blocktype. Scanning through the GC proposal, though, I see a bit more uses of it.

I'm not sure if there's actually a great place to "reserve" opcodes in the CG/spec right now?

I suppose we could propose to the CG that we add a row to this table to claim some opcode with a "(reserved for module linking)" category.

@alexcrichton
Copy link
Collaborator

alexcrichton commented Sep 6, 2024

Ah yeah this is slightly off-topic at this point but my point about blocktype is that you can't actually decode s33 and work on the result, if the decode is a negative number you have to go back byte-by-byte since an overlong encoding of a negative number isn't accepted for the empty block type for example.

In any case though that table looks great! So it seems like they way forward is perhaps:

  • Update the grammar to say that rt:<core:rectype> and mt:<core:moduletype> are the arms of core:deftype
  • Add a note saying that the 0x50 prefix is favored to be core:moduletype instead of core:rectype
  • Add a note indicating that in the future a new prefix byte will be reserved for core:moduletype

In the future when we rev the binary format the the binary header that'll indicate whether 0x50 is a core:rectype or a core:moduletype which will enable transcoding from the previous encoding to the new encoding. In the new encoding 0x00 0x50 won't be supported any more.

@rossberg
Copy link
Member

rossberg commented Sep 6, 2024

@alexcrichton:

I'll also admit though that the idea that you can decode things as s33 has perplexed me because you can't actually do that. It's not spec'd as doing that so you're not allowed to make an overlong encoding of a negative number for example.

Ah, but that was not the goal. The purpose of this encoding only is to save an extra distinctive byte that we'd otherwise need to put in front of each type index in such contexts. At some point we thought this could be a significant space saving.

@lukewagner
Copy link
Member

@alexcrichton Yeah, but I think one extra bullet is needed:

  • Add a third arm to core:type decoding 0x00 0x50 to sub, copying the 0x50 line of this rule.
  • Add a note indicating that in the future a new prefix byte will be reserved for core:moduletype

Nit: I would say that a new prefix opcode will be reserved for core:moduletype

Happy to let @abrown evolve this PR in-place or I'm happy to write up this in a different PR; whatever's convenient.

abrown added a commit to abrown/wasm-tools that referenced this pull request Sep 6, 2024
This follows along with the most recent discussion in the component
model PR ([bytecodealliance#392]).

[bytecodealliance#392]: WebAssembly/component-model#392
abrown added a commit to abrown/wasm-tools that referenced this pull request Sep 6, 2024
This follows along with the most recent discussion in the component
model PR ([bytecodealliance#392]).

[bytecodealliance#392]: WebAssembly/component-model#392

Co-authored-by: Alex Crichton <[email protected]>
Copy link
Member

@lukewagner lukewagner left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, thanks!

design/mvp/Binary.md Outdated Show resolved Hide resolved
Co-authored-by: Luke Wagner <[email protected]>
abrown added a commit to abrown/wasm-tools that referenced this pull request Sep 10, 2024
This change supports [bytecodealliance#392], which adds a way to use GC's `rectypes` as
type definitions in components. Previously, only function types were
supported and there was no way express array and struct types. This
keeps the previous function decoding support based on peeking the
function type `0x60` prefix but adds support for encoding `rectypes`
with a new `0x00` prefix.

[bytecodealliance#392]: WebAssembly/component-model#392

Co-authored-by: Alex Crichton <[email protected]>
abrown added a commit to abrown/wasm-tools that referenced this pull request Sep 10, 2024
This follows along with the most recent discussion in the component
model PR ([bytecodealliance#392]).

[bytecodealliance#392]: WebAssembly/component-model#392

Co-authored-by: Alex Crichton <[email protected]>
github-merge-queue bot pushed a commit to bytecodealliance/wasm-tools that referenced this pull request Sep 10, 2024
* Allow parsing `rectypes` in components

This change supports [#392], which adds a way to use GC's `rectypes` as
type definitions in components. Previously, only function types were
supported and there was no way express array and struct types. This
keeps the previous function decoding support based on peeking the
function type `0x60` prefix but adds support for encoding `rectypes`
with a new `0x00` prefix.

[#392]: WebAssembly/component-model#392

Co-authored-by: Alex Crichton <[email protected]>

* Apply `0x00` prefix to non-final `sub`; add tests

This follows along with the most recent discussion in the component
model PR ([#392]).

[#392]: WebAssembly/component-model#392

Co-authored-by: Alex Crichton <[email protected]>

* review: keep variant as `ComponentCoreTypeId::Sub`

* review: remove leftover comment

* review: remove resolved TODOs

* review: move `From` implementations to `core/binary.rs`

* review: remove `parse_component_sub_type`

---------

Co-authored-by: Alex Crichton <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants