-
Notifications
You must be signed in to change notification settings - Fork 79
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add type definition for rectype
#392
base: main
Are you sure you want to change the base?
Conversation
This changes the component model specification to reference a `rectype` in its type defintions. This makes sense since the [GC] proposal has graduated to become standard WebAssembly and the previous definitions refer to `arraytype` and `structtype` which are subsumed by `rectype`. Adding this also benefits the [shared-everything-threads] proposal, which uses `shared` bits on composite types. In talking with @alexcrichton about this, the `functype` alternative is retained for now to allow backward compatibility for existing components (e.g., components using the `0x60` prefix to define a core `functype`). In the future, the `functype` alternative should be removed completely (since it is subsumed under `rectype`). Potentially the `0x00` prefix could be tweaked as well. In the meantime, this change allows more than one way to encode a `functype`. [GC]: https://github.com/WebAssembly/gc [shared-everything-threads]: https://github.com/WebAssembly/shared-everything-threads Co-authored-by: Alex Crichton <[email protected]>
To expand a bit more on this too, we were hoping that we could share prefix bytes here with core wasm but we already have a conflict. Currently two prefix bytes of core types are implemented: 0x60 for functions and 0x50 for core modules. GC types are spec'd but haven't actually been implemented yet. It ended up that In lieu of that I'd personally be in favor of scrapping the prefix-byte-matches-core-wasm entirely and instead define this as there's either (a) a core wasm type encoded exactly as-is in a core module or (b) a core module type which is something only in the component model. For that I think we'd ideally use prefix bytes 0x00 and 0x01, but given the prevalence of components and tooling such a change can't just be made. As a compromise I'd propose using 0x00 as a prefix byte for core wasm types. When the component model 1.0 is released as a final binary encoding we are reserving the right to make tweaks and I'd consider this to be one of those tweaks. (although I'm not sure of the best place to write this down so we remember when 1.0 comes around) |
Arg, thanks for pointing that out! Yeah, I guess it was a mistake to attempt to pick an unused core type code for I think perhaps a conservative 1.0 goal state might look like:
Importantly, while there are two section ids here, both append definitions to the core type index space and so they can be used in tandem. I (and iirc Andreas) have hope that one far future day module-linking will be prioritized and added to the core wasm spec, at which point in time As for the transitional step: one backwards-compatible option we have is to retcon (and perhaps rename) our current " WDYT? If you like the idea, I can write up a PR. |
To me the decision of two-sections or prefix-byte-in-one-section I think would come down to the integration elsewhere. Having fewer sections "feels" a bit nicer but I'm not really wed to any particular solution. For the two-sections idea, how would that work out in the text format? Is the thinking that everything would look like Tooling-integration wise I do think that it'll be easier to have a single section though since if you're building things up it's easier to manage just one section vs sections interleaving one another. For example right now @abrown and I prototyped something where the API for creating a component automatically switches functions to use the old 0x60 encoding instead of the new 0x00 0x60 encoding to preserve compatibility with toolchains/runtimes today. That was easy to implement on the "create a type section" API whereas if we had two sections the functionality there would have to be lifted up to the "create a component" API which would be more difficult. If module types are added to core wasm one day, then the vestigal parts could presumably be the prefix 0x00 byte and whatever temporary prefix was chosen for core module types? Overall I've found that for the relevant bits it's not too hard to share bits and pieces with core wasm and reusing strictly-exactly-the-same prefixes/sections with core wasm isn't too important so long as the bytes parsed by core wasm are exactly the same. |
I was thinking that section boundaries/grouping would be a binary encoding detail (just like it is now including whether two I suppose all of these options are workable; I mostly just liked that the multiple-sections approach avoided us adding new things now that we intend to break later (trying to keep the "pending breaking change" list to a minimum). Also, hypothetically, core wasm may get |
For tooling today I think it'll definitely be easiest to just tweak the current encodings rather than adding a new section. In terms of "we'll for sure break this later" perhaps both the 0x50 and 0x60 bytes could be deprecated? Rename 0x60 to 0x00 and allow any core type after it, and then rename 0x50 to 0x01 perhaps? That way we could annotate directly in |
Thinking about it some more, the prefix trick doesn't solve the co-existence problem in general since since it doesn't fix the case where the core binary format decodes a single In the spirit of the original PR, what if we:
That way, we just have this one localized wart we can cut off later and the end state has no prefixes. One assumption that I think we've all been making that I wanted to call out to make sure it's fine (I think it is...) is that |
This change supports [bytecodealliance#392], which adds a way to use GC's `rectypes` as type definitions in components. Previously, only function types were supported and there was no way express array and struct types. This keeps the previous function decoding support based on peeking the function type `0x60` prefix but adds support for encoding `rectypes` with a new `0x00` prefix. [bytecodealliance#392]: WebAssembly/component-model#392
This change supports [bytecodealliance#392], which adds a way to use GC's `rectypes` as type definitions in components. Previously, only function types were supported and there was no way express array and struct types. This keeps the previous function decoding support based on peeking the function type `0x60` prefix but adds support for encoding `rectypes` with a new `0x00` prefix. [bytecodealliance#392]: WebAssembly/component-model#392
This change supports [bytecodealliance#392], which adds a way to use GC's `rectypes` as type definitions in components. Previously, only function types were supported and there was no way express array and struct types. This keeps the previous function decoding support based on peeking the function type `0x60` prefix but adds support for encoding `rectypes` with a new `0x00` prefix. [bytecodealliance#392]: WebAssembly/component-model#392 Co-authored-by: Alex Crichton <[email protected]>
This change supports [bytecodealliance#392], which adds a way to use GC's `rectypes` as type definitions in components. Previously, only function types were supported and there was no way express array and struct types. This keeps the previous function decoding support based on peeking the function type `0x60` prefix but adds support for encoding `rectypes` with a new `0x00` prefix. [bytecodealliance#392]: WebAssembly/component-model#392 Co-authored-by: Alex Crichton <[email protected]>
That sounds like a reasonable transition path to me. I'll also admit though that the idea that you can decode things as s33 has perplexed me because you can't actually do that. It's not spec'd as doing that so you're not allowed to make an overlong encoding of a negative number for example. That means that, as far as I know, everyone's doing byte-by-byte parsing anyway and the coincidence that things line up in the leb space is purely coincidental and no one can actually take advantage of it.
Agreed with this, this was only an issue for the top-level discriminator byte. Another sort of orthogonal issue, I'm not sure if there's actually a great place to "reserve" opcodes in the CG/spec right now? For example I'm not sure how proposals coordinate with each other. I'm not sure there's even a centralized listing of "here's all the type opcodes" or at least I couldn't find one when I was writing up this comment |
Yeah, until GC, I think the weird opcode/typeindex overlay trick only shows up in
I suppose we could propose to the CG that we add a row to this table to claim some opcode with a "(reserved for module linking)" category. |
Ah yeah this is slightly off-topic at this point but my point about In any case though that table looks great! So it seems like they way forward is perhaps:
In the future when we rev the binary format the the binary header that'll indicate whether 0x50 is a |
Ah, but that was not the goal. The purpose of this encoding only is to save an extra distinctive byte that we'd otherwise need to put in front of each type index in such contexts. At some point we thought this could be a significant space saving. |
@alexcrichton Yeah, but I think one extra bullet is needed:
Nit: I would say that a new Happy to let @abrown evolve this PR in-place or I'm happy to write up this in a different PR; whatever's convenient. |
This follows along with the most recent discussion in the component model PR ([bytecodealliance#392]). [bytecodealliance#392]: WebAssembly/component-model#392
This follows along with the most recent discussion in the component model PR ([bytecodealliance#392]). [bytecodealliance#392]: WebAssembly/component-model#392 Co-authored-by: Alex Crichton <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, thanks!
Co-authored-by: Luke Wagner <[email protected]>
This change supports [bytecodealliance#392], which adds a way to use GC's `rectypes` as type definitions in components. Previously, only function types were supported and there was no way express array and struct types. This keeps the previous function decoding support based on peeking the function type `0x60` prefix but adds support for encoding `rectypes` with a new `0x00` prefix. [bytecodealliance#392]: WebAssembly/component-model#392 Co-authored-by: Alex Crichton <[email protected]>
This follows along with the most recent discussion in the component model PR ([bytecodealliance#392]). [bytecodealliance#392]: WebAssembly/component-model#392 Co-authored-by: Alex Crichton <[email protected]>
* Allow parsing `rectypes` in components This change supports [#392], which adds a way to use GC's `rectypes` as type definitions in components. Previously, only function types were supported and there was no way express array and struct types. This keeps the previous function decoding support based on peeking the function type `0x60` prefix but adds support for encoding `rectypes` with a new `0x00` prefix. [#392]: WebAssembly/component-model#392 Co-authored-by: Alex Crichton <[email protected]> * Apply `0x00` prefix to non-final `sub`; add tests This follows along with the most recent discussion in the component model PR ([#392]). [#392]: WebAssembly/component-model#392 Co-authored-by: Alex Crichton <[email protected]> * review: keep variant as `ComponentCoreTypeId::Sub` * review: remove leftover comment * review: remove resolved TODOs * review: move `From` implementations to `core/binary.rs` * review: remove `parse_component_sub_type` --------- Co-authored-by: Alex Crichton <[email protected]>
838a7df
to
824fdc5
Compare
This changes the component model specification to reference a
rectype
in its type defintions. This makes sense since the GC proposal has graduated to become standard WebAssembly and the previous definitions refer toarraytype
andstructtype
which are subsumed byrectype
. Adding this also benefits the shared-everything-threads proposal, which usesshared
bits on composite types.In talking with @alexcrichton about this, the
functype
alternative is retained for now to allow backward compatibility for existing components (e.g., components using the0x60
prefix to define a corefunctype
). In the future, thefunctype
alternative should be removed completely (since it is subsumed underrectype
). Potentially the0x00
prefix could be tweaked as well. In the meantime, this change allows more than one way to encode afunctype
.