Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add "decompress" response utility #3423

Open
wants to merge 11 commits into
base: main
Choose a base branch
from

Conversation

kettanaito
Copy link
Contributor

@kettanaito kettanaito commented Jul 26, 2024

This relates to...

Rationale

Exposing the response decompression logic (i.e. the handling of the Content-Encoding header) will allow other Node.js libraries to reuse it, resulting in a consistent experience for everyone.

See more detailed reasoning in the referenced issue.

Changes

Features

  • Add a new decompress utility, abstracting the Content-Type handling from the existing lib/fetch/index.js.
  • Use the decompress utility in the onHeaders callback.
  • Add unit tests for decompress.
  • Add unit tests for decompressStream.
  • Export the decompress utility publicly from undici.
  • Export the decompressStream utility publicly from undici.
  • Add types to the types module
  • Add type tests
  • Export the functions from under the Util namespace

Bug Fixes

Breaking Changes and Deprecations

Status

@KhafraDev
Copy link
Member

it'd be nice if this worked with the rest of undici too, not just fetch

@kettanaito
Copy link
Contributor Author

kettanaito commented Jul 26, 2024

@KhafraDev, can you point me to other usages where this should work?

I admit, we mostly need it for fetch, since http does not handle response encoding by design (it returns it as-is). I presume Undici has other use cases. I can see what I can do there. That being said, keeping a (request, response) call signature would be nice publicly.

@KhafraDev
Copy link
Member

can you point me to other usages where this should work?

undici.request, stream, etc. #1155

@kettanaito
Copy link
Contributor Author

Added a basic set of unit tests for the decompress function.

✔ ignores responses without the "Content-Encoding" header (32.383292ms)
✔ ignores responses with empty "Content-Encoding" header (0.780042ms)
✔ ignores redirect responses (0.439708ms)
✔ ignores HEAD requests (0.456958ms)
﹣ ignores CONNECT requests (0.0775ms) # SKIP
✔ ignores responses with unsupported encoding (0.219292ms)
✔ decompresses responses with "gzip" encoding (15.465084ms)

Still need to cover multiple Content-Encoding values, like gzip, br, etc.

@kettanaito
Copy link
Contributor Author

@KhafraDev, thinking of tailoring this to a generic body decompression purposes, I can imagine it being used this way:

function createDecompressionStream(args): TransformStream 

// Then, usage:
anyCompressedBodyStream.pipe(createDecompressionStream(args))

The most challenging part is figuring out the args here.

@KhafraDev
Copy link
Member

I think for arguments a stream and a header value would be fine

@kettanaito
Copy link
Contributor Author

kettanaito commented Jul 26, 2024

Got it. Added a decompressStream() utility, reused it in the decompress().

Now, decompressStream can be used with any ReadableStream as input, while decompress operates on Fetch API request and response arguments (also decides additional logic like redirects, etc.)

Copy link
Member

@metcoder95 metcoder95 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So far, it LGTM;

I believe the decompressStream can be used within an interceptor if we want to aim for that in the other PR as well.

lib/web/fetch/decompress.js Outdated Show resolved Hide resolved
Comment on lines 80 to 81
decoders.length = 0
break
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
decoders.length = 0
break
return null

Maybe return null or otherwise there is no way to know if it "failed".

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also decompressStream should probably not live under /web/

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ronag, can it fail though? The intention was that, if there's no known compression provided, the stream is returned as-is.

You can still handle failures as you normally would:

decompressStream(myStream).on('error', callback)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if there's no known compression provided

The logic seems to be that if an unknown coding is sent, the body isn't decompressed at all. I'm not sure if that sounds right either.

Content-Encoding: gzip, zstd, br -> ignored (zstd is not supported)
Content-Encoding: gzip, br -> decompressed

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As a consumer, I expect the body stream to be returned as-is if:

  1. It has no Content-Encoding set (no codings provided as an argument);
  2. It has unknown Content-Encoding set or an unknown coding(s) is provided as an argument.

I wouldn't expect it to throw because that's a disruptive behavior that implies I need to add a guard before trying to call decompress/decompressStream. At least, not at a level this low.

If the API is potentially disruptive, I need to have an extra check in place to prevent scenarios where decompression is impossible. This puts extra work on me as a consumer but also results in a more frail API since the list of supported encodings is baked in Undici and may change across releases.

I strongly suggest to return the body stream as-is in those two cases I described above. decompressString() must never error, save for the streams errors itself (and those are irrelevant in the context of this function).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@metcoder95 @KhafraDev do you agree with my reasoning in the post above?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hah, so there's an existing expectation that decompression is skipped if encountered unknown encoding:

 ✖ should skip decompression if unsupported codings

I suppose that answers the question.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

throwing an error will make fetch slower

Copy link
Contributor Author

@kettanaito kettanaito Jul 31, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Regarding the decoders customization discussion, I don't think it justifies the cost of making decompressStream more complex. The only thing you need is the list of decodings, which you can get using a one-liner:

 const codings = contentEncoding
    .toLowerCase()
    .split(',')
    .map((coding) => coding.trim())

The purpose of decompressStream is to grab the exact behavior Undici has under the hood. If you need a different behavior, you should (1) parse the Content-Encoding header; (2) map encodings to decompression streams with custom options.

I can see how you'd want to extend or override certain things from the default decompression, and for that we, perhaps, can consider exporting those options separately?

module.exports = {
  gzipDecompressionOptions: {
    flush: zlib.constants.Z_SYNC_FLUSH,
    finishFlush: zlib.constants.Z_SYNC_FLUSH
  }
}

Even then, that's all the options Undici uses right now. To me this looks like a nice thing to have without substantial use case argumentation. I lean toward iterating on this when we have more of the latter.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Regarding erroring or throwing; As you suggest @kettanaito, I'd like to fail faster and inform the implementer the encoding is not supported without even disturbing the input stream, so fallbacks can be applied if they want to.

For fetch, as @KhafraDev points out, the throwing might be slower, so we can disable by default, and not throw at all, but rather ignore the non-supported encodings.

The issue I'd like to avoid, is putting the implementer in the spot of all or nothing.

Even then, that's all the options Undici uses right now. To me this looks like a nice thing to have without substantial use case argumentation. I lean toward iterating on this when we have more of the latter.

Agree, let's do that 👍

@kettanaito
Copy link
Contributor Author

kettanaito commented Jul 31, 2024

I've added unit tests for x-gzip and gzip, br encodings. Both seem to pass 🎉

Also added test cases for deflate and deflate, gzip.

@kettanaito
Copy link
Contributor Author

The CI is failing with some issues download Node.js binary, it seems. Doesn't appear to be related to the changes.

@kettanaito
Copy link
Contributor Author

@KhafraDev, what do you think if decompressStream becomes createDecompressStream, dropping the input stream argument and allowing any stream to be piped through it?

myStream.pipe(createDecompressStream(codings))

I'd love to learn from you about the difference/implications here. Thanks.

@kettanaito kettanaito marked this pull request as ready for review August 1, 2024 13:04
@kettanaito
Copy link
Contributor Author

kettanaito commented Aug 1, 2024

Update

I've added remaining unit tests for the decompressStream utility, all are passing locally.

Based on the conclusion of the discussion around unknown encodings, and taking into the consideration that there's an explicit existing test suite that unknown encodings must be ignored when decompressing the fetch() response body, looks like the way forward is to leave the implementation as-is. Please let me know if I misunderstood the conclusion.

I suggest adding the throwing logic to decompressStream once we discover a use case for that (decompression in the interceptor may be that use case). From then, we can open a new pull request to extend the decompressStream utility with options or anything else we find proper.

With this, I believe this pull request is ready for review cc @KhafraDev @mcollina

* @param {Response} response
* @returns {ReadableStream<Uint8Array> | null}
*/
function decompress (request, response) {
Copy link
Member

@tsctx tsctx Aug 1, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think decompressStream is sufficient. It should already be redirected.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd like to have a designated utility that performs the decompression including the handling of redirect responses, etc. decompressStream implies you handle that yourself.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It looks to me like you are targeting the Fetch api response.
In this case, have you already been redirected?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@tsctx, I'm moving the existing content encoding logic to this utility function. In the existing logic, Undici does accept the response, which is the redirect response, not the redirected response. This is correct. This utility must also decide if decompression is unnecessary if the response is a redirect response.

body: decompress(request, {
status,
statusText,
headers: headersList,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

headersList is not an instance of Headers

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Technically, not, but its APIs are compatible, from what I can see. At least, the entire test suite hasn't proven me wrong.

I can construct Headers instance out of headersList but it may have performance implications. Would you advise me to do that?

index.js Outdated Show resolved Hide resolved
Copy link
Member

@metcoder95 metcoder95 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, just having the considerations of @tsctx about the util namespace and the following thread: https://github.com/nodejs/undici/pull/3423/files#r1694215822

@mcollina
Copy link
Member

mcollina commented Aug 2, 2024

Can you please add the types for this and the related type tests?

@tsctx tsctx changed the base branch from main to 3.x August 3, 2024 21:36
@tsctx tsctx changed the base branch from 3.x to main August 4, 2024 02:51
@kettanaito
Copy link
Contributor Author

On a related subject, I've recently learned about DecompressionStream global API, which is also available in Node.js as of now. Can it be utilize to the same effect as the API being proposed here?

response.body.pipeTo(new DecompressionStream('gzip'))

One practical downside of this is that there's no connection between the response content encoding and the decompression used. You can also provide only one decompression format (gzip, deflate, deflate-raw), so in case of response streams encoded with multiple codings, you'd have to create a pipe of decompression streams.

Does anybody know if there's any difference in DecompressionStream and the content encoding handling Undici has as of now?

I'd still much like to be consistent with Unidici here, but knowing more would be good.

@@ -130,19 +141,31 @@ const { kConstruct } = require('./lib/web/cache/symbols')
// in an older version of Node, it doesn't have any use without fetch.
module.exports.caches = new CacheStorage(kConstruct)

const { deleteCookie, getCookies, getSetCookies, setCookie } = require('./lib/web/cookies')
const {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is the result of running npm run lint:fix on this branch. You may have something misconfigured if it reports diffs.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Or this branch is behind and main has applied this formatting.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

please undo them

@kettanaito
Copy link
Contributor Author

Moved the tests to the root level of test, and now random tests are failing:

 should include encodedBodySize in performance entry (1.82966ms)
  TypeError [Error]: webidl.converters.USVString is not a function
      at fetch (/Users/kettanaito/Projects/contrib/undici/index.js:121:13)
      at process.processTicksAndRejections (node:internal/process/task_queues:95:5)
      at async Server.<anonymous> (/Users/kettanaito/Projects/contrib/undici/test/fetch/resource-timing.js:68:18)

The tests are in their own modules, no existing tests were edited. The errors themselves don't seem to be related to the changes.

Does anybody know what's causing this?

@mcollina
Copy link
Member

@kettanaito are you still considering finishing this?

@kettanaito
Copy link
Contributor Author

@mcollina, yes. I've fallen off lately, doing other things. This is still on my todo list, and I'd love to see this merged. Still got those test shenanigans, no idea what causing seemingly unrelated tests to fail. I would appreciate your patience with me here, it may take me time to get back to this.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Expose "Content-Encoding" handling publicly
6 participants