Add "decompress" response utility #3423

kettanaito · 2024-07-26T17:01:37Z

This relates to...

Fixes Expose "Content-Encoding" handling publicly #3412
Related to feat: add decompress interceptor #3274

Rationale

Exposing the response decompression logic (i.e. the handling of the Content-Encoding header) will allow other Node.js libraries to reuse it, resulting in a consistent experience for everyone.

See more detailed reasoning in the referenced issue.

Changes

Features

Add a new decompress utility, abstracting the Content-Type handling from the existing lib/fetch/index.js.
Use the decompress utility in the onHeaders callback.
Add unit tests for decompress.
Add unit tests for decompressStream.
Export the decompress utility publicly from undici.
Export the decompressStream utility publicly from undici.
Add types to the types module
Add type tests
Export the functions from under the Util namespace

Bug Fixes

Breaking Changes and Deprecations

Status

KhafraDev · 2024-07-26T17:25:14Z

it'd be nice if this worked with the rest of undici too, not just fetch

kettanaito · 2024-07-26T17:41:33Z

@KhafraDev, can you point me to other usages where this should work?

I admit, we mostly need it for fetch, since http does not handle response encoding by design (it returns it as-is). I presume Undici has other use cases. I can see what I can do there. That being said, keeping a (request, response) call signature would be nice publicly.

KhafraDev · 2024-07-26T18:02:43Z

can you point me to other usages where this should work?

undici.request, stream, etc. #1155

kettanaito · 2024-07-26T18:07:55Z

Added a basic set of unit tests for the decompress function.

✔ ignores responses without the "Content-Encoding" header (32.383292ms)
✔ ignores responses with empty "Content-Encoding" header (0.780042ms)
✔ ignores redirect responses (0.439708ms)
✔ ignores HEAD requests (0.456958ms)
﹣ ignores CONNECT requests (0.0775ms) # SKIP
✔ ignores responses with unsupported encoding (0.219292ms)
✔ decompresses responses with "gzip" encoding (15.465084ms)

Still need to cover multiple Content-Encoding values, like gzip, br, etc.

kettanaito · 2024-07-26T18:21:02Z

@KhafraDev, thinking of tailoring this to a generic body decompression purposes, I can imagine it being used this way:

function createDecompressionStream(args): TransformStream 

// Then, usage:
anyCompressedBodyStream.pipe(createDecompressionStream(args))

The most challenging part is figuring out the args here.

KhafraDev · 2024-07-26T19:18:38Z

I think for arguments a stream and a header value would be fine

kettanaito · 2024-07-26T19:37:09Z

Got it. Added a decompressStream() utility, reused it in the decompress().

Now, decompressStream can be used with any ReadableStream as input, while decompress operates on Fetch API request and response arguments (also decides additional logic like redirects, etc.)

metcoder95

So far, it LGTM;

I believe the decompressStream can be used within an interceptor if we want to aim for that in the other PR as well.

lib/web/fetch/decompress.js

ronag · 2024-07-28T10:26:39Z

lib/web/fetch/decompress.js

+      decoders.length = 0
+      break


Suggested change

decoders.length = 0

break

return null

Maybe return null or otherwise there is no way to know if it "failed".

Also decompressStream should probably not live under /web/

@ronag, can it fail though? The intention was that, if there's no known compression provided, the stream is returned as-is.

You can still handle failures as you normally would:

decompressStream(myStream).on('error', callback)

if there's no known compression provided

The logic seems to be that if an unknown coding is sent, the body isn't decompressed at all. I'm not sure if that sounds right either.

Content-Encoding: gzip, zstd, br -> ignored (zstd is not supported)
Content-Encoding: gzip, br -> decompressed

As a consumer, I expect the body stream to be returned as-is if:

It has no Content-Encoding set (no codings provided as an argument);

It has unknown Content-Encoding set or an unknown coding(s) is provided as an argument.

I wouldn't expect it to throw because that's a disruptive behavior that implies I need to add a guard before trying to call decompress/decompressStream. At least, not at a level this low.

If the API is potentially disruptive, I need to have an extra check in place to prevent scenarios where decompression is impossible. This puts extra work on me as a consumer but also results in a more frail API since the list of supported encodings is baked in Undici and may change across releases.

I strongly suggest to return the body stream as-is in those two cases I described above. decompressString() must never error, save for the streams errors itself (and those are irrelevant in the context of this function).

@metcoder95 @KhafraDev do you agree with my reasoning in the post above?

Hah, so there's an existing expectation that decompression is skipped if encountered unknown encoding:

✖ should skip decompression if unsupported codings

I suppose that answers the question.

throwing an error will make fetch slower

Regarding the decoders customization discussion, I don't think it justifies the cost of making decompressStream more complex. The only thing you need is the list of decodings, which you can get using a one-liner:

const codings = contentEncoding .toLowerCase() .split(',') .map((coding) => coding.trim())

The purpose of decompressStream is to grab the exact behavior Undici has under the hood. If you need a different behavior, you should (1) parse the Content-Encoding header; (2) map encodings to decompression streams with custom options.

I can see how you'd want to extend or override certain things from the default decompression, and for that we, perhaps, can consider exporting those options separately?

module.exports = { gzipDecompressionOptions: { flush: zlib.constants.Z_SYNC_FLUSH, finishFlush: zlib.constants.Z_SYNC_FLUSH } }

Even then, that's all the options Undici uses right now. To me this looks like a nice thing to have without substantial use case argumentation. I lean toward iterating on this when we have more of the latter.

Regarding erroring or throwing; As you suggest @kettanaito, I'd like to fail faster and inform the implementer the encoding is not supported without even disturbing the input stream, so fallbacks can be applied if they want to.

For fetch, as @KhafraDev points out, the throwing might be slower, so we can disable by default, and not throw at all, but rather ignore the non-supported encodings.

The issue I'd like to avoid, is putting the implementer in the spot of all or nothing.

Even then, that's all the options Undici uses right now. To me this looks like a nice thing to have without substantial use case argumentation. I lean toward iterating on this when we have more of the latter.

Agree, let's do that 👍

kettanaito · 2024-07-31T10:57:58Z

I've added unit tests for x-gzip and gzip, br encodings. Both seem to pass 🎉

Also added test cases for deflate and deflate, gzip.

kettanaito · 2024-07-31T13:58:51Z

The CI is failing with some issues download Node.js binary, it seems. Doesn't appear to be related to the changes.

kettanaito · 2024-07-31T14:09:02Z

@KhafraDev, what do you think if decompressStream becomes createDecompressStream, dropping the input stream argument and allowing any stream to be piped through it?

myStream.pipe(createDecompressStream(codings))

I'd love to learn from you about the difference/implications here. Thanks.

kettanaito · 2024-08-01T13:06:58Z

Update

I've added remaining unit tests for the decompressStream utility, all are passing locally.

Based on the conclusion of the discussion around unknown encodings, and taking into the consideration that there's an explicit existing test suite that unknown encodings must be ignored when decompressing the fetch() response body, looks like the way forward is to leave the implementation as-is. Please let me know if I misunderstood the conclusion.

I suggest adding the throwing logic to decompressStream once we discover a use case for that (decompression in the interceptor may be that use case). From then, we can open a new pull request to extend the decompressStream utility with options or anything else we find proper.

With this, I believe this pull request is ready for review cc @KhafraDev @mcollina

tsctx · 2024-08-01T13:53:43Z

lib/web/fetch/decompress.js

+ * @param {Response} response
+ * @returns {ReadableStream<Uint8Array> | null}
+ */
+function decompress (request, response) {


I think decompressStream is sufficient. It should already be redirected.

I'd like to have a designated utility that performs the decompression including the handling of redirect responses, etc. decompressStream implies you handle that yourself.

It looks to me like you are targeting the Fetch api response.
In this case, have you already been redirected?

@tsctx, I'm moving the existing content encoding logic to this utility function. In the existing logic, Undici does accept the response, which is the redirect response, not the redirected response. This is correct. This utility must also decide if decompression is unnecessary if the response is a redirect response.

tsctx · 2024-08-01T13:57:51Z

lib/web/fetch/index.js

+            body: decompress(request, {
+              status,
+              statusText,
+              headers: headersList,


headersList is not an instance of Headers

Technically, not, but its APIs are compatible, from what I can see. At least, the entire test suite hasn't proven me wrong.

I can construct Headers instance out of headersList but it may have performance implications. Would you advise me to do that?

index.js

metcoder95

LGTM, just having the considerations of @tsctx about the util namespace and the following thread: https://github.com/nodejs/undici/pull/3423/files#r1694215822

mcollina · 2024-08-02T16:15:46Z

Can you please add the types for this and the related type tests?

kettanaito · 2024-08-07T15:24:17Z

On a related subject, I've recently learned about DecompressionStream global API, which is also available in Node.js as of now. Can it be utilize to the same effect as the API being proposed here?

response.body.pipeTo(new DecompressionStream('gzip'))

One practical downside of this is that there's no connection between the response content encoding and the decompression used. You can also provide only one decompression format (gzip, deflate, deflate-raw), so in case of response streams encoded with multiple codings, you'd have to create a pipe of decompression streams.

Does anybody know if there's any difference in DecompressionStream and the content encoding handling Undici has as of now?

I'd still much like to be consistent with Unidici here, but knowing more would be good.

kettanaito · 2024-08-07T15:43:27Z

index.js

@@ -130,19 +141,31 @@ const { kConstruct } = require('./lib/web/cache/symbols')
 // in an older version of Node, it doesn't have any use without fetch.
 module.exports.caches = new CacheStorage(kConstruct)

-const { deleteCookie, getCookies, getSetCookies, setCookie } = require('./lib/web/cookies')
+const {


This is the result of running npm run lint:fix on this branch. You may have something misconfigured if it reports diffs.

Or this branch is behind and main has applied this formatting.

please undo them

kettanaito · 2024-08-07T15:45:02Z

Moved the tests to the root level of test, and now random tests are failing:

✖ should include encodedBodySize in performance entry (1.82966ms)
  TypeError [Error]: webidl.converters.USVString is not a function
      at fetch (/Users/kettanaito/Projects/contrib/undici/index.js:121:13)
      at process.processTicksAndRejections (node:internal/process/task_queues:95:5)
      at async Server.<anonymous> (/Users/kettanaito/Projects/contrib/undici/test/fetch/resource-timing.js:68:18)

The tests are in their own modules, no existing tests were edited. The errors themselves don't seem to be related to the changes.

Does anybody know what's causing this?

mcollina · 2024-09-11T16:40:39Z

@kettanaito are you still considering finishing this?

kettanaito · 2024-09-15T08:34:30Z

@mcollina, yes. I've fallen off lately, doing other things. This is still on my todo list, and I'd love to see this merged. Still got those test shenanigans, no idea what causing seemingly unrelated tests to fail. I would appreciate your patience with me here, it may take me time to get back to this.

Add "decompress" response utility

498a4c0

kettanaito mentioned this pull request Jul 26, 2024

Expose "Content-Encoding" handling publicly #3412

Open

Use "decompress" in "onHeaders"

0a15555

Add unit tests for "decompress"

a6ca426

Add "decompressStream" utility

3e89cd4

metcoder95 reviewed Jul 28, 2024

View reviewed changes

lib/web/fetch/decompress.js Outdated Show resolved Hide resolved

ronag reviewed Jul 28, 2024

View reviewed changes

Add missing compression tests

1b06bf1

kettanaito force-pushed the response-decompress branch from 788a1df to 1b06bf1 Compare July 31, 2024 10:57

kettanaito added 2 commits July 31, 2024 13:02

Add "deflate" and "deflate, gzip" unit tests

e386ac2

Improves unit tests

2bfbc44

kettanaito added 2 commits August 1, 2024 14:55

Export "decompress" and "decompressStream" publicly

2382184

Add unit tests for "decompressStream"

3137c6e

kettanaito marked this pull request as ready for review August 1, 2024 13:04

tsctx reviewed Aug 1, 2024

View reviewed changes

index.js Outdated Show resolved Hide resolved

metcoder95 reviewed Aug 2, 2024

View reviewed changes

tsctx changed the base branch from main to 3.x August 3, 2024 21:36

tsctx changed the base branch from 3.x to main August 4, 2024 02:51

kettanaito added 2 commits August 7, 2024 17:29

Move utilities to "util.js"

22cc0da

Fix missing imports in tests

b17059b

kettanaito commented Aug 7, 2024

View reviewed changes

kettanaito mentioned this pull request Aug 11, 2024

Request by got + compression hangs forever mswjs/msw#2200

Closed

4 tasks

Add "decompress" response utility #3423

Are you sure you want to change the base?

Add "decompress" response utility #3423

Conversation

kettanaito commented Jul 26, 2024 • edited Loading

This relates to...

Rationale

Changes

Features

Bug Fixes

Breaking Changes and Deprecations

Status

KhafraDev commented Jul 26, 2024

kettanaito commented Jul 26, 2024 • edited Loading

KhafraDev commented Jul 26, 2024

kettanaito commented Jul 26, 2024

kettanaito commented Jul 26, 2024

KhafraDev commented Jul 26, 2024

kettanaito commented Jul 26, 2024 • edited Loading

metcoder95 left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

kettanaito Jul 31, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

kettanaito commented Jul 31, 2024 • edited Loading

kettanaito commented Jul 31, 2024

kettanaito commented Jul 31, 2024

kettanaito commented Aug 1, 2024 • edited Loading

Update

tsctx Aug 1, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

metcoder95 left a comment

Choose a reason for hiding this comment

mcollina commented Aug 2, 2024

kettanaito commented Aug 7, 2024

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

kettanaito commented Aug 7, 2024

mcollina commented Sep 11, 2024

kettanaito commented Sep 15, 2024

kettanaito commented Jul 26, 2024 •

edited

Loading

kettanaito commented Jul 26, 2024 •

edited

Loading

kettanaito commented Jul 26, 2024 •

edited

Loading

kettanaito Jul 31, 2024 •

edited

Loading

kettanaito commented Jul 31, 2024 •

edited

Loading

kettanaito commented Aug 1, 2024 •

edited

Loading

tsctx Aug 1, 2024 •

edited

Loading