Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix(decoder): decode multi-member gzip data #3270

Open
wants to merge 2 commits into
base: master
Choose a base branch
from

Conversation

lizeyan
Copy link

@lizeyan lizeyan commented Aug 16, 2024

Summary

#3269

Allow GZipDecoder to decode multi-member gzip data

Checklist

  • I understand that this PR may be closed in case there was no previous discussion. (This doesn't apply to typos!)
  • I've added a test for each change that was introduced, and I tried as much as possible to make a single atomic change.
  • I've updated the documentation accordingly.

@lizeyan lizeyan force-pushed the master branch 2 times, most recently from 8cc8437 to d7e2e1b Compare August 23, 2024 06:19
@@ -9,6 +9,7 @@
import zstandard as zstd

import httpx
from httpx._decoders import GZipDecoder
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Rather than using a private import, could we switch this to use the same style as the other test cases here.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I changed the style of the test case

Comment on lines 76 to +77
self.decompressor = zlib.decompressobj(zlib.MAX_WBITS | 16)
self.state = GzipDecoderState.FIRST_MEMBER
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm wondering if there's an implementation using something like...

self._buffer = io.BytesIO()
self._reader = gzip.GzipFile(fileobj=self._buffer, mode='r')

Would using GzipFile be more robust here / catch other cases?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems that zlib is slightly faster than gzip

import gzip
import zlib
import time
import random
import string

def generate_random_data(size):
    return ''.join(random.choices(string.ascii_letters + string.digits, k=size)).encode()

def test_gzip_decompress(data, iterations):
    compressed = gzip.compress(data)
    start_time = time.time()
    for _ in range(iterations):
        gzip.decompress(compressed)
    end_time = time.time()
    return end_time - start_time

def test_zlib_decompress(data, iterations):
    compressed = zlib.compress(data)
    start_time = time.time()
    for _ in range(iterations):
        zlib.decompress(compressed)
    end_time = time.time()
    return end_time - start_time

# 测试参数
data_sizes = [1000, 10000, 100000]
iterations = 10000

print("Testing gzip and zlib decompression performance:")
print("------------------------------------------------")

for size in data_sizes:
    print(f"\nTesting with data size: {size} bytes")
    data = generate_random_data(size)
    
    gzip_time = test_gzip_decompress(data, iterations)
    zlib_time = test_zlib_decompress(data, iterations)
    
    print(f"gzip decompression time: {gzip_time:.4f} seconds")
    print(f"zlib decompression time: {zlib_time:.4f} seconds")
    
    if gzip_time < zlib_time:
        print(f"gzip is faster by {(zlib_time / gzip_time - 1) * 100:.2f}%")
    else:
        print(f"zlib is faster by {(gzip_time / zlib_time - 1) * 100:.2f}%")

print("\nTest completed.")
Testing gzip and zlib decompression performance:
------------------------------------------------

Testing with data size: 1000 bytes
gzip decompression time: 0.0503 seconds
zlib decompression time: 0.0386 seconds
zlib is faster by 30.47%

Testing with data size: 10000 bytes
gzip decompression time: 0.2463 seconds
zlib decompression time: 0.2337 seconds
zlib is faster by 5.38%

Testing with data size: 100000 bytes
gzip decompression time: 2.3218 seconds
zlib decompression time: 2.3094 seconds
zlib is faster by 0.54%

Test completed.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants