Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Storage] decompress workaround #37488

Draft
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

vincenttran-msft
Copy link
Member

draft

@github-actions github-actions bot added the Storage Storage Service (Queues, Blobs, Files) label Sep 19, 2024
@vincenttran-msft
Copy link
Member Author

vincenttran-msft commented Sep 20, 2024

For example, take test_put_blob_empty
Running it vanilla and repro-ing your failures in your PR gives you this stacktrace:

test_blob_encryption_v2.py:442: in test_put_blob_empty
    data = blob.download_blob().readall()
..\..\..\core\azure-core\azure\core\tracing\decorator.py:94: in wrapper_use_tracer
    return func(*args, **kwargs)
..\azure\storage\blob\_blob_client.py:744: in download_blob
    return StorageStreamDownloader(**options)
..\azure\storage\blob\_download.py:381: in __init__
    self._get_encryption_data_request()
..\azure\storage\blob\_download.py:429: in _get_encryption_data_request
    properties = cast("BlobProperties", self._clients.blob.get_properties(**self._request_options))
..\..\..\core\azure-core\azure\core\tracing\decorator.py:94: in wrapper_use_tracer
    return func(*args, **kwargs)
..\azure\storage\blob\_generated\operations\_blob_operations.py:1893: in get_properties
    pipeline_response: PipelineResponse = self._client._pipeline.run(  # pylint: disable=protected-access
..\..\..\core\azure-core\azure\core\pipeline\_base.py:229: in run
    return first_node.send(pipeline_request)
..\..\..\core\azure-core\azure\core\pipeline\_base.py:86: in send

Then the rest is entering a chain of send from passing around through all our pipeline policies.

Most critically, observe

..\azure\storage\blob\_download.py:381: in __init__
    self._get_encryption_data_request()

This codepath is what differs the encryption downloads from all regular downloads (which seem to be working fine), as we need to do some extra verification and setup for encryption. If we follow the implementation of _get_encryption_data_request()

def _get_encryption_data_request(self) -> None:
        # Save current request cls
        download_cls = self._request_options.pop('cls', None)
        # Adjust cls for get_properties
        self._request_options['cls'] = deserialize_blob_properties

        properties = cast("BlobProperties", self._clients.blob.get_properties(**self._request_options))
        # This will return None if there is no encryption metadata or there are parsing errors.
        # That is acceptable here, the proper error will be caught and surfaced when attempting
        # to decrypt the blob.
        self._encryption_data = parse_encryption_data(properties.metadata)

        # Restore cls for download
        self._request_options['cls'] = download_cls

There is an extra network call to get_properties to retrieve the metadata so that we can pass this as:
self._encryption_data = parse_encryption_data(properties.metadata)

Here we can see why we are getting the error:
E TypeError: Session.request() got an unexpected keyword argument 'decompress'

This function calls into the generated code's get_properties function, which does NOT support decompress (as download does).

And so, this PR is a POC that, we can simply chop off decompression before we make the get_properties call, then we can put it back on before it hands the request options back for the download call. Let's wait on Jacob to come back next week to confirm my hunch 😄
image

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Storage Storage Service (Queues, Blobs, Files)
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant