Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: SelectObjectContentRequest #236

Open
wants to merge 21 commits into
base: main
Choose a base branch
from

Conversation

avinayak
Copy link

@avinayak avinayak commented Nov 5, 2023

This PR Adds support for SelectObjectContentRequest in S3. This let's users query contents of S3 Objects if the yare of type CSV, JSON or Parquet.
Explanation for some design choices:

  • The output type of this request is a chunked EventStream(a binary format) encoded octet-stream.
  • There is no Range header for this request. So, we cannot use multiple http requests to stream this.
  • The only way to stream this is at the socket level (http streaming), which I added in feat: Add ability to perform octet streaming ex_aws#1012
    @bernardd Can you please check this?

@avinayak avinayak mentioned this pull request Nov 5, 2023
SelectObjectContents struct and add XML
serialization helpers
options for input_serialization,
output_serialization, and scan_range
@avinayak avinayak marked this pull request as ready for review November 9, 2023 00:27
@avinayak
Copy link
Author

avinayak commented Nov 9, 2023

ping @bernardd

describe "integration test" do
setup [:start_bypass]

test "stream SelectObjectContent results", %{bypass: bypass} do
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ex-aws/ex_aws#1012 has to be merged for this test to pass

@seanhuggins1
Copy link

seanhuggins1 commented Dec 7, 2023

This is great work, would love to see this merged.

def to_xml(value) when is_atom(value), do: Atom.to_string(value) |> String.upcase()
def to_xml(value) when is_binary(value), do: value
def to_xml(value) when is_integer(value), do: Integer.to_string(value)
def to_xml(value) when is_float(value), do: Float.to_string(value)
Copy link
Contributor

@bernardd bernardd Dec 11, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's probably best not to roll our own XML encoder here - it misses some basic stuff like escaping of control characters. Maybe add something like https://github.com/joshnuss/xml_builder as an optional dependency to be used here?

@bernardd
Copy link
Contributor

Thanks very much for your work on this and the corresponding PR over in ex_aws_s3, @avinayak. Please also accept my apologies for not getting to it sooner. It's been quite a busy end of the year over here.
At the moment it looks like there's a bunch of tests that are failing as a result of the changes to the request family of function signatures. I've also made a note about the XML stuff.
If you're able to take a look at that stuff and rebase it against the latest main, I'll do my best to get it merged as soon as I can.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants