Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support for S3 Select #23

Open
Ivor opened this issue Jul 4, 2018 · 10 comments
Open

Support for S3 Select #23

Ivor opened this issue Jul 4, 2018 · 10 comments

Comments

@Ivor
Copy link

Ivor commented Jul 4, 2018

Do you guys plan to support S3 Select any time soon?

I've hacked at it a bit and thought it was working until I started validating what I get back and then I realised the response is chunked and streaming and to be honest its over my head at this stage.

Will be fantastic if you could add this functionality.
The api could simply pass through the request XML and expression. The challenging part is parsing the response.

The project is much appreciated, with or without this. Thanks!

@benwilson512
Copy link
Contributor

I'd welcome a PR for it, but I don't have time in the foreseeable future to do so myself sorry.

@Ivor
Copy link
Author

Ivor commented Jul 5, 2018

Understood. I will scratch around a bit and see if there are bits of the existing library that I can reuse. The Download module seems useful, although the S3 Select request is a post while the download file request is a get request.

If you have any bigger picture perspective or tips to share that will be appreciated but I will see what I can do either way.

Again, much appreciated, the library is very very useful as it is.

@madshargreave
Copy link

@Ivor Did you proceed with this?

@Ivor
Copy link
Author

Ivor commented Jan 16, 2019

I played around a bit but ended up not using it. The operation below worked if passed to ExAws.request(operation) if there were few records. The response can be split on end-of-line character and then parsed from JSON. However, the streaming/chunking aspect escaped me so this failed on bigger record sets.

%ExAws.Operation.S3{
  body: build_xml(expression),
  bucket: "select-bucket-store",
  headers: %{},
  http_method: :post,
  params: %{},
  parser: &ExAws.Utils.identity/1,
  path: "#{path}?select&select-type=2", #path to s3 object
  resource: "",
  service: :s3,
  stream_builder: nil
}

I suspect the only useful part here is that I embedded the query (expression) in the correctly formatted XML and that I added the select&select-type=2 to the path. Besides that this is just a normal S3 request I think. I might have needed to build a stream_builder to deal with bigger data sets.

The XML that I built looked like this:

"<?xml version=\"1.0\" encoding=\"UTF-8\"?>
<SelectRequest>
  <Expression>#{expression}</Expression>
  <ExpressionType>SQL</ExpressionType>
  <InputSerialization>
    <JSON>
      <RecordDelimiter>\n</RecordDelimiter>
      <Type>DOCUMENT</Type>
    </JSON>
  </InputSerialization>
  <OutputSerialization>
    <JSON>
      <RecordDelimiter>\n</RecordDelimiter>
    </JSON>
  </OutputSerialization>
  <RequestProgress>
    <Enabled>FALSE</Enabled>
  </RequestProgress>
</SelectRequest>"

Hope this helps :)

@madshargreave
Copy link

@Ivor thank you, I'll have a stab at the stream_builder :)

@iwarshak
Copy link

@madshargreave any luck with this?

@joshuataylor
Copy link

Does any other resource within ex_aws/ex_aws_s3 include a Transfer-Encoding header with chunked as its value in the response?

https://docs.aws.amazon.com/AmazonS3/latest/API/RESTSelectObjectAppendix.html

Trying to do a simple request/response gives me back this as a header:

     {"Transfer-Encoding", "chunked"}

@avinayak
Copy link

avinayak commented Oct 31, 2023

I am trying to work on this.
eventstream + async chunk streaming is the hard part. I'm studying how boto does this. I got the request working.

UPDATE on 2/11/2023 -

  • Response from S3 here is on type {"Transfer-Encoding", "chunked"} and {"Content-Type", "application/octet-stream"}.
  • Each chunk is encoded in AWS's EventStream format (a lot of binary decoding).
  • This request does not accept Range http header unlike get_object.
  • This means we have to Stream chunks as they arrive with an unknown total size.
  • We'll know when to stop based on the EventStream metadata.
  • I'll have to modify ExAws.Request.Hackney to enable true streaming (need to use hackney.stream_body, and hackney.request without :with_body opt)

UPDATE on 3/11/2023 -

  • I got chunked octet response streaming working. On to decoding EventStream!

    EDIT: cc @bernardd LMK if this sounds good. I'm still working on this. I think I can get a working PR pretty soon.

@benwilson512
Copy link
Contributor

Hi @avinayak I handed off maintenance of this and other ExAws libraries many years ago to @bernardd

@avinayak
Copy link

avinayak commented Nov 5, 2023

@bernardd I have a PRs up for this in this Repo and ex_aws
#236
ex-aws/ex_aws#1012

avinayak added a commit to hiivemarkets/ex_aws that referenced this issue Nov 13, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

6 participants