current process crashes when downloading fle #89

sescobb27 · 2020-05-12T16:35:55Z

Environment

Elixir & Erlang versions (elixir --version):

Erlang/OTP 22 [erts-10.5] [source] [64-bit] [smp:12:12] [ds:12:12:10] [async-threads:1] [hipe]

Elixir 1.10.2 (compiled with Erlang/OTP 22)

ExAws version

* ex_aws 2.1.3 (Hex package) (mix)
  locked at 2.1.3 (ex_aws) 0bdbe2ae
* ex_aws_s3 2.0.2 (Hex package) (mix)
  locked at 2.0.2 (ex_aws_s3) 0569f5b2

HTTP client version. IE for hackney do mix deps | grep hackney

* hackney 1.15.2 (Hex package) (rebar3)
  locked at 1.15.2 (hackney) e0100f8e

Current behavior

Hi when trying to download multiple files at once i'm getting the following error, the problem is that it seems that is causing the current process to crash as is not returning an error tuple, i think is because at download operation async_stream is being used and that links to current process, but not sure if thats the reason see https://github.com/ex-aws/ex_aws_s3/blob/master/lib/ex_aws/s3/download.ex#L71-L93 and from docs

The tasks will be linked to the current process, similarly to async/1.

https://hexdocs.pm/elixir/Task.html#async_stream/5

besides of that i'm not seeing any other stack trace, error log or anything that helps me better diagnose the problem, but at current process i'm logging errors and also i tried rescuing without success so that's why i think this may be the reason

May 12 15:35:12 titan-media-parser-01 media_parser[1307]:     Args: [#Function<0.39970933/1 in ExAws.Operation.ExAws.S3.Download.download_to/3>, [%{end_byte: 16252927999, start_byte: 16200499200}]]
May 12 15:35:12 titan-media-parser-01 media_parser[1307]: Function: &:erlang.apply/2
May 12 15:35:12 titan-media-parser-01 media_parser[1307]:     (stdlib 3.12) proc_lib.erl:249: :proc_lib.init_p_do_apply/3
May 12 15:35:12 titan-media-parser-01 media_parser[1307]:     (elixir 1.10.2) lib/task/supervised.ex:35: Task.Supervised.reply/5
May 12 15:35:12 titan-media-parser-01 media_parser[1307]:     (elixir 1.10.2) lib/task/supervised.ex:90: Task.Supervised.invoke_mfa/2
May 12 15:35:12 titan-media-parser-01 media_parser[1307]:     (ex_aws_s3 2.0.2) lib/ex_aws/s3/download.ex:76: anonymous fn/4 in ExAws.Operation.ExAws.S3.Download.download_to/3
May 12 15:35:12 titan-media-parser-01 media_parser[1307]:     (ex_aws_s3 2.0.2) lib/ex_aws/s3/download.ex:21: ExAws.S3.Download.get_chunk/3
May 12 15:35:12 titan-media-parser-01 media_parser[1307]:     (ex_aws 2.1.3) lib/ex_aws.ex:66: ExAws.request!/2
May 12 15:35:12 titan-media-parser-01 media_parser[1307]: {:error, :checkout_timeout}
May 12 15:35:12 titan-media-parser-01 media_parser[1307]: ** (ExAws.Error) ExAws Request Error!
May 12 15:35:12 titan-media-parser-01 media_parser[1307]: 15:35:12.780 [error] Task #PID<0.7083.0> started from #PID<0.8148.0> terminating
May 12 15:35:12 titan-media-parser-01 media_parser[1307]: 15:35:12.779 [warn]  ExAws: HTTP ERROR: :checkout_timeout for URL: "..." ATTEMPT: 10

Expected behavior

to not crash current process, but instead return error tuple

The text was updated successfully, but these errors were encountered:

sescobb27 · 2020-11-17T15:46:16Z

Hi there, any update on this? can i help fixing this? (I think it would need async_stream_nolink) or you thing is not a problem from the lib? or should i go the easy way and just trap exits on my processes?

sescobb27 · 2020-11-17T15:49:30Z

the same happens with S3.download_file and with S3.upload

sescobb27 · 2020-11-17T18:43:01Z

A proposed solution would be something like this, it will have the same current behavior but with the advantage that can be rescued

NOTE: we would need a way to pass the name of the TaskSupervisor maybe using config

    def perform(op, config) do
      with {:ok, op} <- Upload.initialize(op, config) do
        stream = Stream.with_index(op.src, 1)

        TaskSupervisor
        |> Task.Supervisor.async_stream_nolink(
          stream,
          Upload,
          :upload_chunk!,
          [Map.delete(op, :src), config],
          max_concurrency: Keyword.get(op.opts, :max_concurrency, 4),
          timeout: Keyword.get(op.opts, :timeout, 30_000)
        )
        |> Enum.map(fn
          {:ok, val} -> val
          {:exit, {error, _}} -> raise error
        end)
        |> Upload.complete(op, config)
      end
    end

jimsynz · 2021-02-09T19:33:58Z

We're seeing the same thing in our system too:


18:26:50.155 [error] #PID<0.22527.61> running NarrativeService.APIWeb.Endpoint (cowboy_protocol) terminated
--
Server: content1.getnarrativeapp.com:80 (http)
Request: GET /static/***REDACTED***
** (exit) an exception was raised:
** (HTTPoison.Error) :checkout_timeout
(httpoison) lib/httpoison.ex:156: HTTPoison.request!/5
(elixir) lib/stream.ex:1362: anonymous fn/5 in Stream.resource/3
(elixir) lib/enum.ex:2979: Enum.reduce/3
(api) lib/api_web/controllers/image_controller.ex:1: NarrativeService.APIWeb.ImageController.action/2
(api) lib/api_web/controllers/image_controller.ex:1: NarrativeService.APIWeb.ImageController.phoenix_controller_pipeline/2
(api) lib/api_web/endpoint.ex:1: NarrativeService.APIWeb.Endpoint.instrument/4
(phoenix) lib/phoenix/router.ex:278: Phoenix.Router.__call__/1
(api) lib/api_web/endpoint.ex:1: NarrativeService.APIWeb.Endpoint.plug_builder_call/2
18:26:50.782 [warn] ExAws: HTTP ERROR: :checkout_timeout for URL: "https://s3.amazonaws.com/***REDACTED***" ATTEMPT: 5

My first thought was maybe pool exhaustion. Any thoughts on this @edgurgel?

sescobb27 · 2021-02-09T20:10:34Z

@jimsynz i think :checkout_timeout is indeed pool exhaustion, you may need to increase pool size, or to not use pooling at all, both solutions can work for you. but you need to also know that by increasing the pool size, you may find this error again.

jimsynz · 2021-02-09T20:15:46Z

Yeah. Looking at https://github.com/benoitc/hackney/issues/ it looks like there has been a bunch of problems with the default pool of late.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

current process crashes when downloading fle #89

current process crashes when downloading fle #89

sescobb27 commented May 12, 2020

sescobb27 commented Nov 17, 2020

sescobb27 commented Nov 17, 2020

sescobb27 commented Nov 17, 2020

jimsynz commented Feb 9, 2021 •

edited

Loading

sescobb27 commented Feb 9, 2021

jimsynz commented Feb 9, 2021

current process crashes when downloading fle #89

current process crashes when downloading fle #89

Comments

sescobb27 commented May 12, 2020

Environment

Current behavior

Expected behavior

sescobb27 commented Nov 17, 2020

sescobb27 commented Nov 17, 2020

sescobb27 commented Nov 17, 2020

jimsynz commented Feb 9, 2021 • edited Loading

sescobb27 commented Feb 9, 2021

jimsynz commented Feb 9, 2021

jimsynz commented Feb 9, 2021 •

edited

Loading