Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Race condition in subscriptions ordering #1339

Open
bartlomiej-korpus opened this issue Sep 9, 2024 · 1 comment
Open

Race condition in subscriptions ordering #1339

bartlomiej-korpus opened this issue Sep 9, 2024 · 1 comment

Comments

@bartlomiej-korpus
Copy link

bartlomiej-korpus commented Sep 9, 2024

Environment

  • Elixir version (elixir -v): any
  • Absinthe version (mix deps | grep absinthe): 1.7.8
  • Client Framework and version (Relay, Apollo, etc): latest apollo client with react

Expected behavior

When pushing several updates, one after another, using Absinthe.Subscription.publish, they should arrive in the same order they were sent.

Actual behavior

They might arrive in a different order.

Relevant Schema/Middleware Code

Given a subscription like this:

subscription {
    someEntityUpdates {
        id
        version
    }
}

and code like this:

    Absinthe.Subscription.publish(MyAppWeb.Endpoint, %{id: 1, version: 1}, entity_updated: "entity:1")
    # some work, possibly very fast because of cache
    Absinthe.Subscription.publish(MyAppWeb.Endpoint, %{id: 1, version: 2}, entity_updated: "entity:1")
    # some more work, possibly very fast because of cache
    Absinthe.Subscription.publish(MyAppWeb.Endpoint, %{id: 1, version: 3}, entity_updated: "entity:1")

If consecutive calls to publish happen very quickly, updates to entity 1 might be delivered in the wrong order.

I observe this in a multi-node Phoenix setup using phoenix_redis_pubsub to relay messages between instances. The Phoenix Redis adapter guarantees the ordering of messages.

The bug originates from the fact that when routing mutation_result in Absinthe.Subscription

shard = :erlang.phash2(mutation_result, pool_size)
the actual result payload is used to calculate the target shard. Payloads are different, and so are most probably shards, too. From here on, subscription proxies operate concurrently, and so ordering cannot be guaranteed.

There is a second place in the code that adds to this problem:

Task.Supervisor.start_child(state.task_super, Subscription.Local, :publish_mutation, [

here, actual channel publish happens asynchronously using Task.Supervisor and introduces another race condition(so even if the payloads are routed to the same shard, after firing up Tasks, we can't be sure how BEAM schedules work). However, this one can be resolved by flipping async flag that has been recently merged in.

The only workaround I can think of for now is to use pool_size: 1, async: false.

I'd love to help prepare a fix if this is indeed considered a bug and we can agree on a desired solution.

@benwilson512
Copy link
Contributor

Hey @bartlomiej-korpus this is definitely tricky in a multi-node situation. I should note that Absinthe cannot provide guarantees that messages are not dropped. I think sharding based on the topic to make race conditions unlikely is a solid plan. I don't see a straight forward way however to preserve guaranteed ordering in the async: true scenario, but that may just be a trade off we need to document.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants