Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Consensus operation timeout issue #100

Open
zafercavdar opened this issue Nov 30, 2023 · 4 comments
Open

Consensus operation timeout issue #100

zafercavdar opened this issue Nov 30, 2023 · 4 comments

Comments

@zafercavdar
Copy link

zafercavdar commented Nov 30, 2023

I'm running Qdrant vector database engine in Google Kubernetes and it was deployed by official Helm chart with minor modifications in values.yaml. It was running 3 replicas and I decided to increase it to 5. I ran Helm upgrade and it created 2 more replicas and they're all in Running state. When I run a search on existing collections, I get results immediately. However, I can't run create a new collection by using client.recreate_collection method.

My Python client version: 1.0.5
Qdrant docker version version: 1.2.0

Here are my Python logs:

    self.client.recreate_collection(
  File "/opt/conda/default/lib/python3.8/site-packages/qdrant_client/qdrant_client.py", line 1712, in recreate_collection
    self.delete_collection(collection_name)
  File "/opt/conda/default/lib/python3.8/site-packages/qdrant_client/qdrant_client.py", line 1646, in delete_collection
    result: Optional[bool] = self.http.collections_api.delete_collection(
  File "/opt/conda/default/lib/python3.8/site-packages/qdrant_client/http/api/collections_api.py", line 788, in delete_collection
    return self._build_for_delete_collection(
  File "/opt/conda/default/lib/python3.8/site-packages/qdrant_client/http/api/collections_api.py", line 268, in _build_for_delete_collection
    return self.api_client.request(
  File "/opt/conda/default/lib/python3.8/site-packages/qdrant_client/http/api_client.py", line 68, in request
    return self.send(request, type_)
  File "/opt/conda/default/lib/python3.8/site-packages/qdrant_client/http/api_client.py", line 91, in send
    raise UnexpectedResponse.for_response(response)
qdrant_client.http.exceptions.UnexpectedResponse: Unexpected Response: 500 (Internal Server Error)
Raw response content:
b'{"status":{"error":"Service internal error: Waiting for consensus operation commit failed. Timeout set at: 10 seconds"},"time":10.014906989}'

Kubernetes logs:

[2023-11-30T22:27:50.775Z WARN  qdrant::actix::helpers] error processing request: Waiting for consensus operation commit failed. Timeout set at: 10 seconds
[2023-11-30T22:27:50.775Z INFO  actix_web::middleware::logger] 10.40.0.105 "DELETE /collections/3d995dc0510a4d59839d18acdaea4930 HTTP/1.1" 500 133 "-" "python-httpx/0.23.3" 10.015499

How can I fix this issue?

@zafercavdar zafercavdar changed the title Distributed deployment upgrade issue Consensus operation timeout issue Nov 30, 2023
@zafercavdar
Copy link
Author

zafercavdar commented Dec 1, 2023

Status update: tested the following version combinations:

  • Qdrant image: 1.2.0, 1.2.2, 1.3.2, 1.4.1, 1.5.1, 1.6.1
  • Python Client version: 1.0.5, 1.2.0, 1.3.0, 1.5.0

and still not working.

@zafercavdar
Copy link
Author

Adding a few more logs in case they'd be helpful:

2023-12-01T15:04:48.340752Z  WARN qdrant::actix::helpers: error processing request: Failed to propose operation: leader is not established within 10 secs    

@nrakover
Copy link

I'm running into what I believe is the same issue, with the difference that we're not actively scaling our Qdrant deployment -- it's just happening sporadically during usage. Will investigate whether there was some concurrent auto-scaling going on.

@Ghulam-pan
Copy link

We faced the same issue and fixed the same by increasing the min replica count to 3 in the HPA

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants