Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Race condition on concurrent incremental tables: other procedure is run or "Procedure is not found" error #1691

Open
yan-hic opened this issue Mar 19, 2024 · 2 comments

Comments

@yan-hic
Copy link

yan-hic commented Mar 19, 2024

Not sure if Bigquery-specific but currently, DF generates a hash value for the temporary procedure to construct the MERGE or INSERT DML for an incremental table. I have learned that this hash is based (exclusively ?) on the execution time, probably rounded to seconds at best.
This creates a problem if the same step from another workflow execution runs at the (exact) same time, as same name is used.
Results can be either wrong procedure is run or, "Procedure is not found" error if the "first" execution dropped the procedure before the "second" calls it.

Unless I am oversimplifying, this can be easily fixed by generating an UUID instead of deterministic hash for the temporary procedure name created by DF.
Raised through other channels to no avail so hoping for more traction here.

@Ekrekr
Copy link
Contributor

Ekrekr commented Aug 16, 2024

IIRC this isn't an issue in the GCP API, I'm guessing this is a bug for the CLI?

Could you give us a minimal reproducible example, it's not immediately obvious where in the code this happens.

Thanks!

@yan-hic
Copy link
Author

yan-hic commented Aug 16, 2024

This is in GCP frontend, I am not using DF CLI. So maybe this is in Google's hands but their issuetracker gets little to no traction hence I gave it a shot here.

Difficult to give an example as only reproducible in high concurrency scenarios. The Google dataform forum has several posts reporting the same "name collision". As to where in the code, I assume where the temp procedure gets its name - as said, it should be non-deterministic i.e. truly unique. Idk if that temp name is assigned by BigQuery or Dataform.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants