Execution can't recover after crash #1440

morph-dev · 2024-09-12T08:22:43Z

While running trin execution, it happened that era1 deserialization failed (irrelevant to this issue).

When I tried to resume running it, it would fail very soon afterwards with error:
Error: database error: not found database error block_hash

After looking a bit more into it, I found the problem.

The BlockExecutor::manage_block_hash_serve_window modifies the db directly after every processed block. If the execution crashes (like it happened to me) and we try to resume it, the stored block hashes will not be the correct ones (we will have 256 blocks from the moment of crash, not the saved checkpoint).

Possible solutions:

(preferred) Keep track of block hashes in memory and flush them to this when the rest of state is flushed.
Before execution starts, make sure we have all required block hashes in db (and seed them if that's not the case)

The text was updated successfully, but these errors were encountered:

morph-dev · 2024-09-12T15:12:41Z

Alternatively, we can just never delete block_number->block_hash from the db. Clearly, not most optimized solution, but definitely the easiest one.

It's only ~64 bytes per block, so it's not the end of the world (total of ~1.2 GB for entire chain at the moment).

KolbyML · 2024-09-12T15:30:11Z

I think the right solution is to change from RocksDB to LMDB or MXDB they are both ACID compliment, so if a crash happens we wouldn't have a problem, we could set it to finalize everything once we are done doing the full block execution cycle.

Instead of doing 1 off solutions like are listed above, which won't solve the root problem

KolbyML · 2024-09-14T17:55:35Z

#1451 (comment)
#1451 (comment)

Additional comments I made on this problem, and why switching to an ACID database solves them

morph-dev · 2024-09-15T08:25:10Z

Why can't we use RocksDB? Instead of using rocksdb::DB, we can use rocksdb::TransactionDB or rocksdb::OptimisticTransactionDB.
Difference between transaction and optimistic transaction can be found here: https://github.com/facebook/rocksdb/wiki/Transactions .

I think in our case, we can even use rocksdb::DB::write. Might be the simplest solution.

KolbyML · 2024-09-15T15:17:30Z

Erigon has a write up here

https://github.com/erigontech/erigon/wiki/Choice-of-storage-engine

They tried like 5 different database solutions then ended up with MDBX.

They say it isn't ACID,

Why can't we use RocksDB? Instead of using rocksdb::DB, we can use rocksdb::TransactionDB or rocksdb::OptimisticTransactionDB. Difference between transaction and optimistic transaction can be found here: https://github.com/facebook/rocksdb/wiki/Transactions .

I think in our case, we can even use rocksdb::DB::write. Might be the simplest solution.

This looks like a good initial start, as it seems to have higher reliability than our current solution, but because various projects have pointed out issues, I am inclined to think it is a bad choice long term.

morph-dev added shelf-stable Will not be closed by stale-bot trin execution labels Sep 12, 2024

morph-dev mentioned this issue Sep 12, 2024

feat: add support for async transaction execution #1439

Merged

1 task

morph-dev mentioned this issue Sep 14, 2024

feat(trin-execution): allow ctrl+c to immediately stop execution #1451

Merged

1 task

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Execution can't recover after crash #1440

Execution can't recover after crash #1440

morph-dev commented Sep 12, 2024

morph-dev commented Sep 12, 2024

KolbyML commented Sep 12, 2024 •

edited

Loading

KolbyML commented Sep 14, 2024

morph-dev commented Sep 15, 2024

KolbyML commented Sep 15, 2024 •

edited

Loading

Execution can't recover after crash #1440

Execution can't recover after crash #1440

Comments

morph-dev commented Sep 12, 2024

morph-dev commented Sep 12, 2024

KolbyML commented Sep 12, 2024 • edited Loading

KolbyML commented Sep 14, 2024

morph-dev commented Sep 15, 2024

KolbyML commented Sep 15, 2024 • edited Loading

KolbyML commented Sep 12, 2024 •

edited

Loading

KolbyML commented Sep 15, 2024 •

edited

Loading