Ignore stale readable callbacks after replica sync handoff#3348
Ignore stale readable callbacks after replica sync handoff#3348sarthakaggarwal97 wants to merge 2 commits intovalkey-io:unstablefrom
Conversation
Signed-off-by: Sarthak Aggarwal <sarthagg@amazon.com>
Codecov Report❌ Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## unstable #3348 +/- ##
============================================
+ Coverage 74.98% 75.11% +0.12%
============================================
Files 129 129
Lines 71633 71636 +3
============================================
+ Hits 53715 53810 +95
+ Misses 17918 17826 -92
🚀 New features to boost your workflow:
|
|
I was chatting with @nitaicaro offline for this PR, and ideally |
|
@alon-arenberg - some more context - looking at the TLS + ae interaction, it looks like |
I noticed this crash in yesterday's daily run: https://github.com/valkey-io/valkey/actions/runs/22880939835/job/66383301850#step:10:7530
During sync, the main connection starts with
syncWithPrimary()as its read handler. After the PSYNC reply is processed, the code swaps that handler to the RDB transfer path and sets replication toREPL_STATE_TRANSFER.The race could be that a readable event for the old handler can already be queued before the handler swap is completed. When that stale callback later fires,
syncWithPrimary()runs one more time even though the replica is already in thetransfer phase.
The patch fixes two things -
syncWithPrimary()callbacks once replication is already inREPL_STATE_TRANSFERorREPL_STATE_CONNECTEDPassing CI Run 200 times - https://github.com/sarthakaggarwal97/valkey/actions/runs/22888409606/job/66406230369