Add explicit shutdown to ActorCache when the Worker::Actor is shutdown#134
Conversation
src/workerd/io/actor-cache.h
Outdated
There was a problem hiding this comment.
I think it'll be easier for me see on the ew-test side of things, but a user wouldn't actually receive this error if they broke their Actor via OOM/CPU-usage/throw-from-blockConcurrencyWhile, or if something like a storage error happened, right? I assume they'd receive the real underlying error tunneled to their eyeball worker? Otherwise I can see this being very confusing. ("Why is my Actor shutting down? What does shutting down even mean?")
Actually, maybe what I need to better understand is when they would see this error. I imagined an active Actor would only be shutdown because it was broken, implying some other exception was thrown.
There was a problem hiding this comment.
Actually, maybe what I need to better understand is when they would see this error. I imagined an active Actor would only be shutdown because it was broken, implying some other exception was thrown.
Yeah, in practice this is only a consequence of marking an Actor as broken. That is the current state of the world internally without any code changes.
a user wouldn't actually receive this error if they broke their Actor via ...
Most notably, code update comes to mind as a situation where the actor is broken but JS could theoretically keep running. We want to have this case covered since we don't want to allow any future operations to work. If we had the error from the brokenness in hand, we could use that instead. That's doable, we'd just need to modify Worker::Actor::shutdown() to take an additional argument.
There was a problem hiding this comment.
Most notably, code update comes to mind
Ah yeah. I get that we need a fallback error, I just wanted to be sure that they'd get the OOM/CPU/etc error if that were the root cause, and it wasn't obvious to me that that was how it worked.
There was a problem hiding this comment.
Most notably, code update comes to mind as a situation where the actor is broken but JS could theoretically keep running
Thanks for calling that out. This might be slightly disruptive then in that cached reads will no longer succeed during a code upgrade, which I wouldn't be surprised if some users noticed. Not that that's a bad thing, per se -- it's nice to rule out the possibility of stale reads.
0f6d18e to
581697d
Compare
bretthoerner
left a comment
There was a problem hiding this comment.
Makes sense to me, thanks for weaving the exception through!
It'd be nice to see the simple ew-test equivalent before merging just to be sure this catches things like the write-after-CPU-overload issue, but I only say that because I'm not extremely confident in my ability to just "see" that in ActorCache code. Up to you. :)
581697d to
a461d4d
Compare
|
Turns out that we needed to also shutdown the actor cache inline to the abort promise because otherwise actor cache shutdown was sequenced after the flush promise enqueuing. Fun times! |
jasnell
left a comment
There was a problem hiding this comment.
LGTM but let's make sure @bretthoerner and @a-robinson are both good on it before landing.
a461d4d to
bdbcbb4
Compare
src/workerd/io/actor-cache.h
Outdated
There was a problem hiding this comment.
Most notably, code update comes to mind as a situation where the actor is broken but JS could theoretically keep running
Thanks for calling that out. This might be slightly disruptive then in that cached reads will no longer succeed during a code upgrade, which I wouldn't be surprised if some users noticed. Not that that's a bad thing, per se -- it's nice to rule out the possibility of stale reads.
5babb3c to
7057bce
Compare
a-robinson
left a comment
There was a problem hiding this comment.
Still looks good 👍
Nice test, by the way!
src/workerd/io/worker.c++
Outdated
There was a problem hiding this comment.
Just wondering -- in the cases where error is null, would it be at all useful to pass down reasonCode so that ActorCache::shutdown at least has some sort of info to include in the exception it generates?
There was a problem hiding this comment.
For better or worse, the set of reasonCodes that lack an associated error are pretty uninformative at the moment: Either "unknown reason" or "evicted".
Previously, we allowed scheduled flushes to continue after the Worker::Actor experienced shutdown. This meant that we would effectively commit to storage whatever write operations were in the write buffer when an actor became broken. Instead, we now cancel any scheduled flushes. There may still be in-flight flushes containing coalesced batches of writes from before the actor shut down.
7057bce to
8d0058d
Compare
Previously, we allowed scheduled flushes to continue after the Worker::Actor experienced shutdown. This meant that we would effectively commit to storage whatever write operations were in the write buffer when an actor became broken. Instead, we now cancel any scheduled flushes. There may still be in-flight flushes containing coalesced batches of writes from before the actor shut down.