Add explicit shutdown to ActorCache when the Worker::Actor is shutdown by bcaimano · Pull Request #134 · cloudflare/workerd

bcaimano · 2022-10-27T20:38:58Z

Previously, we allowed scheduled flushes to continue after the Worker::Actor experienced shutdown. This meant that we would effectively commit to storage whatever write operations were in the write buffer when an actor became broken. Instead, we now cancel any scheduled flushes. There may still be in-flight flushes containing coalesced batches of writes from before the actor shut down.

bretthoerner · 2022-10-28T15:49:49Z

src/workerd/io/actor-cache.h

I think it'll be easier for me see on the ew-test side of things, but a user wouldn't actually receive this error if they broke their Actor via OOM/CPU-usage/throw-from-blockConcurrencyWhile, or if something like a storage error happened, right? I assume they'd receive the real underlying error tunneled to their eyeball worker? Otherwise I can see this being very confusing. ("Why is my Actor shutting down? What does shutting down even mean?")

Actually, maybe what I need to better understand is when they would see this error. I imagined an active Actor would only be shutdown because it was broken, implying some other exception was thrown.

Actually, maybe what I need to better understand is when they would see this error. I imagined an active Actor would only be shutdown because it was broken, implying some other exception was thrown.

Yeah, in practice this is only a consequence of marking an Actor as broken. That is the current state of the world internally without any code changes.

a user wouldn't actually receive this error if they broke their Actor via ...

Most notably, code update comes to mind as a situation where the actor is broken but JS could theoretically keep running. We want to have this case covered since we don't want to allow any future operations to work. If we had the error from the brokenness in hand, we could use that instead. That's doable, we'd just need to modify Worker::Actor::shutdown() to take an additional argument.

Most notably, code update comes to mind

Ah yeah. I get that we need a fallback error, I just wanted to be sure that they'd get the OOM/CPU/etc error if that were the root cause, and it wasn't obvious to me that that was how it worked.

Most notably, code update comes to mind as a situation where the actor is broken but JS could theoretically keep running

Thanks for calling that out. This might be slightly disruptive then in that cached reads will no longer succeed during a code upgrade, which I wouldn't be surprised if some users noticed. Not that that's a bad thing, per se -- it's nice to rule out the possibility of stale reads.

bretthoerner

Makes sense to me, thanks for weaving the exception through!

It'd be nice to see the simple ew-test equivalent before merging just to be sure this catches things like the write-after-CPU-overload issue, but I only say that because I'm not extremely confident in my ability to just "see" that in ActorCache code. Up to you. :)

bcaimano · 2022-10-28T20:45:39Z

Turns out that we needed to also shutdown the actor cache inline to the abort promise because otherwise actor cache shutdown was sequenced after the flush promise enqueuing. Fun times!

src/workerd/io/actor-cache.c++

jasnell

LGTM but let's make sure @bretthoerner and @a-robinson are both good on it before landing.

a-robinson · 2022-10-31T18:28:22Z

src/workerd/io/actor-cache.h

Most notably, code update comes to mind as a situation where the actor is broken but JS could theoretically keep running

Thanks for calling that out. This might be slightly disruptive then in that cached reads will no longer succeed during a code upgrade, which I wouldn't be surprised if some users noticed. Not that that's a bad thing, per se -- it's nice to rule out the possibility of stale reads.

a-robinson

Still looks good 👍

Nice test, by the way!

src/workerd/io/actor-cache.c++

a-robinson · 2022-11-02T18:29:29Z

src/workerd/io/worker.c++

Just wondering -- in the cases where error is null, would it be at all useful to pass down reasonCode so that ActorCache::shutdown at least has some sort of info to include in the exception it generates?

For better or worse, the set of reasonCodes that lack an associated error are pretty uninformative at the moment: Either "unknown reason" or "evicted".

Previously, we allowed scheduled flushes to continue after the Worker::Actor experienced shutdown. This meant that we would effectively commit to storage whatever write operations were in the write buffer when an actor became broken. Instead, we now cancel any scheduled flushes. There may still be in-flight flushes containing coalesced batches of writes from before the actor shut down.

bcaimano requested review from a-robinson, bretthoerner, byule, harrishancock, jasnell, jclee, kentonv and mikea as code owners October 27, 2022 20:38

bretthoerner reviewed Oct 28, 2022

View reviewed changes

bcaimano force-pushed the bcaimano/bye-storage-bye branch from 0f6d18e to 581697d Compare October 28, 2022 17:16

bcaimano requested review from bretthoerner and removed request for byule, harrishancock, jclee and mikea October 28, 2022 17:49

bretthoerner approved these changes Oct 28, 2022

View reviewed changes

bcaimano force-pushed the bcaimano/bye-storage-bye branch from 581697d to a461d4d Compare October 28, 2022 20:44

bretthoerner reviewed Oct 28, 2022

View reviewed changes

src/workerd/io/actor-cache.c++ Outdated Show resolved Hide resolved

jasnell reviewed Oct 31, 2022

View reviewed changes

src/workerd/io/actor-cache.c++ Outdated Show resolved Hide resolved

jasnell reviewed Oct 31, 2022

View reviewed changes

src/workerd/io/actor-cache.c++ Outdated Show resolved Hide resolved

jasnell approved these changes Oct 31, 2022

View reviewed changes

bcaimano force-pushed the bcaimano/bye-storage-bye branch from a461d4d to bdbcbb4 Compare October 31, 2022 17:25

a-robinson approved these changes Oct 31, 2022

View reviewed changes

bcaimano force-pushed the bcaimano/bye-storage-bye branch 3 times, most recently from 5babb3c to 7057bce Compare November 1, 2022 22:59

a-robinson approved these changes Nov 2, 2022

View reviewed changes

src/workerd/io/actor-cache.c++ Outdated Show resolved Hide resolved

a-robinson reviewed Nov 2, 2022

View reviewed changes

bcaimano force-pushed the bcaimano/bye-storage-bye branch from 7057bce to 8d0058d Compare November 2, 2022 19:11

bcaimano merged commit 8657507 into cloudflare:main Nov 2, 2022

Conversation

bcaimano commented Oct 27, 2022

Uh oh!

bretthoerner Oct 28, 2022

Choose a reason for hiding this comment

Uh oh!

bcaimano Oct 28, 2022

Choose a reason for hiding this comment

Uh oh!

bretthoerner Oct 28, 2022

Choose a reason for hiding this comment

Uh oh!

a-robinson Oct 31, 2022

Choose a reason for hiding this comment

Uh oh!

bretthoerner left a comment

Choose a reason for hiding this comment

Uh oh!

bcaimano commented Oct 28, 2022

Uh oh!

Uh oh!

Uh oh!

Uh oh!

jasnell left a comment

Choose a reason for hiding this comment

Uh oh!

a-robinson Oct 31, 2022

Choose a reason for hiding this comment

Uh oh!

a-robinson left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

a-robinson Nov 2, 2022

Choose a reason for hiding this comment

Uh oh!

bcaimano Nov 2, 2022

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants