EW-9372 EW-9455 [o11y] Prepare reporting SpanOpen event earlier#5370
EW-9372 EW-9455 [o11y] Prepare reporting SpanOpen event earlier#5370
Conversation
20b779f to
b6452a3
Compare
Merging this PR will not alter performance
Comparing Footnotes
|
|
Tests fail, but that just might be from being out of sync or ontop of something old? A short PR description would help. |
|
Closing this for now – delivering SpanOpen earlier would make it more difficult to implement renaming spans, which we may support in the future. |
|
We should definitely reopen this and send SpanOpen events when the spans open and not when they close. |
7db22e8 to
c73e84d
Compare
6d91a06 to
538c76c
Compare
|
The generated output of |
Codecov Report❌ Patch coverage is Additional details and impacted files@@ Coverage Diff @@
## main #5370 +/- ##
==========================================
- Coverage 70.74% 70.69% -0.05%
==========================================
Files 420 420
Lines 112938 113018 +80
Branches 18517 18533 +16
==========================================
+ Hits 79894 79902 +8
- Misses 22007 22075 +68
- Partials 11037 11041 +4 ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
538c76c to
c5c817f
Compare
The coverage percentage appears lower than it should be here as some functions are only used upstream/not used at all until the follow-up PR. I'm convinced that our coverage doesn't actually get worse when taking that into account. |
c5c817f to
cc51938
Compare
|
In the downstream changes, I see some refused implementations which suggests the interface is wrong. I think moving to an event driven interface could help. I'm assuming most of the test changes are quirks of the testing harnesses, ordering and attribute association, rather than actually functional changes. |
cc51938 to
c302bb5
Compare
I believe what you're referring to are KJ_UNIMPLEMENTED or failing asserts being used for addSpan() and submitSpan() implementations downstream and in the follow-up PR? These functions are marked as such based on different reasons, let's look at them individually:
Yup, the user-facing change here is that the SpanOpen event is reported earlier, which results in a different order of spans in some cases. |
|
The recent push I think helped. It still feels we're working around an issue rather than fixing it Any reason we couldn't do something like that? I haven't considered all the details, but maybe it helps clean up the span time handling and the duplicate/dead code issues. Worth a shot or no? |
|
Internal tracing needs the full span != we must pass the full span. The observer could assemble the span. onOpen would actually be used by both. newChild is unchanged but included the sketch as we're not deleting it. onUpdateName was included to show how we'd extend the interface to support it. getTime isn't currently used on both sides of the span time measurement. |
5bc70a9 to
30bc3e2
Compare
|
If the interface changes are not feasible, could you explain why? |
I'm sorry, but after implementing this I still don't see the upside of that approach. See the changes at felix/102125-stw-cleanup-mar-api (in workerd and downstream):
I'm happy to add additional code comments to explain why the existing approach appears to be right to me (if we only had to support user tracing and not internal tracing here it would be a different story). If I'm missing something here, feel free to edit the branches to show the advantages of the interface you're proposing. |
|
Note that I still have to consider the suggested adjustSpanTime change – my impression was that it would not reduce complexity and merely move it around but it'll be easier to evaluate once implemented. |
|
Overall, I think on this it's likely good to go with the current approach as long as there are no obvious logical flaws. The approach can be iterated on if necessary in follow up PRs. |
That branch is mostly cosmetic, renaming reportStart -> onOpen and report -> onClose. It's also not, form the span, send part of the span and form again. The main substantive changes are to not send the span, rather assembling it the observer, and utilize getTime properly. Can we try that instead? |
|
We're missing the forest for the trees. The core issue isn't renaming methods. I think the right path is to narrow onClose to end only data, properly call getTime and have the internal observer assemble spans. @jasnell I'd rather we properly explore this than keep building around the current interface. The workarounds are expanding, the proposed change isn't large and follow up PRs rarely happen. |
|
I stand by what I said earlier, happy to discuss on Wednesday if needed. |
|
The follow-up PRs have been updated to include changes from mar/spanapi, including the event-based API. That should unblock reviews for the initial step. |
30bc3e2 to
e326304
Compare
|
Looking at the end state in felix/102125-stw-cleanup-p2: Both systems use the interface but each only uses half of it - they haven't been updated to use the same methods. What actually changed here? Either split the interface into two separate abstractions or update both to use the same one. I'd recommend the latter. |
mar-cf
left a comment
There was a problem hiding this comment.
Apologies, I was looking at a stale commit. I see it was update now.
LGTM!
This PR serves to perform two long-standing cleanup tasks in the STW implementation: 1) Sending the SpanOpen event as soon as a span is opened instead of when it closes 2) Getting rid of the CompleteSpan struct, which represents a full span but is something that won't be needed once SpanOpen is handled separately. To implement this in a backwards-compatible way, we need to land it in two parts so that the old code path and the new code path are both supported until we have phased out the old version which doesn't have the APIs for handling SpanOpen separately.
e326304 to
a0e61e3
Compare
Purpose:
This PR serves to perform two long-standing cleanup tasks in the STW implementation:
To implement this in a backwards-compatible way, we need to land it in two parts so that the old code path and the new code path are both supported until we have phased out the old version which doesn't have the APIs for handling SpanOpen separately.
For code that is workerd-only and thus never involved in RPC or that is solely on the RPC server side, we can already decompose function calls so that we don't need to implement the sape functionality twice. This needs to land alongside a downstream PR. A follow-up PR will actually invoke the code path to send SpanOpen first, get rid of CompleteSpan struct and perform a bunch of cleanup – see #6051.
Note that: