Fix reference count bug in partition batcher#14444
Fix reference count bug in partition batcher#14444bogdandrutu merged 3 commits intoopen-telemetry:mainfrom
Conversation
|
Hi @dmitryax , @bogdandrutu When you have time, I’d appreciate a review or any feedback you may have. Please let me know if you’d like additional tests or adjustments. Thank you for your time and guidance. |
Codecov Report✅ All modified and coverable lines are covered by tests. Additional details and impacted files@@ Coverage Diff @@
## main #14444 +/- ##
==========================================
- Coverage 91.81% 91.80% -0.02%
==========================================
Files 677 677
Lines 42677 42677
==========================================
- Hits 39184 39179 -5
- Misses 2433 2436 +3
- Partials 1060 1062 +2 ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
|
Please add a changelog entry |
|
Thank you sir @bogdandrutu for pointing this out. I will add an appropriate changelog entry under /.chloggen to document this change. |
Head branch was pushed to by a user without write access
Signed-off-by: aditya4044656 <adityakuchekar0077@gmail.com>
d0ee32a to
6e48cb4
Compare
Update component from exporter/exporterhelper to pkg/exporterhelper
Head branch was pushed to by a user without write access
|
Thank you for your contribution @aditya4044656! 🎉 We would like to hear from you about your experience contributing to OpenTelemetry by taking a few minutes to fill out this survey. |
This PR contains the following updates: | Package | Type | Update | Change | Pending | |---|---|---|---|---| | [go.opentelemetry.io/collector/component](https://github.com/open-telemetry/opentelemetry-collector) | require | minor | `v1.45.0` → `v1.50.0` | `v1.51.0` | | [go.opentelemetry.io/collector/component/componenttest](https://github.com/open-telemetry/opentelemetry-collector) | require | minor | `v0.139.0` → `v0.144.0` | `v0.145.0` | | [go.opentelemetry.io/collector/confmap](https://github.com/open-telemetry/opentelemetry-collector) | require | minor | `v1.45.0` → `v1.50.0` | `v1.51.0` | | [go.opentelemetry.io/collector/consumer](https://github.com/open-telemetry/opentelemetry-collector) | require | minor | `v1.45.0` → `v1.50.0` | `v1.51.0` | | [go.opentelemetry.io/collector/consumer/consumertest](https://github.com/open-telemetry/opentelemetry-collector) | require | minor | `v0.139.0` → `v0.144.0` | `v0.145.0` | | [go.opentelemetry.io/collector/pdata](https://github.com/open-telemetry/opentelemetry-collector) | require | minor | `v1.45.0` → `v1.50.0` | `v1.51.0` | | [go.opentelemetry.io/collector/processor](https://github.com/open-telemetry/opentelemetry-collector) | require | minor | `v1.45.0` → `v1.50.0` | `v1.51.0` | | [go.opentelemetry.io/collector/processor/processortest](https://github.com/open-telemetry/opentelemetry-collector) | require | minor | `v0.139.0` → `v0.144.0` | `v0.145.0` | | [go.uber.org/zap](https://github.com/uber-go/zap) | require | patch | `v1.27.0` → `v1.27.1` | | --- ### Release Notes <details> <summary>open-telemetry/opentelemetry-collector (go.opentelemetry.io/collector/component)</summary> ### [`v1.50.0`](https://github.com/open-telemetry/opentelemetry-collector/blob/HEAD/CHANGELOG.md#v1500v01440) ##### 🛑 Breaking changes 🛑 - `pkg/exporterhelper`: Change verbosity level for otelcol\_exporter\_queue\_batch\_send\_size metric to detailed. ([#​14278](open-telemetry/opentelemetry-collector#14278)) - `pkg/service`: Remove deprecated `telemetry.disableHighCardinalityMetrics` feature gate. ([#​14373](open-telemetry/opentelemetry-collector#14373)) - `pkg/service`: Remove deprecated `service.noopTracerProvider` feature gate. ([#​14374](open-telemetry/opentelemetry-collector#14374)) ##### 🚩 Deprecations 🚩 - `exporter/otlp_grpc`: Rename `otlp` exporter to `otlp_grpc` exporter and add deprecated alias `otlp`. ([#​14403](open-telemetry/opentelemetry-collector#14403)) - `exporter/otlp_http`: Rename `otlphttp` exporter to `otlp_http` exporter and add deprecated alias `otlphttp`. ([#​14396](open-telemetry/opentelemetry-collector#14396)) ##### 💡 Enhancements 💡 - `cmd/builder`: Avoid duplicate CLI error logging in generated collector binaries by relying on cobra's error handling. ([#​14317](open-telemetry/opentelemetry-collector#14317)) - `cmd/mdatagen`: Add the ability to disable attributes at the metric level and re-aggregate data points based off of these new dimensions ([#​10726](open-telemetry/opentelemetry-collector#10726)) - `cmd/mdatagen`: Add optional `display_name` and `description` fields to metadata.yaml for human-readable component names ([#​14114](open-telemetry/opentelemetry-collector#14114)) The `display_name` field allows components to specify a human-readable name in metadata.yaml. When provided, this name is used as the title in generated README files. The `description` field allows components to include a brief description in generated README files. - `cmd/mdatagen`: Validate stability level for entities ([#​14425](open-telemetry/opentelemetry-collector#14425)) - `pkg/xexporterhelper`: Reenable batching for profiles ([#​14313](open-telemetry/opentelemetry-collector#14313)) - `receiver/nop`: add profiles signal support ([#​14253](open-telemetry/opentelemetry-collector#14253)) ##### 🧰 Bug fixes 🧰 - `pkg/exporterhelper`: Fix reference count bug in partition batcher ([#​14444](open-telemetry/opentelemetry-collector#14444)) <!-- previous-version --> ### [`v1.49.0`](https://github.com/open-telemetry/opentelemetry-collector/blob/HEAD/CHANGELOG.md#v1490v01430) ##### 💡 Enhancements 💡 - `all`: Update semconv import to 1.38.0 ([#​14305](open-telemetry/opentelemetry-collector#14305)) - `exporter/nop`: Add profiles support to nop exporter ([#​14331](open-telemetry/opentelemetry-collector#14331)) - `pkg/pdata`: Optimize the size and pointer bytes for pdata structs ([#​14339](open-telemetry/opentelemetry-collector#14339)) - `pkg/pdata`: Avoid using interfaces/oneof like style for optional fields ([#​14333](open-telemetry/opentelemetry-collector#14333)) <!-- previous-version --> ### [`v1.48.0`](https://github.com/open-telemetry/opentelemetry-collector/blob/HEAD/CHANGELOG.md#v1480v01420) ##### 💡 Enhancements 💡 - `exporter/debug`: Add logging of dropped attributes, events, and links counts in detailed verbosity ([#​14202](open-telemetry/opentelemetry-collector#14202)) - `extension/memory_limiter`: The memorylimiter extension can be used as an HTTP/GRPC middleware. ([#​14081](open-telemetry/opentelemetry-collector#14081)) - `pkg/config/configgrpc`: Statically validate gRPC endpoint ([#​10451](open-telemetry/opentelemetry-collector#10451)) This validation was already done in the OTLP exporter. It will now be applied to any gRPC client. - `pkg/service`: Add support to disabling adding resource attributes as zap fields in internal logging ([#​13869](open-telemetry/opentelemetry-collector#13869)) Note that this does not affect logs exported through OTLP. <!-- previous-version --> ### [`v1.47.0`](https://github.com/open-telemetry/opentelemetry-collector/blob/HEAD/CHANGELOG.md#v1470v01410) ##### 🛑 Breaking changes 🛑 - `pkg/config/confighttp`: Use configoptional.Optional for confighttp.ClientConfig.Cookies field ([#​14021](open-telemetry/opentelemetry-collector#14021)) ##### 💡 Enhancements 💡 - `pkg/config/confighttp`: Setting `compression_algorithms` to an empty list now disables automatic decompression, ignoring Content-Encoding ([#​14131](open-telemetry/opentelemetry-collector#14131)) - `pkg/service`: Update semantic conventions from internal telemetry to v1.37.0 ([#​14232](open-telemetry/opentelemetry-collector#14232)) - `pkg/xscraper`: Implement xscraper for Profiles. ([#​13915](open-telemetry/opentelemetry-collector#13915)) ##### 🧰 Bug fixes 🧰 - `pkg/config/configoptional`: Ensure that configoptional.None values resulting from unmarshaling are equivalent to configoptional.Optional zero value. ([#​14218](open-telemetry/opentelemetry-collector#14218)) <!-- previous-version --> ### [`v1.46.0`](https://github.com/open-telemetry/opentelemetry-collector/blob/HEAD/CHANGELOG.md#v1460v01400) ##### 💡 Enhancements 💡 - `cmd/mdatagen`: `metadata.yaml` now supports an optional `entities` section to organize resource attributes into logical entities with identity and description attributes ([#​14051](open-telemetry/opentelemetry-collector#14051)) When entities are defined, mdatagen generates `AssociateWith{EntityType}()` methods on ResourceBuilder that associate resources with entity types using the entity refs API. The entities section is backward compatible - existing metadata.yaml files without entities continue to work as before. - `cmd/mdatagen`: Add semconv reference for metrics ([#​13920](open-telemetry/opentelemetry-collector#13920)) - `connector/forward`: Add support for Profiles to Profiles ([#​14092](open-telemetry/opentelemetry-collector#14092)) - `exporter/debug`: Disable sending queue by default ([#​14138](open-telemetry/opentelemetry-collector#14138)) The recently added sending queue configuration in Debug exporter was enabled by default and had a problematic default size of 1. This change disables the sending queue by default. Users can enable and configure the sending queue if needed. - `pkg/config/configoptional`: Mark `configoptional.AddEnabledField` as beta ([#​14021](open-telemetry/opentelemetry-collector#14021)) - `pkg/otelcol`: This feature has been improved and tested; secure-by-default redacts configopaque values ([#​12369](open-telemetry/opentelemetry-collector#12369)) ##### 🧰 Bug fixes 🧰 - `all`: Ensure service service.instance.id is the same for all the signals when it is autogenerated. ([#​14140](open-telemetry/opentelemetry-collector#14140)) <!-- previous-version --> </details> <details> <summary>uber-go/zap (go.uber.org/zap)</summary> ### [`v1.27.1`](https://github.com/uber-go/zap/releases/tag/v1.27.1) [Compare Source](uber-go/zap@v1.27.0...v1.27.1) Enhancements: - [#​1501][]: prevent `Object` from panicking on nils - [#​1511][]: Fix a race condition in `WithLazy`. Thanks to [@​rabbbit](https://github.com/rabbbit), [@​alshopov](https://github.com/alshopov), [@​jquirke](https://github.com/jquirke), [@​arukiidou](https://github.com/arukiidou) for their contributions to this release. [#​1501]: uber-go/zap#1501 [#​1511]: uber-go/zap#1511 </details> --- ### Configuration 📅 **Schedule**: Branch creation - At any time (no schedule defined), Automerge - At any time (no schedule defined). 🚦 **Automerge**: Disabled by config. Please merge this manually once you are satisfied. ♻ **Rebasing**: Whenever PR becomes conflicted, or you tick the rebase/retry checkbox. 👻 **Immortal**: This PR will be recreated if closed unmerged. Get [config help](https://github.com/renovatebot/renovate/discussions) if that's undesired. --- - [ ] <!-- rebase-check -->If you want to rebase/retry this PR, check this box --- This PR has been generated by [Renovate Bot](https://github.com/renovatebot/renovate). <!--renovate-debug:eyJjcmVhdGVkSW5WZXIiOiI0Mi4xMC41IiwidXBkYXRlZEluVmVyIjoiNDIuOTUuMiIsInRhcmdldEJyYW5jaCI6Im1haW4iLCJsYWJlbHMiOltdfQ==--> Reviewed-on: https://gitea.t000-n.de/t.behrendt/tracebasedlogsampler/pulls/25 Reviewed-by: t.behrendt <t.behrendt@noreply.localhost> Co-authored-by: Renovate Bot <renovate@t00n.de> Co-committed-by: Renovate Bot <renovate@t00n.de>
Summary
This PR fixes an off-by-one error in the partition batcher’s reference counting logic that could cause exporter errors to be silently dropped under specific error conditions.
When a batch is split into multiple requests and
MergeSplit()returns an error, the reference counter was initialized with an incorrect value due to a copy-paste mistake. This could lead to thedonecallback firing too early, before all flush operations completed.Problem Description
In
Consume, the number of references (numRefs) is intentionally incremented whenmergeSplitErris non-nil to account for the additional error callback. However, the reference counter was initialized usinglen(reqList)instead ofnumRefs.As a result, the reference count could be lower than the actual number of callbacks that would be invoked.
Buggy behavior (simplified)
This mismatch causes the underlying
donecallback to be triggered prematurely.Impact
Before this fix
donecould be invoked before all export operations completedwaitForResult, callers could observe success even when exports failedAfter this fix
doneis invoked only after all operations completeSteps to Reproduce
This issue occurs when all of the following conditions are met:
len(reqList) >= 2)MergeSplit()returns a non-nil errorIn this case, the extra error callback increases the true number of references, but the counter was initialized with a lower value, causing premature completion.
This is an edge case and does not crash or panic, which makes it difficult to detect without careful inspection or targeted testing.
Fix
The fix ensures the reference counter is initialized with the correct number of references (
numRefs) so that all callbacks are properly accounted for.Correct behavior
This aligns the logic with the already-correct implementation used earlier in the same file and restores correct lifecycle handling.
Why This Is Important
This bug is particularly hard to detect because:
However, when it does occur, it can lead to silent telemetry data loss and misleading success signals in production systems.