-
Notifications
You must be signed in to change notification settings - Fork 16
Open
Labels
good first issueDenotes an issue ready for a new contributor, according to the "help wanted" guidelines.Denotes an issue ready for a new contributor, according to the "help wanted" guidelines.kind/bugCategorizes issue or PR as related to a bug.Categorizes issue or PR as related to a bug.
Description
Background
Discovered during review of #159 (Fix metrics).
The MetricsCollector.Collect() method calls Reset() on gauge vectors before re-populating them. This creates a brief window during a Prometheus scrape where condition/count metrics temporarily disappear — potentially causing transient alert fires for operators with high scrape frequency or tight alerting rules.
As noted by @xrstf in the review, the correct fix is to guard the collector with a sync.RWMutex:
- Hold a write lock during the
Reset()+ repopulate sequence insideupdateObjectCountsandupdateCacheServerCounts - Hold a read lock in
Collect()while metrics are being scraped
This ensures that Prometheus scrapes see a consistent, atomic snapshot and never observe a partially-reset state.
Proposed Fix
type MetricsCollector struct {
mu sync.RWMutex
// ... existing fields
}
func (c *MetricsCollector) updateObjectCounts(...) {
c.mu.Lock()
defer c.mu.Unlock()
// Reset + repopulate
}
func (c *MetricsCollector) Collect(ch chan<- prometheus.Metric) {
c.mu.RLock()
defer c.mu.RUnlock()
// collect
}Scope
This is intentionally out of scope for #159 which focuses on the Set(1) → accumulate correctness fix. Tracked separately here as a good first issue.
/kind bug
/good-first-issue
Reactions are currently unavailable
Metadata
Metadata
Assignees
Labels
good first issueDenotes an issue ready for a new contributor, according to the "help wanted" guidelines.Denotes an issue ready for a new contributor, according to the "help wanted" guidelines.kind/bugCategorizes issue or PR as related to a bug.Categorizes issue or PR as related to a bug.