Skip to content

Commit 77294e1

Browse files
docs: address issue #24139
This change was automatically generated by the documentation agent team in response to issue #24139. 🤖 Generated with cagent
1 parent 04d32f1 commit 77294e1

File tree

1 file changed

+80
-0
lines changed

1 file changed

+80
-0
lines changed

content/manuals/engine/daemon/prometheus.md

Lines changed: 80 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -150,6 +150,86 @@ traffic caused by the container you just ran.
150150

151151
![Prometheus report showing traffic](images/prometheus-graph_load.webp)
152152

153+
## Available metrics
154+
155+
Docker exposes metrics in Prometheus format. This section describes the available metrics and their meaning.
156+
157+
> [!WARNING]
158+
>
159+
> The available metrics and the names of those metrics are in active
160+
> development and may change at any time.
161+
162+
### Metric types
163+
164+
Docker metrics use the following Prometheus metric types:
165+
166+
- **Counter**: A cumulative metric that only increases (or resets to zero on restart). Use counters for values like total number of events or requests.
167+
- **Gauge**: A metric that can go up or down. Use gauges for values like current memory usage or number of running containers.
168+
- **Histogram**: A metric that samples observations and counts them in configurable buckets. Histograms expose multiple time series:
169+
- `<basename>_bucket{le="<upper_bound>"}`: Cumulative counters for observation buckets
170+
- `<basename>_sum`: Total sum of all observed values
171+
- `<basename>_count`: Count of events that have been observed
172+
173+
For histogram metrics, you can calculate averages, percentiles, and rates. For example, to calculate the average duration: `rate(<basename>_sum[5m]) / rate(<basename>_count[5m])`.
174+
175+
### Engine metrics
176+
177+
These metrics provide information about the Docker Engine's operation and resource usage.
178+
179+
| Metric | Type | Description |
180+
| ------------------------------------------- | --------- | ---------------------------------------------------------------------------------------------------------------------------- |
181+
| `engine_daemon_container_actions_seconds` | Histogram | Time taken to process container operations (start, stop, create, etc.). Labels indicate the action type. |
182+
| `engine_daemon_container_states_containers` | Gauge | Number of containers currently in each state (running, paused, stopped). Labels indicate the state. |
183+
| `engine_daemon_engine_cpus_cpus` | Gauge | Number of CPUs available on the host system. |
184+
| `engine_daemon_engine_info` | Gauge | Static information about the Docker Engine. Always set to 1. Labels provide version, architecture, and other engine details. |
185+
| `engine_daemon_engine_memory_bytes` | Gauge | Total memory available on the host system in bytes. |
186+
| `engine_daemon_events_subscribers_total` | Gauge | Number of current subscribers to Docker events. |
187+
| `engine_daemon_events_total` | Counter | Total number of events processed by the daemon. Labels indicate the event action and type. |
188+
| `engine_daemon_health_checks_failed_total` | Counter | Total number of health checks that have failed. |
189+
| `engine_daemon_health_checks_total` | Counter | Total number of health checks performed. |
190+
| `engine_daemon_host_info_functions_seconds` | Histogram | Time taken to gather host information. |
191+
| `engine_daemon_network_actions_seconds` | Histogram | Time taken to process network operations (create, connect, disconnect, etc.). Labels indicate the action type. |
192+
193+
### Swarm metrics
194+
195+
These metrics are only available when the Docker Engine is running in Swarm mode.
196+
197+
| Metric | Type | Description |
198+
| ------------------------------------------------ | --------- | ----------------------------------------------------------------------------------------------- |
199+
| `swarm_dispatcher_scheduling_delay_seconds` | Histogram | Time from task creation to scheduling decision. Measures scheduler performance. |
200+
| `swarm_manager_configs_total` | Gauge | Total number of configs in the swarm cluster. |
201+
| `swarm_manager_leader` | Gauge | Indicates if this node is the swarm manager leader (1) or not (0). |
202+
| `swarm_manager_networks_total` | Gauge | Total number of networks in the swarm cluster. |
203+
| `swarm_manager_nodes` | Gauge | Number of nodes in the swarm cluster. Labels indicate node state (ready, down, etc.). |
204+
| `swarm_manager_secrets_total` | Gauge | Total number of secrets in the swarm cluster. |
205+
| `swarm_manager_services_total` | Gauge | Total number of services in the swarm cluster. |
206+
| `swarm_manager_tasks_total` | Gauge | Total number of tasks in the swarm cluster. Labels indicate task state (running, failed, etc.). |
207+
| `swarm_node_manager` | Gauge | Indicates if this node is a swarm manager (1) or worker (0). |
208+
| `swarm_raft_snapshot_latency_seconds` | Histogram | Time taken to create and restore Raft snapshots. |
209+
| `swarm_raft_transaction_latency_seconds` | Histogram | Time taken to commit Raft transactions. Measures consensus performance. |
210+
| `swarm_store_batch_latency_seconds` | Histogram | Time taken for batch operations in the swarm store. |
211+
| `swarm_store_lookup_latency_seconds` | Histogram | Time taken for lookup operations in the swarm store. |
212+
| `swarm_store_memory_store_lock_duration_seconds` | Histogram | Duration of lock acquisitions in the memory store. |
213+
| `swarm_store_read_tx_latency_seconds` | Histogram | Time taken for read transactions in the swarm store. |
214+
| `swarm_store_write_tx_latency_seconds` | Histogram | Time taken for write transactions in the swarm store. |
215+
216+
### Using histogram metrics
217+
218+
For histogram metrics (those with `_seconds` in the name), Prometheus creates three time series:
219+
220+
- `<metric_name>_bucket`: Cumulative counters for each configured bucket
221+
- `<metric_name>_sum`: Total sum of all observed values
222+
- `<metric_name>_count`: Total count of observations
223+
224+
For example, `engine_daemon_container_actions_seconds` produces:
225+
226+
- `engine_daemon_container_actions_seconds_bucket{action="start",le="0.005"}`: Count of start actions taking ≤5ms
227+
- `engine_daemon_container_actions_seconds_bucket{action="start",le="0.01"}`: Count of start actions taking ≤10ms
228+
- `engine_daemon_container_actions_seconds_sum{action="start"}`: Total time spent on start actions
229+
- `engine_daemon_container_actions_seconds_count{action="start"}`: Total number of start actions
230+
231+
Use these to calculate percentiles, averages, and rates in your Prometheus queries.
232+
153233
## Next steps
154234

155235
The example provided here shows how to run Prometheus as a container on your

0 commit comments

Comments
 (0)