Prometheus Metrics¶
Prometheus collects and stores metrics from cAdvisor sidecars running alongside services.
Accessing Prometheus¶
# Create SSH tunnel
ssh -i ~/.ssh/id_ed25519_mlux -L 9090:mel0210:9090 -N u103227@login.lxp.lu -p 8822
# Open Prometheus UI
open http://localhost:9090
Configuration¶
Prometheus is automatically configured to scrape cAdvisor endpoints:
# Auto-generated: $HOME/prometheus/config/prometheus.yml
global:
scrape_interval: 15s
evaluation_interval: 15s
scrape_configs:
- job_name: 'prometheus'
static_configs:
- targets: ['localhost:9090']
- job_name: 'ollama-cadvisor'
static_configs:
- targets: ['mel2073:8080']
labels:
service: 'ollama'
instance: 'mel2073'
- job_name: 'redis-cadvisor'
static_configs:
- targets: ['mel0182:8080']
labels:
service: 'redis'
instance: 'mel0182'
PromQL Query Reference¶
CPU Queries¶
# Instant CPU usage rate (per container)
rate(container_cpu_usage_seconds_total{name=~".+"}[5m])
# Average CPU across all containers
avg(rate(container_cpu_usage_seconds_total{name=~".+"}[5m]))
# Max CPU usage
max(rate(container_cpu_usage_seconds_total{name=~".+"}[5m]))
# CPU for specific container
rate(container_cpu_usage_seconds_total{name="ollama"}[5m])
# CPU percentage (assuming 1 core = 100%)
rate(container_cpu_usage_seconds_total{name=~".+"}[5m]) * 100
Memory Queries¶
# Current memory usage
container_memory_usage_bytes{name=~".+"}
# Memory working set (more accurate for actual usage)
container_memory_working_set_bytes{name=~".+"}
# Memory cache (page cache)
container_memory_cache{name=~".+"}
# Memory as percentage of limit
container_memory_usage_bytes{name=~".+"} /
container_spec_memory_limit_bytes{name=~".+"} * 100
# Total memory across all containers
sum(container_memory_usage_bytes{name=~".+"})
# Memory growth rate
rate(container_memory_usage_bytes{name=~".+"}[5m])
Network Queries¶
# Receive rate (bytes/sec)
rate(container_network_receive_bytes_total{name=~".+"}[5m])
# Transmit rate (bytes/sec)
rate(container_network_transmit_bytes_total{name=~".+"}[5m])
# Total bandwidth (RX + TX)
rate(container_network_receive_bytes_total{name=~".+"}[5m]) +
rate(container_network_transmit_bytes_total{name=~".+"}[5m])
# Packet rate
rate(container_network_receive_packets_total{name=~".+"}[5m])
Filesystem Queries¶
# Filesystem usage
container_fs_usage_bytes{name=~".+"}
# Filesystem usage percentage
container_fs_usage_bytes{name=~".+"} /
container_fs_limit_bytes{name=~".+"} * 100
Target Status¶
Query via CLI¶
# Simple query
python main.py --query-metrics prometheus_xxx "up"
# Container memory
python main.py --query-metrics prometheus_xxx "container_memory_usage_bytes"
# With label filter
python main.py --query-metrics prometheus_xxx 'container_cpu_usage_seconds_total{name="ollama"}'
# Rate query
python main.py --query-metrics prometheus_xxx 'rate(container_cpu_usage_seconds_total[5m])'
Metric Labels¶
Common Labels¶
| Label | Description | Example |
|---|---|---|
name |
Container name | ollama, redis |
instance |
Node hostname | mel2073:8080 |
job |
Prometheus job name | ollama-cadvisor |
service |
Service identifier | ollama |
Filtering by Label¶
# Specific container
container_memory_usage_bytes{name="ollama"}
# Multiple containers
container_memory_usage_bytes{name=~"ollama|redis"}
# By job
container_cpu_usage_seconds_total{job="ollama-cadvisor"}
# Exclude pattern
container_memory_usage_bytes{name!~"cadvisor"}
Aggregation Functions¶
Sum¶
# Total memory
sum(container_memory_usage_bytes{name=~".+"})
# Sum by label
sum by (name) (container_memory_usage_bytes{name=~".+"})
Average¶
Max/Min¶
# Peak memory
max(container_memory_usage_bytes{name=~".+"})
# Min CPU
min(rate(container_cpu_usage_seconds_total{name=~".+"}[5m]))
Count¶
Time Functions¶
Rate¶
# Per-second rate over 5 minutes
rate(container_cpu_usage_seconds_total[5m])
# Use $__rate_interval in Grafana for auto-adjustment
rate(container_cpu_usage_seconds_total[$__rate_interval])
Increase¶
Delta¶
Data Retention¶
Configure retention in Prometheus recipe:
See also: Monitoring Overview | Grafana Dashboards