Grafana Service¶
Grafana provides visualization dashboards for monitoring metrics.
Overview¶
| Property | Value |
|---|---|
| Type | Visualization |
| Default Port | 3000 |
| GPU Required | No |
| Container | docker://grafana/grafana:latest |
Quick Start¶
# Start full monitoring stack
./scripts/start_all_services.sh
# Create tunnel
ssh -L 3000:mel0164:3000 -N u103227@login.lxp.lu -p 8822
# Access Grafana
open http://localhost:3000
# Default: admin / admin
Recipe Configuration¶
# recipes/services/grafana.yaml
service:
name: grafana
description: "Grafana visualization"
container:
docker_source: docker://grafana/grafana:latest
image_path: $HOME/containers/grafana.sif
resources:
nodes: 1
cpus_per_task: 1
mem: "1G"
time: "02:00:00"
partition: cpu
environment:
GF_SECURITY_ADMIN_USER: "admin"
GF_SECURITY_ADMIN_PASSWORD: "admin"
GF_AUTH_ANONYMOUS_ENABLED: "true"
GF_AUTH_ANONYMOUS_ORG_ROLE: "Viewer"
ports:
- 3000
Pre-Built Dashboards¶
The orchestrator provides three pre-configured dashboards:
1. Overview Dashboard¶
Path: /d/overview/overview
Displays system-wide metrics:
- Active targets count
- Running containers
- Total CPU/Memory usage
- CPU usage by container (time series)
- Memory usage by container (time series)
- Network traffic (RX/TX)
- Scrape target status table
2. Service Monitoring Dashboard¶
Path: /d/service-monitoring/service-monitoring
Detailed per-service metrics with container selector:
- Resource overview (CPU, Memory, Network bars)
- CPU usage timeline
- Memory breakdown (total, working set, cache)
- Network throughput
- Filesystem usage
- Memory limit gauge
3. Benchmark Dashboard¶
Path: /d/benchmarks/benchmarks
Performance-focused view during benchmark runs:
- Summary statistics (Avg/Peak CPU, Memory)
- Live CPU timeline (30s window)
- Live memory timeline
- Network performance
- CPU heatmap
- Resource comparison bars
Accessing Dashboards¶
After creating SSH tunnel:
- Open http://localhost:3000
- Login:
admin/admin - Navigate to dashboards via left menu
Prometheus Datasource¶
Grafana is automatically configured to connect to Prometheus:
# Auto-configured datasource
apiVersion: 1
datasources:
- name: Prometheus
uid: prometheus
type: prometheus
access: proxy
url: http://mel0210:9090
isDefault: true
Custom Dashboards¶
Create custom dashboards via the Grafana UI:
- Click + → Create Dashboard
- Add panels with PromQL queries
- Save dashboard
Useful Queries for Panels¶
# CPU gauge
rate(container_cpu_usage_seconds_total{name=~".+"}[1m]) * 100
# Memory time series
container_memory_usage_bytes{name=~".+"}
# Network bidirectional
rate(container_network_receive_bytes_total[1m])
-rate(container_network_transmit_bytes_total[1m])
Environment Variables¶
| Variable | Description | Default |
|---|---|---|
GF_SECURITY_ADMIN_USER |
Admin username | admin |
GF_SECURITY_ADMIN_PASSWORD |
Admin password | admin |
GF_AUTH_ANONYMOUS_ENABLED |
Allow anonymous access | false |
GF_DASHBOARDS_DEFAULT_HOME_DASHBOARD_PATH |
Home dashboard | Overview |
Troubleshooting¶
Can't Connect to Prometheus¶
Check that:
- Prometheus is running (
python main.py --status) - Grafana's datasource URL points to correct node
- Both services are on same network
Dashboards Not Loading¶
- Check Grafana logs:
cat slurm-*.out - Verify provisioning directory exists
- Restart Grafana service
See also: Monitoring Overview | Prometheus