Installation Guide¶
This page provides detailed installation instructions for the HPC AI Benchmarking Orchestrator.
Prerequisites¶
Requirements¶
- Python 3.9+ installed locally
- SSH access to MeluXina supercomputer
- SLURM allocation (account:
p200981or your project account) - SSH key configured for MeluXina
- Git for cloning the repository
Installation Steps¶
1. Clone the Repository¶
cd $HOME
git clone https://github.com/ChrisKarg/team10_EUMASTER4HPC2526_challenge.git
cd team10_EUMASTER4HPC2526_challenge
2. Create Virtual Environment¶
# Create virtual environment
python -m venv .venv
# Activate it
source .venv/bin/activate # Linux/macOS
# or
.venv\Scripts\activate # Windows
# Install dependencies
pip install -r requirements.txt
3. Configure HPC Connection¶
Create or edit config.yaml in the project root:
# SSH Configuration for MeluXina
ssh:
hostname: "login.lxp.lu"
port: 8822
username: "u103227" # Your MeluXina username
key_filename: "~/.ssh/id_ed25519_mlux" # Path to your SSH key
# SLURM Configuration
slurm:
account: "p200981" # Your project account
partition: "gpu" # Default partition (gpu, cpu, fpga)
qos: "default" # Quality of Service
time: "02:00:00" # Default time limit
# Container Configuration
containers:
base_path: "$HOME/containers"
SSH Key Setup
Make sure your SSH key is added to your MeluXina account. Test with:
4. Verify Installation¶
Test the installation by checking available commands:
Expected output:
usage: main.py [-h] [--recipe RECIPE] [--status] [--list-services]
[--list-clients] [--stop-service STOP_SERVICE]
[--stop-all-services] ...
HPC AI Benchmarking Orchestrator
optional arguments:
-h, --help show this help message and exit
--recipe RECIPE Path to YAML recipe file
--status Show status of all running jobs
--list-services List available service types
--list-clients List available client types
...
5. Test SSH Connection¶
Verify the orchestrator can connect to MeluXina:
Expected output (if no jobs running):
Container Images¶
The orchestrator uses Apptainer (Singularity) containers. Container images are automatically pulled on first use, but you can pre-build them:
On MeluXina (via SSH)¶
# SSH to MeluXina
ssh -p 8822 u103227@login.lxp.lu
# Load Apptainer module
module load Apptainer/1.2.4-GCCcore-12.3.0
# Create containers directory
mkdir -p $HOME/containers
cd $HOME/containers
# Pull required images
apptainer pull ollama_latest.sif docker://ollama/ollama:latest
apptainer pull redis_latest.sif docker://redis:latest
apptainer pull chroma_latest.sif docker://chromadb/chroma:latest
apptainer pull mysql_latest.sif docker://mysql:8.0
apptainer pull prometheus.sif docker://prom/prometheus:latest
apptainer pull grafana.sif docker://grafana/grafana:latest
apptainer pull cadvisor.sif docker://gcr.io/cadvisor/cadvisor:latest
Directory Structure¶
After installation, your project should look like:
team10_EUMASTER4HPC2526_challenge/
├── main.py # CLI entry point
├── config.yaml # Your configuration
├── requirements.txt # Python dependencies
├── src/ # Core modules
│ ├── orchestrator.py
│ ├── servers.py
│ ├── clients.py
│ └── services/ # Service implementations
├── recipes/ # YAML recipes
│ ├── services/ # Service definitions
│ └── clients/ # Client definitions
├── benchmark_scripts/ # Benchmark implementations
├── scripts/ # Automation scripts
└── docs-src/ # Documentation
Troubleshooting¶
SSH Connection Issues¶
# Test SSH directly
ssh -v -i ~/.ssh/id_ed25519_mlux -p 8822 u103227@login.lxp.lu
# Check key permissions
chmod 600 ~/.ssh/id_ed25519_mlux
SLURM Account Issues¶
# On MeluXina, check your accounts
sacctmgr show associations user=$USER
# Verify account in config.yaml matches
Container Not Found¶
# On MeluXina, verify container exists
ls -la $HOME/containers/
# Check container path in recipe matches
Next Steps¶
- Quick Start - Run your first benchmark
- CLI Reference - All available commands
- Services - Configure different services
Continue to Quick Start →