GraFlag Architecture

Technical architecture reference for the GraFlag distributed benchmarking platform for Graph Anomaly Detection (GAD).

Table of Contents

System Overview
Design Principles
Component Diagram
Component Descriptions
Execution Flow
Shared Libraries
Data Flow
Experiment Lifecycle
Docker Swarm Service Model
Configuration
Experiment Output Structure

System Overview

GraFlag is a distributed benchmarking platform that runs Graph Anomaly Detection methods across a Docker Swarm cluster. The platform provides:

Distributed execution of GAD methods on GPU-equipped worker nodes via Docker Swarm.
Standardized result collection through the graflag_runner library, which wraps method execution with resource monitoring and result serialization.
Automated evaluation via graflag_evaluator, which computes metrics (AUC-ROC, AUC-PR, etc.) and generates plots.
Multiple client interfaces: a CLI for scripting, a Python API for programmatic use, and a web GUI for interactive management.
NFS-based shared storage for methods, datasets, experiments, and shared libraries, accessible by all cluster nodes.

Design Principles

Container isolation: Each method runs in its own Docker container with a defined Dockerfile, ensuring reproducibility and dependency isolation.
One-shot execution: Services use --restart-condition none, meaning each experiment runs exactly once. There is no automatic retry on failure.
Environment variable injection: Method parameters flow from the CLI through Docker environment variables into the container, using an underscore prefix convention (_PARAM_NAME=value).
Result standardization: All methods produce a results.json file conforming to a fixed schema, enabling uniform evaluation across heterogeneous methods.
Separation of concerns: The orchestration layer (CLI/API) never executes method code directly. It builds images, deploys services, and reads results from shared storage.
Shared storage as the integration point: NFS provides a single namespace (/shared or configurable) visible to all nodes, used for datasets, method source, experiment output, and shared libraries.

Component Diagram

                          +-------------------------------------------+
                          |              Client Machine                |
                          |                                            |
                          |  +--------+  +----------+  +-----------+  |
                          |  |  CLI   |  |   GUI    |  | Python    |  |
                          |  | cli.py |  | Flask +  |  | API       |  |
                          |  |        |  | Vue.js   |  | api.py    |  |
                          |  +---+----+  +----+-----+  +-----+-----+  |
                          |      |            |              |         |
                          |      v            v              v         |
                          |  +--------------------------------------+  |
                          |  |          GraFlag Core (core.py)      |  |
                          |  |         returns dataclasses          |  |
                          |  +--+-------------------------------+---+  |
                          |     |                               |      |
                          |  +--v-----------+  +----------------v--+   |
                          |  | SSHManager   |  | DockerManager     |   |
                          |  | (ssh.py)     |  | (docker_ops.py)   |   |
                          |  | file ops,    |  | Docker SDK via    |   |
                          |  | rsync, build |  | SSH tunnel        |   |
                          |  +------+-------+  +--------+----------+   |
                          +---------|-------------------|---------------+
                                    |                   |
                              SSH / rsync          SSH tunnel to
                                    |            Docker socket
                                    |                   |
                    +---------------------------v-----------------------------+
                    |                    Manager Node                         |
                    |                                                         |
                    |  +----------------+   +-----------------------------+   |
                    |  | Docker Swarm   |   |  Local Registry             |   |
                    |  | Manager        |   |  (registry:2 on port 5000)  |   |
                    |  +-------+--------+   +-----------------------------+   |
                    |          |                                               |
                    |          |  NFS Export: /shared                          |
                    |          |  +---------------------------------------+   |
                    |          |  | methods/ | datasets/ | experiments/   |   |
                    |          |  | libs/    |           |                |   |
                    |          |  +---------------------------------------+   |
                    +----------|----------------------------------------------+
                               |
                  Docker Swarm scheduling
                               |
          +--------------------+---------------------+
          |                    |                      |
    +-----v------+     +------v-----+     +----------v--+
    | Worker 1   |     | Worker 2   |     | Worker N    |
    |            |     |            |     |             |
    | GPU(s)     |     | GPU(s)     |     | GPU(s)      |
    | NFS mount  |     | NFS mount  |     | NFS mount   |
    | /shared    |     | /shared    |     | /shared     |
    +------------+     +------------+     +-------------+

Component Descriptions

graflag (CLI, GUI, DevCluster, and Orchestration)

The main Python package located in graflag/graflag/. It provides the command-line interface, core orchestration logic, a Python API, a web GUI, and a development cluster tool – all installable as a single package via pip install -e . and accessible through the graflag command.

cli.py – Command Line Interface

Entry point for all user-facing commands. Uses argparse with the following commands:

Command	Description
`setup`	Initialize Docker Swarm and join worker nodes
`run`	Build image and deploy experiment service
`status`	Show cluster status and shared directory contents
`list`	List methods, datasets, experiments, or services
`logs`	View experiment logs (supports `--follow`, `--tee`)
`stop`	Stop a running experiment (optional `--rm`)
`evaluate`	Run evaluation on a completed experiment
`copy`	Transfer files between local and remote (rsync)
`sync`	Sync a method or library directory to remote
`gui`	Start the web dashboard
`devcluster`	Deploy or tear down a local Docker Compose development cluster

Key flags for run:

--from-config CONFIG_FILE: Replay an experiment from a saved service_config.json.
--params KEY=VALUE: Override method parameters at runtime.
--no-gpu: Disable GPU resource reservation.
--build: Build (or rebuild) the Docker image before deploying.

core.py – GraFlag Orchestration Class

Central orchestration class (GraFlag) that coordinates all operations. All public methods return structured data (dataclasses from models.py) rather than printing to stdout. The CLI formats the returned data for terminal display.

Key responsibilities:

Initialization: Loads GraflagConfig, creates SSHManager and DockerManager instances.
status(): Returns a ClusterInfo dataclass with nodes, services, and shared directory contents.
run(): Validates method and dataset existence on remote, creates experiment directory, optionally builds the Docker image, deploys the Docker Swarm service, follows logs, and returns the experiment name.
list_methods(): Returns List[MethodInfo] with method metadata parsed from .env files.
list_datasets(): Returns List[DatasetInfo] with size and file count.
list_experiments(): Returns List[ExperimentInfo] with status, sorted by timestamp.
get_experiment_results(): Returns ExperimentResults parsed from results.json.
get_evaluation_results(): Returns EvaluationResults parsed from eval/evaluation.json.
evaluate(): Deploys a graflag-evaluator service against a completed experiment.
register_metric(): Saves a custom metric function as a plugin file on the cluster (global or per-experiment) so the evaluator can load it at runtime.
copy_files(): Bidirectional file transfer using rsync over SSH.
sync(): Sync a local method or library directory to the remote shared storage.
Status management: Writes status.json to experiment directories during build and on failure.

models.py – Data Models

Shared dataclass definitions used by core, api, and GUI:

Dataclass	Description
`ClusterInfo`	Cluster status, nodes, services, shared dir
`MethodInfo`	Method name, description, parameters, files
`DatasetInfo`	Dataset name, path, size, file count
`ExperimentInfo`	Experiment status, results/eval availability
`ExperimentResults`	Parsed results.json with execution metrics
`EvaluationResults`	Parsed evaluation.json with metrics and plots
`BenchmarkProgress`	Progress tracking for experiment execution

All dataclasses implement to_dict() for JSON serialization.

config.py – Configuration Management

GraflagConfig loads configuration from a .env file and exposes typed properties.

Config resolution order:

Explicit path passed via --config flag.
.env file in the current working directory.
Standard location: ~/.config/graflag/config.env.

graflag setup runs an interactive wizard (init_config()) that prompts for values and stores them in the standard location.

Property	Env Variable	Default	Description
`manager_ip`	`MANAGER_IP`	(required)	IP address of the Swarm manager node
`ssh_port`	`SSH_PORT`	`22`	SSH port on the manager
`ssh_key`	`SSH_KEY`	`~/.ssh/id_ed25519`	Path to SSH private key
`remote_shared_dir`	`SHARED_DIR`	`/shared`	Path to shared storage on remote
`nfs_port`	`NFS_PORT`	`2049`	NFS port for mounting
`hosts_file`	`HOSTS_FILE`	`hosts.yml`	Path to hosts.yml with worker IPs

ssh.py – SSH Operations

SSHManager provides remote command execution and file transfer:

execute(command): Runs a command on the manager via ssh subprocess. Always connects as root@MANAGER_IP.
copy_files(): Bidirectional rsync over SSH. Handles both local-to-remote and remote-to-local transfers.
path_exists(), read_file(), mkdir(), list_dir(): File system operations on the remote via SSH.

All SSH commands use -o StrictHostKeyChecking=no and authenticate with the configured SSH key.

docker_ops.py – Docker Swarm Operations

DockerManager handles Docker Swarm interactions using the Docker SDK for Python (docker-py) connected via an SSH tunnel to the remote Docker daemon.

Connection model: An SSH tunnel forwards a local TCP port to the remote Docker socket (/var/run/docker.sock). The Docker SDK connects to tcp://localhost:<tunnel_port>, providing native Python access to the Docker API. The tunnel is established lazily on first use and reused across operations.

Operations using Docker SDK (native Python API):

Swarm management: setup_swarm_manager() via client.swarm.init(), get_swarm_token() via client.swarm.attrs, get_nodes() via client.nodes.list().
Registry: setup_local_registry() via client.services.create().
Service creation: create_service() via client.services.create() with ServiceMode, RestartPolicy, Resources, Mount, and network configuration.
Service lifecycle: list_services(), stop_service(), get_service_logs(), follow_service_logs() via service.logs(follow=True, stream=True).
Cluster status: get_cluster_status() via client.info() and node/service listing.

Operations using SSH (build context is on the remote host):

Image building: build_method_image() – runs docker build and docker push via SSH because the build context (shared directory with methods and libs) resides on the remote host.
Evaluator image: build_evaluator_image() – same rationale.

Worker setup: setup_workers() uses SSH to execute docker swarm join on each worker node, since the Docker SDK connection only reaches the manager.

Reserved environment variables (defined in ReservedEnvVars enum): DATA, EXP, METHOD_NAME, COMMAND, MONITOR_INTERVAL. These cannot be overridden by user-supplied --params.

api.py – Python API

GraFlagAPI is a thin error-safe wrapper around GraFlag core, designed for GUI integration. Since core now returns structured data directly, the API layer:

Delegates all operations to self.core (no duplicated logic).
Wraps each call in try/except, returning empty lists or None on error instead of raising (prevents GUI crashes).
Maintains the same public interface used by the GUI (backward-compatible).
All returned dataclasses implement to_dict() for JSON serialization.

gui/ – Web Interface (subpackage)

A Flask application with a Vue.js frontend, located in graflag/graflag/gui/. Accessible via graflag gui [--host HOST] [--port PORT] [--debug].

Backend (server.py):

Built on Flask with Flask-SocketIO for real-time updates.
Uses GraFlagAPI as its data layer.
REST endpoints under /api/ for methods, datasets, experiments, services, run, evaluation, logs, and plot serving.
Run, evaluation, stop, and delete operations execute in background threads to avoid blocking HTTP responses.
Plot images are streamed from the remote via SSH + base64 encoding.
Server-side caching for methods and datasets (30-second TTL).

Real-time updates:

WebSocket connection via Socket.IO.
A background_updater thread polls experiments and services every 2 seconds.
Updates are broadcast to all connected clients only when state changes (change detection via comparison with last_state).

Frontend:

Vue.js single-page application served from Flask templates.
Communicates with the backend via REST API and WebSocket events.

devcluster/ – Development Cluster (subpackage)

A Docker Compose-based virtual cluster for local development and testing, located in graflag/graflag/devcluster/. Accessible via graflag devcluster --hosts hosts.yml [--pubkey KEY]. Use graflag devcluster --down to stop and remove the cluster.

Components:

docker-compose.yml: Defines a manager node and multiple worker nodes (up to 4) on a bridged network (192.168.100.0/24).
deploy.sh: Automated deployment script that generates SSH keys, builds the Docker Compose configuration from hosts.yml, and starts the cluster.
manager/: Dockerfile and configuration for the manager container (Docker-in-Docker with SSH, NFS server).
worker/: Dockerfile and configuration for worker containers (Docker-in-Docker with SSH, NFS client).

Key characteristics:

All containers run in privileged mode (required for Docker-in-Docker and NFS).
GPU passthrough is configured via deploy.resources.reservations.devices.
Static IP assignment on a custom bridge network.
SSH key exchange is handled during build time via build arguments.

graflag-shared (NFS Shared Storage)

The shared storage directory is NFS-exported by the manager node and mounted on all worker nodes. Structure:

graflag-shared/
+-- methods/              # GAD method implementations
|   +-- <method_name>/
|       +-- .env          # Method metadata and default parameters
|       +-- Dockerfile    # Container definition
|       +-- train_graflag.py  # GraFlag integration script
|       +-- ...
+-- datasets/             # Benchmark datasets
|   +-- <dataset_name>/
|       +-- ...
+-- experiments/          # Experiment results (one directory per run)
|   +-- exp__<method>__<dataset>__<timestamp>/
|       +-- results.json
|       +-- status.json
|       +-- service_config.json
|       +-- service_details.json
|       +-- method_output.txt
|       +-- build.log
|       +-- training.csv
|       +-- resources.csv
|       +-- eval/
+-- libs/                 # Shared Python libraries
    +-- graflag_runner/
    +-- graflag_evaluator/
    |   +-- plugins/      # Global custom metric plugins
    +-- graflag_bond/

Execution Flow

The complete lifecycle of an experiment:

1. Command Parsing

graflag run -m taddy -d uci --build --params MAX_EPOCH=100

The CLI parses arguments, instantiates GraFlag, and calls run().

2. Validation

SSH to manager to verify methods/taddy/ exists.
SSH to manager to verify datasets/uci/ exists.
Create experiment directory: experiments/exp__taddy__uci__20260309_143000/.

3. Image Build (if `--build`)

status.json <- {"status": "building"}

docker build --network=host \
  -f /shared/methods/taddy/Dockerfile \
  -t taddy:latest \
  -t MANAGER_IP:5000/taddy:latest \
  /shared/

docker push MANAGER_IP:5000/taddy:latest

The build context is the entire shared directory, giving the Dockerfile access to libs/ for installing graflag_runner and other shared libraries.

Build output is saved to experiments/exp__*/build.log.

4. Service Deployment

Using the Docker SDK via SSH tunnel, the equivalent of:

docker service create --quiet -d \
  --name exp__taddy__uci__20260309_143000 \
  --restart-condition none \
  --network host \
  --generic-resource NVIDIA-GPU=0 \
  --env METHOD_NAME=taddy \
  --env DATA=/shared/datasets/uci/ \
  --env EXP=/shared/experiments/exp__taddy__uci__20260309_143000/ \
  --env _MAX_EPOCH=100 \
  --mount type=bind,source=/shared,target=/shared \
  MANAGER_IP:5000/taddy:latest

is performed via client.services.create() with ServiceMode, RestartPolicy, Resources, Mount, and env parameters. The method’s .env is read and merged into the env list (rather than using --env-file), giving the orchestrator full control over parameter precedence.

Service configuration is saved to service_config.json before deployment. Service details (including assigned worker node) are saved to service_details.json after creation.

5. Method Execution (inside container)

The container’s CMD invokes graflag_runner:

python3 -m graflag_runner --pass-env-args

MethodRunner.from_env() reads environment variables and:

Extracts _-prefixed env vars and converts them to CLI arguments (e.g., _MAX_EPOCH=100 becomes --max_epoch 100).
Validates dataset compatibility if SUPPORTED_DATASETS is defined.
Writes status.json with status running.
Starts ResourceMonitor in a background thread (tracks CPU, memory, GPU usage via psutil and nvidia-smi).
Executes the method command via subprocess with real-time output capture.
On completion, writes final status.json with status completed or failed, execution time, and resource summary.
Saves captured output to method_output.txt.

The method code itself uses ResultWriter to produce results.json:

from graflag_runner import ResultWriter

writer = ResultWriter()
# ... run detection algorithm ...
writer.save_scores(result_type="EDGE_STREAM_ANOMALY_SCORES", scores=scores, ground_truth=gt)
writer.add_metadata(method_name="taddy", dataset="uci")
writer.add_resource_metrics(exec_time_ms=45230.15, peak_memory_mb=2048.5, peak_gpu_mb=4096.0)
writer.finalize()  # Writes results.json

6. Evaluation (optional, triggered separately)

graflag evaluate -e exp__taddy__uci__20260309_143000

This deploys a separate graflag-evaluator Docker service that:

Loads results.json from the experiment directory.
Computes metrics (AUC-ROC, AUC-PR, etc.) via MetricCalculator.
Generates plots (ROC curve, PR curve, score distribution, spot curves) via PlotGenerator.
Saves evaluation.json and plot PNGs to the eval/ subdirectory.

Shared Libraries

graflag_runner

Location: graflag-shared/libs/graflag_runner/

The execution framework installed inside method containers. Provides:

Module	Description
`runner.py`	`MethodRunner` – main execution wrapper. Manages lifecycle, resource monitoring, status tracking, and output capture.
`results.py`	`ResultWriter` – standardized result serialization. Supports `save_scores()`, `add_metadata()`, `add_resource_metrics()`, `spot()` (CSV-based metric tracking), and `finalize()`.
`monitor.py`	`ResourceMonitor` – background thread tracking CPU memory (via `psutil`), GPU memory (via `nvidia-smi`), with periodic CSV logging through `ResultWriter.spot()`.
`streaming.py`	`StreamableArray` and `stream_write_json` – memory-efficient streaming for large score arrays (writes JSON row-by-row without loading entire arrays into memory).
`subprocess_utils.py`	`run_with_realtime_output()` – subprocess execution with live stdout/stderr capture and forwarding.
`logging_utils.py`	Simple logging functions (`debug`, `info`, `warning`, `error`, `critical`, `exception`).
`__main__.py`	Module entry point for `python -m graflag_runner`.

ResultWriter.spot() method: Tracks real-time metrics to CSV files during training. Each metric group (e.g., "training", "validation", "resources") gets its own CSV file. Schema is locked after the first call – subsequent calls must provide the same metric keys. The first column is always a Unix timestamp.

writer.spot("training", epoch=1, loss=0.5, auc=0.85)
writer.spot("training", epoch=2, loss=0.3, auc=0.90)

graflag_bond

Location: graflag-shared/libs/graflag_bond/

A wrapper library for running PyGOD anomaly detection methods through GraFlag. Provides:

Module	Description
`detectors.py`	`BondDetector` – dynamic PyGOD detector registry. Discovers all detector classes from `pygod.detector` via introspection. Maps method names (with optional `bond_` prefix) to detector classes.
`train.py`	Training logic for PyGOD detectors within the GraFlag framework.
`utils.py`	Utility functions including `get_all_parameters()` for detector parameter discovery.

Usage pattern: methods prefixed with bond_ (e.g., bond_dominant) use this library to instantiate and train PyGOD detectors without writing boilerplate code. Their .env sets COMMAND=python3 -m graflag_bond.train and their Dockerfile uses CMD ["python3", "-m", "graflag_runner"] (without --pass-env-args).

graflag_evaluator

Location: graflag-shared/libs/graflag_evaluator/

The evaluation framework that computes metrics and generates plots. Runs as a Docker service.

Module	Description
`evaluator.py`	`Evaluator` – orchestrator that loads results, computes metrics, generates plots, and saves `evaluation.json`. Loads custom metric plugins on init.
`metrics.py`	`MetricCalculator` – computes AUC-ROC, AUC-PR, and other metrics based on result type. Handles ragged arrays (e.g., temporal edge scores with varying snapshot sizes). Supports plugin-based custom metrics via `register_metric()` and `load_plugins()`.
`plots.py`	`PlotGenerator` – generates ROC curves, PR curves, score distributions, and spot curves from CSV files.
`run_evaluation.py`	Entry point for the evaluation Docker container.
`plugins/`	Directory for global custom metric plugins (`.py` files loaded at evaluation time).

Custom metric plugins: The evaluator automatically loads .py files from two directories before computing metrics:

Global plugins: graflag-shared/libs/graflag_evaluator/plugins/ – applied to all evaluations.
Per-experiment plugins: experiments/<exp_name>/custom_metrics/ – scoped to a single experiment.

Each plugin file imports MetricCalculator and calls register_metric() at module level. Plugins can be created manually or via the Python API (GraFlag.register_metric()), which extracts the function source and writes it as a plugin file on the cluster.

Data Flow

Environment Variable Injection

Parameters flow through three layers:

Method .env file         CLI --params            Docker Service Env
+------------------+    +------------------+    +------------------+
| METHOD_NAME=taddy|    | MAX_EPOCH=100    | -> | METHOD_NAME=taddy|
| COMMAND=python...|    | LEARNING_RATE=.01|    | COMMAND=python...|
| _BATCH_SIZE=128  |    +------------------+    | DATA=/shared/... |
| _MAX_EPOCH=50    |                            | EXP=/shared/...  |
+------------------+                            | _BATCH_SIZE=128  |
                                                | _MAX_EPOCH=100   | (overridden)
                                                | _LEARNING_RATE=.01| (added)
                                                +------------------+

The method’s .env file defines metadata (METHOD_NAME, COMMAND, DESCRIPTION) and default parameters (underscore-prefixed: _BATCH_SIZE=128).
The method’s .env is read by DockerManager._build_service_env() and parsed into a dict.
Reserved variables (DATA, EXP, METHOD_NAME, COMMAND, MONITOR_INTERVAL) are set by the orchestrator and cannot be overridden by --params.
User --params are merged into the dict as _KEY=VALUE, overriding any defaults from the .env file.
The merged dict is passed as the env parameter to client.services.create().
Inside the container, graflag_runner with --pass-env-args extracts _-prefixed env vars and converts them to CLI arguments: _MAX_EPOCH=100 becomes --max_epoch 100.

Result Standardization

All methods must produce a results.json via ResultWriter with this schema:

{
    "result_type": "EDGE_STREAM_ANOMALY_SCORES",
    "scores": [0.1, 0.9, 0.3],
    "ground_truth": [0, 1, 0],
    "metadata": {
        "method_name": "taddy",
        "dataset": "uci"
    }
}

Valid result types:

Category	Node	Edge	Graph
Static	`NODE_ANOMALY_SCORES`	`EDGE_ANOMALY_SCORES`	`GRAPH_ANOMALY_SCORES`
Temporal	`TEMPORAL_NODE_ANOMALY_SCORES`	`TEMPORAL_EDGE_ANOMALY_SCORES`	`TEMPORAL_GRAPH_ANOMALY_SCORES`
Streaming	`NODE_STREAM_ANOMALY_SCORES`	`EDGE_STREAM_ANOMALY_SCORES`	`GRAPH_STREAM_ANOMALY_SCORES`

Optional additional fields: timestamps, node_ids, edges, graph_ids.

Evaluation Pipeline

                   plugins/*.py ------+
                   custom_metrics/*.py-+-> MetricCalculator (load_plugins)
                                       |
results.json --> Evaluator ----------->+--> MetricCalculator --> evaluation.json
                           --> PlotGenerator    --> roc_curve.png
                                                --> pr_curve.png
                                                --> score_distribution.png

*.csv (spot files) --> PlotGenerator --> spot_curves.png (per metric group)

On initialization the Evaluator loads custom metric plugins from the global plugins/ directory and the experiment’s custom_metrics/ directory. Each plugin registers additional metric functions that are then included in the evaluation alongside the built-in metrics (AUC-ROC, AUC-PR, etc.). The evaluator flattens scores and ground truth arrays, handles ragged temporal data, and computes all registered metrics.

Experiment Lifecycle

Each experiment transitions through the following states, tracked in status.json:

                +----------+
                | building |  (image build in progress)
                +----+-----+
                     |
                     v
                +---------+
     +--------->| running |  (service deployed, method executing)
     |          +----+----+
     |               |
     |     +---------+---------+
     |     |                   |
     |     v                   v
     | +-----------+     +--------+
     | | completed |     | failed |
     | +-----------+     +--------+
     |
     | (external stop)
     |     +----------+
     +---->| stopped  |
            +----------+

State determination logic (in core.py):

status.json is the source of truth when present.
Terminal states (completed, failed) from status.json are definitive.
If status.json says running but no Docker service exists, status is stopped.
If status.json says building but no service exists and build.log is present, status is failed.
If no status.json exists but the Docker service is present, status is running.
Legacy experiments without status.json that have results.json are assumed completed.
All other cases are unknown.

status.json is written at multiple points:

By core.py when the build starts (building) and if the build fails.
By graflag_runner when execution starts (running), and when it finishes (completed or failed), including execution time, resource summary, and exit code.

Docker Swarm Service Model

Each experiment runs as a Docker Swarm service with specific configuration:

Service Creation

Services are created via the Docker SDK (client.services.create()):

client.services.create(
    image="REGISTRY:5000/method:tag",
    name="exp__method__dataset__ts",
    env=["METHOD_NAME=...", "DATA=...", "EXP=...", "_PARAM=VALUE", ...],
    mounts=[Mount(target="/shared", source="/shared", type="bind")],
    mode=ServiceMode("replicated", replicas=1),
    restart_policy=RestartPolicy(condition="none"),
    resources=Resources(generic_resources=[...]),  # GPU if enabled
    networks=["host"],
)

Key Design Decisions

--restart-condition none: Experiments are one-shot tasks. If the method crashes, the service transitions to Complete (with non-zero exit code) rather than restarting. The status is tracked in status.json.
--network host: Containers share the host’s network stack. This provides DNS resolution and internet access (for methods that download models or data).
--generic-resource NVIDIA-GPU=0: Reserves a GPU via Docker’s generic resource model. The 0 is the GPU index. This requires the NVIDIA runtime and GPU resource advertisement on worker nodes.
Bind mount: The shared directory is bind-mounted at the same path inside the container, so all paths (DATA, EXP) are consistent between host and container.

Local Registry

A registry:2 service runs on the manager node (port 5000), constrained to node.role==manager. All method images are pushed here after building, allowing any worker node to pull them during service deployment.

Evaluation Services

Evaluation runs as a separate service named eval__exp__method__dataset__ts. It uses the graflag-evaluator image, mounts shared storage at /shared, and receives the experiment path as a command argument. The evaluator service is removed after completion.

Configuration

Client Configuration

Run graflag setup to interactively configure and store settings in ~/.config/graflag/config.env:

# Required
MANAGER_IP=192.168.100.10

# Optional (with defaults)
SSH_PORT=22
SSH_KEY=~/.ssh/id_ed25519
SHARED_DIR=/shared
HOSTS_FILE=hosts.yml

Config resolution order: --config flag > .env in working directory > ~/.config/graflag/config.env.

Cluster Configuration (hosts.yml)

Used by DockerManager for swarm setup and by graflag devcluster for virtual cluster deployment. The path to this file is stored in the client configuration (HOSTS_FILE).

subnet: 192.168.100.0/24
manager: 192.168.100.10
workers:
  - 192.168.100.11
  - 192.168.100.12
  - 192.168.100.13
  - 192.168.100.14

Method Configuration (.env)

Each method has a .env file in its directory:

# Metadata (read by orchestrator)
METHOD_NAME=taddy
DESCRIPTION=Temporal anomaly detection on dynamic graphs
SOURCE_CODE=https://github.com/example/taddy
COMMAND=python3 train_graflag.py
SUPPORTED_DATASETS=uci,btc_alpha,btc_otc

# Default parameters (underscore prefix, overridable via --params)
_MAX_EPOCH=200
_BATCH_SIZE=128
_LEARNING_RATE=0.001
_HIDDEN_DIM=64

Experiment Output Structure

experiments/exp__taddy__uci__20260309_143000/
+-- status.json            # Execution status (building/running/completed/failed)
+-- service_config.json    # Full service configuration (reproducible)
+-- service_details.json   # Docker service metadata (node, task ID, etc.)
+-- build.log              # Docker image build output (if --build was used)
+-- method_output.txt      # Captured stdout/stderr from method execution
+-- results.json           # Standardized method output (scores, ground_truth)
+-- training.csv           # Training metrics from writer.spot("training", ...)
+-- validation.csv         # Validation metrics from writer.spot("validation", ...)
+-- resources.csv          # Resource usage from ResourceMonitor.spot("resources", ...)
+-- custom_metrics/        # Per-experiment custom metric plugins (optional)
+-- eval/                  # Evaluation output (created by graflag evaluate)
    +-- evaluation.json    # Computed metrics and plot references
    +-- roc_curve.png      # ROC curve plot
    +-- pr_curve.png       # Precision-Recall curve plot
    +-- score_distribution.png  # Score histogram by class
    +-- spot_curves.png    # Spot metric curves (if spot files exist)

The service_config.json file contains all information needed to reproduce an experiment:

{
    "experiment_name": "exp__taddy__uci__20260309_143000",
    "method_name": "taddy",
    "dataset": "uci",
    "tag": "latest",
    "gpu_required": true,
    "registry_image": "192.168.100.10:5000/taddy:latest",
    "manager_ip": "192.168.100.10",
    "timestamp": "2026-03-09T14:30:00.000000",
    "data_path": "/shared/datasets/uci/",
    "exp_path": "/shared/experiments/exp__taddy__uci__20260309_143000/",
    "env_contents": {
        "METHOD_NAME": "taddy",
        "COMMAND": "python3 train_graflag.py",
        "_MAX_EPOCH": "100",
        "_BATCH_SIZE": "128"
    }
}

To replay an experiment:

graflag run --from-config ./experiments/exp__taddy__uci__20260309_143000/service_config.json