GraFlag Architecture
Technical architecture reference for the GraFlag distributed benchmarking platform for Graph Anomaly Detection (GAD).
Table of Contents
System Overview
GraFlag is a distributed benchmarking platform that runs Graph Anomaly Detection methods across a Docker Swarm cluster. The platform provides:
Distributed execution of GAD methods on GPU-equipped worker nodes via Docker Swarm.
Standardized result collection through the
graflag_runnerlibrary, which wraps method execution with resource monitoring and result serialization.Automated evaluation via
graflag_evaluator, which computes metrics (AUC-ROC, AUC-PR, etc.) and generates plots.Multiple client interfaces: a CLI for scripting, a Python API for programmatic use, and a web GUI for interactive management.
NFS-based shared storage for methods, datasets, experiments, and shared libraries, accessible by all cluster nodes.
Design Principles
Container isolation: Each method runs in its own Docker container with a defined Dockerfile, ensuring reproducibility and dependency isolation.
One-shot execution: Services use
--restart-condition none, meaning each experiment runs exactly once. There is no automatic retry on failure.Environment variable injection: Method parameters flow from the CLI through Docker environment variables into the container, using an underscore prefix convention (
_PARAM_NAME=value).Result standardization: All methods produce a
results.jsonfile conforming to a fixed schema, enabling uniform evaluation across heterogeneous methods.Separation of concerns: The orchestration layer (CLI/API) never executes method code directly. It builds images, deploys services, and reads results from shared storage.
Shared storage as the integration point: NFS provides a single namespace (
/sharedor configurable) visible to all nodes, used for datasets, method source, experiment output, and shared libraries.
Component Diagram
+-------------------------------------------+
| Client Machine |
| |
| +--------+ +----------+ +-----------+ |
| | CLI | | GUI | | Python | |
| | cli.py | | Flask + | | API | |
| | | | Vue.js | | api.py | |
| +---+----+ +----+-----+ +-----+-----+ |
| | | | |
| v v v |
| +--------------------------------------+ |
| | GraFlag Core (core.py) | |
| | returns dataclasses | |
| +--+-------------------------------+---+ |
| | | |
| +--v-----------+ +----------------v--+ |
| | SSHManager | | DockerManager | |
| | (ssh.py) | | (docker_ops.py) | |
| | file ops, | | Docker SDK via | |
| | rsync, build | | SSH tunnel | |
| +------+-------+ +--------+----------+ |
+---------|-------------------|---------------+
| |
SSH / rsync SSH tunnel to
| Docker socket
| |
+---------------------------v-----------------------------+
| Manager Node |
| |
| +----------------+ +-----------------------------+ |
| | Docker Swarm | | Local Registry | |
| | Manager | | (registry:2 on port 5000) | |
| +-------+--------+ +-----------------------------+ |
| | |
| | NFS Export: /shared |
| | +---------------------------------------+ |
| | | methods/ | datasets/ | experiments/ | |
| | | libs/ | | | |
| | +---------------------------------------+ |
+----------|----------------------------------------------+
|
Docker Swarm scheduling
|
+--------------------+---------------------+
| | |
+-----v------+ +------v-----+ +----------v--+
| Worker 1 | | Worker 2 | | Worker N |
| | | | | |
| GPU(s) | | GPU(s) | | GPU(s) |
| NFS mount | | NFS mount | | NFS mount |
| /shared | | /shared | | /shared |
+------------+ +------------+ +-------------+
Component Descriptions
graflag (CLI, GUI, DevCluster, and Orchestration)
The main Python package located in graflag/graflag/. It provides the command-line interface, core orchestration logic, a Python API, a web GUI, and a development cluster tool – all installable as a single package via pip install -e . and accessible through the graflag command.
cli.py – Command Line Interface
Entry point for all user-facing commands. Uses argparse with the following commands:
Command |
Description |
|---|---|
|
Initialize Docker Swarm and join worker nodes |
|
Build image and deploy experiment service |
|
Show cluster status and shared directory contents |
|
List methods, datasets, experiments, or services |
|
View experiment logs (supports |
|
Stop a running experiment (optional |
|
Run evaluation on a completed experiment |
|
Transfer files between local and remote (rsync) |
|
Sync a method or library directory to remote |
|
Start the web dashboard |
|
Deploy or tear down a local Docker Compose development cluster |
Key flags for run:
--from-config CONFIG_FILE: Replay an experiment from a savedservice_config.json.--params KEY=VALUE: Override method parameters at runtime.--no-gpu: Disable GPU resource reservation.--build: Build (or rebuild) the Docker image before deploying.
core.py – GraFlag Orchestration Class
Central orchestration class (GraFlag) that coordinates all operations. All public methods return structured data (dataclasses from models.py) rather than printing to stdout. The CLI formats the returned data for terminal display.
Key responsibilities:
Initialization: Loads
GraflagConfig, createsSSHManagerandDockerManagerinstances.status(): Returns aClusterInfodataclass with nodes, services, and shared directory contents.run(): Validates method and dataset existence on remote, creates experiment directory, optionally builds the Docker image, deploys the Docker Swarm service, follows logs, and returns the experiment name.list_methods(): ReturnsList[MethodInfo]with method metadata parsed from.envfiles.list_datasets(): ReturnsList[DatasetInfo]with size and file count.list_experiments(): ReturnsList[ExperimentInfo]with status, sorted by timestamp.get_experiment_results(): ReturnsExperimentResultsparsed fromresults.json.get_evaluation_results(): ReturnsEvaluationResultsparsed fromeval/evaluation.json.evaluate(): Deploys agraflag-evaluatorservice against a completed experiment.register_metric(): Saves a custom metric function as a plugin file on the cluster (global or per-experiment) so the evaluator can load it at runtime.copy_files(): Bidirectional file transfer using rsync over SSH.sync(): Sync a local method or library directory to the remote shared storage.Status management: Writes
status.jsonto experiment directories during build and on failure.
models.py – Data Models
Shared dataclass definitions used by core, api, and GUI:
Dataclass |
Description |
|---|---|
|
Cluster status, nodes, services, shared dir |
|
Method name, description, parameters, files |
|
Dataset name, path, size, file count |
|
Experiment status, results/eval availability |
|
Parsed results.json with execution metrics |
|
Parsed evaluation.json with metrics and plots |
|
Progress tracking for experiment execution |
All dataclasses implement to_dict() for JSON serialization.
config.py – Configuration Management
GraflagConfig loads configuration from a .env file and exposes typed properties.
Config resolution order:
Explicit path passed via
--configflag..envfile in the current working directory.Standard location:
~/.config/graflag/config.env.
graflag setup runs an interactive wizard (init_config()) that prompts for values and stores them in the standard location.
Property |
Env Variable |
Default |
Description |
|---|---|---|---|
|
|
(required) |
IP address of the Swarm manager node |
|
|
|
SSH port on the manager |
|
|
|
Path to SSH private key |
|
|
|
Path to shared storage on remote |
|
|
|
NFS port for mounting |
|
|
|
Path to hosts.yml with worker IPs |
ssh.py – SSH Operations
SSHManager provides remote command execution and file transfer:
execute(command): Runs a command on the manager viasshsubprocess. Always connects asroot@MANAGER_IP.copy_files(): Bidirectional rsync over SSH. Handles both local-to-remote and remote-to-local transfers.path_exists(),read_file(),mkdir(),list_dir(): File system operations on the remote via SSH.
All SSH commands use -o StrictHostKeyChecking=no and authenticate with the configured SSH key.
docker_ops.py – Docker Swarm Operations
DockerManager handles Docker Swarm interactions using the Docker SDK for Python (docker-py) connected via an SSH tunnel to the remote Docker daemon.
Connection model: An SSH tunnel forwards a local TCP port to the remote Docker socket (/var/run/docker.sock). The Docker SDK connects to tcp://localhost:<tunnel_port>, providing native Python access to the Docker API. The tunnel is established lazily on first use and reused across operations.
Operations using Docker SDK (native Python API):
Swarm management:
setup_swarm_manager()viaclient.swarm.init(),get_swarm_token()viaclient.swarm.attrs,get_nodes()viaclient.nodes.list().Registry:
setup_local_registry()viaclient.services.create().Service creation:
create_service()viaclient.services.create()withServiceMode,RestartPolicy,Resources,Mount, and network configuration.Service lifecycle:
list_services(),stop_service(),get_service_logs(),follow_service_logs()viaservice.logs(follow=True, stream=True).Cluster status:
get_cluster_status()viaclient.info()and node/service listing.
Operations using SSH (build context is on the remote host):
Image building:
build_method_image()– runsdocker buildanddocker pushvia SSH because the build context (shared directory with methods and libs) resides on the remote host.Evaluator image:
build_evaluator_image()– same rationale.
Worker setup: setup_workers() uses SSH to execute docker swarm join on each worker node, since the Docker SDK connection only reaches the manager.
Reserved environment variables (defined in ReservedEnvVars enum): DATA, EXP, METHOD_NAME, COMMAND, MONITOR_INTERVAL. These cannot be overridden by user-supplied --params.
api.py – Python API
GraFlagAPI is a thin error-safe wrapper around GraFlag core, designed for GUI integration. Since core now returns structured data directly, the API layer:
Delegates all operations to
self.core(no duplicated logic).Wraps each call in try/except, returning empty lists or
Noneon error instead of raising (prevents GUI crashes).Maintains the same public interface used by the GUI (backward-compatible).
All returned dataclasses implement
to_dict()for JSON serialization.
gui/ – Web Interface (subpackage)
A Flask application with a Vue.js frontend, located in graflag/graflag/gui/. Accessible via graflag gui [--host HOST] [--port PORT] [--debug].
Backend (server.py):
Built on Flask with Flask-SocketIO for real-time updates.
Uses
GraFlagAPIas its data layer.REST endpoints under
/api/for methods, datasets, experiments, services, run, evaluation, logs, and plot serving.Run, evaluation, stop, and delete operations execute in background threads to avoid blocking HTTP responses.
Plot images are streamed from the remote via SSH + base64 encoding.
Server-side caching for methods and datasets (30-second TTL).
Real-time updates:
WebSocket connection via Socket.IO.
A
background_updaterthread polls experiments and services every 2 seconds.Updates are broadcast to all connected clients only when state changes (change detection via comparison with
last_state).
Frontend:
Vue.js single-page application served from Flask templates.
Communicates with the backend via REST API and WebSocket events.
devcluster/ – Development Cluster (subpackage)
A Docker Compose-based virtual cluster for local development and testing, located in graflag/graflag/devcluster/. Accessible via graflag devcluster --hosts hosts.yml [--pubkey KEY]. Use graflag devcluster --down to stop and remove the cluster.
Components:
docker-compose.yml: Defines a manager node and multiple worker nodes (up to 4) on a bridged network (192.168.100.0/24).deploy.sh: Automated deployment script that generates SSH keys, builds the Docker Compose configuration fromhosts.yml, and starts the cluster.manager/: Dockerfile and configuration for the manager container (Docker-in-Docker with SSH, NFS server).worker/: Dockerfile and configuration for worker containers (Docker-in-Docker with SSH, NFS client).
Key characteristics:
All containers run in privileged mode (required for Docker-in-Docker and NFS).
GPU passthrough is configured via
deploy.resources.reservations.devices.Static IP assignment on a custom bridge network.
SSH key exchange is handled during build time via build arguments.
Execution Flow
The complete lifecycle of an experiment:
1. Command Parsing
graflag run -m taddy -d uci --build --params MAX_EPOCH=100
The CLI parses arguments, instantiates GraFlag, and calls run().
2. Validation
SSH to manager to verify
methods/taddy/exists.SSH to manager to verify
datasets/uci/exists.Create experiment directory:
experiments/exp__taddy__uci__20260309_143000/.
3. Image Build (if --build)
status.json <- {"status": "building"}
docker build --network=host \
-f /shared/methods/taddy/Dockerfile \
-t taddy:latest \
-t MANAGER_IP:5000/taddy:latest \
/shared/
docker push MANAGER_IP:5000/taddy:latest
The build context is the entire shared directory, giving the Dockerfile access to libs/ for installing graflag_runner and other shared libraries.
Build output is saved to experiments/exp__*/build.log.
4. Service Deployment
Using the Docker SDK via SSH tunnel, the equivalent of:
docker service create --quiet -d \
--name exp__taddy__uci__20260309_143000 \
--restart-condition none \
--network host \
--generic-resource NVIDIA-GPU=0 \
--env METHOD_NAME=taddy \
--env DATA=/shared/datasets/uci/ \
--env EXP=/shared/experiments/exp__taddy__uci__20260309_143000/ \
--env _MAX_EPOCH=100 \
--mount type=bind,source=/shared,target=/shared \
MANAGER_IP:5000/taddy:latest
is performed via client.services.create() with ServiceMode, RestartPolicy, Resources, Mount, and env parameters. The method’s .env is read and merged into the env list (rather than using --env-file), giving the orchestrator full control over parameter precedence.
Service configuration is saved to service_config.json before deployment. Service details (including assigned worker node) are saved to service_details.json after creation.
5. Method Execution (inside container)
The container’s CMD invokes graflag_runner:
python3 -m graflag_runner --pass-env-args
MethodRunner.from_env() reads environment variables and:
Extracts
_-prefixed env vars and converts them to CLI arguments (e.g.,_MAX_EPOCH=100becomes--max_epoch 100).Validates dataset compatibility if
SUPPORTED_DATASETSis defined.Writes
status.jsonwith statusrunning.Starts
ResourceMonitorin a background thread (tracks CPU, memory, GPU usage viapsutilandnvidia-smi).Executes the method command via subprocess with real-time output capture.
On completion, writes final
status.jsonwith statuscompletedorfailed, execution time, and resource summary.Saves captured output to
method_output.txt.
The method code itself uses ResultWriter to produce results.json:
from graflag_runner import ResultWriter
writer = ResultWriter()
# ... run detection algorithm ...
writer.save_scores(result_type="EDGE_STREAM_ANOMALY_SCORES", scores=scores, ground_truth=gt)
writer.add_metadata(method_name="taddy", dataset="uci")
writer.add_resource_metrics(exec_time_ms=45230.15, peak_memory_mb=2048.5, peak_gpu_mb=4096.0)
writer.finalize() # Writes results.json
6. Evaluation (optional, triggered separately)
graflag evaluate -e exp__taddy__uci__20260309_143000
This deploys a separate graflag-evaluator Docker service that:
Loads
results.jsonfrom the experiment directory.Computes metrics (AUC-ROC, AUC-PR, etc.) via
MetricCalculator.Generates plots (ROC curve, PR curve, score distribution, spot curves) via
PlotGenerator.Saves
evaluation.jsonand plot PNGs to theeval/subdirectory.
Data Flow
Environment Variable Injection
Parameters flow through three layers:
Method .env file CLI --params Docker Service Env
+------------------+ +------------------+ +------------------+
| METHOD_NAME=taddy| | MAX_EPOCH=100 | -> | METHOD_NAME=taddy|
| COMMAND=python...| | LEARNING_RATE=.01| | COMMAND=python...|
| _BATCH_SIZE=128 | +------------------+ | DATA=/shared/... |
| _MAX_EPOCH=50 | | EXP=/shared/... |
+------------------+ | _BATCH_SIZE=128 |
| _MAX_EPOCH=100 | (overridden)
| _LEARNING_RATE=.01| (added)
+------------------+
The method’s
.envfile defines metadata (METHOD_NAME,COMMAND,DESCRIPTION) and default parameters (underscore-prefixed:_BATCH_SIZE=128).The method’s
.envis read byDockerManager._build_service_env()and parsed into a dict.Reserved variables (
DATA,EXP,METHOD_NAME,COMMAND,MONITOR_INTERVAL) are set by the orchestrator and cannot be overridden by--params.User
--paramsare merged into the dict as_KEY=VALUE, overriding any defaults from the.envfile.The merged dict is passed as the
envparameter toclient.services.create().Inside the container,
graflag_runnerwith--pass-env-argsextracts_-prefixed env vars and converts them to CLI arguments:_MAX_EPOCH=100becomes--max_epoch 100.
Result Standardization
All methods must produce a results.json via ResultWriter with this schema:
{
"result_type": "EDGE_STREAM_ANOMALY_SCORES",
"scores": [0.1, 0.9, 0.3],
"ground_truth": [0, 1, 0],
"metadata": {
"method_name": "taddy",
"dataset": "uci"
}
}
Valid result types:
Category |
Node |
Edge |
Graph |
|---|---|---|---|
Static |
|
|
|
Temporal |
|
|
|
Streaming |
|
|
|
Optional additional fields: timestamps, node_ids, edges, graph_ids.
Evaluation Pipeline
plugins/*.py ------+
custom_metrics/*.py-+-> MetricCalculator (load_plugins)
|
results.json --> Evaluator ----------->+--> MetricCalculator --> evaluation.json
--> PlotGenerator --> roc_curve.png
--> pr_curve.png
--> score_distribution.png
*.csv (spot files) --> PlotGenerator --> spot_curves.png (per metric group)
On initialization the Evaluator loads custom metric plugins from the global plugins/ directory and the experiment’s custom_metrics/ directory. Each plugin registers additional metric functions that are then included in the evaluation alongside the built-in metrics (AUC-ROC, AUC-PR, etc.). The evaluator flattens scores and ground truth arrays, handles ragged temporal data, and computes all registered metrics.
Experiment Lifecycle
Each experiment transitions through the following states, tracked in status.json:
+----------+
| building | (image build in progress)
+----+-----+
|
v
+---------+
+--------->| running | (service deployed, method executing)
| +----+----+
| |
| +---------+---------+
| | |
| v v
| +-----------+ +--------+
| | completed | | failed |
| +-----------+ +--------+
|
| (external stop)
| +----------+
+---->| stopped |
+----------+
State determination logic (in core.py):
status.jsonis the source of truth when present.Terminal states (
completed,failed) fromstatus.jsonare definitive.If
status.jsonsaysrunningbut no Docker service exists, status isstopped.If
status.jsonsaysbuildingbut no service exists andbuild.logis present, status isfailed.If no
status.jsonexists but the Docker service is present, status isrunning.Legacy experiments without
status.jsonthat haveresults.jsonare assumedcompleted.All other cases are
unknown.
status.json is written at multiple points:
By
core.pywhen the build starts (building) and if the build fails.By
graflag_runnerwhen execution starts (running), and when it finishes (completedorfailed), including execution time, resource summary, and exit code.
Docker Swarm Service Model
Each experiment runs as a Docker Swarm service with specific configuration:
Service Creation
Services are created via the Docker SDK (client.services.create()):
client.services.create(
image="REGISTRY:5000/method:tag",
name="exp__method__dataset__ts",
env=["METHOD_NAME=...", "DATA=...", "EXP=...", "_PARAM=VALUE", ...],
mounts=[Mount(target="/shared", source="/shared", type="bind")],
mode=ServiceMode("replicated", replicas=1),
restart_policy=RestartPolicy(condition="none"),
resources=Resources(generic_resources=[...]), # GPU if enabled
networks=["host"],
)
Key Design Decisions
--restart-condition none: Experiments are one-shot tasks. If the method crashes, the service transitions toComplete(with non-zero exit code) rather than restarting. The status is tracked instatus.json.--network host: Containers share the host’s network stack. This provides DNS resolution and internet access (for methods that download models or data).--generic-resource NVIDIA-GPU=0: Reserves a GPU via Docker’s generic resource model. The0is the GPU index. This requires the NVIDIA runtime and GPU resource advertisement on worker nodes.Bind mount: The shared directory is bind-mounted at the same path inside the container, so all paths (DATA, EXP) are consistent between host and container.
Local Registry
A registry:2 service runs on the manager node (port 5000), constrained to node.role==manager. All method images are pushed here after building, allowing any worker node to pull them during service deployment.
Evaluation Services
Evaluation runs as a separate service named eval__exp__method__dataset__ts. It uses the graflag-evaluator image, mounts shared storage at /shared, and receives the experiment path as a command argument. The evaluator service is removed after completion.
Configuration
Client Configuration
Run graflag setup to interactively configure and store settings in ~/.config/graflag/config.env:
# Required
MANAGER_IP=192.168.100.10
# Optional (with defaults)
SSH_PORT=22
SSH_KEY=~/.ssh/id_ed25519
SHARED_DIR=/shared
HOSTS_FILE=hosts.yml
Config resolution order: --config flag > .env in working directory > ~/.config/graflag/config.env.
Cluster Configuration (hosts.yml)
Used by DockerManager for swarm setup and by graflag devcluster for virtual cluster deployment. The path to this file is stored in the client configuration (HOSTS_FILE).
subnet: 192.168.100.0/24
manager: 192.168.100.10
workers:
- 192.168.100.11
- 192.168.100.12
- 192.168.100.13
- 192.168.100.14
Method Configuration (.env)
Each method has a .env file in its directory:
# Metadata (read by orchestrator)
METHOD_NAME=taddy
DESCRIPTION=Temporal anomaly detection on dynamic graphs
SOURCE_CODE=https://github.com/example/taddy
COMMAND=python3 train_graflag.py
SUPPORTED_DATASETS=uci,btc_alpha,btc_otc
# Default parameters (underscore prefix, overridable via --params)
_MAX_EPOCH=200
_BATCH_SIZE=128
_LEARNING_RATE=0.001
_HIDDEN_DIM=64
Experiment Output Structure
experiments/exp__taddy__uci__20260309_143000/
+-- status.json # Execution status (building/running/completed/failed)
+-- service_config.json # Full service configuration (reproducible)
+-- service_details.json # Docker service metadata (node, task ID, etc.)
+-- build.log # Docker image build output (if --build was used)
+-- method_output.txt # Captured stdout/stderr from method execution
+-- results.json # Standardized method output (scores, ground_truth)
+-- training.csv # Training metrics from writer.spot("training", ...)
+-- validation.csv # Validation metrics from writer.spot("validation", ...)
+-- resources.csv # Resource usage from ResourceMonitor.spot("resources", ...)
+-- custom_metrics/ # Per-experiment custom metric plugins (optional)
+-- eval/ # Evaluation output (created by graflag evaluate)
+-- evaluation.json # Computed metrics and plot references
+-- roc_curve.png # ROC curve plot
+-- pr_curve.png # Precision-Recall curve plot
+-- score_distribution.png # Score histogram by class
+-- spot_curves.png # Spot metric curves (if spot files exist)
The service_config.json file contains all information needed to reproduce an experiment:
{
"experiment_name": "exp__taddy__uci__20260309_143000",
"method_name": "taddy",
"dataset": "uci",
"tag": "latest",
"gpu_required": true,
"registry_image": "192.168.100.10:5000/taddy:latest",
"manager_ip": "192.168.100.10",
"timestamp": "2026-03-09T14:30:00.000000",
"data_path": "/shared/datasets/uci/",
"exp_path": "/shared/experiments/exp__taddy__uci__20260309_143000/",
"env_contents": {
"METHOD_NAME": "taddy",
"COMMAND": "python3 train_graflag.py",
"_MAX_EPOCH": "100",
"_BATCH_SIZE": "128"
}
}
To replay an experiment:
graflag run --from-config ./experiments/exp__taddy__uci__20260309_143000/service_config.json