Graph Anomaly Detection Result Types

This document defines the standardized result types used in GraFlag for graph anomaly detection methods.

Static Graph Results

NODE_ANOMALY_SCORES

Description: Anomaly scores for each individual node in a static graph.

Format: Array of numerical scores, one per node

{
  "result_type": "NODE_ANOMALY_SCORES",
  "scores": [0.12, 0.89, 0.03, 0.76, 0.15],
  "ground_truth": [0, 1, 0, 1, 0],
  "node_ids": [0, 1, 2, 3, 4]
}

EDGE_ANOMALY_SCORES

Description: Anomaly scores for each individual edge in a static graph.

Format: Array of numerical scores, one per edge

{
  "result_type": "EDGE_ANOMALY_SCORES",
  "scores": [0.05, 0.92, 0.17, 0.68],
  "ground_truth": [0, 1, 0, 1],
  "edges": [[0,1], [1,2], [2,3], [0,3]]
}

GRAPH_ANOMALY_SCORES

Description: Anomaly scores for entire graphs in a graph classification setting.

Format: Array of numerical scores, one per graph

{
  "result_type": "GRAPH_ANOMALY_SCORES",
  "scores": [0.23, 0.87, 0.11, 0.94],
  "ground_truth": [0, 1, 0, 1],
  "graph_ids": ["graph_001", "graph_002", "graph_003", "graph_004"]
}

Temporal/Dynamic Graph Results

TEMPORAL_NODE_ANOMALY_SCORES

Description: Time-series of anomaly scores for nodes as the graph evolves over time.

Format: 2D array where each row represents a time step, each column a node

{
  "result_type": "TEMPORAL_NODE_ANOMALY_SCORES",
  "scores": [
    [0.12, 0.25, 0.08, 0.19],
    [0.15, 0.89, 0.12, 0.22],
    [0.18, 0.93, 0.15, 0.25]
  ],
  "ground_truth": [
    [0, 0, 0, 0],
    [0, 1, 0, 0],
    [0, 1, 0, 0]
  ],
  "timestamps": [0, 1, 2],
  "node_ids": [0, 1, 2, 3]
}

TEMPORAL_EDGE_ANOMALY_SCORES

Description: Time-series of anomaly scores for edges as new connections appear/disappear.

Format: 2D array where each row represents a time step, each column an edge

{
  "result_type": "TEMPORAL_EDGE_ANOMALY_SCORES",
  "scores": [
    [0.05, 0.32, 0.17],
    [0.08, 0.91, 0.20],
    [0.12, 0.95, 0.23]
  ],
  "ground_truth": [
    [0, 0, 0],
    [0, 1, 0],
    [0, 1, 0]
  ],
  "timestamps": [0, 1, 2],
  "edges": [[0,1], [1,2], [2,3]]
}

TEMPORAL_GRAPH_ANOMALY_SCORES

Description: Time-series of anomaly scores for entire graphs in a dynamic setting.

Format: 2D array where each row represents a time iteration, each column a graph

{
  "result_type": "TEMPORAL_GRAPH_ANOMALY_SCORES",
  "scores": [
    [0.086, 0.056, 0.062, 0.044],
    [0.089, 0.061, 0.065, 0.047],
    [0.092, 0.358, 0.068, 0.050]
  ],
  "ground_truth": [
    [0, 0, 0, 0],
    [0, 0, 0, 0],
    [0, 1, 0, 0]
  ],
  "iterations": [1, 2, 3],
  "graph_ids": [478, 338, 337, 318]
}

Streaming Graph Results

NODE_STREAM_ANOMALY_SCORES

Description: Anomaly scores for nodes in a streaming setting where each node activity appears at a specific timestamp. Used for node stream anomaly detection where node events arrive sequentially over time.

Format: 1D array of scores with corresponding node IDs and timestamps (one score per node occurrence)

{
  "result_type": "NODE_STREAM_ANOMALY_SCORES",
  "scores": [0.12, 0.89, 0.15, 0.76, 0.23],
  "ground_truth": [0, 1, 0, 1, 0],
  "node_ids": [0, 1, 0, 2, 1],
  "timestamps": [0, 0, 1, 1, 2]
}

Notes:

  • Each index corresponds to one node activity occurrence in the stream

  • scores[i] is the anomaly score for node node_ids[i] at time timestamps[i]

  • Same node can appear multiple times at different timestamps

  • More memory-efficient than 2D format when node activities are sparse over time

Comparison with TEMPORAL_NODE_ANOMALY_SCORES:

Feature

TEMPORAL_NODE_ANOMALY_SCORES

NODE_STREAM_ANOMALY_SCORES

Format

2D array [T x N]

1D array [M]

Use Case

Fixed node set, scores over time

Streaming node events

Memory

O(T x N) - can be large

O(M) - efficient

Inactive Nodes

Use -2 for missing nodes

Not represented


EDGE_STREAM_ANOMALY_SCORES

Description: Anomaly scores for edges in a streaming setting where each edge appears at a specific timestamp. Used for edge stream anomaly detection where edges arrive sequentially over time.

Format: 1D array of scores with corresponding edge pairs and timestamps (one score per edge occurrence)

{
  "result_type": "EDGE_STREAM_ANOMALY_SCORES",
  "scores": [0.05, 0.91, 0.23, 0.68, 0.12],
  "ground_truth": [0, 1, 0, 1, 0],
  "edges": [[0,1], [1,2], [2,3], [0,3], [1,3]],
  "timestamps": [0, 0, 1, 1, 2]
}

Notes:

  • Each index corresponds to one edge occurrence in the stream

  • scores[i] is the anomaly score for edge edges[i] at time timestamps[i]

  • Same edge can appear multiple times at different timestamps

  • More memory-efficient than 2D format when edges are sparse over time

Comparison with TEMPORAL_EDGE_ANOMALY_SCORES:

Feature

TEMPORAL_EDGE_ANOMALY_SCORES

EDGE_STREAM_ANOMALY_SCORES

Format

2D array [T x E]

1D array [N]

Use Case

Fixed edge set, scores over time

Streaming edges

Memory

O(T x E) - can be large

O(N) - efficient

Inactive Edges

Use -2 for missing edges

Not represented


GRAPH_STREAM_ANOMALY_SCORES

Description: Anomaly scores for graphs in a streaming setting where each graph snapshot appears at a specific timestamp.

Format: 1D array of scores with corresponding graph IDs and timestamps (one score per graph occurrence)

{
  "result_type": "GRAPH_STREAM_ANOMALY_SCORES",
  "scores": [0.23, 0.87, 0.11, 0.94, 0.35],
  "ground_truth": [0, 1, 0, 1, 0],
  "graph_ids": ["graph_001", "graph_002", "graph_001", "graph_003", "graph_002"],
  "timestamps": [0, 0, 1, 1, 2]
}

Notes:

  • Each index corresponds to one graph occurrence in the stream

  • scores[i] is the anomaly score for graph graph_ids[i] at time timestamps[i]

  • Same graph can appear multiple times at different timestamps

  • More memory-efficient than 2D format when graph events are sparse over time

Comparison with TEMPORAL_GRAPH_ANOMALY_SCORES:

Feature

TEMPORAL_GRAPH_ANOMALY_SCORES

GRAPH_STREAM_ANOMALY_SCORES

Format

2D array [T x G]

1D array [M]

Use Case

Fixed graph set, scores over time

Streaming graph snapshots

Memory

O(T x G) - can be large

O(M) - efficient

Inactive Graphs

Use -2 for missing graphs

Not represented


Special Score Values

  • -1: Unknown/unassigned

  • -2: Inactive/unseen at this time step (temporal/streaming types only)

Writing Results with ResultWriter

Methods should use ResultWriter from graflag_runner to produce results.json:

from graflag_runner import ResultWriter

writer = ResultWriter()

# Save scores with ground truth
writer.save_scores(
    result_type="NODE_ANOMALY_SCORES",
    scores=anomaly_scores.tolist(),
    ground_truth=labels.tolist(),
    node_ids=list(range(len(anomaly_scores)))  # optional
)

# Add metadata
writer.add_metadata(
    method_name="your_method",
    dataset="cora"
)

# Add resource metrics (optional, also set automatically by graflag_runner)
writer.add_resource_metrics(
    exec_time_ms=12345.67,
    peak_memory_mb=512.3,
    peak_gpu_mb=2048.0      # optional
)

# Finalize (writes results.json to EXP directory)
writer.finalize()

The results.json file is saved to the experiment directory (EXP environment variable) and is used by graflag evaluate to compute metrics (AUC-ROC, AUC-PR) and generate plots.

Required Fields

Field

Description

result_type

One of the valid result types listed above

scores

Anomaly scores (list or nested list)

ground_truth

Binary labels matching scores shape (0=normal, 1=anomaly)

Optional Fields

Field

Description

metadata

Dict with method_name, dataset, hyperparameters, etc.

timestamps

Time indices for temporal/streaming types

node_ids

Node identifiers

edges

Edge pairs as [[src, dst], ...]

graph_ids

Graph identifiers

Custom Metrics

The evaluator computes built-in metrics (AUC-ROC, AUC-PR, Precision@K, Recall@K, F1@K, Best F1) for all result types. You can extend this with custom metrics via plugins.

Plugin Files

Place a .py file in one of two locations:

  • Global (all evaluations): libs/graflag_evaluator/plugins/

  • Per-experiment: experiments/<exp_name>/custom_metrics/

Each plugin imports MetricCalculator and registers a metric function at module level:

# hits_at_100.py
from graflag_evaluator import MetricCalculator

def hits_at_100(scores, ground_truth, **kw):
    top = scores.argsort()[-100:]
    return {"hits@100": ground_truth[top].mean()}

MetricCalculator.register_metric("EDGE_STREAM_ANOMALY_SCORES", hits_at_100)

The function must accept (scores, ground_truth, **kwargs) and return a Dict[str, float].

Python API

The GraFlag.register_metric() method extracts a function’s source code and writes it as a plugin file on the cluster:

from graflag import GraFlag

def hits_at_100(scores, ground_truth, **kw):
    top = scores.argsort()[-100:]
    return {"hits@100": ground_truth[top].mean()}

gf = GraFlag()
gf.register_metric("EDGE_STREAM_ANOMALY_SCORES", hits_at_100)
gf.evaluate("exp__taddy__uci__20260309_120000")  # includes hits@100

Pass experiment="exp_name" to scope the metric to a single experiment instead of globally.