Graph Anomaly Detection Result Types
This document defines the standardized result types used in GraFlag for graph anomaly detection methods.
Static Graph Results
NODE_ANOMALY_SCORES
Description: Anomaly scores for each individual node in a static graph.
Format: Array of numerical scores, one per node
{
"result_type": "NODE_ANOMALY_SCORES",
"scores": [0.12, 0.89, 0.03, 0.76, 0.15],
"ground_truth": [0, 1, 0, 1, 0],
"node_ids": [0, 1, 2, 3, 4]
}
EDGE_ANOMALY_SCORES
Description: Anomaly scores for each individual edge in a static graph.
Format: Array of numerical scores, one per edge
{
"result_type": "EDGE_ANOMALY_SCORES",
"scores": [0.05, 0.92, 0.17, 0.68],
"ground_truth": [0, 1, 0, 1],
"edges": [[0,1], [1,2], [2,3], [0,3]]
}
GRAPH_ANOMALY_SCORES
Description: Anomaly scores for entire graphs in a graph classification setting.
Format: Array of numerical scores, one per graph
{
"result_type": "GRAPH_ANOMALY_SCORES",
"scores": [0.23, 0.87, 0.11, 0.94],
"ground_truth": [0, 1, 0, 1],
"graph_ids": ["graph_001", "graph_002", "graph_003", "graph_004"]
}
Temporal/Dynamic Graph Results
TEMPORAL_NODE_ANOMALY_SCORES
Description: Time-series of anomaly scores for nodes as the graph evolves over time.
Format: 2D array where each row represents a time step, each column a node
{
"result_type": "TEMPORAL_NODE_ANOMALY_SCORES",
"scores": [
[0.12, 0.25, 0.08, 0.19],
[0.15, 0.89, 0.12, 0.22],
[0.18, 0.93, 0.15, 0.25]
],
"ground_truth": [
[0, 0, 0, 0],
[0, 1, 0, 0],
[0, 1, 0, 0]
],
"timestamps": [0, 1, 2],
"node_ids": [0, 1, 2, 3]
}
TEMPORAL_EDGE_ANOMALY_SCORES
Description: Time-series of anomaly scores for edges as new connections appear/disappear.
Format: 2D array where each row represents a time step, each column an edge
{
"result_type": "TEMPORAL_EDGE_ANOMALY_SCORES",
"scores": [
[0.05, 0.32, 0.17],
[0.08, 0.91, 0.20],
[0.12, 0.95, 0.23]
],
"ground_truth": [
[0, 0, 0],
[0, 1, 0],
[0, 1, 0]
],
"timestamps": [0, 1, 2],
"edges": [[0,1], [1,2], [2,3]]
}
TEMPORAL_GRAPH_ANOMALY_SCORES
Description: Time-series of anomaly scores for entire graphs in a dynamic setting.
Format: 2D array where each row represents a time iteration, each column a graph
{
"result_type": "TEMPORAL_GRAPH_ANOMALY_SCORES",
"scores": [
[0.086, 0.056, 0.062, 0.044],
[0.089, 0.061, 0.065, 0.047],
[0.092, 0.358, 0.068, 0.050]
],
"ground_truth": [
[0, 0, 0, 0],
[0, 0, 0, 0],
[0, 1, 0, 0]
],
"iterations": [1, 2, 3],
"graph_ids": [478, 338, 337, 318]
}
Streaming Graph Results
NODE_STREAM_ANOMALY_SCORES
Description: Anomaly scores for nodes in a streaming setting where each node activity appears at a specific timestamp. Used for node stream anomaly detection where node events arrive sequentially over time.
Format: 1D array of scores with corresponding node IDs and timestamps (one score per node occurrence)
{
"result_type": "NODE_STREAM_ANOMALY_SCORES",
"scores": [0.12, 0.89, 0.15, 0.76, 0.23],
"ground_truth": [0, 1, 0, 1, 0],
"node_ids": [0, 1, 0, 2, 1],
"timestamps": [0, 0, 1, 1, 2]
}
Notes:
Each index corresponds to one node activity occurrence in the stream
scores[i]is the anomaly score for nodenode_ids[i]at timetimestamps[i]Same node can appear multiple times at different timestamps
More memory-efficient than 2D format when node activities are sparse over time
Comparison with TEMPORAL_NODE_ANOMALY_SCORES:
Feature |
TEMPORAL_NODE_ANOMALY_SCORES |
NODE_STREAM_ANOMALY_SCORES |
|---|---|---|
Format |
2D array |
1D array |
Use Case |
Fixed node set, scores over time |
Streaming node events |
Memory |
|
|
Inactive Nodes |
Use |
Not represented |
EDGE_STREAM_ANOMALY_SCORES
Description: Anomaly scores for edges in a streaming setting where each edge appears at a specific timestamp. Used for edge stream anomaly detection where edges arrive sequentially over time.
Format: 1D array of scores with corresponding edge pairs and timestamps (one score per edge occurrence)
{
"result_type": "EDGE_STREAM_ANOMALY_SCORES",
"scores": [0.05, 0.91, 0.23, 0.68, 0.12],
"ground_truth": [0, 1, 0, 1, 0],
"edges": [[0,1], [1,2], [2,3], [0,3], [1,3]],
"timestamps": [0, 0, 1, 1, 2]
}
Notes:
Each index corresponds to one edge occurrence in the stream
scores[i]is the anomaly score for edgeedges[i]at timetimestamps[i]Same edge can appear multiple times at different timestamps
More memory-efficient than 2D format when edges are sparse over time
Comparison with TEMPORAL_EDGE_ANOMALY_SCORES:
Feature |
TEMPORAL_EDGE_ANOMALY_SCORES |
EDGE_STREAM_ANOMALY_SCORES |
|---|---|---|
Format |
2D array |
1D array |
Use Case |
Fixed edge set, scores over time |
Streaming edges |
Memory |
|
|
Inactive Edges |
Use |
Not represented |
GRAPH_STREAM_ANOMALY_SCORES
Description: Anomaly scores for graphs in a streaming setting where each graph snapshot appears at a specific timestamp.
Format: 1D array of scores with corresponding graph IDs and timestamps (one score per graph occurrence)
{
"result_type": "GRAPH_STREAM_ANOMALY_SCORES",
"scores": [0.23, 0.87, 0.11, 0.94, 0.35],
"ground_truth": [0, 1, 0, 1, 0],
"graph_ids": ["graph_001", "graph_002", "graph_001", "graph_003", "graph_002"],
"timestamps": [0, 0, 1, 1, 2]
}
Notes:
Each index corresponds to one graph occurrence in the stream
scores[i]is the anomaly score for graphgraph_ids[i]at timetimestamps[i]Same graph can appear multiple times at different timestamps
More memory-efficient than 2D format when graph events are sparse over time
Comparison with TEMPORAL_GRAPH_ANOMALY_SCORES:
Feature |
TEMPORAL_GRAPH_ANOMALY_SCORES |
GRAPH_STREAM_ANOMALY_SCORES |
|---|---|---|
Format |
2D array |
1D array |
Use Case |
Fixed graph set, scores over time |
Streaming graph snapshots |
Memory |
|
|
Inactive Graphs |
Use |
Not represented |
Special Score Values
-1: Unknown/unassigned-2: Inactive/unseen at this time step (temporal/streaming types only)
Writing Results with ResultWriter
Methods should use ResultWriter from graflag_runner to produce results.json:
from graflag_runner import ResultWriter
writer = ResultWriter()
# Save scores with ground truth
writer.save_scores(
result_type="NODE_ANOMALY_SCORES",
scores=anomaly_scores.tolist(),
ground_truth=labels.tolist(),
node_ids=list(range(len(anomaly_scores))) # optional
)
# Add metadata
writer.add_metadata(
method_name="your_method",
dataset="cora"
)
# Add resource metrics (optional, also set automatically by graflag_runner)
writer.add_resource_metrics(
exec_time_ms=12345.67,
peak_memory_mb=512.3,
peak_gpu_mb=2048.0 # optional
)
# Finalize (writes results.json to EXP directory)
writer.finalize()
The results.json file is saved to the experiment directory (EXP environment variable) and is used by graflag evaluate to compute metrics (AUC-ROC, AUC-PR) and generate plots.
Required Fields
Field |
Description |
|---|---|
|
One of the valid result types listed above |
|
Anomaly scores (list or nested list) |
|
Binary labels matching scores shape (0=normal, 1=anomaly) |
Optional Fields
Field |
Description |
|---|---|
|
Dict with method_name, dataset, hyperparameters, etc. |
|
Time indices for temporal/streaming types |
|
Node identifiers |
|
Edge pairs as |
|
Graph identifiers |
Custom Metrics
The evaluator computes built-in metrics (AUC-ROC, AUC-PR, Precision@K, Recall@K, F1@K, Best F1) for all result types. You can extend this with custom metrics via plugins.
Plugin Files
Place a .py file in one of two locations:
Global (all evaluations):
libs/graflag_evaluator/plugins/Per-experiment:
experiments/<exp_name>/custom_metrics/
Each plugin imports MetricCalculator and registers a metric function at module level:
# hits_at_100.py
from graflag_evaluator import MetricCalculator
def hits_at_100(scores, ground_truth, **kw):
top = scores.argsort()[-100:]
return {"hits@100": ground_truth[top].mean()}
MetricCalculator.register_metric("EDGE_STREAM_ANOMALY_SCORES", hits_at_100)
The function must accept (scores, ground_truth, **kwargs) and return a Dict[str, float].
Python API
The GraFlag.register_metric() method extracts a function’s source code and writes it as a plugin file on the cluster:
from graflag import GraFlag
def hits_at_100(scores, ground_truth, **kw):
top = scores.argsort()[-100:]
return {"hits@100": ground_truth[top].mean()}
gf = GraFlag()
gf.register_metric("EDGE_STREAM_ANOMALY_SCORES", hits_at_100)
gf.evaluate("exp__taddy__uci__20260309_120000") # includes hits@100
Pass experiment="exp_name" to scope the metric to a single experiment instead of globally.