# Graph Anomaly Detection Result Types This document defines the standardized result types used in GraFlag for graph anomaly detection methods. ## Static Graph Results ### NODE_ANOMALY_SCORES **Description:** Anomaly scores for each individual node in a static graph. **Format:** Array of numerical scores, one per node ```json { "result_type": "NODE_ANOMALY_SCORES", "scores": [0.12, 0.89, 0.03, 0.76, 0.15], "ground_truth": [0, 1, 0, 1, 0], "node_ids": [0, 1, 2, 3, 4] } ``` --- ### EDGE_ANOMALY_SCORES **Description:** Anomaly scores for each individual edge in a static graph. **Format:** Array of numerical scores, one per edge ```json { "result_type": "EDGE_ANOMALY_SCORES", "scores": [0.05, 0.92, 0.17, 0.68], "ground_truth": [0, 1, 0, 1], "edges": [[0,1], [1,2], [2,3], [0,3]] } ``` --- ### GRAPH_ANOMALY_SCORES **Description:** Anomaly scores for entire graphs in a graph classification setting. **Format:** Array of numerical scores, one per graph ```json { "result_type": "GRAPH_ANOMALY_SCORES", "scores": [0.23, 0.87, 0.11, 0.94], "ground_truth": [0, 1, 0, 1], "graph_ids": ["graph_001", "graph_002", "graph_003", "graph_004"] } ``` --- ## Temporal/Dynamic Graph Results ### TEMPORAL_NODE_ANOMALY_SCORES **Description:** Time-series of anomaly scores for nodes as the graph evolves over time. **Format:** 2D array where each row represents a time step, each column a node ```json { "result_type": "TEMPORAL_NODE_ANOMALY_SCORES", "scores": [ [0.12, 0.25, 0.08, 0.19], [0.15, 0.89, 0.12, 0.22], [0.18, 0.93, 0.15, 0.25] ], "ground_truth": [ [0, 0, 0, 0], [0, 1, 0, 0], [0, 1, 0, 0] ], "timestamps": [0, 1, 2], "node_ids": [0, 1, 2, 3] } ``` --- ### TEMPORAL_EDGE_ANOMALY_SCORES **Description:** Time-series of anomaly scores for edges as new connections appear/disappear. **Format:** 2D array where each row represents a time step, each column an edge ```json { "result_type": "TEMPORAL_EDGE_ANOMALY_SCORES", "scores": [ [0.05, 0.32, 0.17], [0.08, 0.91, 0.20], [0.12, 0.95, 0.23] ], "ground_truth": [ [0, 0, 0], [0, 1, 0], [0, 1, 0] ], "timestamps": [0, 1, 2], "edges": [[0,1], [1,2], [2,3]] } ``` --- ### TEMPORAL_GRAPH_ANOMALY_SCORES **Description:** Time-series of anomaly scores for entire graphs in a dynamic setting. **Format:** 2D array where each row represents a time iteration, each column a graph ```json { "result_type": "TEMPORAL_GRAPH_ANOMALY_SCORES", "scores": [ [0.086, 0.056, 0.062, 0.044], [0.089, 0.061, 0.065, 0.047], [0.092, 0.358, 0.068, 0.050] ], "ground_truth": [ [0, 0, 0, 0], [0, 0, 0, 0], [0, 1, 0, 0] ], "iterations": [1, 2, 3], "graph_ids": [478, 338, 337, 318] } ``` --- ## Streaming Graph Results ### NODE_STREAM_ANOMALY_SCORES **Description:** Anomaly scores for nodes in a streaming setting where each node activity appears at a specific timestamp. Used for node stream anomaly detection where node events arrive sequentially over time. **Format:** 1D array of scores with corresponding node IDs and timestamps (one score per node occurrence) ```json { "result_type": "NODE_STREAM_ANOMALY_SCORES", "scores": [0.12, 0.89, 0.15, 0.76, 0.23], "ground_truth": [0, 1, 0, 1, 0], "node_ids": [0, 1, 0, 2, 1], "timestamps": [0, 0, 1, 1, 2] } ``` **Notes:** - Each index corresponds to one node activity occurrence in the stream - `scores[i]` is the anomaly score for node `node_ids[i]` at time `timestamps[i]` - Same node can appear multiple times at different timestamps - More memory-efficient than 2D format when node activities are sparse over time **Comparison with TEMPORAL_NODE_ANOMALY_SCORES:** | Feature | TEMPORAL_NODE_ANOMALY_SCORES | NODE_STREAM_ANOMALY_SCORES | |---------|------------------------------|----------------------------| | Format | 2D array `[T x N]` | 1D array `[M]` | | Use Case | Fixed node set, scores over time | Streaming node events | | Memory | `O(T x N)` - can be large | `O(M)` - efficient | | Inactive Nodes | Use `-2` for missing nodes | Not represented | --- ### EDGE_STREAM_ANOMALY_SCORES **Description:** Anomaly scores for edges in a streaming setting where each edge appears at a specific timestamp. Used for edge stream anomaly detection where edges arrive sequentially over time. **Format:** 1D array of scores with corresponding edge pairs and timestamps (one score per edge occurrence) ```json { "result_type": "EDGE_STREAM_ANOMALY_SCORES", "scores": [0.05, 0.91, 0.23, 0.68, 0.12], "ground_truth": [0, 1, 0, 1, 0], "edges": [[0,1], [1,2], [2,3], [0,3], [1,3]], "timestamps": [0, 0, 1, 1, 2] } ``` **Notes:** - Each index corresponds to one edge occurrence in the stream - `scores[i]` is the anomaly score for edge `edges[i]` at time `timestamps[i]` - Same edge can appear multiple times at different timestamps - More memory-efficient than 2D format when edges are sparse over time **Comparison with TEMPORAL_EDGE_ANOMALY_SCORES:** | Feature | TEMPORAL_EDGE_ANOMALY_SCORES | EDGE_STREAM_ANOMALY_SCORES | |---------|------------------------------|----------------------------| | Format | 2D array `[T x E]` | 1D array `[N]` | | Use Case | Fixed edge set, scores over time | Streaming edges | | Memory | `O(T x E)` - can be large | `O(N)` - efficient | | Inactive Edges | Use `-2` for missing edges | Not represented | --- ### GRAPH_STREAM_ANOMALY_SCORES **Description:** Anomaly scores for graphs in a streaming setting where each graph snapshot appears at a specific timestamp. **Format:** 1D array of scores with corresponding graph IDs and timestamps (one score per graph occurrence) ```json { "result_type": "GRAPH_STREAM_ANOMALY_SCORES", "scores": [0.23, 0.87, 0.11, 0.94, 0.35], "ground_truth": [0, 1, 0, 1, 0], "graph_ids": ["graph_001", "graph_002", "graph_001", "graph_003", "graph_002"], "timestamps": [0, 0, 1, 1, 2] } ``` **Notes:** - Each index corresponds to one graph occurrence in the stream - `scores[i]` is the anomaly score for graph `graph_ids[i]` at time `timestamps[i]` - Same graph can appear multiple times at different timestamps - More memory-efficient than 2D format when graph events are sparse over time **Comparison with TEMPORAL_GRAPH_ANOMALY_SCORES:** | Feature | TEMPORAL_GRAPH_ANOMALY_SCORES | GRAPH_STREAM_ANOMALY_SCORES | |---------|------------------------------|----------------------------| | Format | 2D array `[T x G]` | 1D array `[M]` | | Use Case | Fixed graph set, scores over time | Streaming graph snapshots | | Memory | `O(T x G)` - can be large | `O(M)` - efficient | | Inactive Graphs | Use `-2` for missing graphs | Not represented | --- ## Special Score Values - `-1`: Unknown/unassigned - `-2`: Inactive/unseen at this time step (temporal/streaming types only) ## Writing Results with ResultWriter Methods should use `ResultWriter` from `graflag_runner` to produce `results.json`: ```python from graflag_runner import ResultWriter writer = ResultWriter() # Save scores with ground truth writer.save_scores( result_type="NODE_ANOMALY_SCORES", scores=anomaly_scores.tolist(), ground_truth=labels.tolist(), node_ids=list(range(len(anomaly_scores))) # optional ) # Add metadata writer.add_metadata( method_name="your_method", dataset="cora" ) # Add resource metrics (optional, also set automatically by graflag_runner) writer.add_resource_metrics( exec_time_ms=12345.67, peak_memory_mb=512.3, peak_gpu_mb=2048.0 # optional ) # Finalize (writes results.json to EXP directory) writer.finalize() ``` The `results.json` file is saved to the experiment directory (`EXP` environment variable) and is used by `graflag evaluate` to compute metrics (AUC-ROC, AUC-PR) and generate plots. ### Required Fields | Field | Description | |-------|-------------| | `result_type` | One of the valid result types listed above | | `scores` | Anomaly scores (list or nested list) | | `ground_truth` | Binary labels matching scores shape (0=normal, 1=anomaly) | ### Optional Fields | Field | Description | |-------|-------------| | `metadata` | Dict with method_name, dataset, hyperparameters, etc. | | `timestamps` | Time indices for temporal/streaming types | | `node_ids` | Node identifiers | | `edges` | Edge pairs as `[[src, dst], ...]` | | `graph_ids` | Graph identifiers | ## Custom Metrics The evaluator computes built-in metrics (AUC-ROC, AUC-PR, Precision@K, Recall@K, F1@K, Best F1) for all result types. You can extend this with custom metrics via plugins. ### Plugin Files Place a `.py` file in one of two locations: - **Global** (all evaluations): `libs/graflag_evaluator/plugins/` - **Per-experiment**: `experiments//custom_metrics/` Each plugin imports `MetricCalculator` and registers a metric function at module level: ```python # hits_at_100.py from graflag_evaluator import MetricCalculator def hits_at_100(scores, ground_truth, **kw): top = scores.argsort()[-100:] return {"hits@100": ground_truth[top].mean()} MetricCalculator.register_metric("EDGE_STREAM_ANOMALY_SCORES", hits_at_100) ``` The function must accept `(scores, ground_truth, **kwargs)` and return a `Dict[str, float]`. ### Python API The `GraFlag.register_metric()` method extracts a function's source code and writes it as a plugin file on the cluster: ```python from graflag import GraFlag def hits_at_100(scores, ground_truth, **kw): top = scores.argsort()[-100:] return {"hits@100": ground_truth[top].mean()} gf = GraFlag() gf.register_metric("EDGE_STREAM_ANOMALY_SCORES", hits_at_100) gf.evaluate("exp__taddy__uci__20260309_120000") # includes hits@100 ``` Pass `experiment="exp_name"` to scope the metric to a single experiment instead of globally.