# Graph Anomaly Detection Result Types

This document defines the standardized result types used in GraFlag for graph anomaly detection methods.

## Static Graph Results

### NODE_ANOMALY_SCORES

**Description:** Anomaly scores for each individual node in a static graph.

**Format:** Array of numerical scores, one per node

```json
{
  "result_type": "NODE_ANOMALY_SCORES",
  "scores": [0.12, 0.89, 0.03, 0.76, 0.15],
  "ground_truth": [0, 1, 0, 1, 0],
  "node_ids": [0, 1, 2, 3, 4]
}
```

---

### EDGE_ANOMALY_SCORES

**Description:** Anomaly scores for each individual edge in a static graph.

**Format:** Array of numerical scores, one per edge

```json
{
  "result_type": "EDGE_ANOMALY_SCORES",
  "scores": [0.05, 0.92, 0.17, 0.68],
  "ground_truth": [0, 1, 0, 1],
  "edges": [[0,1], [1,2], [2,3], [0,3]]
}
```

---

### GRAPH_ANOMALY_SCORES

**Description:** Anomaly scores for entire graphs in a graph classification setting.

**Format:** Array of numerical scores, one per graph

```json
{
  "result_type": "GRAPH_ANOMALY_SCORES",
  "scores": [0.23, 0.87, 0.11, 0.94],
  "ground_truth": [0, 1, 0, 1],
  "graph_ids": ["graph_001", "graph_002", "graph_003", "graph_004"]
}
```

---

## Temporal/Dynamic Graph Results

### TEMPORAL_NODE_ANOMALY_SCORES

**Description:** Time-series of anomaly scores for nodes as the graph evolves over time.

**Format:** 2D array where each row represents a time step, each column a node

```json
{
  "result_type": "TEMPORAL_NODE_ANOMALY_SCORES",
  "scores": [
    [0.12, 0.25, 0.08, 0.19],
    [0.15, 0.89, 0.12, 0.22],
    [0.18, 0.93, 0.15, 0.25]
  ],
  "ground_truth": [
    [0, 0, 0, 0],
    [0, 1, 0, 0],
    [0, 1, 0, 0]
  ],
  "timestamps": [0, 1, 2],
  "node_ids": [0, 1, 2, 3]
}
```

---

### TEMPORAL_EDGE_ANOMALY_SCORES

**Description:** Time-series of anomaly scores for edges as new connections appear/disappear.

**Format:** 2D array where each row represents a time step, each column an edge

```json
{
  "result_type": "TEMPORAL_EDGE_ANOMALY_SCORES",
  "scores": [
    [0.05, 0.32, 0.17],
    [0.08, 0.91, 0.20],
    [0.12, 0.95, 0.23]
  ],
  "ground_truth": [
    [0, 0, 0],
    [0, 1, 0],
    [0, 1, 0]
  ],
  "timestamps": [0, 1, 2],
  "edges": [[0,1], [1,2], [2,3]]
}
```

---

### TEMPORAL_GRAPH_ANOMALY_SCORES

**Description:** Time-series of anomaly scores for entire graphs in a dynamic setting.

**Format:** 2D array where each row represents a time iteration, each column a graph

```json
{
  "result_type": "TEMPORAL_GRAPH_ANOMALY_SCORES",
  "scores": [
    [0.086, 0.056, 0.062, 0.044],
    [0.089, 0.061, 0.065, 0.047],
    [0.092, 0.358, 0.068, 0.050]
  ],
  "ground_truth": [
    [0, 0, 0, 0],
    [0, 0, 0, 0],
    [0, 1, 0, 0]
  ],
  "iterations": [1, 2, 3],
  "graph_ids": [478, 338, 337, 318]
}
```

---

## Streaming Graph Results

### NODE_STREAM_ANOMALY_SCORES

**Description:** Anomaly scores for nodes in a streaming setting where each node activity appears at a specific timestamp. Used for node stream anomaly detection where node events arrive sequentially over time.

**Format:** 1D array of scores with corresponding node IDs and timestamps (one score per node occurrence)

```json
{
  "result_type": "NODE_STREAM_ANOMALY_SCORES",
  "scores": [0.12, 0.89, 0.15, 0.76, 0.23],
  "ground_truth": [0, 1, 0, 1, 0],
  "node_ids": [0, 1, 0, 2, 1],
  "timestamps": [0, 0, 1, 1, 2]
}
```

**Notes:**

- Each index corresponds to one node activity occurrence in the stream
- `scores[i]` is the anomaly score for node `node_ids[i]` at time `timestamps[i]`
- Same node can appear multiple times at different timestamps
- More memory-efficient than 2D format when node activities are sparse over time

**Comparison with TEMPORAL_NODE_ANOMALY_SCORES:**

| Feature | TEMPORAL_NODE_ANOMALY_SCORES | NODE_STREAM_ANOMALY_SCORES |
|---------|------------------------------|----------------------------|
| Format | 2D array `[T x N]` | 1D array `[M]` |
| Use Case | Fixed node set, scores over time | Streaming node events |
| Memory | `O(T x N)` - can be large | `O(M)` - efficient |
| Inactive Nodes | Use `-2` for missing nodes | Not represented |

---

### EDGE_STREAM_ANOMALY_SCORES

**Description:** Anomaly scores for edges in a streaming setting where each edge appears at a specific timestamp. Used for edge stream anomaly detection where edges arrive sequentially over time.

**Format:** 1D array of scores with corresponding edge pairs and timestamps (one score per edge occurrence)

```json
{
  "result_type": "EDGE_STREAM_ANOMALY_SCORES",
  "scores": [0.05, 0.91, 0.23, 0.68, 0.12],
  "ground_truth": [0, 1, 0, 1, 0],
  "edges": [[0,1], [1,2], [2,3], [0,3], [1,3]],
  "timestamps": [0, 0, 1, 1, 2]
}
```

**Notes:**

- Each index corresponds to one edge occurrence in the stream
- `scores[i]` is the anomaly score for edge `edges[i]` at time `timestamps[i]`
- Same edge can appear multiple times at different timestamps
- More memory-efficient than 2D format when edges are sparse over time

**Comparison with TEMPORAL_EDGE_ANOMALY_SCORES:**

| Feature | TEMPORAL_EDGE_ANOMALY_SCORES | EDGE_STREAM_ANOMALY_SCORES |
|---------|------------------------------|----------------------------|
| Format | 2D array `[T x E]` | 1D array `[N]` |
| Use Case | Fixed edge set, scores over time | Streaming edges |
| Memory | `O(T x E)` - can be large | `O(N)` - efficient |
| Inactive Edges | Use `-2` for missing edges | Not represented |

---

### GRAPH_STREAM_ANOMALY_SCORES

**Description:** Anomaly scores for graphs in a streaming setting where each graph snapshot appears at a specific timestamp.

**Format:** 1D array of scores with corresponding graph IDs and timestamps (one score per graph occurrence)

```json
{
  "result_type": "GRAPH_STREAM_ANOMALY_SCORES",
  "scores": [0.23, 0.87, 0.11, 0.94, 0.35],
  "ground_truth": [0, 1, 0, 1, 0],
  "graph_ids": ["graph_001", "graph_002", "graph_001", "graph_003", "graph_002"],
  "timestamps": [0, 0, 1, 1, 2]
}
```

**Notes:**

- Each index corresponds to one graph occurrence in the stream
- `scores[i]` is the anomaly score for graph `graph_ids[i]` at time `timestamps[i]`
- Same graph can appear multiple times at different timestamps
- More memory-efficient than 2D format when graph events are sparse over time

**Comparison with TEMPORAL_GRAPH_ANOMALY_SCORES:**

| Feature | TEMPORAL_GRAPH_ANOMALY_SCORES | GRAPH_STREAM_ANOMALY_SCORES |
|---------|------------------------------|----------------------------|
| Format | 2D array `[T x G]` | 1D array `[M]` |
| Use Case | Fixed graph set, scores over time | Streaming graph snapshots |
| Memory | `O(T x G)` - can be large | `O(M)` - efficient |
| Inactive Graphs | Use `-2` for missing graphs | Not represented |

---

## Special Score Values

- `-1`: Unknown/unassigned
- `-2`: Inactive/unseen at this time step (temporal/streaming types only)

## Writing Results with ResultWriter

Methods should use `ResultWriter` from `graflag_runner` to produce `results.json`:

```python
from graflag_runner import ResultWriter

writer = ResultWriter()

# Save scores with ground truth
writer.save_scores(
    result_type="NODE_ANOMALY_SCORES",
    scores=anomaly_scores.tolist(),
    ground_truth=labels.tolist(),
    node_ids=list(range(len(anomaly_scores)))  # optional
)

# Add metadata
writer.add_metadata(
    method_name="your_method",
    dataset="cora"
)

# Add resource metrics (optional, also set automatically by graflag_runner)
writer.add_resource_metrics(
    exec_time_ms=12345.67,
    peak_memory_mb=512.3,
    peak_gpu_mb=2048.0      # optional
)

# Finalize (writes results.json to EXP directory)
writer.finalize()
```

The `results.json` file is saved to the experiment directory (`EXP` environment variable) and is used by `graflag evaluate` to compute metrics (AUC-ROC, AUC-PR) and generate plots.

### Required Fields

| Field | Description |
|-------|-------------|
| `result_type` | One of the valid result types listed above |
| `scores` | Anomaly scores (list or nested list) |
| `ground_truth` | Binary labels matching scores shape (0=normal, 1=anomaly) |

### Optional Fields

| Field | Description |
|-------|-------------|
| `metadata` | Dict with method_name, dataset, hyperparameters, etc. |
| `timestamps` | Time indices for temporal/streaming types |
| `node_ids` | Node identifiers |
| `edges` | Edge pairs as `[[src, dst], ...]` |
| `graph_ids` | Graph identifiers |

## Custom Metrics

The evaluator computes built-in metrics (AUC-ROC, AUC-PR, Precision@K, Recall@K, F1@K, Best F1) for all result types. You can extend this with custom metrics via plugins.

### Plugin Files

Place a `.py` file in one of two locations:

- **Global** (all evaluations): `libs/graflag_evaluator/plugins/`
- **Per-experiment**: `experiments/<exp_name>/custom_metrics/`

Each plugin imports `MetricCalculator` and registers a metric function at module level:

```python
# hits_at_100.py
from graflag_evaluator import MetricCalculator

def hits_at_100(scores, ground_truth, **kw):
    top = scores.argsort()[-100:]
    return {"hits@100": ground_truth[top].mean()}

MetricCalculator.register_metric("EDGE_STREAM_ANOMALY_SCORES", hits_at_100)
```

The function must accept `(scores, ground_truth, **kwargs)` and return a `Dict[str, float]`.

### Python API

The `GraFlag.register_metric()` method extracts a function's source code and writes it as a plugin file on the cluster:

```python
from graflag import GraFlag

def hits_at_100(scores, ground_truth, **kw):
    top = scores.argsort()[-100:]
    return {"hits@100": ground_truth[top].mean()}

gf = GraFlag()
gf.register_metric("EDGE_STREAM_ANOMALY_SCORES", hits_at_100)
gf.evaluate("exp__taddy__uci__20260309_120000")  # includes hits@100
```

Pass `experiment="exp_name"` to scope the metric to a single experiment instead of globally.