Skip to content

Data Flow & Telemetry

This page documents how signals flow from a client request through Synapse detection to the Horizon hub and out to dashboards.

End-to-End Signal Flow

Ingest Pipeline

1. Signal Creation (Synapse)

When Synapse detects a threat or noteworthy event, it creates a signal containing:

  • Signal type (e.g., SQLI_DETECTED, RATE_LIMIT_EXCEEDED, DLP_MATCH)
  • Source IP and fingerprint
  • Risk score and matched rules
  • Request metadata (path, method, headers)

Signals are batched locally and sent to the Horizon hub via the WebSocket tunnel.

2. WebSocket Ingestion (Gateway)

Sensors authenticate to /ws/sensors using API keys with signal:write scope. The gateway supports:

  • signal — single signal submission
  • signal-batch — batch submission (preferred)
  • pong — heartbeat response
  • blocklist-sync — fleet blocklist synchronization

3. Aggregation

The Aggregator processes incoming signals with several strategies:

StrategyDescription
BatchingFlushes at SIGNAL_BATCH_SIZE (default: 100) or SIGNAL_BATCH_TIMEOUT_MS (default: 5000 ms)
DeduplicationMerges signals by signalType + (sourceIp OR fingerprint) within the batch window
EnrichmentAdds tenant and sensor context to each signal
BackpressureMax queue size prevents memory exhaustion under burst load

4. Dual-Write Storage

  • PostgreSQL is authoritative — all signals, state, and relationships
  • ClickHouse receives async copies — failures do not block ingestion
  • ClickHouse-dependent endpoints return HTTP 503 when ClickHouse is disabled

5. Correlation

The Correlator analyzes signal batches for cross-tenant patterns:

  • Identifies coordinated attacks using anonymized SHA-256 fingerprints
  • Creates Campaign objects grouping related signals
  • Escalates campaigns when they cross severity thresholds

6. Broadcasting

The Broadcaster pushes real-time updates to connected dashboards:

  • Dashboards connect to /ws/dashboard with dashboard:read scope
  • Default subscriptions: campaigns, threats, blocklist
  • Supports subscribe/unsubscribe for fine-grained topic control
  • Auto-creates blocklist entries for high-severity threats

Fleet Command Flow

Commands are persisted in PostgreSQL. If a sensor is offline, commands are queued and delivered when the sensor reconnects.

Config and Rule Distribution

Rules are pushed to sensors using one of three strategies:

StrategyBehavior
immediatePush to all sensors now
canaryPush to a subset first, promote after verification
scheduledPush at a specified time

Sync state is tracked per-sensor in sensor_sync_state and rule_sync_state tables.

Hunt Query Routing

The Hunt Service routes queries based on their time window:

  • < 24 hours — PostgreSQL for fresh, authoritative data
  • > 24 hours — ClickHouse for historical time-series
  • Mixed ranges — split at the 24h boundary, query both, merge results

Metric Retention

CategoryMetricsRetentionResolution
TrafficRPS, bandwidth, latency percentiles, status codes90 days1 minute
SecurityBlocks, attacks, campaigns, risk scores1 yearRaw events
HealthCPU, memory, disk, connection count30 days1 minute
APIEndpoint stats, schema violations, discovery90 daysRaw events

Real-Time vs Historical Storage

Real-Time (Redis)Historical (ClickHouse)
Last 5 minutes of eventsFull event archive
Live attack feedsTrend analysis
Active session countsThreat hunting queries
War Room stateCompliance reports
Sub-second query latencyComplex aggregations

Licensed under AGPL-3.0 · atlascrew.dev