Log Federation - Dimensigon

Real-World Use Case: Production Incident Response

A production issue hits 5 servers. You need to correlate logs across all of them instantly to find root cause.

1. Centralized Log Querying

Federated Log API

# Query logs from all nodes at once
$ curl https://any-node:20194/api/v1.0/logs \
  -G --data-urlencode "filter=ERROR"

# Results from all 5 nodes, chronologically ordered
[
  {
    "node": "prod-1",
    "timestamp": "2025-02-20T06:23:45.123Z",
    "message": "Database connection timeout"
  },
  {
    "node": "prod-2",
    "timestamp": "2025-02-20T06:23:46.456Z",
    "message": "Orchestration failed: DB unreachable"
  }
]

# Filter by orchestration execution
$ curl https://any-node:20194/api/v1.0/logs \
  -G --data-urlencode "execution_id=exec-123"

Zero Setup Logging

No ELK stack to maintain
No external logging service
Just query from any cluster node
Complete execution tracing

2. Real-Time Monitoring Dashboard

Built-In Web Dashboard

DM-WebManager provides live log streaming and real-time orchestration monitoring:

Live tail of all logs
Filter by node, level, keyword
Execution progress tracking
Performance metrics

Execution Logs View

2025-02-20 06:23:00 [prod-1] INFO Starting deploy-v2
2025-02-20 06:23:01 [prod-2] INFO Pulling image
2025-02-20 06:23:03 [prod-3] INFO Running health check
2025-02-20 06:23:04 [prod-1] WARN Health slow (2.3s)
2025-02-20 06:23:05 [prod-4] OK   All checks passed

Additional Capabilities

🔍 Advanced Log Filtering

Filter by node, level, timestamp range, orchestration ID, or custom regex patterns.

Filter Examples

"ERROR in prod-1 AND after:2025-03-12T14:00"
"execution_id=exec-789 AND level:WARN"
"regex:Database.*timeout AND nodes:[prod-1..prod-5]"

📊 Log Aggregation Metrics

Automatically aggregate logs by error type, node, and time window for trend analysis.

Aggregation

"errors_by_node": {
  "prod-1": 23,
  "prod-2": 7
},
"top_errors": [
  {"type": "timeout", "count": 15}
]

⏱️ Retention Policies

Automatic log rotation and retention policies. Keep recent logs hot, archive old logs.

Retention Config

"retention": {
  "hot": "7d",
  "warm": "30d",
  "cold_archive": "1y"
}

📈 Log-Based Alerting

Create alerts triggered by error patterns, missing heartbeats, or anomaly detection.

Alert Rules

"alert": {
  "name": "high-error-rate",
  "condition": "errors > 100 in 5m",
  "webhook": "https://slack/hook"
}

🔍

Instant Search

Query millions of log lines in milliseconds across all nodes.

📊

No Infrastructure

Logs stored locally on each node. No external dependencies.

⏱️

Real-Time Tailing

Stream logs live as orchestrations execute.

📈

Audit Trail

Complete history of all operations for compliance.

Start Monitoring Back to Features

Unified Visibility Across All Nodes