Skip to content

Phase2: Implementation of metrics collectors for Dashboard#464

Open
kushagra0902 wants to merge 14 commits intocameri:mainfrom
kushagra0902:phase2
Open

Phase2: Implementation of metrics collectors for Dashboard#464
kushagra0902 wants to merge 14 commits intocameri:mainfrom
kushagra0902:phase2

Conversation

@kushagra0902
Copy link
Copy Markdown
Contributor

Description

This PR delivers Phase 2 of the dashboard service by replacing placeholder KPI generation with real PostgreSQL-backed metrics. Appropriate indexs are also added in the database for efficient querying.

Related Issue

Solves issue #144 partially by implementing phase 2 of the proposed plan

Motivation and Context

The main motivation is discussed in the issue discussion. TLDR: It helps in solving the problem of monitoring Relay performance by adding a dashboard showing the KPIs discussed.

How Has This Been Tested?

Unit tests have been added for the appropriate files. Other than this manual testing involves spinnig up relay server, and dashboard together and monitoring the dashboard endpoints

Screenshots (if appropriate):

N/A

Types of changes

  • Non-functional change (docs, style, minor refactor)
  • Bug fix (non-breaking change which fixes an issue)
  • New feature (non-breaking change which adds functionality)
  • Breaking change (fix or feature that would cause existing functionality to change)

Checklist:

  • My code follows the code style of this project.
  • My change requires a change to the documentation.
  • I have updated the documentation accordingly.
  • I have read the CONTRIBUTING document.
  • I have added tests to cover my code changes.
  • All new and existing tests passed.

Copilot AI review requested due to automatic review settings April 19, 2026 05:33
Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Implements Phase 2 of the dashboard service by introducing a dedicated HTTP/WebSocket app that serves KPI snapshots and periodically refreshes them from PostgreSQL-backed collectors, alongside DB migrations intended to support efficient KPI querying and change detection.

Changes:

  • Added dashboard-service runtime (Express + WebSocket) with polling scheduler and snapshot caching.
  • Implemented PostgreSQL KPI collector queries (events-by-kind, admitted users, sats paid, top talkers) plus snapshot/update-version plumbing.
  • Added migrations for KPI query indexes and new triggers/state to detect changes, plus unit tests for scheduler/snapshot/app wiring.

Reviewed changes

Copilot reviewed 22 out of 22 changed files in this pull request and generated 8 comments.

Show a summary per file
File Description
test/unit/dashboard-service/snapshot-service.spec.ts Adds unit coverage for snapshot refresh/sequence behavior and error propagation.
test/unit/dashboard-service/polling-scheduler.spec.ts Adds unit coverage for non-overlapping polling behavior and failure recovery.
test/unit/dashboard-service/app.spec.ts Adds an integration-style unit test asserting health/snapshot/ws endpoints are served.
src/dashboard-service/app.ts Wires Express routes, WebSocket hub, snapshot service, and polling scheduler into a runnable service.
src/dashboard-service/config.ts Adds dashboard-service CLI/env configuration parsing.
src/dashboard-service/index.ts Adds dashboard-service entrypoint with lifecycle/shutdown handling.
src/dashboard-service/types.ts Introduces dashboard KPI and websocket message types.
src/dashboard-service/services/snapshot-service.ts Adds in-memory snapshot caching, fingerprinting, and optional revision-based skipping.
src/dashboard-service/services/kpi-collector-service.ts Implements PostgreSQL-backed KPI aggregation queries.
src/dashboard-service/services/dashboard-update-version-service.ts Reads dashboard revision from DB with fallback-on-error behavior.
src/dashboard-service/polling/polling-scheduler.ts Adds recursive setTimeout scheduler to avoid overlapping polls.
src/dashboard-service/ws/dashboard-ws-hub.ts Adds websocket connection handling and snapshot/tick broadcasts.
src/dashboard-service/api/dashboard-router.ts Adds /api/v1/kpis/snapshot route wiring.
src/dashboard-service/controllers/get-health-controller.ts Adds /healthz controller.
src/dashboard-service/controllers/get-kpi-snapshot-controller.ts Adds KPI snapshot controller.
src/dashboard-service/handlers/request-handlers/get-health-request-handler.ts Adds request handler wrapper for health route.
src/dashboard-service/handlers/request-handlers/get-kpi-snapshot-request-handler.ts Adds request handler wrapper for KPI snapshot route.
package.json Adds dashboard dev/start scripts and dashboard-focused unit test script.
migrations/20260412_020000_add_dashboard_kpi_indexes.js Adds concurrent indexes intended to support KPI queries.
migrations/20260411_210100_add_dashboard_live_notify_triggers.js Adds NOTIFY triggers for dashboard change events.
migrations/20260419_120000_add_dashboard_updated_revision.js Adds dashboard_state + revision trigger mechanism for change detection.
migrations/20260415_010000_add_nip29_group_control_tables.js Adds NIP-29 group control tables (unrelated to dashboard KPIs).

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines +22 to +25
await knex.raw(`
CREATE INDEX CONCURRENTLY IF NOT EXISTS idx_events_pubkey
ON events (event_pubkey);
`)
Copy link

Copilot AI Apr 19, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This migration creates idx_events_pubkey on events(event_pubkey), but the events table already defines an index on event_pubkey (created in the initial events table migration). Adding a second btree index on the same column increases write amplification and disk usage without improving plans. Consider dropping idx_events_pubkey and relying on the existing index (or change it to a composite/covering index if you need different ordering).

Copilot uses AI. Check for mistakes.
Comment on lines +22 to +25
await knex.raw(`
CREATE INDEX CONCURRENTLY IF NOT EXISTS idx_events_pubkey
ON events (event_pubkey);
`)
Copy link

Copilot AI Apr 19, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This migration creates idx_events_pubkey on events(event_pubkey), but the events table already has a btree index on event_pubkey from its original migration. A duplicate index increases insert/update cost and disk usage without improving query plans. Consider removing idx_events_pubkey (or changing it to a genuinely different composite/covering index if needed).

Copilot uses AI. Check for mistakes.
Comment on lines +20 to +24
UPDATE dashboard_state
SET
revision = revision + 1,
updated_at = NOW()
WHERE id = 1;
Copy link

Copilot AI Apr 19, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

dashboard_updated() performs UPDATE dashboard_state ... WHERE id = 1 on every INSERT/UPDATE/DELETE statement for both events and users. Because event writes are typically single-row statements, this introduces a hot-row contention point and extra WAL writes on high-throughput relays. Consider using a sequence-based revision (nextval) or relying on LISTEN/NOTIFY (added separately) instead of updating a single shared row for every write.

Copilot uses AI. Check for mistakes.
Comment on lines +21 to +23
export class KPICollectorService {
public constructor(
private readonly dbClient: DatabaseClient,
Copy link

Copilot AI Apr 19, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

KPICollectorService adds non-trivial SQL aggregation logic, but there are currently no unit tests referencing it anywhere under test/. Adding tests that validate the query result mapping (counts, sums, top talkers ordering) would help catch regressions in the dashboard KPIs.

Copilot uses AI. Check for mistakes.
private snapshot: KPISnapshot = {
sequence: this.sequence,
generatedAt: new Date(0).toISOString(),
status: 'live',
Copy link

Copilot AI Apr 19, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

SnapshotService initializes generatedAt to the Unix epoch and status to 'live'. If the first successful collection returns the default metrics, refresh() treats it as unchanged and will not update generatedAt/sequence, leaving a "live" snapshot with an epoch timestamp indefinitely. Consider initializing status as 'stale' (or forcing the first refresh to update the snapshot) so the first successful refresh always sets a real generatedAt.

Suggested change
status: 'live',
status: 'stale',

Copilot uses AI. Check for mistakes.
Comment on lines +9 to +17
const toNumber = (value: unknown): number => {
if (typeof value === 'number') {
return value
}

if (typeof value === 'string' && value !== '') {
return Number(value)
}

Copy link

Copilot AI Apr 19, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

toNumber() converts Postgres COUNT/BIGINT/NUMERIC string results via Number(value). For large relays this can exceed Number.MAX_SAFE_INTEGER and silently lose precision, producing incorrect KPI counts. Consider returning BigInt/string from the collector (and updating API types), or detect unsafe conversions and handle them explicitly.

Suggested change
const toNumber = (value: unknown): number => {
if (typeof value === 'number') {
return value
}
if (typeof value === 'string' && value !== '') {
return Number(value)
}
const INTEGER_STRING_PATTERN = /^-?\d+$/
const toNumber = (value: unknown): number => {
if (typeof value === 'number') {
if (!Number.isFinite(value)) {
throw new RangeError(`Cannot convert non-finite number to KPI metric: ${value}`)
}
if (Number.isInteger(value) && !Number.isSafeInteger(value)) {
throw new RangeError(`Cannot convert unsafe integer to KPI metric: ${value}`)
}
return value
}
if (typeof value === 'bigint') {
if (value > BigInt(Number.MAX_SAFE_INTEGER) || value < BigInt(Number.MIN_SAFE_INTEGER)) {
throw new RangeError(`Cannot convert bigint outside safe integer range to KPI metric: ${value.toString()}`)
}
return Number(value)
}
if (typeof value === 'string' && value !== '') {
if (INTEGER_STRING_PATTERN.test(value)) {
const bigintValue = BigInt(value)
if (bigintValue > BigInt(Number.MAX_SAFE_INTEGER) || bigintValue < BigInt(Number.MIN_SAFE_INTEGER)) {
throw new RangeError(`Cannot convert integer string outside safe integer range to KPI metric: ${value}`)
}
return Number(bigintValue)
}
const numericValue = Number(value)
if (!Number.isFinite(numericValue)) {
throw new RangeError(`Cannot convert non-finite numeric string to KPI metric: ${value}`)
}
return numericValue
}

Copilot uses AI. Check for mistakes.
Comment on lines +97 to +101
.sum<{ total: string | null }>('balance as total')
.first()

const millisats = toNumber(result?.total)
return millisats / 1000
Copy link

Copilot AI Apr 19, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

SUM(users.balance) is over a BIGINT column, but getSatsPaidCount converts the result through Number(...) and divides by 1000. If the total exceeds JS safe integer range this will silently produce the wrong satsPaid. Consider fetching/handling the aggregate as BigInt/string and formatting for the API response.

Copilot uses AI. Check for mistakes.
wsPath: '/api/v1/kpis/stream',
pollIntervalMs: 1000,
useDummyData: true,
collectorMode: 'full',
Copy link

Copilot AI Apr 19, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The config literal passed to createDashboardService includes collectorMode: 'full', but DashboardServiceConfig/createDashboardService don't define that option. With ts-node/register this can fail TS compilation due to excess property checks. Remove collectorMode from the test config (or add it to DashboardServiceConfig and implement it).

Suggested change
collectorMode: 'full',

Copilot uses AI. Check for mistakes.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants