fix: comprehensive dashboard, backend, and schema improvements by MysticRyuujin · Pull Request #15 · NethermindEth/json-bench

MysticRyuujin · 2025-12-11T21:20:13Z

This vibe coded PR fixes a ton of issues I had with the dashboards including but not limited to:

Fixes test names
Incorrect Success Rates
- Fixes issue where Avg Success Rate would read 0% as would Per-Client Metrics
- Fixes issue where JSON-RPC would return error but it would not record it as an error
Incorrect Response Time Distribution
- Mean would report Infinitys
- Std Dev would report NaNs
Baseline Management Fixes
- I could not save or create new Baselines in the Dashboard
  - I did my best to ensure the save and delete feature is working and properly displays all the required UI elements
  - I have no idea what this is actually suppose to do...
  - I think the button "Set as Baseline" at the top of the page is still broken
Fixed Throughput Trends UI
- Request Throughput now looks more correct and trendy

I also tried to fix some metrics in Grafana for the k6 Prometheus dashboard, specifically Iterations, HTTP Latency Stats (p99/p95/avg/med), and HTTP Latency Distribution (min/avg/med/p90/p95/max)

I have tested all of these changes locally:

docker compose down --remove-orphans --volumes
docker compose up -d --build
go run ./runner/main.go -config ~/tmp/config/mix.yaml -clients ~/tmp/config/clients.yaml -historic -storage-config ./config/storage-example.yaml
open outputs/report.html
open http://localhost:8080

However, there are a handful of things that PR touches that I'm not overly confident in...

I had to make changes to the database schema to make Baseline stuff work in the GUI, while I tested everything locally, it's completely unclear to me how any of the baseline stuff was intended to work in the first place, if it ever did? I could use some guidance here.
The outputs/report.html ALWAYS says that connection pooling is 0 and it should be enabled, but I don't think these metrics were being captured correctly, if at all, I don't think k6 reports this? I removed that because I wasn't sure what else to do.
Copilot has quite a few suggestions about fixing the API that are currently "out of scope" but maybe should be addressed? 😅

- Sync init-db-clean.sql schema with init-db.sql - Fix dashboard components (LatencyDistribution, ThroughputChart, BaselineManager) - Improve error handling and edge cases in runner - Update API client metrics and server handling - Fix performance analyzer and data exporter - Address code review feedback on documentation and error handling

Copilot

Pull request overview

This PR implements comprehensive fixes for dashboard metrics, baseline management, and database schema improvements. The changes address critical issues with success rate calculations, response time distributions showing invalid values (Infinity/NaN), and non-functional baseline management features. Additionally, it removes incorrectly tracked connection pooling metrics and updates Grafana dashboards with correct k6 Prometheus metric queries.

Key Changes

Metrics Accuracy: Fixed success rate calculations by tracking JSON-RPC errors via k6_checks_rate metrics, and added proper handling for Infinity/NaN values in latency distributions
Baseline Management: Enhanced baseline functionality with git branch/commit tracking, proper API integration, and improved UI for baseline CRUD operations
Database Schema: Added full_results and git fields to support baseline features, changed ID types from UUID to VARCHAR for better readability, and removed foreign key constraints (with documented risks)

Reviewed changes

Copilot reviewed 26 out of 28 changed files in this pull request and generated 16 comments.

Show a summary per file

File	Description
runner/types/types.go	Added SuccessRate field to ClientMetrics for explicit success tracking
runner/storage/postgres.go	Added full_results JSONB column support and improved JSON marshaling error handling
runner/storage/historic.go	Extract test name and description from config for better run metadata
runner/storage/grafana_schema.go	Added full_results column to benchmark_runs table schema
runner/metrics/results_collection.go	Track JSON-RPC errors using k6_checks_rate metrics to fix success rate calculations
runner/main.go	Updated prometheus endpoint flag description for clarity
runner/generator/ultimate_html_report.go	Removed non-functional connection metrics from HTML reports
runner/generator/k6_generator.go	Added logic to properly construct Prometheus remote write URL
runner/exporter/data_exporter.go	Removed connection metrics from CSV exports
runner/api/test_types.go	Added GitBranch and GitCommit fields to Baseline struct
runner/api/server.go	Implemented global trends endpoint for dashboard metrics
runner/api/client_metrics.go	Added SuccessRate field computation in client metrics
runner/analyzer/performance_analyzer.go	Removed recommendations for unmeasured connection metrics
runner/analysis/baseline.go	Enhanced baseline management with git tracking and fallback metrics handling
metrics/init-db.sql	Updated schema with new columns, changed UUID to VARCHAR IDs, removed foreign keys
metrics/init-db-clean.sql	Applied same schema changes for clean database initialization
metrics/dashboards/k6-dashboard.json	Fixed Grafana queries to use correct k6 metric names (iterations, latency stats)
dashboard/src/utils/data-transformers.ts	Handle full_results as both string and pre-parsed object
dashboard/src/pages/RunDetails.tsx	Integrated proper baseline management hooks with API
dashboard/src/pages/Dashboard.tsx	Improved empty states and added warnings about throughput approximations
dashboard/src/hooks/useDetailedMetrics.ts	Refactored aggregation logic with better error handling
dashboard/src/components/ThroughputChart.tsx	UI improvements for better layout and formatting
dashboard/src/components/LatencyDistribution.tsx	Fixed NaN/Infinity handling in statistics calculations
dashboard/src/components/BaselineManager.tsx	Added extensive handling for API naming inconsistencies (camelCase vs snake_case)
dashboard/src/api/client.ts	Fixed baseline API calls to use snake_case as required by backend
go.sum	Removed unused dependency entries
README.md	Fixed output path in documentation
.gitignore	Removed snapshot directory entries

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copilot · 2025-12-11T21:33:52Z

runner/api/server.go

+	// Map metric name to database column using a whitelist to prevent SQL injection
+	validMetricColumns := map[string]string{
+		"avg_latency":  "avg_latency",
+		"p95_latency":  "p95_latency",
+		"p99_latency":  "p99_latency",
+		"error_rate":   "(100.0 - success_rate)",
+		"throughput":   "throughput",
+		"success_rate": "success_rate",
+	}
+
+	metricColumn, valid := validMetricColumns[metric]
+	if !valid {
+		// Default to avg_latency if metric not recognized
+		metricColumn = "avg_latency"
+	}
+
+	query := fmt.Sprintf(`
+		SELECT 
+			timestamp,
+			test_name,
+			%s as value
+		FROM benchmark_runs
+		WHERE timestamp >= $1
+		ORDER BY timestamp DESC
+		LIMIT $2
+	`, metricColumn)
+


While a whitelist approach is used for metric validation which is good, the metric column value is directly interpolated into the SQL query using fmt.Sprintf. Although the value comes from a whitelist, this pattern is risky and could be error-prone if the whitelist is modified incorrectly in the future. Consider using a safer approach such as a switch statement that builds the entire query string, or using parameterized queries with CASE expressions.

Suggested change

// Map metric name to database column using a whitelist to prevent SQL injection

validMetricColumns := map[string]string{

"avg_latency": "avg_latency",

"p95_latency": "p95_latency",

"p99_latency": "p99_latency",

"error_rate": "(100.0 - success_rate)",

"throughput": "throughput",

"success_rate": "success_rate",

}

metricColumn, valid := validMetricColumns[metric]

if !valid {

// Default to avg_latency if metric not recognized

metricColumn = "avg_latency"

}

query := fmt.Sprintf(`

SELECT

timestamp,

test_name,

%s as value

FROM benchmark_runs

WHERE timestamp >= $1

ORDER BY timestamp DESC

LIMIT $2

`, metricColumn)

// Use a switch statement to select the query string based on the metric to prevent SQL injection

var query string

switch metric {

case "avg_latency":

query = `

SELECT

timestamp,

test_name,

avg_latency as value

FROM benchmark_runs

WHERE timestamp >= $1

ORDER BY timestamp DESC

LIMIT $2

`

case "p95_latency":

query = `

SELECT

timestamp,

test_name,

p95_latency as value

FROM benchmark_runs

WHERE timestamp >= $1

ORDER BY timestamp DESC

LIMIT $2

`

case "p99_latency":

query = `

SELECT

timestamp,

test_name,

p99_latency as value

FROM benchmark_runs

WHERE timestamp >= $1

ORDER BY timestamp DESC

LIMIT $2

`

case "error_rate":

query = `

SELECT

timestamp,

test_name,

(100.0 - success_rate) as value

FROM benchmark_runs

WHERE timestamp >= $1

ORDER BY timestamp DESC

LIMIT $2

`

case "throughput":

query = `

SELECT

timestamp,

test_name,

throughput as value

FROM benchmark_runs

WHERE timestamp >= $1

ORDER BY timestamp DESC

LIMIT $2

`

case "success_rate":

query = `

SELECT

timestamp,

test_name,

success_rate as value

FROM benchmark_runs

WHERE timestamp >= $1

ORDER BY timestamp DESC

LIMIT $2

`

default:

// Default to avg_latency if metric not recognized

query = `

SELECT

timestamp,

test_name,

avg_latency as value

FROM benchmark_runs

WHERE timestamp >= $1

ORDER BY timestamp DESC

LIMIT $2

`

}

Copilot · 2025-12-11T21:33:52Z

runner/analysis/baseline.go

+		// Outer condition ensures len(run.Clients) > 0, so division is safe
+		totalRequests := run.TotalRequests / int64(len(run.Clients))
+		totalErrors := run.TotalErrors / int64(len(run.Clients))


Division by len(run.Clients) to distribute total requests and errors across clients is overly simplistic and may not reflect actual distribution if clients handled different loads. The comment on line 621 notes that len(run.Clients) > 0 to avoid division by zero, but the distribution itself may be inaccurate. Consider documenting this limitation more clearly or tracking per-client metrics properly.

Copilot · 2025-12-11T21:33:53Z

dashboard/src/pages/Dashboard.tsx

+                      // WARNING: This is an overly simplistic approximation that assumes
+                      // a direct inverse relationship without accounting for:
+                      // - Concurrency levels
+                      // - Request batching
+                      // - Network overhead
+                      // - Client-side processing time


The comment describes the approximation formula as "overly simplistic" and lists multiple factors it doesn't account for (concurrency, batching, network overhead, etc.). While it's good to document these limitations, if the approximation is this inaccurate, consider whether displaying it at all provides value or if it might mislead users. The visible disclaimer on line 466-468 helps, but the extensive warning comments suggest this metric may not be reliable enough for production use.

Copilot · 2025-12-11T21:33:53Z

dashboard/src/api/client.ts

+   * @note Uses snake_case (run_id) as required by backend API.
+   *       TODO: Backend should be standardized to accept camelCase for consistency
+   *       with JavaScript/TypeScript conventions.
   */
  async setBaseline(runId: string, name: string): Promise<void> {
    await this.makeRequest({
      method: 'POST',
      url: '/api/baselines',
-      data: { runId, name }
+      data: { run_id: runId, name }


The API expects snake_case field names (run_id) but the TypeScript frontend uses camelCase conventions. The comment and TODO on lines 336-338 correctly identify that the backend should accept camelCase for consistency with JavaScript/TypeScript conventions. This inconsistency forces the frontend to transform data before sending it, adding unnecessary complexity and potential for bugs.

Copilot · 2025-12-11T21:33:53Z

runner/analysis/baseline.go

+	// NOTE: Schema migrations (such as adding columns) should be handled externally 
+	// using a migration tool or versioned SQL scripts. The baselines table creation 
+	// includes git_branch and git_commit columns for new databases.
+


The comment mentions that schema migrations should be handled externally, but there's no migration mechanism in place for existing databases that need the git_branch and git_commit columns added. This could cause runtime errors when the application tries to query these columns on existing databases. Consider adding ALTER TABLE statements with IF NOT EXISTS checks, or providing clear migration instructions in the documentation.

Suggested change

// Ensure git_branch and git_commit columns exist for legacy databases

alterColumnQueries := []string{

`ALTER TABLE baselines ADD COLUMN IF NOT EXISTS git_branch VARCHAR(255)`,

`ALTER TABLE baselines ADD COLUMN IF NOT EXISTS git_commit VARCHAR(255)`,

}

for _, q := range alterColumnQueries {

if _, err := bm.db.ExecContext(ctx, q); err != nil {

bm.log.WithError(err).Warnf("Failed to add column with query: %s", q)

}

}

Copilot · 2025-12-11T21:33:55Z

dashboard/src/components/BaselineManager.tsx

+// Helper to safely get fields from baselines that might be HistoricRun or Baseline objects
+// NOTE: These helpers handle both camelCase and snake_case field names due to API response
+// inconsistencies. This is a defensive workaround that masks an underlying API design issue.
+// TODO: The backend API should be standardized to use a consistent naming convention
+// (either always camelCase for JavaScript/TypeScript frontends or always snake_case throughout)
+// to eliminate this complexity and potential for bugs. Once standardized, these helper
+// functions can be removed and replaced with direct field access.
+function getBaselineField(baseline: any, camelCase: string, snakeCase: string, defaultValue = ''): any {
+  return baseline[camelCase] ?? baseline[snakeCase] ?? defaultValue
+}
+
+// Helper to get metrics from baseline_metrics nested object
+function getBaselineMetric(baseline: any, metricName: string, defaultValue: number = 0): number {
+  // Try baseline_metrics (snake_case from API)
+  const metrics = baseline.baseline_metrics ?? baseline.baselineMetrics ?? {}
+
+  // Map common metric names to their possible field names
+  const metricMappings: Record<string, string[]> = {
+    'avgLatency': ['avg_latency_ms', 'avgLatencyMs', 'avg_latency'],
+    'successRate': ['success_rate', 'successRate'],
+    'errorRate': ['overall_error_rate', 'overallErrorRate', 'error_rate', 'errorRate'],
+    'p95Latency': ['p95_latency_ms', 'p95LatencyMs', 'p95_latency'],
+    'p99Latency': ['p99_latency_ms', 'p99LatencyMs', 'p99_latency'],
+  }
+
+  const possibleNames = metricMappings[metricName] || [metricName]
+
+  for (const name of possibleNames) {
+    if (metrics[name] !== undefined && metrics[name] !== null) {
+      return Number(metrics[name])
+    }
+  }
+
+  // Also check top-level fields (for HistoricRun objects used as baselines)
+  for (const name of possibleNames) {
+    if (baseline[name] !== undefined && baseline[name] !== null) {
+      return Number(baseline[name])
+    }
+  }
+
+  return defaultValue
+}


The extensive helper functions and comments (lines 44-85) are needed to handle inconsistent API responses with both camelCase and snake_case field names. The TODO comment on line 47 correctly identifies this as an API design issue that should be fixed. This workaround adds significant complexity and maintenance burden. The backend API should standardize on one naming convention to eliminate this complexity.

Suggested change

// Helper to safely get fields from baselines that might be HistoricRun or Baseline objects

// NOTE: These helpers handle both camelCase and snake_case field names due to API response

// inconsistencies. This is a defensive workaround that masks an underlying API design issue.

// TODO: The backend API should be standardized to use a consistent naming convention

// (either always camelCase for JavaScript/TypeScript frontends or always snake_case throughout)

// to eliminate this complexity and potential for bugs. Once standardized, these helper

// functions can be removed and replaced with direct field access.

function getBaselineField(baseline: any, camelCase: string, snakeCase: string, defaultValue = ''): any {

return baseline[camelCase] ?? baseline[snakeCase] ?? defaultValue

}

// Helper to get metrics from baseline_metrics nested object

function getBaselineMetric(baseline: any, metricName: string, defaultValue: number = 0): number {

// Try baseline_metrics (snake_case from API)

const metrics = baseline.baseline_metrics ?? baseline.baselineMetrics ?? {}

// Map common metric names to their possible field names

const metricMappings: Record<string, string[]> = {

'avgLatency': ['avg_latency_ms', 'avgLatencyMs', 'avg_latency'],

'successRate': ['success_rate', 'successRate'],

'errorRate': ['overall_error_rate', 'overallErrorRate', 'error_rate', 'errorRate'],

'p95Latency': ['p95_latency_ms', 'p95LatencyMs', 'p95_latency'],

'p99Latency': ['p99_latency_ms', 'p99LatencyMs', 'p99_latency'],

}

const possibleNames = metricMappings[metricName] || [metricName]

for (const name of possibleNames) {

if (metrics[name] !== undefined && metrics[name] !== null) {

return Number(metrics[name])

}

}

// Also check top-level fields (for HistoricRun objects used as baselines)

for (const name of possibleNames) {

if (baseline[name] !== undefined && baseline[name] !== null) {

return Number(baseline[name])

}

}

return defaultValue

}

Copilot · 2025-12-11T21:33:55Z

runner/metrics/results_collection.go

+						method.ErrorCount += failedCount
+						method.SuccessCount = method.Count - method.ErrorCount


The calculation uses method.Count to derive failedCount, but then recalculates SuccessCount as "Count - ErrorCount". This assumes that method.Count represents total requests. However, if method.Count was previously calculated excluding certain errors or if there's any filtering, this could lead to incorrect error counting. The comment on line 205-206 acknowledges this concern but doesn't resolve it. Consider adding validation or documentation about what method.Count should represent.

Copilot · 2025-12-11T21:33:56Z

runner/metrics/results_collection.go

+							fmt.Printf("Warning: Success count became negative for method %s (total: %d, errors: %d)\n",
+								methodName, method.Count, method.ErrorCount)
+							method.SuccessCount = 0


Warning message is printed directly to stdout using fmt.Printf instead of using a proper logging mechanism. This is inconsistent with error logging patterns used elsewhere in the codebase and makes it harder to control log levels or redirect output.

Copilot · 2025-12-11T21:33:56Z

dashboard/src/hooks/useDetailedMetrics.ts

+              // Log the error for debugging but also propagate it so the user knows aggregation failed
+              console.error('Failed to aggregate method metrics:', aggError)
+              // Set an empty array so the UI can handle the missing data gracefully
+              (run as any).method_metrics = []
+              // Re-throw to ensure the error is not silently ignored
+              throw new Error(`Method metrics aggregation failed: ${aggError instanceof Error ? aggError.message : String(aggError)}`)


The error is re-thrown after setting an empty array for method_metrics, which means the empty array set on line 113 will never be used because the exception propagates. Either remove the re-throw to allow graceful degradation with empty data, or remove the empty array assignment since it won't be used. The current code creates confusion about the intended error handling strategy.

Suggested change

// Log the error for debugging but also propagate it so the user knows aggregation failed

console.error('Failed to aggregate method metrics:', aggError)

// Set an empty array so the UI can handle the missing data gracefully

(run as any).method_metrics = []

// Re-throw to ensure the error is not silently ignored

throw new Error(`Method metrics aggregation failed: ${aggError instanceof Error ? aggError.message : String(aggError)}`)

// Log the error for debugging but allow graceful degradation with empty data

console.error('Failed to aggregate method metrics:', aggError)

// Set an empty array so the UI can handle the missing data gracefully

(run as any).method_metrics = []

// Do not re-throw; allow the function to continue and return partial data

Copilot · 2025-12-11T21:33:56Z

runner/storage/postgres.go

+		d.log.WithError(err).Error("Failed to marshal clients JSON")
+		return err
+	}
+	methodsJSON, err := json.Marshal(run.Methods)
+	if err != nil {
+		d.log.WithError(err).Error("Failed to marshal methods JSON")
+		return err
+	}
+	tagsJSON, err := json.Marshal(run.Tags)
+	if err != nil {
+		d.log.WithError(err).Error("Failed to marshal tags JSON")


Error handling added for JSON marshalling operations, but the errors are logged and then returned, which may result in duplicate error logging if the caller also logs the error. Consider whether both logging and returning is necessary, or if returning the error alone is sufficient for the caller to handle appropriately.

Suggested change

d.log.WithError(err).Error("Failed to marshal clients JSON")

return err

}

methodsJSON, err := json.Marshal(run.Methods)

if err != nil {

d.log.WithError(err).Error("Failed to marshal methods JSON")

return err

}

tagsJSON, err := json.Marshal(run.Tags)

if err != nil {

d.log.WithError(err).Error("Failed to marshal tags JSON")

return err

}

methodsJSON, err := json.Marshal(run.Methods)

if err != nil {

return err

}

tagsJSON, err := json.Marshal(run.Tags)

if err != nil {

cbermudez97 · 2025-12-15T15:17:14Z

@MysticRyuujin PR LGTM in general. I have approved but I see some Copilot comments that you might want to check. If they are okay then let me know to merge. 👍

cbermudez97 · 2025-12-15T15:25:57Z

About the points:

Baseline comparisons where there when I started making changes so not very familiar with them. However, iirc they were not working so any improvements is welcome. The idea is to have like a summary for each of the benchmarks runs so we can compare different versions/clients performance.
Yes, think thats correct and the connection pool metric should be removed.
I would just take a look at DB ones and other security suggestions for the queries. Besides those think the rest are not critical. However, maybe its just better to address them now instead of leaving them there forever. 😄

Copilot AI review requested due to automatic review settings December 11, 2025 21:20

Copilot started reviewing on behalf of MysticRyuujin December 11, 2025 21:20 View session

Copilot AI reviewed Dec 11, 2025

View reviewed changes

cbermudez97 approved these changes Dec 15, 2025

View reviewed changes

+	// Ensure git_branch and git_commit columns exist for legacy databases
+	alterColumnQueries := []string{
+		`ALTER TABLE baselines ADD COLUMN IF NOT EXISTS git_branch VARCHAR(255)`,
+		`ALTER TABLE baselines ADD COLUMN IF NOT EXISTS git_commit VARCHAR(255)`,
+	}
+	for _, q := range alterColumnQueries {
+		if _, err := bm.db.ExecContext(ctx, q); err != nil {
+			bm.log.WithError(err).Warnf("Failed to add column with query: %s", q)
+		}
+	}

		method.ErrorCount += failedCount
		method.SuccessCount = method.Count - method.ErrorCount

Conversation

MysticRyuujin commented Dec 11, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Key Changes

Reviewed changes

Uh oh!

Copilot AI Dec 11, 2025

Choose a reason for hiding this comment

Uh oh!

Copilot AI Dec 11, 2025

Choose a reason for hiding this comment

Uh oh!

Copilot AI Dec 11, 2025

Choose a reason for hiding this comment

Uh oh!

Copilot AI Dec 11, 2025

Choose a reason for hiding this comment

Uh oh!

Copilot AI Dec 11, 2025

Choose a reason for hiding this comment

Uh oh!

Copilot AI Dec 11, 2025

Choose a reason for hiding this comment

Uh oh!

Copilot AI Dec 11, 2025

Choose a reason for hiding this comment

Uh oh!

Copilot AI Dec 11, 2025

Choose a reason for hiding this comment

Uh oh!

Copilot AI Dec 11, 2025

Choose a reason for hiding this comment

Uh oh!

Copilot AI Dec 11, 2025

Choose a reason for hiding this comment

Uh oh!

cbermudez97 commented Dec 15, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

cbermudez97 commented Dec 15, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

MysticRyuujin commented Dec 11, 2025 •

edited

Loading

cbermudez97 commented Dec 15, 2025 •

edited

Loading