Skip to content

⚙️ dashboard for observability + rerun option #10

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 21 commits into
base: main
Choose a base branch
from

Conversation

baptistecolle
Copy link
Collaborator

Changes

  • Added capability to rerun individual benchmarks

    • Modularized backend architecture to support selective benchmark reruns
    • Addresses cases where benchmarks produced erroneous results due to outdated dependencies
  • Implemented observability dashboard for LLM Performance Leaderboard

    • Provides monitoring of benchmark execution status
    • Tracks failed benchmark configurations to facilitate debugging
    • Enhances visibility into the overall health of the leaderboard system

Motivation

These changes improve the maintainability and reliability of the LLM Performance Leaderboard by enabling operators to:

  1. Quickly identify and rerun problematic benchmarks (previously old data would just stay in the leaderboard)
  2. Monitor benchmark execution status through a centralized dashboard
  3. Understand failed configurations better in order to find root cause of a bug

@baptistecolle baptistecolle added the all_benchmarks [CI] Requires and enables running all benchmark workflows label Feb 6, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
all_benchmarks [CI] Requires and enables running all benchmark workflows
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant