Skip to content
Change the repository type filter

All

    Repositories list

    • Python
      0000Updated Jan 25, 2026Jan 25, 2026
    • Python
      0410Updated Jan 19, 2026Jan 19, 2026
    • cve-bench

      Public
      CVE-Bench: A Benchmark for AI Agents’ Ability to Exploit Real-World Web Application Vulnerabilities
      Python
      2914201Updated Jan 14, 2026Jan 14, 2026
    • PilotDB

      Public
      Online AQP with A Priori Error Guarantees
      Python
      1500Updated Jan 12, 2026Jan 12, 2026
    • SkyRL

      Public
      SkyRL: A Modular Full-stack RL Library for LLMs
      Python
      240100Updated Dec 7, 2025Dec 7, 2025
    • drama

      Public
      [SIGMOD'2026] DRAMA: Unifying Data Retrieval and Analysis for Open-Domain Analytic Queries
      Python
      01110Updated Dec 6, 2025Dec 6, 2025
    • ELT-Bench

      Public
      Python
      62330Updated Dec 5, 2025Dec 5, 2025
    • TypeScript
      0000Updated Nov 30, 2025Nov 30, 2025
    • Post-training with Tinker
      Python
      308100Updated Nov 29, 2025Nov 29, 2025
    • Collection of evals for Inspect AI
      C++
      237000Updated Nov 17, 2025Nov 17, 2025
    • SWE-bench

      Public
      SWE-bench: Can Language Models Resolve Real-world Github Issues?
      Python
      745000Updated Nov 14, 2025Nov 14, 2025
    • zk-torch

      Public
      Rust
      83631Updated Nov 7, 2025Nov 7, 2025
    • BEAT

      Public
      Python
      1410Updated Nov 3, 2025Nov 3, 2025
    • Python
      2600Updated Nov 3, 2025Nov 3, 2025
    • leap

      Public
      [VLDB'2025] LEAP: LLM-powered End-to-end Automatic Library for Processing Social Science Queries on Unstructured Data
      Python
      01900Updated Nov 3, 2025Nov 3, 2025
    • Java
      0000Updated Aug 27, 2025Aug 27, 2025
    • Python
      0000Updated Aug 24, 2025Aug 24, 2025
    • Cuda
      64900Updated Jul 31, 2025Jul 31, 2025
    • Jupyter Notebook
      0000Updated Jul 20, 2025Jul 20, 2025
    • UTBoost

      Public
      Python
      0000Updated Jul 14, 2025Jul 14, 2025
    • tau-bench

      Public
      Code and Data for Tau-Bench
      Python
      180000Updated Jun 12, 2025Jun 12, 2025
    • rllm

      Public
      Democratizing Reinforcement Learning for LLMs
      Python
      492000Updated May 10, 2025May 10, 2025
    • aider

      Public
      aider is AI pair programming in your terminal
      Python
      3.9k000Updated May 6, 2025May 6, 2025
    • Async pipelined version of Verl
      Python
      13000Updated Apr 28, 2025Apr 28, 2025
    • AutoGPT

      Public
      AutoGPT is the vision of accessible AI for everyone, to use and to build on. Our mission is to provide the tools, so that you can focus on what matters.
      Python
      46k000Updated Apr 23, 2025Apr 23, 2025
    • Open source replication of Anthropic's Crosscoders for Model Diffing
      Jupyter Notebook
      25000Updated Apr 21, 2025Apr 21, 2025
    • This repo contains the dataset and code for the paper "SWE-Lancer: Can Frontier LLMs Earn $1 Million from Real-World Freelance Software Engineering?"
      Python
      140000Updated Apr 10, 2025Apr 10, 2025
    • An open science effort to benchmark legal reasoning in foundation models
      Python
      81000Updated Apr 5, 2025Apr 5, 2025
    • Sky-T1: Train your own O1 preview model within $450
      Python
      345000Updated Apr 3, 2025Apr 3, 2025
    • lex-glue

      Public
      LexGLUE: A Benchmark Dataset for Legal Language Understanding in English
      Python
      43000Updated Apr 3, 2025Apr 3, 2025