Skip to content
Change the repository type filter

All

    Repositories list

    • ML_course

      Public
      EPFL Machine Learning Course, Fall 2025
      Jupyter Notebook
      1k2k42Updated Nov 13, 2025Nov 13, 2025
    • disco

      Public
      DISCO is a code-free and installation-free browser platform that allows any non-technical user to collaboratively train machine learning models without sharing any private data.
      TypeScript
      301695910Updated Nov 13, 2025Nov 13, 2025
    • Official implementation of "Gradient-Normalized Smoothness for Optimization with Approximate Hessians"
      Jupyter Notebook
      0000Updated Nov 9, 2025Nov 9, 2025
    • Benchmarking Optimizers for LLM Pretraining
      Python
      14000Updated Nov 9, 2025Nov 9, 2025
    • nanoGPT-like codebase for LLM training
      Python
      3610933Updated Nov 7, 2025Nov 7, 2025
    • CoMiGS

      Public
      Python
      0100Updated Sep 24, 2025Sep 24, 2025
    • TiMoE

      Public
      A time aware language modeling framework
      Python
      0100Updated Aug 31, 2025Aug 31, 2025
    • Python
      162535Updated Jul 18, 2025Jul 18, 2025
    • EPFL Course - Optimization for Machine Learning - CS-439
      Jupyter Notebook
      3351.4k51Updated Jul 8, 2025Jul 8, 2025
    • Code for the paper "Enhancing Multilingual LLM Pretraining with Model-Based Data Selection"
      Python
      2300Updated May 16, 2025May 16, 2025
    • Code for NeurIPS 2024 Spotlight: "Scaling Laws and Compute-Optimal Training Beyond Fixed Training Durations"
      Python
      88510Updated Oct 30, 2024Oct 30, 2024
    • powersgd

      Public
      Practical low-rank gradient compression for distributed optimization: https://arxiv.org/abs/1905.13727
      Python
      3214811Updated Oct 29, 2024Oct 29, 2024
    • CoBo

      Public
      Python
      0000Updated Oct 22, 2024Oct 22, 2024
    • Exploration on-device self-supervised collaborative fine-tuning of large language models with limited local data availability, using Low-Rank Adaptation (LoRA). We introduce three distinct trust-weighted gradient aggregation schemes: weight similarity-based, prediction similarity-based and validation performance-based.
      Python
      0500Updated Sep 2, 2024Sep 2, 2024
    • SGD with compressed gradients and error-feedback: https://arxiv.org/abs/1901.09847
      Jupyter Notebook
      93122Updated Jul 25, 2024Jul 25, 2024
    • REQ

      Public
      Python
      01800Updated Jun 10, 2024Jun 10, 2024
    • CoTFormer

      Public
      Python
      0600Updated May 22, 2024May 22, 2024
    • Python
      0000Updated May 22, 2024May 22, 2024
    • Python
      16000Updated Apr 18, 2024Apr 18, 2024
    • Python
      108101Updated Apr 16, 2024Apr 16, 2024
    • DoGE

      Public
      Codebase for ICML submission "DOGE: Domain Reweighting with Generalization Estimation"
      4000Updated Feb 4, 2024Feb 4, 2024
    • Landmark Attention: Random-Access Infinite Context Length for Transformers
      Python
      3642680Updated Dec 20, 2023Dec 20, 2023
    • pam

      Public
      Python
      41600Updated Dec 9, 2023Dec 9, 2023
    • Python
      0400Updated Aug 18, 2023Aug 18, 2023
    • optML-pku

      Public
      summer school materials
      54600Updated Aug 4, 2023Aug 4, 2023
    • Code for Multi-Head Attention: Collaborate Instead of Concatenate
      Python
      2115161Updated Jun 12, 2023Jun 12, 2023
    • Jupyter Notebook
      614920Updated Jun 2, 2023Jun 2, 2023
    • difficulty-guided text summarization
      Python
      5500Updated May 22, 2023May 22, 2023
    • relaysgd

      Public
      Code for the paper “RelaySum for Decentralized Deep Learning on Heterogeneous Data”
      Jupyter Notebook
      3900Updated Apr 21, 2023Apr 21, 2023
    • Tools for experimentation and using run:ai. The aim is for these to be small self-contained utilities that are used by multiple people.
      Python
      0010Updated Mar 16, 2023Mar 16, 2023