Skip to content

SYRAN (SYmbolic Regression for unsupervised ANomaly detection) learns an ensemble of human-readable equations that describe symbolic invariants: functions that are approximately constant on normal data.

Notifications You must be signed in to change notification settings

KDD-OpenSource/SYRAN

Repository files navigation

SYRAN – SYmbolic Regression for unsupervised ANomaly detection

File Overview

  • Model internals and objectives: syran_model.py
  • Training loops and experiment logic: syran_training.py
  • Evaluation: syran_evaluation.py
  • Experiment entry points: run_benchmark.py, run_toy_kepler.py

Installation and dependencies

This project is largely self-contained, but relies on a few standard Python libraries. We recommend setting up a virtual environment and installing the dependencies listed in environment.yaml:

conda env create -f environment.yaml
conda activate SYRAN

Specifically, you need:

  • Python 3.10+
  • numpy
  • scikit-learn
  • tqdm (used inside phase_search if applicable)

Data format

Anomaly detection benchmark

Each dataset is stored in data/<dataset>.npz with keys:

  • x: training data, shape (n_train, n_features)
  • tx: test data, shape (n_test, n_features)
  • ty: binary test labels, shape (n_test,)

Kepler toy example

The toy data is stored in toy_data/exoplanet_data.npz with key:

  • data: full dataset, shape (n_samples, 2), columns T and a.

Running experiments

Examples:

1. Anomaly detection benchmark (single dataset)

python run_benchmark.py \
  --dataset APima \
  --data_root data \
  --output_root results \
  --complexity_weight 0.1 \
  --loss_bound 1.0 \
  --chunk_size 2 \
  --num_chunks 20 \
  --max_phase_iterations 30000 \
  --seed 42

2. Kepler toy example

python run_toy_kepler.py \
  --data_path toy_data/exoplanet_data.npz \
  --output_root kepler_results \
  --complexity_weight 0.1 \
  --chunk_size 2 \
  --num_chunks 50 \
  --max_phase_iterations 100 \
  --seed 42

About

SYRAN (SYmbolic Regression for unsupervised ANomaly detection) learns an ensemble of human-readable equations that describe symbolic invariants: functions that are approximately constant on normal data.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages