This code comes jointly with reference:
Andrei Semenov, Martin Jaggi, Nikita Doikov.
Date: June 2025
Abstract:
In this work, we develop new optimization algorithms that use approximate second-order information combined with the gradient regularization technique to achieve fast global convergence rates for both convex and non-convex objectives. The key innovation of our analysis is a novel notion called Gradient-Normalized Smoothness, which characterizes the maximum radius of a ball around the current point that yields a good relative approximation of the gradient field. Our theory establishes a natural intrinsic connection between Hessian approximation and the linearization of the gradient. Importantly, Gradient-Normalized Smoothness does not depend on the specific problem class of the objective functions, while effectively translating local information about the gradient field and Hessian approximation into the global behavior of the method. This new concept equips approximate second-order algorithms with universal global convergence guarantees, recovering state-of-the-art rates for functions with Hölder-continuous Hessians and third derivatives, quasi-self-concordant functions, as well as smooth classes in first-order optimization. These rates are achieved automatically and extend to broader classes, such as generalized self-concordant functions. We demonstrate direct applications of our results for global linear rates in logistic regression and softmax problems with approximate Hessians, as well as in non-convex optimization using Fisher and Gauss-Newton approximations.
src/
methods.py # Algorithm 1 from the paper, algorithms with other adaptive search schemes, gradient methods
oracles.py # LogSumExp, Nonlinear Equations with linear operator and Chebyshev polynomials, Rosenbrock function, etc.
approximations.py # code for Hessian approximations for different oracles
utils.py # code for plotting graphs
data/
mushrooms.txt # example of a dataset; you can add here more
notebooks/
examples.ipynb # examples of approximations and comparison of methodsSimply run the examples.ipynb notebook.
At the beginning of the notebook, we provide practical approximations for each oracle.
All of them are compatible with our theory.
In particular, we investigated the following approximations.
| Problem | Naming in the paper | Approximation | Code reference in src/approximations.py
|
|---|---|---|---|
| LogSumExp | Weighted Gauss-Newton | approx_hess_fn_logsumexp |
|
| Equations with linear operator | Fisher Term of |
approx_hess_fn_fisher_term |
|
| Nonlinear Equations & Rosenbrock | Inexact Hessian | approx_hess_nonlinear_equations |
|
| Nonlinear Equations & Chebyshev polynomials | Inexact Hessian | approx_hess_fn_chebyshev |
You can also use a fast implementation of our algorithm, which corresponds to grad_norm_smooth_for_rank_one function in examples.py.
Thus, you could obtain the following nice examples:
We believe the details provided are clear enough to reproduce the main findings of our paper.
Please do not hesitate to reach out to us if you have questions.
@article{semenov2025gradientnormalizedsmoothness,
title={Gradient-{N}ormalized {S}moothness for {O}ptimization with {A}pproximate {H}essians},
author={Semenov, Andrei and Jaggi, Martin and Doikov, Nikita},
journal={arXiv preprint arXiv:2506.13710},
url={https://arxiv.org/abs/2506.13710},
year={2025}
}
