This project, developed for the Optimization for Data Science course at the University of Padua, addresses the Maximum Clique Problem (MCP) using first-order optimization algorithms.
The MCP is a well-known NP-hard problem that involves finding the largest complete subgraph (a clique) within a given undirected graph. To overcome the combinatorial nature of the problem, it has been reformulated as a continuous optimization problem.
The approach is based on the Motzkin-Straus formulation, which links the clique problem to a quadratic program. To ensure that the local maxima of the objective function correspond to actual maximal cliques, a regularization term has been added.
The goal is to solve the following optimization problem:
where:
-
$A$ is the adjacency matrix of the graph. -
$x$ is the vector of optimization variables. -
$\Delta$ is the standard simplex:$\Delta := {x \in \mathbb{R}^n \mid \sum x_i = 1, x_i \geq 0}$ . -
$\Phi(x)$ is the regularization function.
Two different regularization functions have been implemented and compared:
-
$L_2$ Regularization:$\Phi_{l2}(x) = \frac{\alpha}{2} |x|_2^2$ -
$L_0$ Approximation:$\Phi_{l0}(x) = \alpha \sum_{i=1}^n (\exp(-\beta x_i) - 1)$
.
├── .gitignore
├── README.md
├── final_project.ipynb
├── requirements.txt
└── results.csv
final_project.ipynb
: The Jupyter Notebook containing the full code implementation, experiments, and results visualization.requirements.txt
: The Python dependencies required to run the project.results.csv
: The raw data obtained from the experiments.
Ensure you have Python 3.x installed.
-
Clone the repository:
git clone https://github.com/your-username/repository-name.git cd Find-a-maximal-clique-with-optimization-algorithm
-
Create and activate a virtual environment (recommended):
python -m venv venv # On macOS/Linux: source venv/bin/activate # On Windows: .\venv\Scripts\activate
-
Install the dependencies:
pip install -r requirements.txt
The entire project can be executed through the Jupyter Notebook final_project.ipynb
.
- Start Jupyter Lab or Jupyter Notebook from the project's root directory:
jupyter lab
- Open the
final_project.ipynb
file. - Run the cells in order to reproduce the entire workflow:
- Loading libraries and utility functions.
- Defining the algorithms and step-size strategies.
- Running experiments on benchmark graphs.
- Generating analysis tables and plots.
Four first-order optimization algorithms were implemented, each tested with three different step-size strategies.
- Projected Gradient Descent (PGD): Performs a gradient step followed by a projection onto the simplex.
- Frank-Wolfe (FW): A projection-free method that optimizes a linear approximation of the objective function.
- Pairwise Frank-Wolfe (PFW): A variant that moves "mass" between the best and worst vertices, accelerating convergence.
- Away-step Frank-Wolfe (AFW): Improves PFW by dynamically choosing between a standard (FW) direction and an "away" direction.
- Optimal (Exact Line Search): Computes the optimal step-size at each iteration.
- Armijo: A backtracking rule that ensures sufficient progress.
- Fixed: Uses a constant, predefined step-size.
The algorithms were evaluated on benchmark graphs from the DIMACS challenge. The analysis yielded the following insights:
-
Best Performance: The combination of Projected Gradient Descent (PGD) with
$L_0$ regularization and a fixed step-size proved to be the most effective, finding the largest cliques on average. -
Impact of Regularization: The
$L_0$ regularization showed a clear superiority, better guiding the algorithms toward sparse solutions that represent cliques. - Computational Efficiency: Fixed step-size strategies were the fastest. Exact line search, while theoretically powerful, was too slow for practical use with PGD.
- Robustness: The Frank-Wolfe algorithms (FW and AFW) proved to be more stable and less sensitive to the choice of step-size compared to PGD and PFW.
In conclusion, the project highlighted the strong synergy between the PGD algorithm and a regularization function that accurately models the combinatorial nature of the problem, demonstrating the effectiveness of continuous optimization for solving the maximum clique problem.