Skip to content

Commit 0cbef03

Browse files
committed
Pushing the docs to dev/ for branch: main, commit 0b5f812a45eef59beb855c18f13ec46aa53be486
1 parent 246ede4 commit 0cbef03

File tree

1,612 files changed

+7146
-8706
lines changed

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

1,612 files changed

+7146
-8706
lines changed

dev/.buildinfo

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,4 +1,4 @@
11
# Sphinx build info version 1
22
# This file hashes the configuration used when building these files. When it is not found, a full rebuild will be done.
3-
config: 0f61c97b5bf4f7dca35f96af020ec233
3+
config: 8450bde4b8617db003deeb680cba6fea
44
tags: 645f666f9bcd5a90fca523b33c5a78b7
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.

dev/_downloads/7996e584c563a930d174772f44af2089/plot_validation_curve.ipynb

Lines changed: 0 additions & 43 deletions
This file was deleted.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.

dev/_downloads/b49810e68af99a01e25ba2dfc951b687/plot_train_error_vs_test_error.ipynb

Lines changed: 50 additions & 7 deletions
Original file line numberDiff line numberDiff line change
@@ -4,7 +4,7 @@
44
"cell_type": "markdown",
55
"metadata": {},
66
"source": [
7-
"\n# Train error vs Test error\n\nIllustration of how the performance of an estimator on unseen data (test data)\nis not the same as the performance on training data. As the regularization\nincreases the performance on train decreases while the performance on test\nis optimal within a range of values of the regularization parameter.\nThe example with an Elastic-Net regression model and the performance is\nmeasured using the explained variance a.k.a. R^2.\n"
7+
"\n# Effect of model regularization on training and test error\n\nIn this example, we evaluate the impact of the regularization parameter in a\nlinear model called :class:`~sklearn.linear_model.ElasticNet`. To carry out this\nevaluation, we use a validation curve using\n:class:`~sklearn.model_selection.ValidationCurveDisplay`. This curve shows the\ntraining and test scores of the model for different values of the regularization\nparameter.\n\nOnce we identify the optimal regularization parameter, we compare the true and\nestimated coefficients of the model to determine if the model is able to recover\nthe coefficients from the noisy input data.\n"
88
]
99
},
1010
{
@@ -22,7 +22,7 @@
2222
"cell_type": "markdown",
2323
"metadata": {},
2424
"source": [
25-
"## Generate sample data\n\n"
25+
"## Generate sample data\n\nWe generate a regression dataset that contains many features relative to the\nnumber of samples. However, only 10% of the features are informative. In this context,\nlinear models exposing L1 penalization are commonly used to recover a sparse\nset of coefficients.\n\n"
2626
]
2727
},
2828
{
@@ -33,14 +33,14 @@
3333
},
3434
"outputs": [],
3535
"source": [
36-
"import numpy as np\n\nfrom sklearn import linear_model\nfrom sklearn.datasets import make_regression\nfrom sklearn.model_selection import train_test_split\n\nn_samples_train, n_samples_test, n_features = 75, 150, 500\nX, y, coef = make_regression(\n n_samples=n_samples_train + n_samples_test,\n n_features=n_features,\n n_informative=50,\n shuffle=False,\n noise=1.0,\n coef=True,\n)\nX_train, X_test, y_train, y_test = train_test_split(\n X, y, train_size=n_samples_train, test_size=n_samples_test, shuffle=False\n)"
36+
"from sklearn.datasets import make_regression\nfrom sklearn.model_selection import train_test_split\n\nn_samples_train, n_samples_test, n_features = 150, 300, 500\nX, y, true_coef = make_regression(\n n_samples=n_samples_train + n_samples_test,\n n_features=n_features,\n n_informative=50,\n shuffle=False,\n noise=1.0,\n coef=True,\n random_state=42,\n)\nX_train, X_test, y_train, y_test = train_test_split(\n X, y, train_size=n_samples_train, test_size=n_samples_test, shuffle=False\n)"
3737
]
3838
},
3939
{
4040
"cell_type": "markdown",
4141
"metadata": {},
4242
"source": [
43-
"## Compute train and test errors\n\n"
43+
"## Model definition\n\nHere, we do not use a model that only exposes an L1 penalty. Instead, we use\nan :class:`~sklearn.linear_model.ElasticNet` model that exposes both L1 and L2\npenalties.\n\nWe fix the `l1_ratio` parameter such that the solution found by the model is still\nsparse. Therefore, this type of model tries to find a sparse solution but at the same\ntime also tries to shrink all coefficients towards zero.\n\nIn addition, we force the coefficients of the model to be positive since we know that\n`make_regression` generates a response with a positive signal. So we use this\npre-knowledge to get a better model.\n\n"
4444
]
4545
},
4646
{
@@ -51,14 +51,14 @@
5151
},
5252
"outputs": [],
5353
"source": [
54-
"alphas = np.logspace(-5, 1, 60)\nenet = linear_model.ElasticNet(l1_ratio=0.7, max_iter=10000)\ntrain_errors = list()\ntest_errors = list()\nfor alpha in alphas:\n enet.set_params(alpha=alpha)\n enet.fit(X_train, y_train)\n train_errors.append(enet.score(X_train, y_train))\n test_errors.append(enet.score(X_test, y_test))\n\ni_alpha_optim = np.argmax(test_errors)\nalpha_optim = alphas[i_alpha_optim]\nprint(\"Optimal regularization parameter : %s\" % alpha_optim)\n\n# Estimate the coef_ on full data with optimal regularization parameter\nenet.set_params(alpha=alpha_optim)\ncoef_ = enet.fit(X, y).coef_"
54+
"from sklearn.linear_model import ElasticNet\n\nenet = ElasticNet(l1_ratio=0.9, positive=True, max_iter=10_000)"
5555
]
5656
},
5757
{
5858
"cell_type": "markdown",
5959
"metadata": {},
6060
"source": [
61-
"## Plot results functions\n\n"
61+
"## Evaluate the impact of the regularization parameter\n\nTo evaluate the impact of the regularization parameter, we use a validation\ncurve. This curve shows the training and test scores of the model for different\nvalues of the regularization parameter.\n\nThe regularization `alpha` is a parameter applied to the coefficients of the model:\nwhen it tends to zero, no regularization is applied and the model tries to fit the\ntraining data with the least amount of error. However, it leads to overfitting when\nfeatures are noisy. When `alpha` increases, the model coefficients are constrained,\nand thus the model cannot fit the training data as closely, avoiding overfitting.\nHowever, if too much regularization is applied, the model underfits the data and\nis not able to properly capture the signal.\n\nThe validation curve helps in finding a good trade-off between both extremes: the\nmodel is not regularized and thus flexible enough to fit the signal, but not too\nflexible to overfit. The :class:`~sklearn.model_selection.ValidationCurveDisplay`\nallows us to display the training and validation scores across a range of alpha\nvalues.\n\n"
6262
]
6363
},
6464
{
@@ -69,7 +69,50 @@
6969
},
7070
"outputs": [],
7171
"source": [
72-
"import matplotlib.pyplot as plt\n\nplt.subplot(2, 1, 1)\nplt.semilogx(alphas, train_errors, label=\"Train\")\nplt.semilogx(alphas, test_errors, label=\"Test\")\nplt.vlines(\n alpha_optim,\n plt.ylim()[0],\n np.max(test_errors),\n color=\"k\",\n linewidth=3,\n label=\"Optimum on test\",\n)\nplt.legend(loc=\"lower right\")\nplt.ylim([0, 1.2])\nplt.xlabel(\"Regularization parameter\")\nplt.ylabel(\"Performance\")\n\n# Show estimated coef_ vs true coef\nplt.subplot(2, 1, 2)\nplt.plot(coef, label=\"True coef\")\nplt.plot(coef_, label=\"Estimated coef\")\nplt.legend()\nplt.subplots_adjust(0.09, 0.04, 0.94, 0.94, 0.26, 0.26)\nplt.show()"
72+
"import numpy as np\n\nfrom sklearn.model_selection import ValidationCurveDisplay\n\nalphas = np.logspace(-5, 1, 60)\ndisp = ValidationCurveDisplay.from_estimator(\n enet,\n X_train,\n y_train,\n param_name=\"alpha\",\n param_range=alphas,\n scoring=\"r2\",\n n_jobs=2,\n score_type=\"both\",\n)\ndisp.ax_.set(\n title=r\"Validation Curve for ElasticNet (R$^2$ Score)\",\n xlabel=r\"alpha (regularization strength)\",\n ylabel=\"R$^2$ Score\",\n)\n\ntest_scores_mean = disp.test_scores.mean(axis=1)\nidx_avg_max_test_score = np.argmax(test_scores_mean)\ndisp.ax_.vlines(\n alphas[idx_avg_max_test_score],\n disp.ax_.get_ylim()[0],\n test_scores_mean[idx_avg_max_test_score],\n color=\"k\",\n linewidth=2,\n linestyle=\"--\",\n label=f\"Optimum on test\\n$\\\\alpha$ = {alphas[idx_avg_max_test_score]:.2e}\",\n)\n_ = disp.ax_.legend(loc=\"lower right\")"
73+
]
74+
},
75+
{
76+
"cell_type": "markdown",
77+
"metadata": {},
78+
"source": [
79+
"To find the optimal regularization parameter, we can select the value of `alpha`\nthat maximizes the validation score.\n\n## Coefficients comparison\n\nNow that we have identified the optimal regularization parameter, we can compare the\ntrue coefficients and the estimated coefficients.\n\nFirst, let's set the regularization parameter to the optimal value and fit the\nmodel on the training data. In addition, we'll show the test score for this model.\n\n"
80+
]
81+
},
82+
{
83+
"cell_type": "code",
84+
"execution_count": null,
85+
"metadata": {
86+
"collapsed": false
87+
},
88+
"outputs": [],
89+
"source": [
90+
"enet.set_params(alpha=alphas[idx_avg_max_test_score]).fit(X_train, y_train)\nprint(\n f\"Test score: {enet.score(X_test, y_test):.3f}\",\n)"
91+
]
92+
},
93+
{
94+
"cell_type": "markdown",
95+
"metadata": {},
96+
"source": [
97+
"Now, we plot the true coefficients and the estimated coefficients.\n\n"
98+
]
99+
},
100+
{
101+
"cell_type": "code",
102+
"execution_count": null,
103+
"metadata": {
104+
"collapsed": false
105+
},
106+
"outputs": [],
107+
"source": [
108+
"import matplotlib.pyplot as plt\n\nfig, axs = plt.subplots(ncols=2, figsize=(12, 6), sharex=True, sharey=True)\nfor ax, coef, title in zip(axs, [true_coef, enet.coef_], [\"True\", \"Model\"]):\n ax.stem(coef)\n ax.set(\n title=f\"{title} Coefficients\",\n xlabel=\"Feature Index\",\n ylabel=\"Coefficient Value\",\n )\nfig.suptitle(\n \"Comparison of the coefficients of the true generative model and \\n\"\n \"the estimated elastic net coefficients\"\n)\n\nplt.show()"
109+
]
110+
},
111+
{
112+
"cell_type": "markdown",
113+
"metadata": {},
114+
"source": [
115+
"While the original coefficients are sparse, the estimated coefficients are not\nas sparse. The reason is that we fixed the `l1_ratio` parameter to 0.9. We could\nforce the model to get a sparser solution by increasing the `l1_ratio` parameter.\n\nHowever, we observed that for the estimated coefficients that are close to zero in\nthe true generative model, our model shrinks them towards zero. So we don't recover\nthe true coefficients, but we get a sensible outcome in line with the performance\nobtained on the test set.\n\n"
73116
]
74117
}
75118
],
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.

dev/_downloads/d7ef5ff0bffa701d573ebc3ef124729a/plot_validation_curve.py

Lines changed: 0 additions & 43 deletions
This file was deleted.
Binary file not shown.
Binary file not shown.

0 commit comments

Comments
 (0)