Skip to content

Commit 2bcd1a8

Browse files
committed
Boosting and pickle model completed
1 parent daab459 commit 2bcd1a8

File tree

5 files changed

+331
-0
lines changed

5 files changed

+331
-0
lines changed
Lines changed: 84 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,84 @@
1+
{
2+
"cells": [
3+
{
4+
"cell_type": "markdown",
5+
"metadata": {},
6+
"source": [
7+
"## Boosting: Hyperparameters\n",
8+
"\n",
9+
"Import [`GradientBoostingClassifier`](https://scikit-learn.org/stable/modules/generated/sklearn.ensemble.GradientBoostingClassifier.html) and [`GradientBoostingRegressor`](https://scikit-learn.org/stable/modules/generated/sklearn.ensemble.GradientBoostingRegressor.html) from `sklearn` and explore the hyperparameters."
10+
]
11+
},
12+
{
13+
"cell_type": "markdown",
14+
"metadata": {},
15+
"source": [
16+
"### Import Boosting Algorithm for Classification & Regression"
17+
]
18+
},
19+
{
20+
"cell_type": "code",
21+
"execution_count": 1,
22+
"metadata": {},
23+
"outputs": [
24+
{
25+
"name": "stdout",
26+
"output_type": "stream",
27+
"text": [
28+
"<class 'sklearn.ensemble._gb.GradientBoostingClassifier'>\n"
29+
]
30+
}
31+
],
32+
"source": [
33+
"from sklearn.ensemble import GradientBoostingClassifier, GradientBoostingRegressor\n",
34+
"\n",
35+
"print(GradientBoostingClassifier)"
36+
]
37+
},
38+
{
39+
"cell_type": "code",
40+
"execution_count": 2,
41+
"metadata": {},
42+
"outputs": [
43+
{
44+
"name": "stdout",
45+
"output_type": "stream",
46+
"text": [
47+
"<class 'sklearn.ensemble._gb.GradientBoostingRegressor'>\n"
48+
]
49+
}
50+
],
51+
"source": [
52+
"print(GradientBoostingRegressor)"
53+
]
54+
},
55+
{
56+
"cell_type": "code",
57+
"execution_count": null,
58+
"metadata": {},
59+
"outputs": [],
60+
"source": []
61+
}
62+
],
63+
"metadata": {
64+
"kernelspec": {
65+
"display_name": "Python 3",
66+
"language": "python",
67+
"name": "python3"
68+
},
69+
"language_info": {
70+
"codemirror_mode": {
71+
"name": "ipython",
72+
"version": 3
73+
},
74+
"file_extension": ".py",
75+
"mimetype": "text/x-python",
76+
"name": "python",
77+
"nbconvert_exporter": "python",
78+
"pygments_lexer": "ipython3",
79+
"version": "3.8.3"
80+
}
81+
},
82+
"nbformat": 4,
83+
"nbformat_minor": 2
84+
}
Lines changed: 247 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,247 @@
1+
{
2+
"cells": [
3+
{
4+
"cell_type": "markdown",
5+
"metadata": {},
6+
"source": [
7+
"## Boosting: Fit and evaluate a model\n",
8+
"\n",
9+
"Using the Titanic dataset from [this](https://www.kaggle.com/c/titanic/overview) Kaggle competition.\n",
10+
"\n",
11+
"In this section, we will fit and evaluate a simple Gradient Boosting model."
12+
]
13+
},
14+
{
15+
"cell_type": "markdown",
16+
"metadata": {},
17+
"source": [
18+
"### Read in Data"
19+
]
20+
},
21+
{
22+
"cell_type": "code",
23+
"execution_count": 1,
24+
"metadata": {},
25+
"outputs": [],
26+
"source": [
27+
"import joblib\n",
28+
"import pandas as pd\n",
29+
"from sklearn.ensemble import GradientBoostingClassifier\n",
30+
"from sklearn.model_selection import GridSearchCV\n",
31+
"\n",
32+
"import warnings\n",
33+
"warnings.filterwarnings('ignore', category=FutureWarning)\n",
34+
"warnings.filterwarnings('ignore', category=DeprecationWarning)\n",
35+
"\n",
36+
"train_features = pd.read_csv('../Data/train_features.csv')\n",
37+
"train_labels = pd.read_csv('../Data/train_labels.csv', header=None)"
38+
]
39+
},
40+
{
41+
"cell_type": "markdown",
42+
"metadata": {},
43+
"source": [
44+
"### Hyperparameter tuning\n",
45+
"\n",
46+
"![GB](img/gb.png)"
47+
]
48+
},
49+
{
50+
"cell_type": "code",
51+
"execution_count": 2,
52+
"metadata": {},
53+
"outputs": [],
54+
"source": [
55+
"def print_results(results):\n",
56+
" print('BEST PARAMS: {}\\n'.format(results.best_params_))\n",
57+
" \n",
58+
" means = results.cv_results_['mean_test_score']\n",
59+
" stds = results.cv_results_['std_test_score']\n",
60+
" for mean, std, params in zip(means, stds, results.cv_results_['params']):\n",
61+
" print('{} (+/-{}) for {}'.format(round(mean,3), round(std *2, 3), params))"
62+
]
63+
},
64+
{
65+
"cell_type": "code",
66+
"execution_count": 4,
67+
"metadata": {},
68+
"outputs": [
69+
{
70+
"name": "stdout",
71+
"output_type": "stream",
72+
"text": [
73+
"BEST PARAMS: {'learning_rate': 0.01, 'max_depth': 3, 'n_estimators': 500}\n",
74+
"0.624 (+/-0.007) for {'learning_rate': 0.01, 'max_depth': 1, 'n_estimators': 5}\n",
75+
"0.796 (+/-0.115) for {'learning_rate': 0.01, 'max_depth': 1, 'n_estimators': 50}\n",
76+
"0.796 (+/-0.115) for {'learning_rate': 0.01, 'max_depth': 1, 'n_estimators': 250}\n",
77+
"0.811 (+/-0.117) for {'learning_rate': 0.01, 'max_depth': 1, 'n_estimators': 500}\n",
78+
"0.624 (+/-0.007) for {'learning_rate': 0.01, 'max_depth': 3, 'n_estimators': 5}\n",
79+
"0.811 (+/-0.069) for {'learning_rate': 0.01, 'max_depth': 3, 'n_estimators': 50}\n",
80+
"0.83 (+/-0.074) for {'learning_rate': 0.01, 'max_depth': 3, 'n_estimators': 250}\n",
81+
"0.841 (+/-0.077) for {'learning_rate': 0.01, 'max_depth': 3, 'n_estimators': 500}\n",
82+
"0.624 (+/-0.007) for {'learning_rate': 0.01, 'max_depth': 5, 'n_estimators': 5}\n",
83+
"0.822 (+/-0.052) for {'learning_rate': 0.01, 'max_depth': 5, 'n_estimators': 50}\n",
84+
"0.818 (+/-0.043) for {'learning_rate': 0.01, 'max_depth': 5, 'n_estimators': 250}\n",
85+
"0.828 (+/-0.047) for {'learning_rate': 0.01, 'max_depth': 5, 'n_estimators': 500}\n",
86+
"0.624 (+/-0.007) for {'learning_rate': 0.01, 'max_depth': 7, 'n_estimators': 5}\n",
87+
"0.817 (+/-0.049) for {'learning_rate': 0.01, 'max_depth': 7, 'n_estimators': 50}\n",
88+
"0.822 (+/-0.039) for {'learning_rate': 0.01, 'max_depth': 7, 'n_estimators': 250}\n",
89+
"0.8 (+/-0.028) for {'learning_rate': 0.01, 'max_depth': 7, 'n_estimators': 500}\n",
90+
"0.624 (+/-0.007) for {'learning_rate': 0.01, 'max_depth': 9, 'n_estimators': 5}\n",
91+
"0.803 (+/-0.059) for {'learning_rate': 0.01, 'max_depth': 9, 'n_estimators': 50}\n",
92+
"0.8 (+/-0.042) for {'learning_rate': 0.01, 'max_depth': 9, 'n_estimators': 250}\n",
93+
"0.79 (+/-0.047) for {'learning_rate': 0.01, 'max_depth': 9, 'n_estimators': 500}\n",
94+
"0.796 (+/-0.115) for {'learning_rate': 0.1, 'max_depth': 1, 'n_estimators': 5}\n",
95+
"0.815 (+/-0.119) for {'learning_rate': 0.1, 'max_depth': 1, 'n_estimators': 50}\n",
96+
"0.818 (+/-0.111) for {'learning_rate': 0.1, 'max_depth': 1, 'n_estimators': 250}\n",
97+
"0.828 (+/-0.092) for {'learning_rate': 0.1, 'max_depth': 1, 'n_estimators': 500}\n",
98+
"0.813 (+/-0.071) for {'learning_rate': 0.1, 'max_depth': 3, 'n_estimators': 5}\n",
99+
"0.841 (+/-0.07) for {'learning_rate': 0.1, 'max_depth': 3, 'n_estimators': 50}\n",
100+
"0.83 (+/-0.039) for {'learning_rate': 0.1, 'max_depth': 3, 'n_estimators': 250}\n",
101+
"0.811 (+/-0.036) for {'learning_rate': 0.1, 'max_depth': 3, 'n_estimators': 500}\n",
102+
"0.813 (+/-0.051) for {'learning_rate': 0.1, 'max_depth': 5, 'n_estimators': 5}\n",
103+
"0.824 (+/-0.039) for {'learning_rate': 0.1, 'max_depth': 5, 'n_estimators': 50}\n",
104+
"0.809 (+/-0.032) for {'learning_rate': 0.1, 'max_depth': 5, 'n_estimators': 250}\n",
105+
"0.803 (+/-0.039) for {'learning_rate': 0.1, 'max_depth': 5, 'n_estimators': 500}\n",
106+
"0.817 (+/-0.047) for {'learning_rate': 0.1, 'max_depth': 7, 'n_estimators': 5}\n",
107+
"0.796 (+/-0.014) for {'learning_rate': 0.1, 'max_depth': 7, 'n_estimators': 50}\n",
108+
"0.796 (+/-0.032) for {'learning_rate': 0.1, 'max_depth': 7, 'n_estimators': 250}\n",
109+
"0.798 (+/-0.05) for {'learning_rate': 0.1, 'max_depth': 7, 'n_estimators': 500}\n",
110+
"0.794 (+/-0.039) for {'learning_rate': 0.1, 'max_depth': 9, 'n_estimators': 5}\n",
111+
"0.792 (+/-0.031) for {'learning_rate': 0.1, 'max_depth': 9, 'n_estimators': 50}\n",
112+
"0.788 (+/-0.043) for {'learning_rate': 0.1, 'max_depth': 9, 'n_estimators': 250}\n",
113+
"0.794 (+/-0.053) for {'learning_rate': 0.1, 'max_depth': 9, 'n_estimators': 500}\n",
114+
"0.818 (+/-0.099) for {'learning_rate': 1, 'max_depth': 1, 'n_estimators': 5}\n",
115+
"0.832 (+/-0.081) for {'learning_rate': 1, 'max_depth': 1, 'n_estimators': 50}\n",
116+
"0.826 (+/-0.077) for {'learning_rate': 1, 'max_depth': 1, 'n_estimators': 250}\n",
117+
"0.822 (+/-0.081) for {'learning_rate': 1, 'max_depth': 1, 'n_estimators': 500}\n",
118+
"0.82 (+/-0.061) for {'learning_rate': 1, 'max_depth': 3, 'n_estimators': 5}\n",
119+
"0.8 (+/-0.024) for {'learning_rate': 1, 'max_depth': 3, 'n_estimators': 50}\n",
120+
"0.785 (+/-0.037) for {'learning_rate': 1, 'max_depth': 3, 'n_estimators': 250}\n",
121+
"0.79 (+/-0.03) for {'learning_rate': 1, 'max_depth': 3, 'n_estimators': 500}\n",
122+
"0.79 (+/-0.032) for {'learning_rate': 1, 'max_depth': 5, 'n_estimators': 5}\n",
123+
"0.781 (+/-0.034) for {'learning_rate': 1, 'max_depth': 5, 'n_estimators': 50}\n",
124+
"0.796 (+/-0.025) for {'learning_rate': 1, 'max_depth': 5, 'n_estimators': 250}\n",
125+
"0.794 (+/-0.021) for {'learning_rate': 1, 'max_depth': 5, 'n_estimators': 500}\n",
126+
"0.796 (+/-0.042) for {'learning_rate': 1, 'max_depth': 7, 'n_estimators': 5}\n",
127+
"0.796 (+/-0.031) for {'learning_rate': 1, 'max_depth': 7, 'n_estimators': 50}\n",
128+
"0.786 (+/-0.047) for {'learning_rate': 1, 'max_depth': 7, 'n_estimators': 250}\n",
129+
"0.796 (+/-0.041) for {'learning_rate': 1, 'max_depth': 7, 'n_estimators': 500}\n",
130+
"0.783 (+/-0.022) for {'learning_rate': 1, 'max_depth': 9, 'n_estimators': 5}\n",
131+
"0.796 (+/-0.055) for {'learning_rate': 1, 'max_depth': 9, 'n_estimators': 50}\n",
132+
"0.801 (+/-0.046) for {'learning_rate': 1, 'max_depth': 9, 'n_estimators': 250}\n",
133+
"0.79 (+/-0.034) for {'learning_rate': 1, 'max_depth': 9, 'n_estimators': 500}\n",
134+
"0.204 (+/-0.115) for {'learning_rate': 10, 'max_depth': 1, 'n_estimators': 5}\n",
135+
"0.204 (+/-0.115) for {'learning_rate': 10, 'max_depth': 1, 'n_estimators': 50}\n",
136+
"0.204 (+/-0.115) for {'learning_rate': 10, 'max_depth': 1, 'n_estimators': 250}\n",
137+
"0.204 (+/-0.115) for {'learning_rate': 10, 'max_depth': 1, 'n_estimators': 500}\n",
138+
"0.307 (+/-0.195) for {'learning_rate': 10, 'max_depth': 3, 'n_estimators': 5}\n",
139+
"0.307 (+/-0.195) for {'learning_rate': 10, 'max_depth': 3, 'n_estimators': 50}\n",
140+
"0.307 (+/-0.195) for {'learning_rate': 10, 'max_depth': 3, 'n_estimators': 250}\n",
141+
"0.307 (+/-0.195) for {'learning_rate': 10, 'max_depth': 3, 'n_estimators': 500}\n",
142+
"0.414 (+/-0.258) for {'learning_rate': 10, 'max_depth': 5, 'n_estimators': 5}\n",
143+
"0.389 (+/-0.181) for {'learning_rate': 10, 'max_depth': 5, 'n_estimators': 50}\n",
144+
"0.386 (+/-0.171) for {'learning_rate': 10, 'max_depth': 5, 'n_estimators': 250}\n",
145+
"0.417 (+/-0.271) for {'learning_rate': 10, 'max_depth': 5, 'n_estimators': 500}\n",
146+
"0.58 (+/-0.186) for {'learning_rate': 10, 'max_depth': 7, 'n_estimators': 5}\n",
147+
"0.609 (+/-0.194) for {'learning_rate': 10, 'max_depth': 7, 'n_estimators': 50}\n",
148+
"0.538 (+/-0.171) for {'learning_rate': 10, 'max_depth': 7, 'n_estimators': 250}\n",
149+
"0.603 (+/-0.187) for {'learning_rate': 10, 'max_depth': 7, 'n_estimators': 500}\n",
150+
"0.695 (+/-0.124) for {'learning_rate': 10, 'max_depth': 9, 'n_estimators': 5}\n",
151+
"0.674 (+/-0.102) for {'learning_rate': 10, 'max_depth': 9, 'n_estimators': 50}\n",
152+
"0.715 (+/-0.12) for {'learning_rate': 10, 'max_depth': 9, 'n_estimators': 250}\n",
153+
"0.689 (+/-0.107) for {'learning_rate': 10, 'max_depth': 9, 'n_estimators': 500}\n",
154+
"0.376 (+/-0.007) for {'learning_rate': 100, 'max_depth': 1, 'n_estimators': 5}\n",
155+
"0.376 (+/-0.007) for {'learning_rate': 100, 'max_depth': 1, 'n_estimators': 50}\n",
156+
"0.376 (+/-0.007) for {'learning_rate': 100, 'max_depth': 1, 'n_estimators': 250}\n",
157+
"0.376 (+/-0.007) for {'learning_rate': 100, 'max_depth': 1, 'n_estimators': 500}\n",
158+
"0.29 (+/-0.102) for {'learning_rate': 100, 'max_depth': 3, 'n_estimators': 5}\n",
159+
"0.29 (+/-0.102) for {'learning_rate': 100, 'max_depth': 3, 'n_estimators': 50}\n",
160+
"0.29 (+/-0.102) for {'learning_rate': 100, 'max_depth': 3, 'n_estimators': 250}\n",
161+
"0.29 (+/-0.102) for {'learning_rate': 100, 'max_depth': 3, 'n_estimators': 500}\n",
162+
"0.365 (+/-0.201) for {'learning_rate': 100, 'max_depth': 5, 'n_estimators': 5}\n",
163+
"0.356 (+/-0.189) for {'learning_rate': 100, 'max_depth': 5, 'n_estimators': 50}\n",
164+
"0.356 (+/-0.189) for {'learning_rate': 100, 'max_depth': 5, 'n_estimators': 250}\n",
165+
"0.359 (+/-0.19) for {'learning_rate': 100, 'max_depth': 5, 'n_estimators': 500}\n",
166+
"0.592 (+/-0.082) for {'learning_rate': 100, 'max_depth': 7, 'n_estimators': 5}\n",
167+
"0.575 (+/-0.095) for {'learning_rate': 100, 'max_depth': 7, 'n_estimators': 50}\n",
168+
"0.569 (+/-0.097) for {'learning_rate': 100, 'max_depth': 7, 'n_estimators': 250}\n",
169+
"0.582 (+/-0.092) for {'learning_rate': 100, 'max_depth': 7, 'n_estimators': 500}\n",
170+
"0.678 (+/-0.107) for {'learning_rate': 100, 'max_depth': 9, 'n_estimators': 5}\n",
171+
"0.665 (+/-0.13) for {'learning_rate': 100, 'max_depth': 9, 'n_estimators': 50}\n",
172+
"0.667 (+/-0.096) for {'learning_rate': 100, 'max_depth': 9, 'n_estimators': 250}\n",
173+
"0.691 (+/-0.075) for {'learning_rate': 100, 'max_depth': 9, 'n_estimators': 500}\n"
174+
]
175+
}
176+
],
177+
"source": [
178+
"gb = GradientBoostingClassifier()\n",
179+
"parameters = {\n",
180+
" 'n_estimators' : [5, 50, 250, 500],\n",
181+
" 'max_depth': [1, 3, 5, 7, 9],\n",
182+
" 'learning_rate': [0.01, 0.1, 1, 10, 100]\n",
183+
"}\n",
184+
"\n",
185+
"cv = GridSearchCV(gb, parameters, cv=5)\n",
186+
"cv.fit(train_features, train_labels.values.ravel())\n",
187+
"\n",
188+
"print_results(cv)"
189+
]
190+
},
191+
{
192+
"cell_type": "markdown",
193+
"metadata": {},
194+
"source": [
195+
"### Write out pickled model"
196+
]
197+
},
198+
{
199+
"cell_type": "code",
200+
"execution_count": 5,
201+
"metadata": {},
202+
"outputs": [
203+
{
204+
"data": {
205+
"text/plain": [
206+
"['../Pickled_Models/GB_model.pkl']"
207+
]
208+
},
209+
"execution_count": 5,
210+
"metadata": {},
211+
"output_type": "execute_result"
212+
}
213+
],
214+
"source": [
215+
"joblib.dump(cv.best_estimator_, '../Pickled_Models/GB_model.pkl')"
216+
]
217+
},
218+
{
219+
"cell_type": "code",
220+
"execution_count": null,
221+
"metadata": {},
222+
"outputs": [],
223+
"source": []
224+
}
225+
],
226+
"metadata": {
227+
"kernelspec": {
228+
"display_name": "Python 3",
229+
"language": "python",
230+
"name": "python3"
231+
},
232+
"language_info": {
233+
"codemirror_mode": {
234+
"name": "ipython",
235+
"version": 3
236+
},
237+
"file_extension": ".py",
238+
"mimetype": "text/x-python",
239+
"name": "python",
240+
"nbconvert_exporter": "python",
241+
"pygments_lexer": "ipython3",
242+
"version": "3.8.3"
243+
}
244+
},
245+
"nbformat": 4,
246+
"nbformat_minor": 2
247+
}
Loading
Loading
Binary file not shown.

0 commit comments

Comments
 (0)