Skip to content

Commit 3c92fc6

Browse files
committed
Random Forest module completed
1 parent 51f5525 commit 3c92fc6

File tree

4 files changed

+249
-0
lines changed

4 files changed

+249
-0
lines changed
Lines changed: 84 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,84 @@
1+
{
2+
"cells": [
3+
{
4+
"cell_type": "markdown",
5+
"metadata": {},
6+
"source": [
7+
"## Random Forest: Hyperparameters\n",
8+
"\n",
9+
"Import [`RandomForestClassifier`](https://scikit-learn.org/stable/modules/generated/sklearn.ensemble.RandomForestClassifier.html) and [`RandomForestRegressor`](https://scikit-learn.org/stable/modules/generated/sklearn.ensemble.RandomForestRegressor.html) from `sklearn` and explore the hyperparameters."
10+
]
11+
},
12+
{
13+
"cell_type": "markdown",
14+
"metadata": {},
15+
"source": [
16+
"### Import Random Forest Algorithm for Classification & Regression"
17+
]
18+
},
19+
{
20+
"cell_type": "code",
21+
"execution_count": 2,
22+
"metadata": {},
23+
"outputs": [
24+
{
25+
"name": "stdout",
26+
"output_type": "stream",
27+
"text": [
28+
"RandomForestClassifier()\n"
29+
]
30+
}
31+
],
32+
"source": [
33+
"from sklearn.ensemble import RandomForestClassifier, RandomForestRegressor\n",
34+
"\n",
35+
"print(RandomForestClassifier())"
36+
]
37+
},
38+
{
39+
"cell_type": "code",
40+
"execution_count": 3,
41+
"metadata": {},
42+
"outputs": [
43+
{
44+
"name": "stdout",
45+
"output_type": "stream",
46+
"text": [
47+
"RandomForestRegressor()\n"
48+
]
49+
}
50+
],
51+
"source": [
52+
"print(RandomForestRegressor())"
53+
]
54+
},
55+
{
56+
"cell_type": "code",
57+
"execution_count": null,
58+
"metadata": {},
59+
"outputs": [],
60+
"source": []
61+
}
62+
],
63+
"metadata": {
64+
"kernelspec": {
65+
"display_name": "Python 3",
66+
"language": "python",
67+
"name": "python3"
68+
},
69+
"language_info": {
70+
"codemirror_mode": {
71+
"name": "ipython",
72+
"version": 3
73+
},
74+
"file_extension": ".py",
75+
"mimetype": "text/x-python",
76+
"name": "python",
77+
"nbconvert_exporter": "python",
78+
"pygments_lexer": "ipython3",
79+
"version": "3.8.3"
80+
}
81+
},
82+
"nbformat": 4,
83+
"nbformat_minor": 2
84+
}
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,165 @@
1+
{
2+
"cells": [
3+
{
4+
"cell_type": "markdown",
5+
"metadata": {},
6+
"source": [
7+
"## Random Forest: Fit and evaluate a model\n",
8+
"\n",
9+
"Using the Titanic dataset from [this](https://www.kaggle.com/c/titanic/overview) Kaggle competition.\n",
10+
"\n",
11+
"In this section, we will fit and evaluate a simple Random Forest model."
12+
]
13+
},
14+
{
15+
"cell_type": "markdown",
16+
"metadata": {},
17+
"source": [
18+
"### Read in Data"
19+
]
20+
},
21+
{
22+
"cell_type": "code",
23+
"execution_count": 4,
24+
"metadata": {},
25+
"outputs": [],
26+
"source": [
27+
"import joblib\n",
28+
"import pandas as pd\n",
29+
"from sklearn.ensemble import RandomForestClassifier\n",
30+
"from sklearn.model_selection import GridSearchCV\n",
31+
"\n",
32+
"import warnings\n",
33+
"warnings.filterwarnings('ignore', category=FutureWarning)\n",
34+
"warnings.filterwarnings('ignore', category=DeprecationWarning)\n",
35+
"\n",
36+
"train_features = pd.read_csv('../Data/train_features.csv')\n",
37+
"train_labels = pd.read_csv('../Data/train_labels.csv', header=None)"
38+
]
39+
},
40+
{
41+
"cell_type": "markdown",
42+
"metadata": {},
43+
"source": [
44+
"### Hyperparameter tuning\n",
45+
"\n",
46+
"![RF](img/rf.png)"
47+
]
48+
},
49+
{
50+
"cell_type": "code",
51+
"execution_count": 5,
52+
"metadata": {},
53+
"outputs": [],
54+
"source": [
55+
"def print_results(results):\n",
56+
" print('BEST PARAMS: {}\\n'.format(results.best_params_))\n",
57+
" \n",
58+
" means = results.cv_results_['mean_test_score'] \n",
59+
" stds = results.cv_results_['std_test_score']\n",
60+
" for mean, std, params in zip(means, stds, results.cv_results_['params']):\n",
61+
" print('{} (+-{}) for {}'.format(round(mean,3), round(std * 2, 3), params))"
62+
]
63+
},
64+
{
65+
"cell_type": "code",
66+
"execution_count": 6,
67+
"metadata": {},
68+
"outputs": [
69+
{
70+
"name": "stdout",
71+
"output_type": "stream",
72+
"text": [
73+
"BEST PARAMS: {'max_depth': 4, 'n_estimators': 250}\n",
74+
"\n",
75+
"0.77 (+-0.166) for {'max_depth': 2, 'n_estimators': 5}\n",
76+
"0.805 (+-0.101) for {'max_depth': 2, 'n_estimators': 50}\n",
77+
"0.8 (+-0.107) for {'max_depth': 2, 'n_estimators': 250}\n",
78+
"0.809 (+-0.1) for {'max_depth': 4, 'n_estimators': 5}\n",
79+
"0.82 (+-0.13) for {'max_depth': 4, 'n_estimators': 50}\n",
80+
"0.824 (+-0.109) for {'max_depth': 4, 'n_estimators': 250}\n",
81+
"0.794 (+-0.03) for {'max_depth': 8, 'n_estimators': 5}\n",
82+
"0.817 (+-0.059) for {'max_depth': 8, 'n_estimators': 50}\n",
83+
"0.822 (+-0.067) for {'max_depth': 8, 'n_estimators': 250}\n",
84+
"0.802 (+-0.096) for {'max_depth': 16, 'n_estimators': 5}\n",
85+
"0.811 (+-0.031) for {'max_depth': 16, 'n_estimators': 50}\n",
86+
"0.811 (+-0.036) for {'max_depth': 16, 'n_estimators': 250}\n",
87+
"0.811 (+-0.067) for {'max_depth': 32, 'n_estimators': 5}\n",
88+
"0.807 (+-0.032) for {'max_depth': 32, 'n_estimators': 50}\n",
89+
"0.815 (+-0.024) for {'max_depth': 32, 'n_estimators': 250}\n",
90+
"0.779 (+-0.068) for {'max_depth': None, 'n_estimators': 5}\n",
91+
"0.813 (+-0.026) for {'max_depth': None, 'n_estimators': 50}\n",
92+
"0.815 (+-0.043) for {'max_depth': None, 'n_estimators': 250}\n"
93+
]
94+
}
95+
],
96+
"source": [
97+
"rf = RandomForestClassifier()\n",
98+
"parameters = {\n",
99+
" 'n_estimators': [5, 50, 250], \n",
100+
" 'max_depth': [2, 4, 8, 16, 32, None], \n",
101+
"}\n",
102+
"\n",
103+
"cv = GridSearchCV(rf, parameters, cv=5)\n",
104+
"cv.fit(train_features, train_labels.values.ravel())\n",
105+
"\n",
106+
"print_results(cv)"
107+
]
108+
},
109+
{
110+
"cell_type": "markdown",
111+
"metadata": {},
112+
"source": [
113+
"### Write out pickled model"
114+
]
115+
},
116+
{
117+
"cell_type": "code",
118+
"execution_count": 7,
119+
"metadata": {},
120+
"outputs": [
121+
{
122+
"data": {
123+
"text/plain": [
124+
"['../Pickled_Models/RF_model.pkl']"
125+
]
126+
},
127+
"execution_count": 7,
128+
"metadata": {},
129+
"output_type": "execute_result"
130+
}
131+
],
132+
"source": [
133+
"joblib.dump(cv.best_estimator_, '../Pickled_Models/RF_model.pkl')"
134+
]
135+
},
136+
{
137+
"cell_type": "code",
138+
"execution_count": null,
139+
"metadata": {},
140+
"outputs": [],
141+
"source": []
142+
}
143+
],
144+
"metadata": {
145+
"kernelspec": {
146+
"display_name": "Python 3",
147+
"language": "python",
148+
"name": "python3"
149+
},
150+
"language_info": {
151+
"codemirror_mode": {
152+
"name": "ipython",
153+
"version": 3
154+
},
155+
"file_extension": ".py",
156+
"mimetype": "text/x-python",
157+
"name": "python",
158+
"nbconvert_exporter": "python",
159+
"pygments_lexer": "ipython3",
160+
"version": "3.8.3"
161+
}
162+
},
163+
"nbformat": 4,
164+
"nbformat_minor": 2
165+
}
Loading
Binary file not shown.

0 commit comments

Comments
 (0)