Skip to content

Commit b8943d6

Browse files
committed
comparing and evaluation best model
1 parent 2bcd1a8 commit b8943d6

File tree

2 files changed

+210
-0
lines changed

2 files changed

+210
-0
lines changed
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,210 @@
1+
{
2+
"cells": [
3+
{
4+
"cell_type": "markdown",
5+
"metadata": {},
6+
"source": [
7+
"## Summary: Compare model results and final model selection\n",
8+
"\n",
9+
"Using the Titanic dataset from [this](https://www.kaggle.com/c/titanic/overview) Kaggle competition.\n",
10+
"\n",
11+
"In this section, we will do the following:\n",
12+
"1. Evaluate all of our saved models on the validation set\n",
13+
"2. Select the best model based on performance on the validation set\n",
14+
"3. Evaluate that model on the holdout test set"
15+
]
16+
},
17+
{
18+
"cell_type": "markdown",
19+
"metadata": {},
20+
"source": [
21+
"### Read in Data"
22+
]
23+
},
24+
{
25+
"cell_type": "code",
26+
"execution_count": 8,
27+
"metadata": {},
28+
"outputs": [],
29+
"source": [
30+
"import joblib\n",
31+
"import pandas as pd\n",
32+
"from sklearn.metrics import accuracy_score, precision_score, recall_score\n",
33+
"from time import time\n",
34+
"\n",
35+
"val_features = pd.read_csv('../Data/val_features.csv')\n",
36+
"val_labels = pd.read_csv('../Data/val_labels.csv', header=None)\n",
37+
"\n",
38+
"test_features = pd.read_csv('../Data/test_features.csv')\n",
39+
"test_labels = pd.read_csv('../Data/test_labels.csv', header=None)"
40+
]
41+
},
42+
{
43+
"cell_type": "markdown",
44+
"metadata": {},
45+
"source": [
46+
"### Read in Models"
47+
]
48+
},
49+
{
50+
"cell_type": "code",
51+
"execution_count": 9,
52+
"metadata": {},
53+
"outputs": [],
54+
"source": [
55+
"models = {}\n",
56+
"for mdl in ['LR', 'SVM', 'MLP', 'RF', 'GB']:\n",
57+
" models[mdl] = joblib.load('../Pickled_Models/{}_model.pkl'.format(mdl))"
58+
]
59+
},
60+
{
61+
"cell_type": "code",
62+
"execution_count": 10,
63+
"metadata": {},
64+
"outputs": [
65+
{
66+
"data": {
67+
"text/plain": [
68+
"{'LR': LogisticRegression(C=1, max_iter=1000),\n",
69+
" 'SVM': SVC(C=0.1, kernel='linear'),\n",
70+
" 'MLP': MLPClassifier(activation='tanh', hidden_layer_sizes=(10,), max_iter=1000),\n",
71+
" 'RF': RandomForestClassifier(max_depth=4, n_estimators=250),\n",
72+
" 'GB': GradientBoostingClassifier(learning_rate=0.01, n_estimators=500)}"
73+
]
74+
},
75+
"execution_count": 10,
76+
"metadata": {},
77+
"output_type": "execute_result"
78+
}
79+
],
80+
"source": [
81+
"models"
82+
]
83+
},
84+
{
85+
"cell_type": "markdown",
86+
"metadata": {},
87+
"source": [
88+
"### Evaluate models on the validation set\n",
89+
"\n",
90+
"![Evaluation Metrics](img/eval_metrics.png)"
91+
]
92+
},
93+
{
94+
"cell_type": "code",
95+
"execution_count": 20,
96+
"metadata": {},
97+
"outputs": [],
98+
"source": [
99+
"def evaluate_model(name, model, features, labels):\n",
100+
" start = time()\n",
101+
" pred = model.predict(features)\n",
102+
" end = time()\n",
103+
" \n",
104+
" accuracy = round(accuracy_score(labels, pred), 3) \n",
105+
" precision = round(precision_score(labels, pred), 3)\n",
106+
" recall = round(recall_score(labels, pred), 3)\n",
107+
" \n",
108+
" print('{} -- Accuracy: {} / Precision: {} / Recall: {} / Latency: {}ms'.format(name,\n",
109+
" accuracy,\n",
110+
" precision,\n",
111+
" recall,\n",
112+
" round((end - start)*1000, 1)))"
113+
]
114+
},
115+
{
116+
"cell_type": "code",
117+
"execution_count": 21,
118+
"metadata": {},
119+
"outputs": [
120+
{
121+
"name": "stdout",
122+
"output_type": "stream",
123+
"text": [
124+
"LR -- Accuracy: 0.775 / Precision: 0.712 / Recall: 0.646 / Latency: 3.0ms\n",
125+
"SVM -- Accuracy: 0.747 / Precision: 0.672 / Recall: 0.6 / Latency: 5.0ms\n",
126+
"MLP -- Accuracy: 0.781 / Precision: 0.724 / Recall: 0.646 / Latency: 3.0ms\n",
127+
"RF -- Accuracy: 0.809 / Precision: 0.83 / Recall: 0.6 / Latency: 38.0ms\n",
128+
"GB -- Accuracy: 0.815 / Precision: 0.808 / Recall: 0.646 / Latency: 6.0ms\n"
129+
]
130+
}
131+
],
132+
"source": [
133+
"# validation set\n",
134+
"for name, mdl in models.items():\n",
135+
" evaluate_model(name, mdl, val_features, val_labels)"
136+
]
137+
},
138+
{
139+
"cell_type": "markdown",
140+
"metadata": {},
141+
"source": [
142+
"### Evaluate best model on test set"
143+
]
144+
},
145+
{
146+
"cell_type": "code",
147+
"execution_count": 24,
148+
"metadata": {},
149+
"outputs": [
150+
{
151+
"name": "stdout",
152+
"output_type": "stream",
153+
"text": [
154+
"Random Forest -- Accuracy: 0.799 / Precision: 0.845 / Recall: 0.645 / Latency: 48.0ms\n"
155+
]
156+
}
157+
],
158+
"source": [
159+
"# test set\n",
160+
"evaluate_model('Random Forest', models['RF'], test_features, test_labels)"
161+
]
162+
},
163+
{
164+
"cell_type": "code",
165+
"execution_count": 25,
166+
"metadata": {},
167+
"outputs": [
168+
{
169+
"name": "stdout",
170+
"output_type": "stream",
171+
"text": [
172+
"Gradient Boosting -- Accuracy: 0.816 / Precision: 0.852 / Recall: 0.684 / Latency: 6.0ms\n"
173+
]
174+
}
175+
],
176+
"source": [
177+
"# test set\n",
178+
"evaluate_model('Gradient Boosting', models['GB'], test_features, test_labels)"
179+
]
180+
},
181+
{
182+
"cell_type": "code",
183+
"execution_count": null,
184+
"metadata": {},
185+
"outputs": [],
186+
"source": []
187+
}
188+
],
189+
"metadata": {
190+
"kernelspec": {
191+
"display_name": "Python 3",
192+
"language": "python",
193+
"name": "python3"
194+
},
195+
"language_info": {
196+
"codemirror_mode": {
197+
"name": "ipython",
198+
"version": 3
199+
},
200+
"file_extension": ".py",
201+
"mimetype": "text/x-python",
202+
"name": "python",
203+
"nbconvert_exporter": "python",
204+
"pygments_lexer": "ipython3",
205+
"version": "3.8.3"
206+
}
207+
},
208+
"nbformat": 4,
209+
"nbformat_minor": 2
210+
}

0 commit comments

Comments
 (0)