ptyadana
diff --git a/‎ML - Applied Machine Learning - Algorithms/05.Random Forest/01.Random Forest - Hyperparameters.ipynb
Lines changed: 84 additions & 0 deletions b/‎ML - Applied Machine Learning - Algorithms/05.Random Forest/01.Random Forest - Hyperparameters.ipynb
Lines changed: 84 additions & 0 deletions
diff --git a/‎ML - Applied Machine Learning - Algorithms/05.Random Forest/02.Random Forest - Fit and evaluate a model.ipynb
Lines changed: 165 additions & 0 deletions b/‎ML - Applied Machine Learning - Algorithms/05.Random Forest/02.Random Forest - Fit and evaluate a model.ipynb
Lines changed: 165 additions & 0 deletions
diff --git a/‎ML - Applied Machine Learning - Algorithms/05.Random Forest/img/rf.png
76.8 KB b/‎ML - Applied Machine Learning - Algorithms/05.Random Forest/img/rf.png
76.8 KB
diff --git a/‎ML - Applied Machine Learning - Algorithms/Pickled_Models/RF_model.pkl
553 KB b/‎ML - Applied Machine Learning - Algorithms/Pickled_Models/RF_model.pkl
553 KB
@@ -0,0 +1,84 @@
+{
+ "cells": [
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "## Random Forest: Hyperparameters\n",
+    "\n",
+    "Import [`RandomForestClassifier`](https://scikit-learn.org/stable/modules/generated/sklearn.ensemble.RandomForestClassifier.html) and [`RandomForestRegressor`](https://scikit-learn.org/stable/modules/generated/sklearn.ensemble.RandomForestRegressor.html) from `sklearn` and explore the hyperparameters."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "### Import Random Forest Algorithm for Classification & Regression"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 2,
+   "metadata": {},
+   "outputs": [
+    {
+     "name": "stdout",
+     "output_type": "stream",
+     "text": [
+      "RandomForestClassifier()\n"
+     ]
+    }
+   ],
+   "source": [
+    "from sklearn.ensemble import RandomForestClassifier, RandomForestRegressor\n",
+    "\n",
+    "print(RandomForestClassifier())"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 3,
+   "metadata": {},
+   "outputs": [
+    {
+     "name": "stdout",
+     "output_type": "stream",
+     "text": [
+      "RandomForestRegressor()\n"
+     ]
+    }
+   ],
+   "source": [
+    "print(RandomForestRegressor())"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": []
+  }
+ ],
+ "metadata": {
+  "kernelspec": {
+   "display_name": "Python 3",
+   "language": "python",
+   "name": "python3"
+  },
+  "language_info": {
+   "codemirror_mode": {
+    "name": "ipython",
+    "version": 3
+   },
+   "file_extension": ".py",
+   "mimetype": "text/x-python",
+   "name": "python",
+   "nbconvert_exporter": "python",
+   "pygments_lexer": "ipython3",
+   "version": "3.8.3"
+  }
+ },
+ "nbformat": 4,
+ "nbformat_minor": 2
+}
@@ -0,0 +1,165 @@
+{
+ "cells": [
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "## Random Forest: Fit and evaluate a model\n",
+    "\n",
+    "Using the Titanic dataset from [this](https://www.kaggle.com/c/titanic/overview) Kaggle competition.\n",
+    "\n",
+    "In this section, we will fit and evaluate a simple Random Forest model."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "### Read in Data"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 4,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "import joblib\n",
+    "import pandas as pd\n",
+    "from sklearn.ensemble import RandomForestClassifier\n",
+    "from sklearn.model_selection import GridSearchCV\n",
+    "\n",
+    "import warnings\n",
+    "warnings.filterwarnings('ignore', category=FutureWarning)\n",
+    "warnings.filterwarnings('ignore', category=DeprecationWarning)\n",
+    "\n",
+    "train_features = pd.read_csv('../Data/train_features.csv')\n",
+    "train_labels = pd.read_csv('../Data/train_labels.csv', header=None)"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "### Hyperparameter tuning\n",
+    "\n",
+    "![RF](img/rf.png)"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 5,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "def print_results(results):\n",
+    "    print('BEST PARAMS: {}\\n'.format(results.best_params_))\n",
+    "    \n",
+    "    means = results.cv_results_['mean_test_score'] \n",
+    "    stds = results.cv_results_['std_test_score']\n",
+    "    for mean, std, params in zip(means, stds, results.cv_results_['params']):\n",
+    "        print('{} (+-{}) for {}'.format(round(mean,3), round(std * 2, 3), params))"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 6,
+   "metadata": {},
+   "outputs": [
+    {
+     "name": "stdout",
+     "output_type": "stream",
+     "text": [
+      "BEST PARAMS: {'max_depth': 4, 'n_estimators': 250}\n",
+      "\n",
+      "0.77 (+-0.166) for {'max_depth': 2, 'n_estimators': 5}\n",
+      "0.805 (+-0.101) for {'max_depth': 2, 'n_estimators': 50}\n",
+      "0.8 (+-0.107) for {'max_depth': 2, 'n_estimators': 250}\n",
+      "0.809 (+-0.1) for {'max_depth': 4, 'n_estimators': 5}\n",
+      "0.82 (+-0.13) for {'max_depth': 4, 'n_estimators': 50}\n",
+      "0.824 (+-0.109) for {'max_depth': 4, 'n_estimators': 250}\n",
+      "0.794 (+-0.03) for {'max_depth': 8, 'n_estimators': 5}\n",
+      "0.817 (+-0.059) for {'max_depth': 8, 'n_estimators': 50}\n",
+      "0.822 (+-0.067) for {'max_depth': 8, 'n_estimators': 250}\n",
+      "0.802 (+-0.096) for {'max_depth': 16, 'n_estimators': 5}\n",
+      "0.811 (+-0.031) for {'max_depth': 16, 'n_estimators': 50}\n",
+      "0.811 (+-0.036) for {'max_depth': 16, 'n_estimators': 250}\n",
+      "0.811 (+-0.067) for {'max_depth': 32, 'n_estimators': 5}\n",
+      "0.807 (+-0.032) for {'max_depth': 32, 'n_estimators': 50}\n",
+      "0.815 (+-0.024) for {'max_depth': 32, 'n_estimators': 250}\n",
+      "0.779 (+-0.068) for {'max_depth': None, 'n_estimators': 5}\n",
+      "0.813 (+-0.026) for {'max_depth': None, 'n_estimators': 50}\n",
+      "0.815 (+-0.043) for {'max_depth': None, 'n_estimators': 250}\n"
+     ]
+    }
+   ],
+   "source": [
+    "rf = RandomForestClassifier()\n",
+    "parameters = {\n",
+    "    'n_estimators': [5, 50, 250], \n",
+    "    'max_depth': [2, 4, 8, 16, 32, None], \n",
+    "}\n",
+    "\n",
+    "cv = GridSearchCV(rf, parameters, cv=5)\n",
+    "cv.fit(train_features, train_labels.values.ravel())\n",
+    "\n",
+    "print_results(cv)"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "### Write out pickled model"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 7,
+   "metadata": {},
+   "outputs": [
+    {
+     "data": {
+      "text/plain": [
+       "['../Pickled_Models/RF_model.pkl']"
+      ]
+     },
+     "execution_count": 7,
+     "metadata": {},
+     "output_type": "execute_result"
+    }
+   ],
+   "source": [
+    "joblib.dump(cv.best_estimator_, '../Pickled_Models/RF_model.pkl')"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": []
+  }
+ ],
+ "metadata": {
+  "kernelspec": {
+   "display_name": "Python 3",
+   "language": "python",
+   "name": "python3"
+  },
+  "language_info": {
+   "codemirror_mode": {
+    "name": "ipython",
+    "version": 3
+   },
+   "file_extension": ".py",
+   "mimetype": "text/x-python",
+   "name": "python",
+   "nbconvert_exporter": "python",
+   "pygments_lexer": "ipython3",
+   "version": "3.8.3"
+  }
+ },
+ "nbformat": 4,
+ "nbformat_minor": 2
+}