{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "# Tutorial: Tree-based and ensemble models in Python\n", "\n", "This is not a standalone tutorial, as most of the theory and important questions are asked in the main tutorial in R. The second tutorial serves the purpose of helping you make your own tree-based and enseble models in Python. I expect that you have already covered the video lectures and the main tutorial in R by now. So I assume that you already know what you are doing, you just don't know how to code it in Python yet. If R tutorial provides more assistance in digesting the material, this tutorial in Python pushes you to be more independent, with minimum hints. (I used these notebooks to prepare this tutorial: [1](https://github.com/ageron/handson-ml2/blob/master/06_decision_trees.ipynb), [2](https://github.com/ageron/handson-ml2/blob/master/07_ensemble_learning_and_random_forests.ipynb) [3](https://nbviewer.jupyter.org/github/JWarmenhoven/ISL-python/blob/master/Notebooks/Chapter%208.ipynb#8.3.4-Boosting))\n", "\n", "We keep on working with the same COMPAS dataset, and we will use `sklearn` library. " ] }, { "cell_type": "code", "execution_count": 189, "metadata": {}, "outputs": [], "source": [ "# Import the usual libraries\n", "import sys\n", "import sklearn\n", "import numpy as np\n", "import pandas as pd\n", "import os" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Load data and split it in train and test sets\n", "Let's load the sample of COMPAS data we used in the previous tutorial. For your convenience, I already transformed all factor variables into dummy variables using `model.matrix()` function in R and saved it on github as a csv file. So you can directly access the data using the link below." ] }, { "cell_type": "code", "execution_count": 190, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", " | Two_yr_Recidivismyes | \n", "Number_of_Priors | \n", "Age_Above_FourtyFiveyes | \n", "Age_Below_TwentyFiveyes | \n", "FemaleMale | \n", "Misdemeanoryes | \n", "ethnicityCaucasian | \n", "
---|---|---|---|---|---|---|---|
0 | \n", "1 | \n", "-0.012530 | \n", "0 | \n", "1 | \n", "0 | \n", "1 | \n", "1 | \n", "
1 | \n", "1 | \n", "-0.400115 | \n", "0 | \n", "1 | \n", "1 | \n", "1 | \n", "0 | \n", "
2 | \n", "1 | \n", "0.701857 | \n", "1 | \n", "0 | \n", "1 | \n", "1 | \n", "0 | \n", "
3 | \n", "0 | \n", "-0.930922 | \n", "0 | \n", "0 | \n", "1 | \n", "0 | \n", "1 | \n", "
4 | \n", "1 | \n", "-0.545324 | \n", "0 | \n", "1 | \n", "1 | \n", "1 | \n", "0 | \n", "