# Practicum 2 - Fairness in ML 

Let us suppose that you take a new job for a hard hitting data analytics journalism team. Your team (get into groups of 3) is looking into allegations that machine learning classification algorithms are unfair and discriminatory. You are handed a dataset about recidivism - the rate of criminal defendants recommiting crimes. The idea is to use a suspectâ€™s criminal record to predict whether they will recidivate - information that a judge can use in determining whether to grant bail. The judge is hesitant to employ a mysterious machine learning algorithm, and asks your team to look into these allegations of discrimination. In particular, he wants you to investigate the naive use of a support vector machine and of logistic regression.

In [None]:
import numpy as np
import matplotlib.pyplot as plt
import pandas as pd
from sklearn import svm
from sklearn import linear_model as lm

Here is the training data. There are 4,167 individual records, and 10 features (one of which is the prediction target, the 2 year recidivism rate in the first column).

In [None]:
DTrain = pd.read_csv("propublicaTrain.csv")
DTest = pd.read_csv("propublicaTest.csv")
DTrain.head()

Now let's work with it. In particular, we'll train two different classifiers based on this training data. Both will be trying to predict the two year recidivism rate based on the other 9 features. We will use a support vector machine with a linear kernel and logistic regression. 

The models will be trained with the test data, and then the results matrix will have the test data, along with the predictions of the classifiers we trained. The test data was drawn as a random sample from the overall data set. (Note that you may need to give this box a few seconds to run. You will see a sample of your results appear in a table when processing has completed).

In [None]:
clf = svm.SVC(kernel="linear") # Initializing a stock SVM using a RBF kernel
clf.fit(DTrain.iloc[0:,1:], DTrain.two_year_recid) # Training a stock SVM
recidSVM = clf.predict(DTest.iloc[0:,1:]) # 0/1 prediction values for the SVM

# Initializing logistic regression. Feel free to play with the parameters.
mod = lm.LogisticRegression(penalty="l2", tol=0.01, 
                            C=1, fit_intercept=True, intercept_scaling=1, solver="liblinear", 
                            max_iter=10, multi_class="ovr", verbose=0, n_jobs=1)
mod.fit(DTrain.iloc[0:,1:], DTrain.two_year_recid) # Fitting to the data
recidLR = mod.predict(DTest.iloc[0:,1:]) # 0/1 prediction values for logistic regression

R = pd.DataFrame.copy(DTest)
R['SVM'] = recidSVM
R['LR'] = recidLR
R.head()

Let's see how "well" we did. What I call the "uniform accuracy" is just 1 minus the error rate on the test set. In other words, it is the percentage of individuals in the test set for whom the classifier gave the correct value for two_year_recid.

In [None]:
print("Uniform Accuracy of SVM: ", np.sum(R.SVM==R.two_year_recid)/R.shape[0] )
print("Uniform Accuracy of Logistic Regression: ", np.sum(R.LR==R.two_year_recid)/R.shape[0] )

Ok, so we did alright, nothing too exciting. For those who read the ProPublica article, you may note that these off the shelf algorithms, without any fancy modifications or optimizations, are potentially performing as well as proprietary COMPAS software! Now we will look into fairness. In particular, here is a plot of the uniform accuracy of both algorithms for the protected and the unprotected racial class.

In [None]:
numRace0 = R[R.race==0].shape[0]
numRace1 = R[R.race==1].shape[0]

AccSVM =[R[(R.two_year_recid == R.SVM)  & (R.race==0)].shape[0]/numRace0, 
         R[(R.two_year_recid == R.SVM) & (R.race==1)].shape[0]/numRace1] 
AccLR = [R[(R.two_year_recid == R.LR)   & (R.race==0)].shape[0]/numRace0, 
         R[(R.two_year_recid == R.LR)  & (R.race==1)].shape[0]/numRace1]

plt.bar(x=range(4), height=np.concatenate((AccSVM, AccLR)), align='center')
plt.xticks(range(4), ('SVM on Race 0', 'SVM on Race 1',
                     'LR on Race 0', 'LR on Race 1'))
plt.show()

Looks like there is nothing interesting going on here. The SVM is slightly more accurate on the unprotected racial group, and logistic regression is slightly more accurate for the protected racial group, but they are similar. That's where you come in.

# To Do

### First
Analyze these results. Start with the SVM. Try to formalize arguments (that is, argue from the data) that:
1. The algorithm is racially biased / unfair.
2. The algorithm is not racially biased / unfair (use a different argument than just uniform accuracy).

### Second
If you finish the first two steps, try to answer the further questions:
1. Find another way to argue that the algorithm is racially biased / unfair.
2. Is one of the algorithms more unfair than the other? Why? How would you summarize the difference between the algorithms?
3. Can an algorithm simultaneously achieve high accuracy and be fair and unbiased on this dataset? Why or why not, and with what measures of bias or fairness?