Credit Card Fraud Detection

This dataset is available at Kaggle. It contains transactions made by credit cards in September 2013 by European cardholders.

Initial Observations

This dataset is heavily unbalanced, with only a small minority of the transactions being fraudulent. It also only contains numeric values as a result of a PCA transformation. First, load in the required libraries and take a look at the data.

In [2]:
import pandas as pd
import numpy as np
import tensorflow as tf
from tensorflow import keras
import matplotlib.pyplot as plt
import seaborn as sns
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import RobustScaler
from sklearn.metrics import confusion_matrix
import sklearn
import warnings
warnings.filterwarnings("ignore")
%matplotlib inline
In [3]:
df = pd.read_csv("creditcard.csv")
df.head()
Out[3]:
Time V1 V2 V3 V4 V5 V6 V7 V8 V9 ... V21 V22 V23 V24 V25 V26 V27 V28 Amount Class
0 0.0 -1.359807 -0.072781 2.536347 1.378155 -0.338321 0.462388 0.239599 0.098698 0.363787 ... -0.018307 0.277838 -0.110474 0.066928 0.128539 -0.189115 0.133558 -0.021053 149.62 0
1 0.0 1.191857 0.266151 0.166480 0.448154 0.060018 -0.082361 -0.078803 0.085102 -0.255425 ... -0.225775 -0.638672 0.101288 -0.339846 0.167170 0.125895 -0.008983 0.014724 2.69 0
2 1.0 -1.358354 -1.340163 1.773209 0.379780 -0.503198 1.800499 0.791461 0.247676 -1.514654 ... 0.247998 0.771679 0.909412 -0.689281 -0.327642 -0.139097 -0.055353 -0.059752 378.66 0
3 1.0 -0.966272 -0.185226 1.792993 -0.863291 -0.010309 1.247203 0.237609 0.377436 -1.387024 ... -0.108300 0.005274 -0.190321 -1.175575 0.647376 -0.221929 0.062723 0.061458 123.50 0
4 2.0 -1.158233 0.877737 1.548718 0.403034 -0.407193 0.095921 0.592941 -0.270533 0.817739 ... -0.009431 0.798278 -0.137458 0.141267 -0.206010 0.502292 0.219422 0.215153 69.99 0

5 rows × 31 columns

In [4]:
df.describe()
Out[4]:
Time V1 V2 V3 V4 V5 V6 V7 V8 V9 ... V21 V22 V23 V24 V25 V26 V27 V28 Amount Class
count 284807.000000 2.848070e+05 2.848070e+05 2.848070e+05 2.848070e+05 2.848070e+05 2.848070e+05 2.848070e+05 2.848070e+05 2.848070e+05 ... 2.848070e+05 2.848070e+05 2.848070e+05 2.848070e+05 2.848070e+05 2.848070e+05 2.848070e+05 2.848070e+05 284807.000000 284807.000000
mean 94813.859575 1.165980e-15 3.416908e-16 -1.373150e-15 2.086869e-15 9.604066e-16 1.490107e-15 -5.556467e-16 1.177556e-16 -2.406455e-15 ... 1.656562e-16 -3.444850e-16 2.578648e-16 4.471968e-15 5.340915e-16 1.687098e-15 -3.666453e-16 -1.220404e-16 88.349619 0.001727
std 47488.145955 1.958696e+00 1.651309e+00 1.516255e+00 1.415869e+00 1.380247e+00 1.332271e+00 1.237094e+00 1.194353e+00 1.098632e+00 ... 7.345240e-01 7.257016e-01 6.244603e-01 6.056471e-01 5.212781e-01 4.822270e-01 4.036325e-01 3.300833e-01 250.120109 0.041527
min 0.000000 -5.640751e+01 -7.271573e+01 -4.832559e+01 -5.683171e+00 -1.137433e+02 -2.616051e+01 -4.355724e+01 -7.321672e+01 -1.343407e+01 ... -3.483038e+01 -1.093314e+01 -4.480774e+01 -2.836627e+00 -1.029540e+01 -2.604551e+00 -2.256568e+01 -1.543008e+01 0.000000 0.000000
25% 54201.500000 -9.203734e-01 -5.985499e-01 -8.903648e-01 -8.486401e-01 -6.915971e-01 -7.682956e-01 -5.540759e-01 -2.086297e-01 -6.430976e-01 ... -2.283949e-01 -5.423504e-01 -1.618463e-01 -3.545861e-01 -3.171451e-01 -3.269839e-01 -7.083953e-02 -5.295979e-02 5.600000 0.000000
50% 84692.000000 1.810880e-02 6.548556e-02 1.798463e-01 -1.984653e-02 -5.433583e-02 -2.741871e-01 4.010308e-02 2.235804e-02 -5.142873e-02 ... -2.945017e-02 6.781943e-03 -1.119293e-02 4.097606e-02 1.659350e-02 -5.213911e-02 1.342146e-03 1.124383e-02 22.000000 0.000000
75% 139320.500000 1.315642e+00 8.037239e-01 1.027196e+00 7.433413e-01 6.119264e-01 3.985649e-01 5.704361e-01 3.273459e-01 5.971390e-01 ... 1.863772e-01 5.285536e-01 1.476421e-01 4.395266e-01 3.507156e-01 2.409522e-01 9.104512e-02 7.827995e-02 77.165000 0.000000
max 172792.000000 2.454930e+00 2.205773e+01 9.382558e+00 1.687534e+01 3.480167e+01 7.330163e+01 1.205895e+02 2.000721e+01 1.559499e+01 ... 2.720284e+01 1.050309e+01 2.252841e+01 4.584549e+00 7.519589e+00 3.517346e+00 3.161220e+01 3.384781e+01 25691.160000 1.000000

8 rows × 31 columns

In [5]:
print(round(df['Class'].value_counts()[1]/len(df) * 100, 3), "% are fraudulent")
0.173 % are fraudulent

As we can see only a small minority of the dataset contains fraudulent transactions. Therefore, a model could get a high accuracy if it just guessed not fraudulent for all transactions and would not be effective at preventing fraudulent transactions.

In [6]:
sns.countplot('Class', data=df)
plt.title("Distribution of Class")
Out[6]:
Text(0.5, 1.0, 'Distribution of Class')
In [7]:
sns.distplot(df['Amount'].values, color="red")
plt.title("Distribution of Transaction Amount")
plt.xlim([min(df['Amount'].values), max(df['Amount'].values)])
Out[7]:
(0.0, 25691.16)

Data preparation

The Time and Amount columns should also be scaled to match all the other columns.

In [8]:
robust = RobustScaler()

df['scaled_amount'] = robust.fit_transform(df['Amount'].values.reshape(-1, 1))
df['scaled_time'] = robust.fit_transform(df['Time'].values.reshape(-1, 1))

df.drop(['Time', 'Amount'], axis=1, inplace=True)

scaled_amount = df['scaled_amount']
scaled_time = df['scaled_time']

# Insert the scaled columns at the front of the dataframe

df.drop(['scaled_amount', 'scaled_time'], axis=1, inplace=True)
df.insert(0, 'scaled_amount', scaled_amount)
df.insert(1, 'scaled_time', scaled_time)


df.head()
Out[8]:
scaled_amount scaled_time V1 V2 V3 V4 V5 V6 V7 V8 ... V20 V21 V22 V23 V24 V25 V26 V27 V28 Class
0 1.783274 -0.994983 -1.359807 -0.072781 2.536347 1.378155 -0.338321 0.462388 0.239599 0.098698 ... 0.251412 -0.018307 0.277838 -0.110474 0.066928 0.128539 -0.189115 0.133558 -0.021053 0
1 -0.269825 -0.994983 1.191857 0.266151 0.166480 0.448154 0.060018 -0.082361 -0.078803 0.085102 ... -0.069083 -0.225775 -0.638672 0.101288 -0.339846 0.167170 0.125895 -0.008983 0.014724 0
2 4.983721 -0.994972 -1.358354 -1.340163 1.773209 0.379780 -0.503198 1.800499 0.791461 0.247676 ... 0.524980 0.247998 0.771679 0.909412 -0.689281 -0.327642 -0.139097 -0.055353 -0.059752 0
3 1.418291 -0.994972 -0.966272 -0.185226 1.792993 -0.863291 -0.010309 1.247203 0.237609 0.377436 ... -0.208038 -0.108300 0.005274 -0.190321 -1.175575 0.647376 -0.221929 0.062723 0.061458 0
4 0.670579 -0.994960 -1.158233 0.877737 1.548718 0.403034 -0.407193 0.095921 0.592941 -0.270533 ... 0.408542 -0.009431 0.798278 -0.137458 0.141267 -0.206010 0.502292 0.219422 0.215153 0

5 rows × 31 columns

Now we have our new columns scaled to match the rest of the data.

Basic Model

First, lets see what happens when we run a model on the imbalanced data and see how we can improve.

In [9]:
train_df, test_df = train_test_split(df, test_size=0.2)
y_train = train_df['Class']
X_train = train_df.drop('Class', axis=1)
y_test = test_df['Class']
X_test = test_df.drop('Class', axis=1)
In [10]:
print("Train data has ", round(y_train.value_counts()[1]/len(y_train) * 100, 3), "% of class fraud")
print("Test data has ", round(y_test.value_counts()[1]/len(y_test) * 100, 3), "% of class fraud")
Train data has  0.17 % of class fraud
Test data has  0.183 % of class fraud
In [11]:
model = keras.Sequential([
    keras.layers.Dense(30, input_shape=(30,), activation='relu'),
    keras.layers.Dense(32, activation='relu'),
    keras.layers.Dense(2, activation='softmax')
])


model.compile(optimizer='adam', loss='sparse_categorical_crossentropy', metrics=['accuracy'])

model.fit(X_train, y_train, epochs=20, verbose=2)

test_loss, test_acc = model.evaluate(X_test,  y_test, verbose=2)
print('\nTest accuracy:', test_acc)
Train on 227845 samples
Epoch 1/20
227845/227845 - 10s - loss: 0.0074 - accuracy: 0.9987
Epoch 2/20
227845/227845 - 10s - loss: 0.0032 - accuracy: 0.9994
Epoch 3/20
227845/227845 - 10s - loss: 0.0029 - accuracy: 0.9994
Epoch 4/20
227845/227845 - 10s - loss: 0.0028 - accuracy: 0.9994
Epoch 5/20
227845/227845 - 10s - loss: 0.0026 - accuracy: 0.9995
Epoch 6/20
227845/227845 - 9s - loss: 0.0024 - accuracy: 0.9995
Epoch 7/20
227845/227845 - 10s - loss: 0.0023 - accuracy: 0.9995
Epoch 8/20
227845/227845 - 10s - loss: 0.0022 - accuracy: 0.9995
Epoch 9/20
227845/227845 - 9s - loss: 0.0022 - accuracy: 0.9995
Epoch 10/20
227845/227845 - 9s - loss: 0.0021 - accuracy: 0.9995
Epoch 11/20
227845/227845 - 9s - loss: 0.0020 - accuracy: 0.9995
Epoch 12/20
227845/227845 - 9s - loss: 0.0019 - accuracy: 0.9996
Epoch 13/20
227845/227845 - 9s - loss: 0.0019 - accuracy: 0.9996
Epoch 14/20
227845/227845 - 10s - loss: 0.0019 - accuracy: 0.9996
Epoch 15/20
227845/227845 - 10s - loss: 0.0018 - accuracy: 0.9996
Epoch 16/20
227845/227845 - 10s - loss: 0.0018 - accuracy: 0.9996
Epoch 17/20
227845/227845 - 9s - loss: 0.0018 - accuracy: 0.9996
Epoch 18/20
227845/227845 - 9s - loss: 0.0017 - accuracy: 0.9996
Epoch 19/20
227845/227845 - 9s - loss: 0.0017 - accuracy: 0.9996
Epoch 20/20
227845/227845 - 10s - loss: 0.0018 - accuracy: 0.9996
56962/56962 - 2s - loss: 0.0037 - accuracy: 0.9994

Test accuracy: 0.99942064
In [17]:
predictions = model.predict(X_test)
predicted_class = []
for prediction in predictions:
    predicted_class.append(np.argmax(prediction))
cm = confusion_matrix(y_test, predicted_class)
plt.figure(figsize=(5,5))
sns.heatmap(cm, annot=True, fmt="d")
plt.title('Confusion matrix')
plt.ylabel('Actual label')
plt.xlabel('Predicted label')
Out[17]:
Text(0.5, 24.0, 'Predicted label')

It appears our model is very accurate, getting over 99% of cases correct. However, to really tell if this is the model we want, we need to look deeper. Due to the imbalanced data, if the model just guessed not fraud for each transaction it would also get over 99% accuracy.
Lets take a look at what it predicts for the whole data set.

In [19]:
test_predictions = model.predict(df.drop('Class', axis=1))
predicted_class = []
for prediction in test_predictions:
    predicted_class.append(np.argmax(prediction))
cm = confusion_matrix(df['Class'], predicted_class)
plt.figure(figsize=(5,5))
sns.heatmap(cm, annot=True, fmt="d")
plt.title('Confusion matrix on all data')
plt.ylabel('Actual label')
plt.xlabel('Predicted label')
Out[19]:
Text(0.5, 24.0, 'Predicted label')

We can see that this model misses 89 cases of fraud, and also wrongly classifies 28 legitimate transactions as fraud. For a bank, this means 28 customers will either be contacted to validate their details or have their card blocked as a result of this wrong classification and 89 fraudulent transactions will go undetected
There is definitely some room to improve, scaling the data to have half fraud and half non fraud might improve our model.

Balanced Data

First, lets create a dataframe with 492 cases of each class, as we only have 492 cases of fraud in total.

In [28]:
fraud_df = df.loc[df['Class'] == 1]
nfraud_df = df.loc[df['Class'] == 0]
small_df = pd.concat([fraud_df, nfraud_df[:492]])
new_df = small_df.sample(frac=1, random_state=24)
print(new_df['Class'].value_counts()/len(new_df))
1    0.5
0    0.5
Name: Class, dtype: float64

We now have a dataset that is evenly distributed. Now, lets train_test_split this and give it to our model and see if results improve.

In [29]:
new_train_df, new_test_df = train_test_split(new_df, test_size=0.2)
y_train_new = train_df['Class']
X_train_new = train_df.drop('Class', axis=1)
y_test_new = test_df['Class']
X_test_new = test_df.drop('Class', axis=1)

new_model = keras.Sequential([
    keras.layers.Dense(30, input_shape=(30,), activation='relu'),
    keras.layers.Dense(32, activation='relu'),
    keras.layers.Dense(2, activation='softmax')
])

new_model.compile(optimizer='adam', loss='sparse_categorical_crossentropy', metrics=['accuracy'])

new_model.fit(X_train_new, y_train_new, epochs=20, verbose=2)

test_loss, test_acc = model.evaluate(X_test_new,  y_test_new, verbose=2)
print('\nTest accuracy:', test_acc)
Train on 227845 samples
Epoch 1/20
227845/227845 - 10s - loss: 0.0058 - accuracy: 0.9991
Epoch 2/20
227845/227845 - 9s - loss: 0.0032 - accuracy: 0.9994
Epoch 3/20
227845/227845 - 9s - loss: 0.0029 - accuracy: 0.9994
Epoch 4/20
227845/227845 - 9s - loss: 0.0027 - accuracy: 0.9994
Epoch 5/20
227845/227845 - 9s - loss: 0.0025 - accuracy: 0.9995
Epoch 6/20
227845/227845 - 9s - loss: 0.0026 - accuracy: 0.9995
Epoch 7/20
227845/227845 - 10s - loss: 0.0024 - accuracy: 0.9995
Epoch 8/20
227845/227845 - 9s - loss: 0.0022 - accuracy: 0.9995
Epoch 9/20
227845/227845 - 10s - loss: 0.0022 - accuracy: 0.9995
Epoch 10/20
227845/227845 - 9s - loss: 0.0021 - accuracy: 0.9994
Epoch 11/20
227845/227845 - 10s - loss: 0.0020 - accuracy: 0.9995
Epoch 12/20
227845/227845 - 10s - loss: 0.0020 - accuracy: 0.9995
Epoch 13/20
227845/227845 - 9s - loss: 0.0020 - accuracy: 0.9995
Epoch 14/20
227845/227845 - 9s - loss: 0.0018 - accuracy: 0.9995
Epoch 15/20
227845/227845 - 10s - loss: 0.0018 - accuracy: 0.9996
Epoch 16/20
227845/227845 - 10s - loss: 0.0018 - accuracy: 0.9996
Epoch 17/20
227845/227845 - 10s - loss: 0.0017 - accuracy: 0.9996
Epoch 18/20
227845/227845 - 9s - loss: 0.0016 - accuracy: 0.9996
Epoch 19/20
227845/227845 - 9s - loss: 0.0017 - accuracy: 0.9996
Epoch 20/20
227845/227845 - 10s - loss: 0.0015 - accuracy: 0.9996
56962/56962 - 1s - loss: 0.0037 - accuracy: 0.9994

Test accuracy: 0.99942064
In [30]:
test_predictions = new_model.predict(df.drop('Class', axis=1))
predicted_class = []
for prediction in test_predictions:
    predicted_class.append(np.argmax(prediction))
cm = confusion_matrix(df['Class'], predicted_class)
plt.figure(figsize=(5,5))
sns.heatmap(cm, annot=True, fmt="d")
plt.title('Confusion matrix on all data')
plt.ylabel('Actual label')
plt.xlabel('Predicted label')
Out[30]:
Text(0.5, 24.0, 'Predicted label')

Legitimate Transactions Detected (True Negatives): 284293
Legitimate Transactions Incorrectly Detected (False Positives): 13
Fraudulent Transactions Missed (False Negatives): 106
Fraudulent Transactions Detected (True Positives): 386

This model predicts fails to identify 106 transactions as fraud. This is as a result of getting rid of most of the data, only taking 492 out of 284,315 non-fraud transactions. This leads to our model falsely believing that there is an equal amount of fraud and legitimate transactions. We can try to improve our model by adding class weights and changing the design of our Keras model.

Class Weights

First, lets create the weights for each class.

In [31]:
# Pos = class 1, Neg = class 2
pos = df['Class'].value_counts()[1]
neg = df['Class'].value_counts()[0]
total = pos + neg
class_0_weight = (1 / neg)*(total)/2.0
class_1_weight = (1 / pos)*(total)/2.0

class_weights = {0: class_0_weight, 1: class_1_weight}

print("Weight for class 0: {:.2f}".format(class_0_weight))
print("Weight for class 1: {:.2f}".format(class_1_weight))
Weight for class 0: 0.50
Weight for class 1: 289.44

Once we have the weights, we will train the model with the weights in an attempt to improve it. We will also adjust the model slightly. We will be adding a dropout layer will prevent overfitting the model, including validation data and creating a callback to stop the model early if needed.

In [33]:
train_df, val_df = train_test_split(train_df, test_size=0.2)
train_features = train_df.drop('Class', axis=1)
train_labels = train_df['Class']
val_features = val_df.drop('Class', axis=1)
val_labels = val_df['Class']

early_stopping = tf.keras.callbacks.EarlyStopping(
    monitor='val_auc',
    verbose=1,
    patience=10,
    mode='max',
    restore_best_weights=True)

metrics = [
      keras.metrics.TruePositives(name='tp'),
      keras.metrics.FalsePositives(name='fp'),
      keras.metrics.TrueNegatives(name='tn'),
      keras.metrics.FalseNegatives(name='fn'),
      keras.metrics.BinaryAccuracy(name='accuracy'),
      keras.metrics.Precision(name='precision'),
      keras.metrics.Recall(name='recall'),
      keras.metrics.AUC(name='auc'),
]


weighted_model = keras.Sequential([
    keras.layers.Dense(16, input_shape=(X_train.shape[-1],), activation='relu'),
    keras.layers.Dropout(0.5),
    keras.layers.Dense(1, activation='sigmoid')
])

weighted_model.compile(
    optimizer=keras.optimizers.Adam(lr=1e-3),
    loss=keras.losses.BinaryCrossentropy(),
    metrics=metrics
)

weighted_history = weighted_model.fit(
    train_features,
    train_labels,
    epochs=100,
    callbacks=[early_stopping],
    validation_data=(val_features, val_labels),
    class_weight=class_weights
)
WARNING: Logging before flag parsing goes to stderr.
W0707 23:32:42.764910 13900 data_adapter.py:1091] sample_weight modes were coerced from
  ...
    to
  ['...']
W0707 23:32:42.894561 13900 data_adapter.py:1091] sample_weight modes were coerced from
  ...
    to
  ['...']
Train on 145820 samples, validate on 36456 samples
Epoch 1/100
145820/145820 [==============================] - 14s 94us/sample - loss: 0.5225 - tp: 199.0000 - fp: 27146.0000 - tn: 118426.0000 - fn: 49.0000 - accuracy: 0.8135 - precision: 0.0073 - recall: 0.8024 - auc: 0.8614 - val_loss: 0.2964 - val_tp: 55.0000 - val_fp: 760.0000 - val_tn: 35631.0000 - val_fn: 10.0000 - val_accuracy: 0.9789 - val_precision: 0.0675 - val_recall: 0.8462 - val_auc: 0.9168
Epoch 2/100
145820/145820 [==============================] - 11s 76us/sample - loss: 0.2461 - tp: 224.0000 - fp: 6661.0000 - tn: 138911.0000 - fn: 24.0000 - accuracy: 0.9542 - precision: 0.0325 - recall: 0.9032 - auc: 0.9576 - val_loss: 0.2896 - val_tp: 53.0000 - val_fp: 518.0000 - val_tn: 35873.0000 - val_fn: 12.0000 - val_accuracy: 0.9855 - val_precision: 0.0928 - val_recall: 0.8154 - val_auc: 0.9302
Epoch 3/100
145820/145820 [==============================] - 11s 77us/sample - loss: 0.2040 - tp: 220.0000 - fp: 4219.0000 - tn: 141353.0000 - fn: 28.0000 - accuracy: 0.9709 - precision: 0.0496 - recall: 0.8871 - auc: 0.9688 - val_loss: 0.2866 - val_tp: 53.0000 - val_fp: 491.0000 - val_tn: 35900.0000 - val_fn: 12.0000 - val_accuracy: 0.9862 - val_precision: 0.0974 - val_recall: 0.8154 - val_auc: 0.9457
Epoch 4/100
145820/145820 [==============================] - 14s 95us/sample - loss: 0.1670 - tp: 225.0000 - fp: 3377.0000 - tn: 142195.0000 - fn: 23.0000 - accuracy: 0.9767 - precision: 0.0625 - recall: 0.9073 - auc: 0.9763 - val_loss: 0.3403 - val_tp: 52.0000 - val_fp: 282.0000 - val_tn: 36109.0000 - val_fn: 13.0000 - val_accuracy: 0.9919 - val_precision: 0.1557 - val_recall: 0.8000 - val_auc: 0.9387
Epoch 5/100
145820/145820 [==============================] - 12s 81us/sample - loss: 0.1705 - tp: 224.0000 - fp: 3234.0000 - tn: 142338.0000 - fn: 24.0000 - accuracy: 0.9777 - precision: 0.0648 - recall: 0.9032 - auc: 0.9785 - val_loss: 0.3042 - val_tp: 54.0000 - val_fp: 730.0000 - val_tn: 35661.0000 - val_fn: 11.0000 - val_accuracy: 0.9797 - val_precision: 0.0689 - val_recall: 0.8308 - val_auc: 0.9488
Epoch 6/100
145820/145820 [==============================] - 12s 81us/sample - loss: 0.1645 - tp: 227.0000 - fp: 4461.0000 - tn: 141111.0000 - fn: 21.0000 - accuracy: 0.9693 - precision: 0.0484 - recall: 0.9153 - auc: 0.9791 - val_loss: 0.3582 - val_tp: 53.0000 - val_fp: 290.0000 - val_tn: 36101.0000 - val_fn: 12.0000 - val_accuracy: 0.9917 - val_precision: 0.1545 - val_recall: 0.8154 - val_auc: 0.9472
Epoch 7/100
145820/145820 [==============================] - 11s 79us/sample - loss: 0.1744 - tp: 224.0000 - fp: 3625.0000 - tn: 141947.0000 - fn: 24.0000 - accuracy: 0.9750 - precision: 0.0582 - recall: 0.9032 - auc: 0.9823 - val_loss: 0.3310 - val_tp: 54.0000 - val_fp: 502.0000 - val_tn: 35889.0000 - val_fn: 11.0000 - val_accuracy: 0.9859 - val_precision: 0.0971 - val_recall: 0.8308 - val_auc: 0.9533
Epoch 8/100
145820/145820 [==============================] - 12s 81us/sample - loss: 0.1616 - tp: 229.0000 - fp: 3793.0000 - tn: 141779.0000 - fn: 19.0000 - accuracy: 0.9739 - precision: 0.0569 - recall: 0.9234 - auc: 0.9829 - val_loss: 0.3709 - val_tp: 54.0000 - val_fp: 419.0000 - val_tn: 35972.0000 - val_fn: 11.0000 - val_accuracy: 0.9882 - val_precision: 0.1142 - val_recall: 0.8308 - val_auc: 0.9523
Epoch 9/100
145820/145820 [==============================] - 12s 79us/sample - loss: 0.1577 - tp: 228.0000 - fp: 3488.0000 - tn: 142084.0000 - fn: 20.0000 - accuracy: 0.9759 - precision: 0.0614 - recall: 0.9194 - auc: 0.9815 - val_loss: 0.3877 - val_tp: 53.0000 - val_fp: 389.0000 - val_tn: 36002.0000 - val_fn: 12.0000 - val_accuracy: 0.9890 - val_precision: 0.1199 - val_recall: 0.8154 - val_auc: 0.9518
Epoch 10/100
145820/145820 [==============================] - 12s 82us/sample - loss: 0.1736 - tp: 222.0000 - fp: 3125.0000 - tn: 142447.0000 - fn: 26.0000 - accuracy: 0.9784 - precision: 0.0663 - recall: 0.8952 - auc: 0.9773 - val_loss: 0.3946 - val_tp: 53.0000 - val_fp: 442.0000 - val_tn: 35949.0000 - val_fn: 12.0000 - val_accuracy: 0.9875 - val_precision: 0.1071 - val_recall: 0.8154 - val_auc: 0.9510
Epoch 11/100
145820/145820 [==============================] - 12s 79us/sample - loss: 0.1590 - tp: 228.0000 - fp: 3311.0000 - tn: 142261.0000 - fn: 20.0000 - accuracy: 0.9772 - precision: 0.0644 - recall: 0.9194 - auc: 0.9833 - val_loss: 0.4072 - val_tp: 53.0000 - val_fp: 602.0000 - val_tn: 35789.0000 - val_fn: 12.0000 - val_accuracy: 0.9832 - val_precision: 0.0809 - val_recall: 0.8154 - val_auc: 0.9503
Epoch 12/100
145820/145820 [==============================] - 12s 79us/sample - loss: 0.1560 - tp: 229.0000 - fp: 3845.0000 - tn: 141727.0000 - fn: 19.0000 - accuracy: 0.9735 - precision: 0.0562 - recall: 0.9234 - auc: 0.9848 - val_loss: 0.4290 - val_tp: 53.0000 - val_fp: 552.0000 - val_tn: 35839.0000 - val_fn: 12.0000 - val_accuracy: 0.9845 - val_precision: 0.0876 - val_recall: 0.8154 - val_auc: 0.9473
Epoch 13/100
145820/145820 [==============================] - 12s 79us/sample - loss: 0.1483 - tp: 227.0000 - fp: 4245.0000 - tn: 141327.0000 - fn: 21.0000 - accuracy: 0.9707 - precision: 0.0508 - recall: 0.9153 - auc: 0.9843 - val_loss: 0.4528 - val_tp: 53.0000 - val_fp: 411.0000 - val_tn: 35980.0000 - val_fn: 12.0000 - val_accuracy: 0.9884 - val_precision: 0.1142 - val_recall: 0.8154 - val_auc: 0.9517
Epoch 14/100
145820/145820 [==============================] - 12s 80us/sample - loss: 0.1439 - tp: 228.0000 - fp: 3533.0000 - tn: 142039.0000 - fn: 20.0000 - accuracy: 0.9756 - precision: 0.0606 - recall: 0.9194 - auc: 0.9843 - val_loss: 0.4666 - val_tp: 54.0000 - val_fp: 404.0000 - val_tn: 35987.0000 - val_fn: 11.0000 - val_accuracy: 0.9886 - val_precision: 0.1179 - val_recall: 0.8308 - val_auc: 0.9531
Epoch 15/100
145820/145820 [==============================] - 12s 81us/sample - loss: 0.1241 - tp: 230.0000 - fp: 3314.0000 - tn: 142258.0000 - fn: 18.0000 - accuracy: 0.9771 - precision: 0.0649 - recall: 0.9274 - auc: 0.9895 - val_loss: 0.4760 - val_tp: 54.0000 - val_fp: 394.0000 - val_tn: 35997.0000 - val_fn: 11.0000 - val_accuracy: 0.9889 - val_precision: 0.1205 - val_recall: 0.8308 - val_auc: 0.9520
Epoch 16/100
145820/145820 [==============================] - 11s 78us/sample - loss: 0.1434 - tp: 228.0000 - fp: 2942.0000 - tn: 142630.0000 - fn: 20.0000 - accuracy: 0.9797 - precision: 0.0719 - recall: 0.9194 - auc: 0.9876 - val_loss: 0.4983 - val_tp: 53.0000 - val_fp: 422.0000 - val_tn: 35969.0000 - val_fn: 12.0000 - val_accuracy: 0.9881 - val_precision: 0.1116 - val_recall: 0.8154 - val_auc: 0.9539
Epoch 17/100
145820/145820 [==============================] - 12s 80us/sample - loss: 0.1236 - tp: 231.0000 - fp: 3415.0000 - tn: 142157.0000 - fn: 17.0000 - accuracy: 0.9765 - precision: 0.0634 - recall: 0.9315 - auc: 0.9889 - val_loss: 0.5492 - val_tp: 53.0000 - val_fp: 346.0000 - val_tn: 36045.0000 - val_fn: 12.0000 - val_accuracy: 0.9902 - val_precision: 0.1328 - val_recall: 0.8154 - val_auc: 0.9531
Epoch 18/100
145820/145820 [==============================] - 12s 81us/sample - loss: 0.1861 - tp: 226.0000 - fp: 3521.0000 - tn: 142051.0000 - fn: 22.0000 - accuracy: 0.9757 - precision: 0.0603 - recall: 0.9113 - auc: 0.9797 - val_loss: 0.5217 - val_tp: 54.0000 - val_fp: 467.0000 - val_tn: 35924.0000 - val_fn: 11.0000 - val_accuracy: 0.9869 - val_precision: 0.1036 - val_recall: 0.8308 - val_auc: 0.9527
Epoch 19/100
145820/145820 [==============================] - 12s 79us/sample - loss: 0.1589 - tp: 227.0000 - fp: 3819.0000 - tn: 141753.0000 - fn: 21.0000 - accuracy: 0.9737 - precision: 0.0561 - recall: 0.9153 - auc: 0.9851 - val_loss: 0.5254 - val_tp: 54.0000 - val_fp: 477.0000 - val_tn: 35914.0000 - val_fn: 11.0000 - val_accuracy: 0.9866 - val_precision: 0.1017 - val_recall: 0.8308 - val_auc: 0.9514
Epoch 20/100
145820/145820 [==============================] - 11s 77us/sample - loss: 0.1606 - tp: 232.0000 - fp: 4406.0000 - tn: 141166.0000 - fn: 16.0000 - accuracy: 0.9697 - precision: 0.0500 - recall: 0.9355 - auc: 0.9837 - val_loss: 0.4894 - val_tp: 54.0000 - val_fp: 680.0000 - val_tn: 35711.0000 - val_fn: 11.0000 - val_accuracy: 0.9810 - val_precision: 0.0736 - val_recall: 0.8308 - val_auc: 0.9583
Epoch 21/100
145820/145820 [==============================] - 10s 72us/sample - loss: 0.1617 - tp: 227.0000 - fp: 3885.0000 - tn: 141687.0000 - fn: 21.0000 - accuracy: 0.9732 - precision: 0.0552 - recall: 0.9153 - auc: 0.9843 - val_loss: 0.5563 - val_tp: 53.0000 - val_fp: 320.0000 - val_tn: 36071.0000 - val_fn: 12.0000 - val_accuracy: 0.9909 - val_precision: 0.1421 - val_recall: 0.8154 - val_auc: 0.9482
Epoch 22/100
145820/145820 [==============================] - 11s 73us/sample - loss: 0.1542 - tp: 226.0000 - fp: 3106.0000 - tn: 142466.0000 - fn: 22.0000 - accuracy: 0.9785 - precision: 0.0678 - recall: 0.9113 - auc: 0.9863 - val_loss: 0.5508 - val_tp: 55.0000 - val_fp: 601.0000 - val_tn: 35790.0000 - val_fn: 10.0000 - val_accuracy: 0.9832 - val_precision: 0.0838 - val_recall: 0.8462 - val_auc: 0.9499
Epoch 23/100
145820/145820 [==============================] - 11s 77us/sample - loss: 0.1441 - tp: 233.0000 - fp: 3992.0000 - tn: 141580.0000 - fn: 15.0000 - accuracy: 0.9725 - precision: 0.0551 - recall: 0.9395 - auc: 0.9868 - val_loss: 0.5735 - val_tp: 54.0000 - val_fp: 465.0000 - val_tn: 35926.0000 - val_fn: 11.0000 - val_accuracy: 0.9869 - val_precision: 0.1040 - val_recall: 0.8308 - val_auc: 0.9496
Epoch 24/100
145820/145820 [==============================] - 11s 74us/sample - loss: 0.1479 - tp: 227.0000 - fp: 3860.0000 - tn: 141712.0000 - fn: 21.0000 - accuracy: 0.9734 - precision: 0.0555 - recall: 0.9153 - auc: 0.9842 - val_loss: 0.5873 - val_tp: 53.0000 - val_fp: 150.0000 - val_tn: 36241.0000 - val_fn: 12.0000 - val_accuracy: 0.9956 - val_precision: 0.2611 - val_recall: 0.8154 - val_auc: 0.9500
Epoch 25/100
145820/145820 [==============================] - 11s 74us/sample - loss: 0.1563 - tp: 227.0000 - fp: 3683.0000 - tn: 141889.0000 - fn: 21.0000 - accuracy: 0.9746 - precision: 0.0581 - recall: 0.9153 - auc: 0.9829 - val_loss: 0.6188 - val_tp: 54.0000 - val_fp: 451.0000 - val_tn: 35940.0000 - val_fn: 11.0000 - val_accuracy: 0.9873 - val_precision: 0.1069 - val_recall: 0.8308 - val_auc: 0.9485
Epoch 26/100
145820/145820 [==============================] - 11s 76us/sample - loss: 0.1505 - tp: 229.0000 - fp: 4312.0000 - tn: 141260.0000 - fn: 19.0000 - accuracy: 0.9703 - precision: 0.0504 - recall: 0.9234 - auc: 0.9865 - val_loss: 0.6347 - val_tp: 53.0000 - val_fp: 362.0000 - val_tn: 36029.0000 - val_fn: 12.0000 - val_accuracy: 0.9897 - val_precision: 0.1277 - val_recall: 0.8154 - val_auc: 0.9490
Epoch 27/100
145820/145820 [==============================] - 11s 75us/sample - loss: 0.1554 - tp: 229.0000 - fp: 3930.0000 - tn: 141642.0000 - fn: 19.0000 - accuracy: 0.9729 - precision: 0.0551 - recall: 0.9234 - auc: 0.9833 - val_loss: 0.6000 - val_tp: 53.0000 - val_fp: 354.0000 - val_tn: 36037.0000 - val_fn: 12.0000 - val_accuracy: 0.9900 - val_precision: 0.1302 - val_recall: 0.8154 - val_auc: 0.9473
Epoch 28/100
145820/145820 [==============================] - 11s 74us/sample - loss: 0.1415 - tp: 223.0000 - fp: 3911.0000 - tn: 141661.0000 - fn: 25.0000 - accuracy: 0.9730 - precision: 0.0539 - recall: 0.8992 - auc: 0.9862 - val_loss: 0.6234 - val_tp: 54.0000 - val_fp: 449.0000 - val_tn: 35942.0000 - val_fn: 11.0000 - val_accuracy: 0.9874 - val_precision: 0.1074 - val_recall: 0.8308 - val_auc: 0.9455
Epoch 29/100
145820/145820 [==============================] - 11s 73us/sample - loss: 0.1366 - tp: 230.0000 - fp: 3395.0000 - tn: 142177.0000 - fn: 18.0000 - accuracy: 0.9766 - precision: 0.0634 - recall: 0.9274 - auc: 0.9870 - val_loss: 0.7174 - val_tp: 54.0000 - val_fp: 461.0000 - val_tn: 35930.0000 - val_fn: 11.0000 - val_accuracy: 0.9871 - val_precision: 0.1049 - val_recall: 0.8308 - val_auc: 0.9384
Epoch 30/100
145312/145820 [============================>.] - ETA: 0s - loss: 0.1680 - tp: 227.0000 - fp: 4710.0000 - tn: 140354.0000 - fn: 21.0000 - accuracy: 0.9674 - precision: 0.0460 - recall: 0.9153 - auc: 0.9825Restoring model weights from the end of the best epoch.
145820/145820 [==============================] - 11s 74us/sample - loss: 0.1676 - tp: 227.0000 - fp: 4725.0000 - tn: 140847.0000 - fn: 21.0000 - accuracy: 0.9675 - precision: 0.0458 - recall: 0.9153 - auc: 0.9825 - val_loss: 0.7547 - val_tp: 53.0000 - val_fp: 314.0000 - val_tn: 36077.0000 - val_fn: 12.0000 - val_accuracy: 0.9911 - val_precision: 0.1444 - val_recall: 0.8154 - val_auc: 0.9379
Epoch 00030: early stopping
In [34]:
test_predictions_weighted = weighted_model.predict(X_test)
cm = confusion_matrix(y_test, test_predictions_weighted > 0.5)
plt.figure(figsize=(5,5))
sns.heatmap(cm, annot=True, fmt="d")
plt.title('Confusion matrix on test data')
plt.ylabel('Actual label')
plt.xlabel('Predicted label')
Out[34]:
Text(0.5, 24.0, 'Predicted label')
In [35]:
all_predictions_weighted = weighted_model.predict(df.drop('Class', axis=1))
cm = confusion_matrix(df['Class'], all_predictions_weighted > 0.5)
plt.figure(figsize=(5,5))
sns.heatmap(cm, annot=True, fmt="d")
plt.title('Confusion matrix on all data')
plt.ylabel('Actual label')
plt.xlabel('Predicted label')
Out[35]:
Text(0.5, 24.0, 'Predicted label')

We have reduced the amount of False Negatives with 38, but the amount of False Postives has increased. In an attempt to improve this further, we run our model on the evenly distributed data.

In [36]:
sample_train_df, sample_val_df = train_test_split(new_train_df, test_size=0.2)
sample_train_features = sample_train_df.drop('Class', axis=1)
sample_train_labels = sample_train_df['Class']
sample_val_features = sample_val_df.drop('Class', axis=1)
sample_val_labels = sample_val_df['Class']

sample_weighted_history = weighted_model.fit(
    sample_train_features,
    sample_train_labels,
    epochs=100,
    callbacks=[early_stopping],
    validation_data=(sample_val_features, sample_val_labels),
    class_weight=class_weights
)
W0707 23:42:51.811549 13900 data_adapter.py:1091] sample_weight modes were coerced from
  ...
    to
  ['...']
W0707 23:42:51.877387 13900 data_adapter.py:1091] sample_weight modes were coerced from
  ...
    to
  ['...']
Train on 629 samples, validate on 158 samples
Epoch 1/100
629/629 [==============================] - 0s 109us/sample - loss: 23.8736 - tp: 283.0000 - fp: 23.0000 - tn: 302.0000 - fn: 21.0000 - accuracy: 0.9300 - precision: 0.9248 - recall: 0.9309 - auc: 0.9757 - val_loss: 26.9609 - val_tp: 75.0000 - val_fp: 4.0000 - val_tn: 72.0000 - val_fn: 7.0000 - val_accuracy: 0.9304 - val_precision: 0.9494 - val_recall: 0.9146 - val_auc: 0.9671
Epoch 2/100
629/629 [==============================] - 0s 116us/sample - loss: 14.6170 - tp: 293.0000 - fp: 44.0000 - tn: 281.0000 - fn: 11.0000 - accuracy: 0.9126 - precision: 0.8694 - recall: 0.9638 - auc: 0.9788 - val_loss: 21.2953 - val_tp: 76.0000 - val_fp: 10.0000 - val_tn: 66.0000 - val_fn: 6.0000 - val_accuracy: 0.8987 - val_precision: 0.8837 - val_recall: 0.9268 - val_auc: 0.9675
Epoch 3/100
629/629 [==============================] - 0s 108us/sample - loss: 12.8326 - tp: 296.0000 - fp: 50.0000 - tn: 275.0000 - fn: 8.0000 - accuracy: 0.9078 - precision: 0.8555 - recall: 0.9737 - auc: 0.9744 - val_loss: 18.2975 - val_tp: 76.0000 - val_fp: 15.0000 - val_tn: 61.0000 - val_fn: 6.0000 - val_accuracy: 0.8671 - val_precision: 0.8352 - val_recall: 0.9268 - val_auc: 0.9650
Epoch 4/100
629/629 [==============================] - 0s 103us/sample - loss: 8.8605 - tp: 296.0000 - fp: 60.0000 - tn: 265.0000 - fn: 8.0000 - accuracy: 0.8919 - precision: 0.8315 - recall: 0.9737 - auc: 0.9774 - val_loss: 16.4870 - val_tp: 78.0000 - val_fp: 17.0000 - val_tn: 59.0000 - val_fn: 4.0000 - val_accuracy: 0.8671 - val_precision: 0.8211 - val_recall: 0.9512 - val_auc: 0.9649
Epoch 5/100
629/629 [==============================] - 0s 113us/sample - loss: 9.0883 - tp: 297.0000 - fp: 72.0000 - tn: 253.0000 - fn: 7.0000 - accuracy: 0.8744 - precision: 0.8049 - recall: 0.9770 - auc: 0.9711 - val_loss: 14.4957 - val_tp: 79.0000 - val_fp: 21.0000 - val_tn: 55.0000 - val_fn: 3.0000 - val_accuracy: 0.8481 - val_precision: 0.7900 - val_recall: 0.9634 - val_auc: 0.9641
Epoch 6/100
629/629 [==============================] - 0s 109us/sample - loss: 8.8053 - tp: 297.0000 - fp: 79.0000 - tn: 246.0000 - fn: 7.0000 - accuracy: 0.8633 - precision: 0.7899 - recall: 0.9770 - auc: 0.9729 - val_loss: 12.9282 - val_tp: 79.0000 - val_fp: 24.0000 - val_tn: 52.0000 - val_fn: 3.0000 - val_accuracy: 0.8291 - val_precision: 0.7670 - val_recall: 0.9634 - val_auc: 0.9642
Epoch 7/100
629/629 [==============================] - 0s 113us/sample - loss: 9.3549 - tp: 294.0000 - fp: 93.0000 - tn: 232.0000 - fn: 10.0000 - accuracy: 0.8362 - precision: 0.7597 - recall: 0.9671 - auc: 0.9631 - val_loss: 11.6858 - val_tp: 79.0000 - val_fp: 24.0000 - val_tn: 52.0000 - val_fn: 3.0000 - val_accuracy: 0.8291 - val_precision: 0.7670 - val_recall: 0.9634 - val_auc: 0.9644
Epoch 8/100
629/629 [==============================] - 0s 129us/sample - loss: 8.2175 - tp: 294.0000 - fp: 89.0000 - tn: 236.0000 - fn: 10.0000 - accuracy: 0.8426 - precision: 0.7676 - recall: 0.9671 - auc: 0.9597 - val_loss: 10.9104 - val_tp: 79.0000 - val_fp: 27.0000 - val_tn: 49.0000 - val_fn: 3.0000 - val_accuracy: 0.8101 - val_precision: 0.7453 - val_recall: 0.9634 - val_auc: 0.9644
Epoch 9/100
629/629 [==============================] - 0s 120us/sample - loss: 4.2633 - tp: 299.0000 - fp: 102.0000 - tn: 223.0000 - fn: 5.0000 - accuracy: 0.8299 - precision: 0.7456 - recall: 0.9836 - auc: 0.9730 - val_loss: 9.9166 - val_tp: 79.0000 - val_fp: 27.0000 - val_tn: 49.0000 - val_fn: 3.0000 - val_accuracy: 0.8101 - val_precision: 0.7453 - val_recall: 0.9634 - val_auc: 0.9638
Epoch 10/100
629/629 [==============================] - 0s 119us/sample - loss: 4.8526 - tp: 297.0000 - fp: 115.0000 - tn: 210.0000 - fn: 7.0000 - accuracy: 0.8060 - precision: 0.7209 - recall: 0.9770 - auc: 0.9692 - val_loss: 9.0297 - val_tp: 81.0000 - val_fp: 28.0000 - val_tn: 48.0000 - val_fn: 1.0000 - val_accuracy: 0.8165 - val_precision: 0.7431 - val_recall: 0.9878 - val_auc: 0.9641
Epoch 11/100
629/629 [==============================] - 0s 106us/sample - loss: 3.5489 - tp: 301.0000 - fp: 123.0000 - tn: 202.0000 - fn: 3.0000 - accuracy: 0.7997 - precision: 0.7099 - recall: 0.9901 - auc: 0.9738 - val_loss: 8.6270 - val_tp: 81.0000 - val_fp: 29.0000 - val_tn: 47.0000 - val_fn: 1.0000 - val_accuracy: 0.8101 - val_precision: 0.7364 - val_recall: 0.9878 - val_auc: 0.9639
Epoch 12/100
 32/629 [>.............................] - ETA: 0s - loss: 2.1954 - tp: 14.0000 - fp: 9.0000 - tn: 9.0000 - fn: 0.0000e+00 - accuracy: 0.7188 - precision: 0.6087 - recall: 1.0000 - auc: 0.9702Restoring model weights from the end of the best epoch.
629/629 [==============================] - 0s 106us/sample - loss: 4.8951 - tp: 297.0000 - fp: 119.0000 - tn: 206.0000 - fn: 7.0000 - accuracy: 0.7997 - precision: 0.7139 - recall: 0.9770 - auc: 0.9662 - val_loss: 8.2261 - val_tp: 81.0000 - val_fp: 30.0000 - val_tn: 46.0000 - val_fn: 1.0000 - val_accuracy: 0.8038 - val_precision: 0.7297 - val_recall: 0.9878 - val_auc: 0.9637
Epoch 00012: early stopping
In [37]:
test_predictions_weighted = weighted_model.predict(X_test_new)
cm = confusion_matrix(y_test_new, test_predictions_weighted > 0.5)
plt.figure(figsize=(5,5))
sns.heatmap(cm, annot=True, fmt="d")
plt.title('Confusion matrix on test data')
plt.ylabel('Actual label')
plt.xlabel('Predicted label')
Out[37]:
Text(0.5, 24.0, 'Predicted label')
In [38]:
all_predictions_weighted = weighted_model.predict(df.drop('Class', axis=1))
cm = confusion_matrix(df['Class'], all_predictions_weighted > 0.5)
plt.figure(figsize=(5,5))
sns.heatmap(cm, annot=True, fmt="d")
plt.title('Confusion matrix on all data')
plt.ylabel('Actual label')
plt.xlabel('Predicted label')
Out[38]:
Text(0.5, 24.0, 'Predicted label')

We have again decreased the number of False Negatives to 20, but as a result increased the number of False Positives to 21,557. 4.07% of frauds go undetected, while 7.59% of legitimate transactions are wrongly identified as fraud.

Conclusion

There is clearly a decision to be made by banks. Either detect more fraud and risk annoying customers with a high number of legitimate transactions classed as fraudulent, or detect slightly less fraudulent transactions but keep False Positives low.

Some considerations for improvement:

  • Remove the time column as it is unclear how it helps the model predict fraud
  • Setting an initial bias to help with initial learning
  • Alter model layers and parameters