This dataset is available at Kaggle. It contains transactions made by credit cards in September 2013 by European cardholders.
This dataset is heavily unbalanced, with only a small minority of the transactions being fraudulent. It also only contains numeric values as a result of a PCA transformation. First, load in the required libraries and take a look at the data.
import pandas as pd
import numpy as np
import tensorflow as tf
from tensorflow import keras
import matplotlib.pyplot as plt
import seaborn as sns
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import RobustScaler
from sklearn.metrics import confusion_matrix
import sklearn
import warnings
warnings.filterwarnings("ignore")
%matplotlib inline
df = pd.read_csv("creditcard.csv")
df.head()
df.describe()
print(round(df['Class'].value_counts()[1]/len(df) * 100, 3), "% are fraudulent")
As we can see only a small minority of the dataset contains fraudulent transactions. Therefore, a model could get a high accuracy if it just guessed not fraudulent for all transactions and would not be effective at preventing fraudulent transactions.
sns.countplot('Class', data=df)
plt.title("Distribution of Class")
sns.distplot(df['Amount'].values, color="red")
plt.title("Distribution of Transaction Amount")
plt.xlim([min(df['Amount'].values), max(df['Amount'].values)])
The Time and Amount columns should also be scaled to match all the other columns.
robust = RobustScaler()
df['scaled_amount'] = robust.fit_transform(df['Amount'].values.reshape(-1, 1))
df['scaled_time'] = robust.fit_transform(df['Time'].values.reshape(-1, 1))
df.drop(['Time', 'Amount'], axis=1, inplace=True)
scaled_amount = df['scaled_amount']
scaled_time = df['scaled_time']
# Insert the scaled columns at the front of the dataframe
df.drop(['scaled_amount', 'scaled_time'], axis=1, inplace=True)
df.insert(0, 'scaled_amount', scaled_amount)
df.insert(1, 'scaled_time', scaled_time)
df.head()
Now we have our new columns scaled to match the rest of the data.
First, lets see what happens when we run a model on the imbalanced data and see how we can improve.
train_df, test_df = train_test_split(df, test_size=0.2)
y_train = train_df['Class']
X_train = train_df.drop('Class', axis=1)
y_test = test_df['Class']
X_test = test_df.drop('Class', axis=1)
print("Train data has ", round(y_train.value_counts()[1]/len(y_train) * 100, 3), "% of class fraud")
print("Test data has ", round(y_test.value_counts()[1]/len(y_test) * 100, 3), "% of class fraud")
model = keras.Sequential([
keras.layers.Dense(30, input_shape=(30,), activation='relu'),
keras.layers.Dense(32, activation='relu'),
keras.layers.Dense(2, activation='softmax')
])
model.compile(optimizer='adam', loss='sparse_categorical_crossentropy', metrics=['accuracy'])
model.fit(X_train, y_train, epochs=20, verbose=2)
test_loss, test_acc = model.evaluate(X_test, y_test, verbose=2)
print('\nTest accuracy:', test_acc)
predictions = model.predict(X_test)
predicted_class = []
for prediction in predictions:
predicted_class.append(np.argmax(prediction))
cm = confusion_matrix(y_test, predicted_class)
plt.figure(figsize=(5,5))
sns.heatmap(cm, annot=True, fmt="d")
plt.title('Confusion matrix')
plt.ylabel('Actual label')
plt.xlabel('Predicted label')
It appears our model is very accurate, getting over 99% of cases correct. However, to really tell if this is the model we want, we need to look deeper. Due to the imbalanced data, if the model just guessed not fraud for each transaction it would also get over 99% accuracy.
Lets take a look at what it predicts for the whole data set.
test_predictions = model.predict(df.drop('Class', axis=1))
predicted_class = []
for prediction in test_predictions:
predicted_class.append(np.argmax(prediction))
cm = confusion_matrix(df['Class'], predicted_class)
plt.figure(figsize=(5,5))
sns.heatmap(cm, annot=True, fmt="d")
plt.title('Confusion matrix on all data')
plt.ylabel('Actual label')
plt.xlabel('Predicted label')
We can see that this model misses 89 cases of fraud, and also wrongly classifies 28 legitimate transactions as fraud. For a bank, this means 28 customers will either be contacted to validate their details or have their card blocked as a result of this wrong classification and 89 fraudulent transactions will go undetected
There is definitely some room to improve, scaling the data to have half fraud and half non fraud might improve our model.
First, lets create a dataframe with 492 cases of each class, as we only have 492 cases of fraud in total.
fraud_df = df.loc[df['Class'] == 1]
nfraud_df = df.loc[df['Class'] == 0]
small_df = pd.concat([fraud_df, nfraud_df[:492]])
new_df = small_df.sample(frac=1, random_state=24)
print(new_df['Class'].value_counts()/len(new_df))
We now have a dataset that is evenly distributed. Now, lets train_test_split this and give it to our model and see if results improve.
new_train_df, new_test_df = train_test_split(new_df, test_size=0.2)
y_train_new = train_df['Class']
X_train_new = train_df.drop('Class', axis=1)
y_test_new = test_df['Class']
X_test_new = test_df.drop('Class', axis=1)
new_model = keras.Sequential([
keras.layers.Dense(30, input_shape=(30,), activation='relu'),
keras.layers.Dense(32, activation='relu'),
keras.layers.Dense(2, activation='softmax')
])
new_model.compile(optimizer='adam', loss='sparse_categorical_crossentropy', metrics=['accuracy'])
new_model.fit(X_train_new, y_train_new, epochs=20, verbose=2)
test_loss, test_acc = model.evaluate(X_test_new, y_test_new, verbose=2)
print('\nTest accuracy:', test_acc)
test_predictions = new_model.predict(df.drop('Class', axis=1))
predicted_class = []
for prediction in test_predictions:
predicted_class.append(np.argmax(prediction))
cm = confusion_matrix(df['Class'], predicted_class)
plt.figure(figsize=(5,5))
sns.heatmap(cm, annot=True, fmt="d")
plt.title('Confusion matrix on all data')
plt.ylabel('Actual label')
plt.xlabel('Predicted label')
Legitimate Transactions Detected (True Negatives): 284293
Legitimate Transactions Incorrectly Detected (False Positives): 13
Fraudulent Transactions Missed (False Negatives): 106
Fraudulent Transactions Detected (True Positives): 386
This model predicts fails to identify 106 transactions as fraud. This is as a result of getting rid of most of the data, only taking 492 out of 284,315 non-fraud transactions. This leads to our model falsely believing that there is an equal amount of fraud and legitimate transactions. We can try to improve our model by adding class weights and changing the design of our Keras model.
First, lets create the weights for each class.
# Pos = class 1, Neg = class 2
pos = df['Class'].value_counts()[1]
neg = df['Class'].value_counts()[0]
total = pos + neg
class_0_weight = (1 / neg)*(total)/2.0
class_1_weight = (1 / pos)*(total)/2.0
class_weights = {0: class_0_weight, 1: class_1_weight}
print("Weight for class 0: {:.2f}".format(class_0_weight))
print("Weight for class 1: {:.2f}".format(class_1_weight))
Once we have the weights, we will train the model with the weights in an attempt to improve it. We will also adjust the model slightly. We will be adding a dropout layer will prevent overfitting the model, including validation data and creating a callback to stop the model early if needed.
train_df, val_df = train_test_split(train_df, test_size=0.2)
train_features = train_df.drop('Class', axis=1)
train_labels = train_df['Class']
val_features = val_df.drop('Class', axis=1)
val_labels = val_df['Class']
early_stopping = tf.keras.callbacks.EarlyStopping(
monitor='val_auc',
verbose=1,
patience=10,
mode='max',
restore_best_weights=True)
metrics = [
keras.metrics.TruePositives(name='tp'),
keras.metrics.FalsePositives(name='fp'),
keras.metrics.TrueNegatives(name='tn'),
keras.metrics.FalseNegatives(name='fn'),
keras.metrics.BinaryAccuracy(name='accuracy'),
keras.metrics.Precision(name='precision'),
keras.metrics.Recall(name='recall'),
keras.metrics.AUC(name='auc'),
]
weighted_model = keras.Sequential([
keras.layers.Dense(16, input_shape=(X_train.shape[-1],), activation='relu'),
keras.layers.Dropout(0.5),
keras.layers.Dense(1, activation='sigmoid')
])
weighted_model.compile(
optimizer=keras.optimizers.Adam(lr=1e-3),
loss=keras.losses.BinaryCrossentropy(),
metrics=metrics
)
weighted_history = weighted_model.fit(
train_features,
train_labels,
epochs=100,
callbacks=[early_stopping],
validation_data=(val_features, val_labels),
class_weight=class_weights
)
test_predictions_weighted = weighted_model.predict(X_test)
cm = confusion_matrix(y_test, test_predictions_weighted > 0.5)
plt.figure(figsize=(5,5))
sns.heatmap(cm, annot=True, fmt="d")
plt.title('Confusion matrix on test data')
plt.ylabel('Actual label')
plt.xlabel('Predicted label')
all_predictions_weighted = weighted_model.predict(df.drop('Class', axis=1))
cm = confusion_matrix(df['Class'], all_predictions_weighted > 0.5)
plt.figure(figsize=(5,5))
sns.heatmap(cm, annot=True, fmt="d")
plt.title('Confusion matrix on all data')
plt.ylabel('Actual label')
plt.xlabel('Predicted label')
We have reduced the amount of False Negatives with 38, but the amount of False Postives has increased. In an attempt to improve this further, we run our model on the evenly distributed data.
sample_train_df, sample_val_df = train_test_split(new_train_df, test_size=0.2)
sample_train_features = sample_train_df.drop('Class', axis=1)
sample_train_labels = sample_train_df['Class']
sample_val_features = sample_val_df.drop('Class', axis=1)
sample_val_labels = sample_val_df['Class']
sample_weighted_history = weighted_model.fit(
sample_train_features,
sample_train_labels,
epochs=100,
callbacks=[early_stopping],
validation_data=(sample_val_features, sample_val_labels),
class_weight=class_weights
)
test_predictions_weighted = weighted_model.predict(X_test_new)
cm = confusion_matrix(y_test_new, test_predictions_weighted > 0.5)
plt.figure(figsize=(5,5))
sns.heatmap(cm, annot=True, fmt="d")
plt.title('Confusion matrix on test data')
plt.ylabel('Actual label')
plt.xlabel('Predicted label')
all_predictions_weighted = weighted_model.predict(df.drop('Class', axis=1))
cm = confusion_matrix(df['Class'], all_predictions_weighted > 0.5)
plt.figure(figsize=(5,5))
sns.heatmap(cm, annot=True, fmt="d")
plt.title('Confusion matrix on all data')
plt.ylabel('Actual label')
plt.xlabel('Predicted label')
We have again decreased the number of False Negatives to 20, but as a result increased the number of False Positives to 21,557. 4.07% of frauds go undetected, while 7.59% of legitimate transactions are wrongly identified as fraud.
There is clearly a decision to be made by banks. Either detect more fraud and risk annoying customers with a high number of legitimate transactions classed as fraudulent, or detect slightly less fraudulent transactions but keep False Positives low.
Some considerations for improvement: