Fraud Detection

This is a national competition held by Capital One, focusing on credit card fraud detection in large-scale data set with machine learning, our team rank 1300 by prediction accuracy, creativity in feature engineering, low total cost, and final presentation.

Feature engineering: design three groups of new feature sets, including periodic features, velocity features and impact features, process and create 40 GB new data set with 98 features from the raw data (20 GB) with Spark.

Design a sample-and-ensemble framework, fit 19 statistical models with different samples, models include gradient boosting trees with XGBoost, regularized logistic regression and SVM. The ensemble model achieved 98.72% out-of- sample prediction accuracy, with good performance in both False Positive (0.6%) and False Negative (0.7%).