An end-to-end machine learning application that predicts loan default probability using XGBoost with SHAP explainability, Flask API backend, and a modern React frontend featuring Firebase authentication and real-time prediction history.
This system analyzes loan applicant data to predict the likelihood of default, providing both a risk score and transparent explanations of which factors influence the prediction. Built for financial institutions or lenders who need interpretable AI-driven credit risk assessments.
- Machine Learning Pipeline: XGBoost classifier trained on Lending Club historical data (2007-2018)
- Explainable AI: SHAP values showing which features increase/decrease default risk
- REST API: Flask backend with CORS support for predictions
- User Authentication: Firebase email/password authentication with password reset
- Real-time History: Cloud Firestore for storing and displaying user-specific prediction history
- Modern UI: React with Tailwind CSS, glassmorphic design, and smooth animations
- Python 3.8+
- Flask - REST API framework
- XGBoost - Gradient boosting classifier
- SHAP - Model interpretability
- scikit-learn - ML utilities
- imbalanced-learn (SMOTE) - Handling class imbalance
- pandas/numpy - Data processing
- React 19 with Vite
- Tailwind CSS 4 - Styling
- Firebase - Authentication & Firestore database
- Lucide React - Icons
credit-risk-ml/
├── credit_risk_model.py # ML model training & prediction logic
├── api.py # Flask REST API
├── requirements.txt # Python dependencies
├── src/
│ ├── App.jsx # Main React component with auth
│ ├── main.jsx # React entry point
│ ├── index.css # Tailwind configuration
│ └── dashboard.jsx # Alternative dashboard (legacy)
└── package.json # Node.js dependencies
- Python 3.8 or higher
- Node.js 16 or higher
- Firebase account (for authentication and database)
- Clone the repository:
git clone https://github.com/yourusername/credit-risk-ml.git
cd credit-risk-ml- Create a virtual environment and install dependencies:
python -m venv venv
source venv/bin/activate # On Windows: venv\Scripts\activate
pip install -r requirements.txt-
Download the Lending Club dataset:
- Place
accepted_2007_to_2018Q4.csv.gzinai_lending_club_loan_data/directory - Dataset available from Kaggle Lending Club Dataset
- Place
-
Train the model (this may take several minutes):
python credit_risk_model.pyThis will generate:
credit_risk_model.pkl- Trained XGBoost modelshap_explainer.pkl- SHAP explainermodel_columns.pkl- Feature columns
- Start the Flask API:
python api.pyThe API will run on http://127.0.0.1:5001
- Install Node.js dependencies:
npm install-
Configure Firebase:
- Create a Firebase project at console.firebase.google.com
- Enable Email/Password authentication
- Create a Firestore database in test mode
- Copy your Firebase configuration from Project Settings
- Replace the
firebaseConfigobject insrc/App.jsxwith your credentials
-
Start the development server:
npm run devThe frontend will run on http://localhost:5173
- Sign Up / Sign In: Create an account or log in with existing credentials
- Enter Loan Data: Fill in the application form with:
- Loan amount and interest rate
- Annual income and employment length
- Loan term, grade, home ownership status
- Loan purpose
- Get Prediction: Click "Calculate Risk Score" to receive:
- Default probability percentage
- Risk level (Low/Medium/High)
- Top 10 features influencing the prediction
- View History: See all your past predictions in the history panel
Predicts loan default probability.
Request Body:
{
"loan_amnt": 10000,
"int_rate": 11.5,
"annual_inc": 75000,
"emp_length": 5,
"term": " 36 months",
"grade": "B",
"home_ownership": "MORTGAGE",
"purpose": "debt_consolidation",
"dti": 20.0,
"fico_range_low": 690,
"fico_range_high": 694,
"open_acc": 10,
"pub_rec": 0,
"revol_bal": 15000,
"revol_util": 50.0,
"total_acc": 25,
"issue_d": "Dec-2015",
"earliest_cr_line": "Jan-2005"
}Response:
{
"default_probability": 0.234,
"explanation": {
"int_rate": 0.045,
"annual_inc": -0.023,
"loan_amnt": 0.012,
...
}
}- Data Loading: Samples 10% of Lending Club data for efficient training
- Preprocessing:
- Extracts numeric values from term and employment length
- Calculates credit history length from date fields
- One-hot encodes categorical variables
- Handles missing values (drops columns >40% missing, fills rest with median)
- Class Balancing: Applies SMOTE to training data
- Model Training: XGBoost binary classifier with logloss evaluation
- Explainability: Trains SHAP TreeExplainer for feature importance
The model is evaluated on a held-out test set with classification reports showing precision, recall, and F1-scores for both "Good Loan" and "Default" classes.
The model accepts the following features:
- Loan Details: amount, interest rate, term, grade, purpose
- Applicant Profile: annual income, employment length, home ownership
- Credit Metrics: DTI, FICO scores, open accounts, public records, revolving balance/utilization, total accounts
- Credit History: issue date, earliest credit line (for history length calculation)
This project is licensed under the MIT License - see the LICENSE file for details.
- Lending Club for providing the historical loan data
- SHAP library for model interpretability
- Firebase for authentication and database infrastructure
Aiden Kim - ajk041@bucknell.edu
Project Link: [https://github.com/A319K/credit-risk-model]