I’ve seen a lot of hype around Prediction APIs, recently. This is obviously a byproduct of the current data science fad.
As a public service, I’m going to show you how you can build your own prediction API … and I’ll do it by creating a very basic version in 10 minutes.
We will build an API that will determine if we should provide credit to someone based on certain demographic information.
We will use Kaggle’s “Give Me Some Credit” dataset as the basis for this example.
Go to the “Give Me Some Credit” page, and download the files.
You will have 4 files:
We will only need the cs-training.csv and DataDictionary.xls files for this project.
Create application folder
Use the following commands to create a directory and move into it.
In the directory create a file called create_credit_classifier.py. We will use incrementally build the program in that text file.
Load Data Set
We need to parse the cs-training.csv file so that we can make sense of the data. The following shows the first 5 lines of data from cs-training.csv
We can observe the following fields in the file
DataDictionary.xls contains a description of each column, except for the first column. However, the first column is obviously an identification id column.
For this example, we want to predict if someone is likely to be a credit risk based on past data.
The column “SeriousDlqin2yrs” is our “outcome” feature and the rest of the columns are our “target” features.
We want to create a classifier that given some target features can predict the outcome feature. In order to do that we need to do what is known as “feature extraction”. The following code will do that will pandas.
Generate Training and Testing Set
We now have to separate our data into two disjoint sets: a training set, and a testing set.
We have to do this because we will use “cross-validation” to measure the accuracy of our predictive model.
We will train our classifier on the training set and test it’s accuracy on the testing set.
Intuitively, if our classifier should classify credit risks in the testing set the same as in the real world. This makes the testing set a proxy to how it would behave in production.
Define Classifier Type
Scipy comes with a bunch of baked-in classifiers. We will use the default Naive Bayes classifier for this example.
To train the model we simply have to feed the classifier the target and output variables
Now that we have our classifier we cross verify the results against our test set.
The output for this script is the following
The output shows that we have a 92% accuracy with the following error types
- 55737 true positives
- 110 true negatives
- 212 false positives
- 4076 false negatives
With our classifier done we can save it so that we can use it a separate program
Create Web API
With our model created, we can now create our web service that can decide if we should give credit to someone based on certain demographic information.
Create the file credit_api.py.
Install flask-restplus from the command line
flask-restplus makes creating flask and swagger applications much simpler.
The following code will setup the scaffolding for setting up a flask application
The following code will setup the request parameters for our web service
This code will setup take the request parameters, feed them into the model, and determine the eligibility for extending credit.
You can start the flask app from the command line
And you can use the web interface by visiting localhost:5000
You can also use curl to get a response from the flask app
You can get the complete code on my github repo.
So there you have it: a prediction API built in about 10 mins.
I would never actually put this into production. A real production prediction API would need to handle edge cases and we would need to do model section.
However, the basic nuts and bolts of a prediction API are pretty straightforward. There really isn’t any magic to building a prediction engine.