How To Build a Prediction API in 10 Minutes with Flask, Swagger, and SciPy

I’ve seen a lot of hype around Prediction APIs, recently. This is obviously a byproduct of the current data science fad.

As a public service, I’m going to show you how you can build your own prediction API … and I’ll do it by creating a very basic version in 10 minutes.

We will build an API that will determine if we should provide credit to someone based on certain demographic information.

We will use Kaggle’s “Give Me Some Credit” dataset as the basis for this example.

Go to the “Give Me Some Credit” page, and download the files.

You will have 4 files:

  • cs-training.csv
  • cs-test.csv
  • sampleEntry.csv
  • DataDictionary.xls

We will only need the cs-training.csv and DataDictionary.xls files for this project.

Create application folder

Use the following commands to create a directory and move into it.

In the directory create a file called create_credit_classifier.py. We will use incrementally build the program in that text file.

Load Data Set

We need to parse the cs-training.csv file so that we can make sense of the data. The following shows the first 5 lines of data from cs-training.csv

We can observe the following fields in the file

  • SeriousDlqin2yrs
  • RevolvingUtilizationOfUnsecuredLines
  • age
  • NumberOfTime30-59DaysPastDueNotWorse
  • DebtRatio
  • MonthlyIncome
  • NumberOfOpenCreditLinesAndLoans
  • NumberOfTimes90DaysLate
  • NumberRealEstateLoansOrLines
  • NumberOfTime60-89DaysPastDueNotWorse
  • NumberOfDependents

DataDictionary.xls contains a description of each column, except for the first column. However, the first column is obviously an identification id column.

For this example, we want to predict if someone is likely to be a credit risk based on past data.

The column “SeriousDlqin2yrs” is our “outcome” feature and the rest of the columns are our “target” features.

We want to create a classifier that given some target features can predict the outcome feature. In order to do that we need to do what is known as “feature extraction”. The following code will do that will pandas.

Generate Training and Testing Set

We now have to separate our data into two disjoint sets: a training set, and a testing set.

We have to do this because we will use “cross-validation” to measure the accuracy of our predictive model.

We will train our classifier on the training set and test it’s accuracy on the testing set.

Intuitively, if our classifier should classify credit risks in the testing set the same as in the real world. This makes the testing set a proxy to how it would behave in production.

Define Classifier Type

Scipy comes with a bunch of baked-in classifiers. We will use the default Naive Bayes classifier for this example.

Train Classifier

To train the model we simply have to feed the classifier the target and output variables

Validate Classifier

Now that we have our classifier we cross verify the results against our test set.

The output for this script is the following

The output shows that we have a 92% accuracy with the following error types

  • 55737 true positives
  • 110 true negatives
  • 212 false positives
  • 4076 false negatives

Save Classifier

With our classifier done we can save it so that we can use it a separate program

Create Web API

With our model created, we can now create our web service that can decide if we should give credit to someone based on certain demographic information.

Create the file credit_api.py.

Install flask-restplus from the command line

flask-restplus makes creating flask and swagger applications much simpler.

The following code will setup the scaffolding for setting up a flask application

The following code will setup the request parameters for our web service

This code will setup take the request parameters, feed them into the model, and determine the eligibility for extending credit.

You can start the flask app from the command line

And you can use the web interface by visiting localhost:5000

credit_api_flask

You can also use curl to get a response from the flask app

Conclusion

So there you have it: a prediction API built in about 10 mins.

I would never actually put this into production. A real production prediction API would need to handle edge cases and we would need to do model section.

However, the basic nuts and bolts of a prediction API are pretty straightforward. There really isn’t any magic to building a prediction engine.

Advertisements

3 thoughts on “How To Build a Prediction API in 10 Minutes with Flask, Swagger, and SciPy

  1. Awesome tutorial. I did it on Cloud 9 which added some complications because of the ports, but I got there in the end. Thanks for posting this.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s