In this blog, we will build an IPL Score Prediction Model using Lasso, Ridge Regression. We will select the best model based on the model's performance metrics and Hyperparameter Tuning. We will also be
building an interactive Flask model.
First, we will import the necessary Libraries.
Now read the dataset(csv file) and display the top 5 records.
Dropping unnecessary columns:
- Our specific use case features like mid, batsman, bowler, striker, and non-striker would not play a significant role so it’s better to drop them.
- I know that batsmen can play a role in changing scores, but the problem is that there are tonnes of batsmen that have played in IPL so we can’t operate on these many categories, so it’s better to drop them.
PreProcessing:
- Convert the date column to pandas DateTime column.
- Then we have to remove teams that are not playing today in IPL and we just have to keep consistent teams.
- Also, we will take data that is after the 5 overs because the initial stages of the match do not play that important part in deciding the score.
Correcting the names of the venues:
- We are just using this function f(x) to correct the venue names.
- After that, we are removing entries in which we have foreign grounds like Newlands, St George’s Park, etc.
- In the third last line, we are using XOR operation to remove these grounds. As we know that True xor True is False, that’s what we are doing here. If we come across any entry whose venue is in ignored stadiums that entry will be true and true XOR true will become false and we will not take that in df.
Converting categorical columns to dummy variables.
Just changing positions of columns.
Resetting index
- Here, the Indexes are not proper because we dropped many entries.
- So we are just resetting indexes in above step and just deleting the index column that it will make of previous indexes.
Scaling our numerical data for IPL Score Prediction model.
- Scaling our columns like ‘runs’, ‘wickets’, ‘overs’, ‘runs_last_5’, ‘wickets_last_5’ to bring them all down to same scale.
- We will not scale the whole data because other columns are just 1s and 0s and this represents categorical columns and as we know scaling is just done on numerical values.
Splitting data for training and testing
- We are splitting using date column.
- All the data from 2007-2017 is for training.
- Data from and after 2017 is for testing.
Training our Ridge model for IPL Score Prediction.
We also tried Linear regression but that was overfitting to the next level, that’s why we went with Ridge Regression because it prevents our model from overfitting.
Saving our IPL Score Prediction model for Deployment
Deploy the model
For deployment we will use flask framework. We have made a python file known as ‘app.py’. What this code does is, it will give us access to the ‘index.html’ and ‘predict.html’ files. We have used the POST method to call.











Comments
Post a Comment