Exploratory Data Analysis of IPL Dataset

Problem Statement

The task is to analyze ball by ball data from all the way 2008 to 2019. Using this you come up with analysis to form your own dream team for IPL.
For year 2016, 2017 and 2018

  • I.

    1. Find out most valuable player - explain why
    2. Find out most consistent batsman - explain why
    3. Find out most consistent bowler - explain why
  • II.

    1. Find out, worst player - explain why
    2. Find out worst batsmen - explain why
    3. Find out worst bowler - explain why
  • III. Rank top 25 players for the year 2016 to 2019

  • IV. Identify most improved player from 2018 to 2019

  • V. Find out which stadium had the most runs and which scored the least

  • VI. Find out where (which stadium or which bowler) bowlers perform better (Take most wickets while giving least runs)

  • VII. Now the interesting part - I have which will be a dream team to have for 2020 based on historical data

About Data

Indian Premier League (IPL) is a Twenty20 cricket format league in India. It is usually played in April and May every year. As of 2019, the title sponsor of the game is Vivo. The league was founded by Board of Control for Cricket India (BCCI) in 2008.

The data contains result of each match and each delivery for 11 seasons.

In [0]:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
In [0]:
Deliveries=pd.read_csv(r'innings_data.csv')
Matches=pd.read_csv(r'match_data.csv')
In [0]:
Deliveries.head()
Out[0]:
id season batsman bowler innings non_striker replacements bowled_over batsman_team player_out fielder_caught_out type_out extras_wides extras_legbyes extras_noballs extras_byes extras_penalty total_extras_runs batsman_runs total_runs
0 335988 2008 AC Gilchrist GD McGrath 1st JC Buttler NaN 0.1 Rajasthan Royals 0 0 0 0 1 0 0 0 1 0 1
1 335988 2008 AC Gilchrist GD McGrath 1st AM Rahane NaN 0.2 Rajasthan Royals 0 0 0 0 0 0 0 0 0 0 0
2 335988 2008 AC Gilchrist GD McGrath 1st AM Rahane NaN 0.3 Rajasthan Royals 0 0 0 0 0 0 0 0 0 4 4
3 335988 2008 Y Venugopal Rao GD McGrath 1st AM Rahane NaN 0.4 Rajasthan Royals 0 0 0 0 0 0 0 0 0 0 0
4 335988 2008 Y Venugopal Rao GD McGrath 1st AM Rahane NaN 0.5 Rajasthan Royals 0 0 0 0 0 0 0 0 0 6 6
In [0]:
Matches.head()
Out[0]:
id season city date team1 team2 toss_winner toss_decision winner eliminator ... overs player_of_match venue umpire1 umpire2 umpire3 first_bat_team first_bowl_team first_bat_score second_bat_score
0 419164 2010 Mumbai 2010-04-24 Royal Challengers Bangalore Deccan Chargers Deccan Chargers bat Royal Challengers Bangalore NaN ... 20 A Kumble Dr DY Patil Sports Academy RE Koertzen SJA Taufel NaN Deccan Chargers Royal Challengers Bangalore 82.0 86.0
1 419131 2010 Delhi 2010-03-29 Delhi Daredevils Kolkata Knight Riders Delhi Daredevils bat Delhi Daredevils NaN ... 20 DA Warner Feroz Shah Kotla SS Hazare SJA Taufel NaN Delhi Daredevils Kolkata Knight Riders 177.0 137.0
2 336021 2008 Mumbai 2008-05-16 Mumbai Indians Kolkata Knight Riders Mumbai Indians field Mumbai Indians NaN ... 20 SM Pollock Wankhede Stadium BR Doctrove DJ Harper NaN Kolkata Knight Riders Mumbai Indians 67.0 68.0
3 980931 2016 Pune 2016-04-22 Rising Pune Supergiants Royal Challengers Bangalore Rising Pune Supergiants field Royal Challengers Bangalore NaN ... 20 AB de Villiers Maharashtra Cricket Association Stadium CB Gaffaney VK Sharma NaN Royal Challengers Bangalore Rising Pune Supergiants 185.0 172.0
4 419163 2010 Mumbai 2010-04-22 Chennai Super Kings Deccan Chargers Chennai Super Kings bat Chennai Super Kings NaN ... 20 DE Bollinger Dr DY Patil Sports Academy BR Doctrove RB Tiffin NaN Chennai Super Kings Deccan Chargers 142.0 104.0

5 rows × 24 columns

Data Cleaning

In [0]:
Deliveries.isnull().sum()*100/Deliveries.shape[0]
Out[0]:
id                     0.00000
season                 0.00000
batsman                0.00000
bowler                 0.00000
innings                0.00000
non_striker            0.00000
replacements          99.98301
bowled_over            0.00000
batsman_team           0.00000
player_out             0.00000
fielder_caught_out     0.00000
type_out               0.00000
extras_wides           0.00000
extras_legbyes         0.00000
extras_noballs         0.00000
extras_byes            0.00000
extras_penalty         0.00000
total_extras_runs      0.00000
batsman_runs           0.00000
total_runs             0.00000
dtype: float64
In [0]:
Matches.isnull().sum()*100/Matches.shape[0]
Out[0]:
id                   0.000000
season               0.000000
city                 1.742627
date                 0.000000
team1                0.000000
team2                0.000000
toss_winner          0.000000
toss_decision        0.000000
winner               1.608579
eliminator          98.927614
dl_applied          97.453083
win_by_runs         54.959786
win_by_wickets      46.648794
result              98.391421
overs                0.000000
player_of_match      0.536193
venue                0.000000
umpire1              0.134048
umpire2              0.134048
umpire3             99.731903
first_bat_team       0.000000
first_bowl_team      0.000000
first_bat_score      0.000000
second_bat_score     0.268097
dtype: float64
In [0]:
print('Deliveries Details')
print('Shape: ',Deliveries.shape)
print('Size: ',Deliveries.size)
print('Dimensions: ',Deliveries.ndim)
Deliveries Details
Shape:  (176573, 20)
Size:  3531460
Dimensions:  2
In [0]:
print('Matches Details')
print('Shape: ',Matches.shape)
print('Size: ',Matches.size)
print('Dimensions: ',Matches.ndim)
Matches Details
Shape:  (746, 24)
Size:  17904
Dimensions:  2
In [0]:
Deliveries=Deliveries.sort_index()
In [0]:
Matches=Matches.sort_values('id')
In [0]:
Deliveries.describe()
Out[0]:
id season bowled_over extras_wides extras_legbyes extras_noballs extras_byes extras_penalty total_extras_runs batsman_runs total_runs
count 1.765730e+05 176573.000000 176573.000000 176573.000000 176573.000000 176573.000000 176573.000000 176573.000000 176573.000000 176573.000000 176573.000000
mean 7.131601e+05 2013.368386 9.528802 0.036823 0.021198 0.004180 0.004961 0.000057 0.067219 1.237432 1.304650
std 2.843665e+05 3.323319 5.677220 0.251613 0.194935 0.070553 0.116674 0.016828 0.343014 1.609116 1.597156
min 3.359820e+05 2008.000000 0.100000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000
25% 5.012080e+05 2011.000000 4.500000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000
50% 5.980470e+05 2013.000000 9.400000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 1.000000 1.000000
75% 9.809850e+05 2016.000000 14.400000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 1.000000 1.000000
max 1.178425e+06 2019.000000 19.900000 5.000000 5.000000 5.000000 4.000000 5.000000 7.000000 6.000000 7.000000
In [0]:
Matches.describe()
Out[0]:
id season win_by_runs win_by_wickets overs first_bat_score second_bat_score
count 7.460000e+02 746.000000 336.000000 398.000000 746.0 746.000000 744.000000
mean 7.135520e+05 2013.369973 29.883929 6.251256 20.0 161.356568 147.841398
std 2.845419e+05 3.326906 27.276827 1.821281 0.0 30.537844 31.153406
min 3.359820e+05 2008.000000 1.000000 1.000000 20.0 56.000000 2.000000
25% 5.012092e+05 2011.000000 11.000000 5.000000 20.0 142.000000 131.000000
50% 5.980485e+05 2013.000000 22.000000 6.000000 20.0 163.000000 150.000000
75% 9.809845e+05 2016.000000 39.000000 8.000000 20.0 182.000000 169.000000
max 1.178425e+06 2019.000000 146.000000 10.000000 20.0 263.000000 223.000000
In [0]:
Deliveries.dtypes
Out[0]:
id                      int64
season                  int64
batsman                object
bowler                 object
innings                object
non_striker            object
replacements           object
bowled_over           float64
batsman_team           object
player_out             object
fielder_caught_out     object
type_out               object
extras_wides            int64
extras_legbyes          int64
extras_noballs          int64
extras_byes             int64
extras_penalty          int64
total_extras_runs       int64
batsman_runs            int64
total_runs              int64
dtype: object
In [0]:
Matches.dtypes
Out[0]:
id                    int64
season                int64
city                 object
date                 object
team1                object
team2                object
toss_winner          object
toss_decision        object
winner               object
eliminator           object
dl_applied           object
win_by_runs         float64
win_by_wickets      float64
result               object
overs                 int64
player_of_match      object
venue                object
umpire1              object
umpire2              object
umpire3              object
first_bat_team       object
first_bowl_team      object
first_bat_score     float64
second_bat_score    float64
dtype: object
In [0]:
Deliveries.drop(['replacements','id'],inplace=True,axis=1)
Deliveries['type_out'].replace('[a-zA-z \s]+','1',regex=True,inplace=True)
Deliveries['type_out']=Deliveries['type_out'].astype(int)
In [0]:
print(Deliveries.isnull().sum())
Deliveries.head()
season                0
batsman               0
bowler                0
innings               0
non_striker           0
bowled_over           0
batsman_team          0
player_out            0
fielder_caught_out    0
type_out              0
extras_wides          0
extras_legbyes        0
extras_noballs        0
extras_byes           0
extras_penalty        0
total_extras_runs     0
batsman_runs          0
total_runs            0
dtype: int64
Out[0]:
season batsman bowler innings non_striker bowled_over batsman_team player_out fielder_caught_out type_out extras_wides extras_legbyes extras_noballs extras_byes extras_penalty total_extras_runs batsman_runs total_runs
0 2008 AC Gilchrist GD McGrath 1st JC Buttler 0.1 Rajasthan Royals 0 0 0 0 1 0 0 0 1 0 1
1 2008 AC Gilchrist GD McGrath 1st AM Rahane 0.2 Rajasthan Royals 0 0 0 0 0 0 0 0 0 0 0
2 2008 AC Gilchrist GD McGrath 1st AM Rahane 0.3 Rajasthan Royals 0 0 0 0 0 0 0 0 0 4 4
3 2008 Y Venugopal Rao GD McGrath 1st AM Rahane 0.4 Rajasthan Royals 0 0 0 0 0 0 0 0 0 0 0
4 2008 Y Venugopal Rao GD McGrath 1st AM Rahane 0.5 Rajasthan Royals 0 0 0 0 0 0 0 0 0 6 6
In [0]:
Matches.drop(['id','eliminator','result','umpire3','dl_applied'],inplace=True,axis=1)
Matches['city'].fillna('unknown',inplace=True)
Matches['winner'].fillna('No Result',inplace=True)
Matches['player_of_match'].fillna('unknown',inplace=True)
Matches['umpire1'].fillna('unknown',inplace=True)
Matches['umpire2'].fillna('unknown',inplace=True)
Matches['second_bat_score'].fillna(0,inplace=True)
Matches['win_by_runs'].fillna(0.0,inplace=True)
Matches['win_by_wickets'].fillna(0.0,inplace=True)
In [0]:
Matches.isnull().sum()
Out[0]:
season              0
city                0
date                0
team1               0
team2               0
toss_winner         0
toss_decision       0
winner              0
win_by_runs         0
win_by_wickets      0
overs               0
player_of_match     0
venue               0
umpire1             0
umpire2             0
first_bat_team      0
first_bowl_team     0
first_bat_score     0
second_bat_score    0
dtype: int64
In [0]:
Matches['win_by_runs']=Matches['win_by_runs'].astype(int)
Matches['win_by_wickets']=Matches['win_by_wickets'].astype(int)
In [0]:
Matches.head()
Out[0]:
season city date team1 team2 toss_winner toss_decision winner win_by_runs win_by_wickets overs player_of_match venue umpire1 umpire2 first_bat_team first_bowl_team first_bat_score second_bat_score
143 2008 Bangalore 2008-04-18 Royal Challengers Bangalore Kolkata Knight Riders Royal Challengers Bangalore field Kolkata Knight Riders 140 0 20 BB McCullum M Chinnaswamy Stadium Asad Rauf RE Koertzen Kolkata Knight Riders Royal Challengers Bangalore 222.0 82.0
11 2008 Chandigarh 2008-04-19 Kings XI Punjab Chennai Super Kings Chennai Super Kings bat Chennai Super Kings 33 0 20 MEK Hussey Punjab Cricket Association Stadium, Mohali MR Benson SL Shastri Chennai Super Kings Kings XI Punjab 240.0 207.0
21 2008 Delhi 2008-04-19 Delhi Daredevils Rajasthan Royals Rajasthan Royals bat Delhi Daredevils 0 9 20 MF Maharoof Feroz Shah Kotla Aleem Dar GA Pratapkumar Rajasthan Royals Delhi Daredevils 129.0 132.0
134 2008 Mumbai 2008-04-20 Mumbai Indians Royal Challengers Bangalore Mumbai Indians bat Royal Challengers Bangalore 0 5 20 MV Boucher Wankhede Stadium SJ Davis DJ Harper Mumbai Indians Royal Challengers Bangalore 165.0 166.0
646 2008 Kolkata 2008-04-20 Kolkata Knight Riders Deccan Chargers Deccan Chargers bat Kolkata Knight Riders 0 5 20 DJ Hussey Eden Gardens BF Bowden K Hariharan Deccan Chargers Kolkata Knight Riders 110.0 112.0

Data Analysis

In [0]:
df=Deliveries.loc[Deliveries['season']>=2016]
df=df.groupby('batsman').sum()
df.reset_index(level=0, inplace=True)
df=df[['batsman','batsman_runs']]
df.sort_values(by=['batsman_runs'],ascending=False,inplace=True)
df.head(10)
Out[0]:
batsman batsman_runs
49 DA Warner 1871
232 V Kohli 1826
190 S Dhawan 1805
16 AM Rahane 1626
202 SK Raina 1569
7 AB de Villiers 1498
142 MS Dhoni 1389
108 KL Rahul 1342
181 RG Sharma 1270
22 AT Rayudu 1267
In [0]:
df1=Deliveries.loc[Deliveries['season']>=2016]
df1=df1.groupby('bowler').sum()
df1.reset_index(level=0, inplace=True)
df1=df1[['bowler','type_out']]
In [0]:
df1.sort_values(by=['type_out'],ascending=False,inplace=True)
df1.head(10)
Out[0]:
bowler type_out
69 JJ Bumrah 67
168 SP Narine 66
173 Sandeep Sharma 66
192 YS Chahal 63
185 UT Yadav 61
21 B Kumar 56
146 Rashid Khan 53
103 MJ McClenaghan 49
11 AR Patel 48
68 JD Unadkat 45
In [0]:
import palettable
sns.set_style('darkgrid')
pal=palettable.cartocolors.qualitative.Prism_10.mpl_colors
plt.figure(figsize=[15,7])
plt.title('Highest Run Scorers from 2016-2019')
ax=sns.barplot(x='batsman',y='batsman_runs',data=df.head(10),palette=pal)
t=ax.set(xlabel='Batsman',ylabel='Runs Scored')
In [0]:
pal=palettable.cartocolors.qualitative.Prism_10.mpl_colors
plt.figure(figsize=[15,7])
plt.title('Highest Wicket Takers from 2016-2019')
sns.set_style('darkgrid')
ax=sns.barplot(x='bowler',y='type_out',data=df1.head(10),palette=pal)
t=ax.set(xlabel='Bowlers',ylabel='Wickets Taken')

Calculating MVP Values for Players

In [0]:
def mvpbat(x):
    if x==4:
        return 2.5
    elif x==6:
        return 3.5
    else:
        return 0
def mvpbowl(x):
    if x==1:
        return 2.5
    else:
        return 0
def mvpdotball(x):
    if x==0:
        return 1
    else:
        return 0
def mvpfield(x):
    if x=='0':
        return 0
    else:
        return 2.5
Deliveries['mvp_batsman'] = [mvpbat(x) for x in Deliveries['batsman_runs']]
Deliveries['mvp_bowler'] = [mvpbowl(x) for x in Deliveries['type_out']]
Deliveries['mvp_dot'] = [mvpdotball(x) for x in Deliveries['batsman_runs']]
Deliveries['mvp_fielder']= [mvpfield(x) for x in Deliveries['fielder_caught_out']]
Deliveries['mvp_bowler']+=Deliveries['mvp_dot']
Deliveries.drop('mvp_dot',axis=1,inplace=True)
In [0]:
Deliveries.head()
Out[0]:
season batsman bowler innings non_striker bowled_over batsman_team player_out fielder_caught_out type_out ... extras_legbyes extras_noballs extras_byes extras_penalty total_extras_runs batsman_runs total_runs mvp_batsman mvp_bowler mvp_fielder
0 2008 AC Gilchrist GD McGrath 1st JC Buttler 0.1 Rajasthan Royals 0 0 0 ... 1 0 0 0 1 0 1 0.0 1.0 0.0
1 2008 AC Gilchrist GD McGrath 1st AM Rahane 0.2 Rajasthan Royals 0 0 0 ... 0 0 0 0 0 0 0 0.0 1.0 0.0
2 2008 AC Gilchrist GD McGrath 1st AM Rahane 0.3 Rajasthan Royals 0 0 0 ... 0 0 0 0 0 4 4 2.5 0.0 0.0
3 2008 Y Venugopal Rao GD McGrath 1st AM Rahane 0.4 Rajasthan Royals 0 0 0 ... 0 0 0 0 0 0 0 0.0 1.0 0.0
4 2008 Y Venugopal Rao GD McGrath 1st AM Rahane 0.5 Rajasthan Royals 0 0 0 ... 0 0 0 0 0 6 6 3.5 0.0 0.0

5 rows × 21 columns

In [0]:
Deliveries['mvp_bowler'].describe()
Out[0]:
count    176573.000000
mean          0.526170
std           0.820316
min           0.000000
25%           0.000000
50%           0.000000
75%           1.000000
max           3.500000
Name: mvp_bowler, dtype: float64
In [0]:
Deliveries['mvp_fielder'].describe()
Out[0]:
count    176573.000000
mean          0.088547
std           0.462091
min           0.000000
25%           0.000000
50%           0.000000
75%           0.000000
max           2.500000
Name: mvp_fielder, dtype: float64

Creating Dataframes for MVPs in Batting, Bowling and Overall

In [0]:
mvpbat=Deliveries.loc[Deliveries['season']>=2016]
mvpbat=mvpbat.groupby(['batsman','season']).sum()
mvpbat.reset_index(level=(0,1), inplace=True)

mvpbat.sort_values(by=['batsman_runs'],ascending=False,inplace=True)
mvpbat.head(10)
mvpbat.drop(['mvp_bowler','bowled_over','type_out','extras_wides','extras_legbyes','extras_noballs','extras_byes','total_extras_runs'
         ,'total_runs','extras_penalty'],axis=1,inplace=True)
mvpbat.rename(index=str,columns={'batsman':'player'},inplace=True)
mvpbat.head()
Out[0]:
player season batsman_runs mvp_batsman mvp_fielder
114 DA Warner 2016 755 275.5 50.0
524 V Kohli 2016 661 212.0 57.5
253 KS Williamson 2018 657 241.0 47.5
116 DA Warner 2019 617 211.0 55.0
423 S Dhawan 2016 560 190.0 30.0
In [0]:
mvpbowl=Deliveries.loc[Deliveries['season']>=2016]
mvpbowl=mvpbowl.groupby(['bowler','season']).sum()
mvpbowl.reset_index(level=(0,1), inplace=True)

mvpbowl.sort_values(by=['type_out'],ascending=False,inplace=True)
mvpbowl.head(10)
mvpbowl.drop(['mvp_batsman','mvp_fielder','bowled_over','batsman_runs','extras_wides','extras_legbyes','extras_noballs','extras_byes','total_extras_runs'
         ,'total_runs','extras_penalty'],axis=1,inplace=True)
mvpbowl.rename(index=str,columns={'bowler':'player'},inplace=True)
In [0]:
mvp=mvpbat.merge(mvpbowl,how='outer')
mvp.fillna(0,inplace=True)
mvp['total_value']=mvp['mvp_batsman']+mvp['mvp_bowler']+mvp['mvp_fielder']
In [0]:
mvp.head()
Out[0]:
player season batsman_runs mvp_batsman mvp_fielder type_out mvp_bowler total_value
0 DA Warner 2016 755.0 275.5 50.0 0.0 0.0 325.5
1 V Kohli 2016 661.0 212.0 57.5 0.0 5.0 274.5
2 KS Williamson 2018 657.0 241.0 47.5 0.0 0.0 288.5
3 DA Warner 2019 617.0 211.0 55.0 0.0 0.0 266.0
4 S Dhawan 2016 560.0 190.0 30.0 0.0 0.0 220.0
In [0]:
mvp.groupby(['player','season']).sum().sort_values('total_value',ascending=False).head(10)
Out[0]:
batsman_runs mvp_batsman mvp_fielder type_out mvp_bowler total_value
player season
DA Warner 2016 755.0 275.5 50.0 0.0 0.0 325.5
SP Narine 2017 188.0 77.5 15.0 25.0 216.5 309.0
2018 249.0 89.5 12.5 22.0 206.0 308.0
SR Watson 2018 474.0 177.5 25.0 8.0 95.0 297.5
KS Williamson 2018 657.0 241.0 47.5 0.0 0.0 288.5
V Kohli 2016 661.0 212.0 57.5 0.0 5.0 274.5
MC Henriques 2016 206.0 70.5 22.5 11.0 178.5 271.5
BA Stokes 2017 317.0 127.5 15.0 9.0 124.5 267.0
DA Warner 2019 617.0 211.0 55.0 0.0 0.0 266.0
Shakib Al Hasan 2018 216.0 68.0 12.5 14.0 170.0 250.5

Dividing MVP table wrt seasons

In [0]:
season2016=mvp[mvp['season']==2016]
season2017=mvp[mvp['season']==2017]
season2018=mvp[mvp['season']==2018]
season2019=mvp[mvp['season']==2019]

season2016bat=mvpbat[mvpbat['season']==2016]
season2017bat=mvpbat[mvpbat['season']==2017]
season2018bat=mvpbat[mvpbat['season']==2018]
season2019bat=mvpbat[mvpbat['season']==2019]

season2016bowl=mvpbowl[mvpbowl['season']==2016]
season2017bowl=mvpbowl[mvpbowl['season']==2017]
season2018bowl=mvpbowl[mvpbowl['season']==2018]
season2019bowl=mvpbowl[mvpbowl['season']==2019]

Finding MVP for each season

In [0]:
#Most Valuable Player for each season
season2016[season2016['total_value']==max(season2016['total_value'])]
Out[0]:
player season batsman_runs mvp_batsman mvp_fielder type_out mvp_bowler total_value
0 DA Warner 2016 755.0 275.5 50.0 0.0 0.0 325.5
In [0]:
season2017[season2017['total_value']==max(season2017['total_value'])]
Out[0]:
player season batsman_runs mvp_batsman mvp_fielder type_out mvp_bowler total_value
146 SP Narine 2017 188.0 77.5 15.0 25.0 216.5 309.0
In [0]:
season2018[season2018['total_value']==max(season2018['total_value'])]
Out[0]:
player season batsman_runs mvp_batsman mvp_fielder type_out mvp_bowler total_value
97 SP Narine 2018 249.0 89.5 12.5 22.0 206.0 308.0

Finding Most Consistent Batsman and Bowler

In [0]:
#Most Consistent Batsman
top10bat=pd.concat([season2016bat.sort_values('mvp_batsman',ascending=False).head(10),
                 season2017bat.sort_values('mvp_batsman',ascending=False).head(10),
                 season2018bat.sort_values('mvp_batsman',ascending=False).head(10)])
In [0]:
top10bat.groupby('player').sum().sort_values('mvp_batsman',ascending=False)['mvp_batsman'].head(1)
Out[0]:
player
AM Rahane    490.5
Name: mvp_batsman, dtype: float64
In [0]:
#Most Consistent Bowler
top10bowl=pd.concat([season2016bowl.sort_values('mvp_bowler',ascending=False).head(10),
                 season2017bowl.sort_values('mvp_bowler',ascending=False).head(10),
                 season2018bowl.sort_values('mvp_bowler',ascending=False).head(10)])
In [0]:
top10bowl.groupby('player').sum().sort_values('mvp_bowler',ascending=False)['mvp_bowler'].head(1)
Out[0]:
player
JJ Bumrah    541.5
Name: mvp_bowler, dtype: float64

Worst Player, Batsman and Bowler

In [0]:
#Worst Player
season2016[season2016['total_value']==min(season2016['total_value'])].tail(1)
Out[0]:
player season batsman_runs mvp_batsman mvp_fielder type_out mvp_bowler total_value
542 UBT Chand 2016 1.0 0.0 0.0 0.0 0.0 0.0
In [0]:
season2017[season2017['total_value']==min(season2017['total_value'])].tail(1)
Out[0]:
player season batsman_runs mvp_batsman mvp_fielder type_out mvp_bowler total_value
550 AD Nath 2017 0.0 0.0 0.0 0.0 0.0 0.0
In [0]:
season2018[season2018['total_value']==min(season2018['total_value'])].tail(1)
Out[0]:
player season batsman_runs mvp_batsman mvp_fielder type_out mvp_bowler total_value
532 NV Ojha 2018 1.0 0.0 0.0 0.0 0.0 0.0
In [0]:
#Worst Batsman
season2016bat[(season2016bat['mvp_batsman']==min(season2016bat['mvp_batsman']))].tail(1)
Out[0]:
player season batsman_runs mvp_batsman mvp_fielder
505 Swapnil Singh 2016 0 0.0 0.0
In [0]:
season2017bat[(season2017bat['mvp_batsman']==min(season2017bat['mvp_batsman']))].tail(1)
Out[0]:
player season batsman_runs mvp_batsman mvp_fielder
18 AD Nath 2017 0 0.0 0.0
In [0]:
season2018bat[(season2018bat['mvp_batsman']==min(season2018bat['mvp_batsman']))].tail(1)
Out[0]:
player season batsman_runs mvp_batsman mvp_fielder
62 B Stanlake 2018 1 0.0 0.0
In [0]:
#Worst Bowler
season2016bowl[(season2016bowl['mvp_bowler']==min(season2016bowl['mvp_bowler']))].tail(1)
Out[0]:
player season type_out mvp_bowler
213 M Vijay 2016 0 0.0
In [0]:
season2017bowl[(season2017bowl['mvp_bowler']==min(season2017bowl['mvp_bowler']))].tail(1)
Out[0]:
player season type_out mvp_bowler
137 IK Pathan 2017 0 2.0
In [0]:
season2018bowl[(season2018bowl['mvp_bowler']==min(season2018bowl['mvp_bowler']))].tail(1)
Out[0]:
player season type_out mvp_bowler
227 MK Tiwary 2018 0 2.0

Top 25 Players Each Season

In [0]:
#Top 25 For Each Year
top25=season2016.sort_values('total_value',ascending=False).head(25)
top25
Out[0]:
player season batsman_runs mvp_batsman mvp_fielder type_out mvp_bowler total_value
0 DA Warner 2016 755.0 275.5 50.0 0.0 0.0 325.5
1 V Kohli 2016 661.0 212.0 57.5 0.0 5.0 274.5
132 MC Henriques 2016 206.0 70.5 22.5 11.0 178.5 271.5
9 AB de Villiers 2016 503.0 203.0 35.0 0.0 0.0 238.0
28 SK Raina 2016 416.0 160.5 37.5 5.0 39.5 237.5
177 SR Watson 2016 149.0 48.5 7.5 16.0 180.0 236.0
7 M Vijay 2016 523.0 204.0 30.0 0.0 0.0 234.0
207 AR Patel 2016 109.0 46.5 2.5 18.0 180.0 229.0
107 RA Jadeja 2016 234.0 78.5 15.0 8.0 132.0 225.5
4 S Dhawan 2016 560.0 190.0 30.0 0.0 0.0 220.0
12 G Gambhir 2016 497.0 176.0 42.5 0.0 0.0 218.5
13 AM Rahane 2016 489.0 171.5 45.0 0.0 0.0 216.5
304 B Kumar 2016 39.0 13.5 2.5 21.0 198.5 214.5
23 Q de Kock 2016 429.0 149.5 57.5 0.0 0.0 207.0
169 CH Morris 2016 157.0 62.5 2.5 11.0 140.5 205.5
241 DJ Bravo 2016 73.0 19.5 2.5 17.0 180.5 202.5
213 Harbhajan Singh 2016 106.0 30.5 5.0 13.0 161.5 197.0
38 RV Uthappa 2016 385.0 151.0 45.0 0.0 0.0 196.0
258 P Kumar 2016 61.0 31.5 0.0 15.0 164.5 196.0
554 Mustafizur Rahman 2016 0.0 0.0 0.0 16.0 192.0 192.0
59 DR Smith 2016 331.0 125.5 5.0 4.0 55.0 185.5
456 Sandeep Sharma 2016 8.0 2.5 0.0 21.0 178.5 181.0
31 KK Nair 2016 411.0 163.5 17.5 0.0 0.0 181.0
46 AJ Finch 2016 355.0 133.5 47.5 0.0 0.0 181.0
362 MM Sharma 2016 23.0 5.0 5.0 16.0 169.0 179.0
In [0]:
plt.figure(figsize=[20,7])
plt.title('Top 15 Players of 2016')
sns.set_style('darkgrid')
ax=sns.barplot(x='player',y='total_value',data=top25.head(15))
t=ax.set(xlabel='Player',ylabel='MVP Score')
In [0]:
top25=season2017.sort_values('total_value',ascending=False).head(25)
top25
Out[0]:
player season batsman_runs mvp_batsman mvp_fielder type_out mvp_bowler total_value
146 SP Narine 2017 188.0 77.5 15.0 25.0 216.5 309.0
66 BA Stokes 2017 317.0 127.5 15.0 9.0 124.5 267.0
135 AR Patel 2017 201.0 68.0 20.0 10.0 150.0 238.0
120 KH Pandya 2017 222.0 82.0 17.5 16.0 135.0 234.5
11 DA Warner 2017 499.0 163.5 65.0 0.0 0.0 228.5
412 JJ Bumrah 2017 14.0 5.0 2.5 24.0 207.0 214.5
6 SPD Smith 2017 533.0 186.5 20.0 0.0 0.0 206.5
167 RA Jadeja 2017 161.0 68.5 17.5 7.0 107.5 193.5
19 G Gambhir 2017 463.0 150.5 42.5 0.0 0.0 193.0
275 PJ Cummins 2017 51.0 11.0 5.0 15.0 172.5 188.5
408 Sandeep Sharma 2017 15.0 8.5 0.0 21.0 178.5 187.0
231 MM Sharma 2017 79.0 34.0 5.0 11.0 147.5 186.5
453 JD Unadkat 2017 9.0 3.5 0.0 19.0 181.5 185.0
276 DT Christian 2017 51.0 17.0 7.5 21.0 160.5 185.0
497 Rashid Khan 2017 4.0 2.5 0.0 18.0 182.0 184.5
44 SK Raina 2017 361.0 120.5 30.0 2.0 30.0 180.5
426 MJ McClenaghan 2017 12.0 0.0 0.0 14.0 177.0 177.0
25 S Dhawan 2017 424.0 149.0 27.5 0.0 0.0 176.5
29 AM Rahane 2017 416.0 161.5 15.0 0.0 0.0 176.5
377 UT Yadav 2017 21.0 6.0 0.0 15.0 170.5 176.5
114 HH Pandya 2017 225.0 77.5 10.0 5.0 85.5 173.0
246 CR Woakes 2017 71.0 33.0 2.5 10.0 136.0 171.5
94 MC Henriques 2017 259.0 89.0 17.5 3.0 64.5 171.0
43 MS Dhoni 2017 367.0 144.0 22.5 0.0 0.0 166.5
39 SV Samson 2017 373.0 152.5 12.5 0.0 0.0 165.0
In [0]:
plt.figure(figsize=[20,7])
plt.title('Top 15 Players of 2017')
sns.set_style('darkgrid')
ax=sns.barplot(x='player',y='total_value',data=top25.head(15))
t=ax.set(xlabel='Player',ylabel='MVP Score')
In [0]:
top25=season2018.sort_values('total_value',ascending=False).head(25)
top25
Out[0]:
player season batsman_runs mvp_batsman mvp_fielder type_out mvp_bowler total_value
97 SP Narine 2018 249.0 89.5 12.5 22.0 206.0 308.0
16 SR Watson 2018 474.0 177.5 25.0 8.0 95.0 297.5
2 KS Williamson 2018 657.0 241.0 47.5 0.0 0.0 288.5
127 Shakib Al Hasan 2018 216.0 68.0 12.5 14.0 170.0 250.5
292 Rashid Khan 2018 43.0 18.5 2.5 17.0 222.5 243.5
134 HH Pandya 2018 201.0 46.5 27.5 18.0 156.0 230.0
10 AT Rayudu 2018 503.0 190.0 35.0 0.0 0.0 225.0
195 DJ Bravo 2018 122.0 50.0 17.5 14.0 148.0 215.5
5 KL Rahul 2018 535.0 173.0 42.5 0.0 0.0 215.5
32 KD Karthik 2018 411.0 156.5 57.5 0.0 0.0 214.0
141 KH Pandya 2018 192.0 61.0 20.0 11.0 128.5 209.5
18 JC Buttler 2018 467.0 171.0 37.5 0.0 0.0 208.5
14 V Kohli 2018 481.0 175.5 32.5 0.0 0.0 208.0
389 S Kaul 2018 18.0 7.5 2.5 15.0 195.5 205.5
20 SK Raina 2018 461.0 176.0 27.5 0.0 0.0 203.5
271 AJ Tye 2018 53.0 14.5 2.5 18.0 184.0 201.0
262 JD Unadkat 2018 57.0 22.5 7.5 18.0 168.0 198.0
34 AM Rahane 2018 406.0 157.5 40.0 0.0 0.0 197.5
15 RR Pant 2018 480.0 165.0 32.5 0.0 0.0 197.5
21 S Dhawan 2018 453.0 153.0 40.0 0.0 0.0 193.0
162 BA Stokes 2018 167.0 44.0 20.0 12.0 128.0 192.0
197 K Gowtham 2018 122.0 55.0 7.5 10.0 124.0 186.5
36 SS Iyer 2018 401.0 137.0 45.0 0.0 0.0 182.0
22 CA Lynn 2018 444.0 157.5 22.5 0.0 0.0 180.0
410 UT Yadav 2018 14.0 5.0 0.0 18.0 174.0 179.0
In [0]:
plt.figure(figsize=[20,7])
plt.title('Top 15 Players of 2018')
sns.set_style('darkgrid')
ax=sns.barplot(x='player',y='total_value',data=top25.head(15))
t=ax.set(xlabel='Player',ylabel='MVP Score')
In [0]:
top25=season2019.sort_values('total_value',ascending=False).head(25)
top25
Out[0]:
player season batsman_runs mvp_batsman mvp_fielder type_out mvp_bowler total_value
3 DA Warner 2019 617.0 211.0 55.0 0.0 0.0 266.0
17 SS Iyer 2019 470.0 185.5 50.0 0.0 0.0 235.5
130 HH Pandya 2019 208.0 67.0 15.0 13.0 135.5 217.5
8 KL Rahul 2019 505.0 176.0 25.0 0.0 0.0 201.0
286 Rashid Khan 2019 46.0 19.5 0.0 18.0 178.0 197.5
180 RA Jadeja 2019 141.0 51.5 2.5 13.0 139.5 193.5
333 YS Chahal 2019 30.0 11.0 2.5 21.0 175.5 189.0
30 V Kohli 2019 413.0 152.0 35.0 0.0 0.0 187.0
24 Q de Kock 2019 425.0 158.5 27.5 0.0 0.0 186.0
40 AB de Villiers 2019 372.0 148.0 37.5 0.0 0.0 185.5
318 R Ashwin 2019 34.0 13.0 5.0 16.0 164.0 182.0
324 B Kumar 2019 32.0 15.5 0.0 18.0 166.0 181.5
165 KH Pandya 2019 166.0 54.5 17.5 11.0 106.5 178.5
268 PP Chawla 2019 55.0 21.0 0.0 15.0 149.5 170.5
90 AD Russell 2019 268.0 89.5 17.5 3.0 62.5 169.5
206 SP Narine 2019 110.0 33.0 7.5 12.0 127.0 167.5
221 AR Patel 2019 89.0 31.0 2.5 13.0 128.5 162.0
319 S Gopal 2019 34.0 10.0 5.0 13.0 143.5 158.5
50 CH Gayle 2019 350.0 127.5 27.5 0.0 0.0 155.0
41 JM Bairstow 2019 369.0 134.0 20.0 0.0 0.0 154.0
61 AT Rayudu 2019 326.0 124.5 27.5 0.0 0.0 152.0
556 Imran Tahir 2019 0.0 0.0 0.0 11.0 151.5 151.5
72 RR Pant 2019 297.0 120.5 30.0 0.0 0.0 150.5
421 NA Saini 2019 13.0 6.0 0.0 15.0 144.5 150.5
58 SK Raina 2019 331.0 129.0 17.5 0.0 4.0 150.5
In [0]:
plt.figure(figsize=[20,7])
plt.title('Top 15 Players of 2019')
sns.set_style('darkgrid')
ax=sns.barplot(x='player',y='total_value',data=top25.head(15))
t=ax.set(xlabel='Player',ylabel='MVP Score')

Most Improved Player from 2018 to 2019

In [0]:
mvpimproved=season2018.merge(season2019,left_on='player',right_on='player')
mvpimproved.head()
Out[0]:
player season_x batsman_runs_x mvp_batsman_x mvp_fielder_x type_out_x mvp_bowler_x total_value_x season_y batsman_runs_y mvp_batsman_y mvp_fielder_y type_out_y mvp_bowler_y total_value_y
0 KS Williamson 2018 657.0 241.0 47.5 0.0 0.0 288.5 2019 72.0 25.0 0.0 0.0 0.0 25.0
1 KL Rahul 2018 535.0 173.0 42.5 0.0 0.0 215.5 2019 505.0 176.0 25.0 0.0 0.0 201.0
2 AT Rayudu 2018 503.0 190.0 35.0 0.0 0.0 225.0 2019 326.0 124.5 27.5 0.0 0.0 152.0
3 V Kohli 2018 481.0 175.5 32.5 0.0 0.0 208.0 2019 413.0 152.0 35.0 0.0 0.0 187.0
4 RR Pant 2018 480.0 165.0 32.5 0.0 0.0 197.5 2019 297.0 120.5 30.0 0.0 0.0 150.5
In [0]:
mvpimproved=mvpimproved[['player','total_value_x','total_value_y']]
In [0]:
mvpimproved['improvement']=mvpimproved['total_value_y']-mvpimproved['total_value_x']
In [0]:
player=mvpimproved.sort_values('improvement',ascending=False).head(1)
player
Out[0]:
player total_value_x total_value_y improvement
104 M Ashwin 24.5 128.5 104.0
In [0]:
Stadium=Matches.groupby('city').sum()
Stadium['total_runs']=Stadium['first_bat_score']+Stadium['second_bat_score']
Stadium.reset_index(inplace=True)
In [0]:
Stadium=Stadium.sort_values('total_runs',ascending=False).head()
Stadium
Out[0]:
city season win_by_runs win_by_wickets overs first_bat_score second_bat_score total_runs
22 Mumbai 199306 1269 301 1980 16302.0 15149.0 31451.0
21 Kolkata 155078 871 275 1540 12334.0 11324.0 23658.0
10 Delhi 147011 1040 222 1460 11939.0 10772.0 22711.0
2 Bangalore 132861 1263 232 1320 11081.0 9455.0 20536.0
14 Hyderabad 126908 750 226 1260 9941.0 9246.0 19187.0
In [0]:
plt.figure(figsize=[10,7])
plt.title('Stadiums with most runs')
ax=sns.barplot('city','total_runs',data=Stadium.head(),palette='pastel')
In [0]:
Deliveries['bowled_over']=[1 for x in Deliveries['bowled_over']]
In [0]:
Deliveries.head()
Out[0]:
season batsman bowler innings non_striker bowled_over batsman_team player_out fielder_caught_out type_out ... extras_legbyes extras_noballs extras_byes extras_penalty total_extras_runs batsman_runs total_runs mvp_batsman mvp_bowler mvp_fielder
0 2008 AC Gilchrist GD McGrath 1st JC Buttler 1 Rajasthan Royals 0 0 0 ... 1 0 0 0 1 0 1 0.0 1.0 0.0
1 2008 AC Gilchrist GD McGrath 1st AM Rahane 1 Rajasthan Royals 0 0 0 ... 0 0 0 0 0 0 0 0.0 1.0 0.0
2 2008 AC Gilchrist GD McGrath 1st AM Rahane 1 Rajasthan Royals 0 0 0 ... 0 0 0 0 0 4 4 2.5 0.0 0.0
3 2008 Y Venugopal Rao GD McGrath 1st AM Rahane 1 Rajasthan Royals 0 0 0 ... 0 0 0 0 0 0 0 0.0 1.0 0.0
4 2008 Y Venugopal Rao GD McGrath 1st AM Rahane 1 Rajasthan Royals 0 0 0 ... 0 0 0 0 0 6 6 3.5 0.0 0.0

5 rows × 21 columns

In [0]:
bowler=Deliveries.groupby('bowler').sum()
bowler=bowler[['bowled_over','type_out','total_runs']]
In [0]:
bowler['economy']=6*bowler['total_runs']/bowler['bowled_over']
bowler['average']=bowler['total_runs']/bowler['type_out']
In [0]:
bowler=bowler[bowler['bowled_over']>300]
bowler['eco_norm']=(bowler['economy']-bowler['economy'].mean())/bowler['economy'].std()
bowler['avg_norm']=(bowler['average']-bowler['average'].mean())/bowler['average'].std()
bowler['wicket_to_runs']=bowler['eco_norm']+bowler['avg_norm']
bowler=bowler.sort_values('wicket_to_runs')
bowler.reset_index(inplace=True)
In [0]:
bowler.head(10)
Out[0]:
bowler bowled_over type_out total_runs economy average eco_norm avg_norm wicket_to_runs
0 Washington Sundar 340 20 375 6.617647 18.750000 -3.168019 -1.511227 -4.679246
1 AS Rajpoot 452 22 521 6.915929 23.681818 -2.398183 -0.666092 -3.064275
2 GB Hogg 466 30 563 7.248927 18.766667 -1.538749 -1.508371 -3.047120
3 DT Christian 818 55 995 7.298289 18.090909 -1.411352 -1.624172 -3.035523
4 TG Southee 932 65 1140 7.339056 17.538462 -1.306135 -1.718841 -3.024977
5 PJ Cummins 379 20 448 7.092348 22.400000 -1.942863 -0.885749 -2.828612
6 M Ashwin 402 29 501 7.477612 17.275862 -0.948536 -1.763841 -2.712377
7 M Muralitharan 1577 93 1926 7.327838 20.709677 -1.335088 -1.175410 -2.510498
8 BW Hilfenhaus 390 27 488 7.507692 18.074074 -0.870901 -1.627057 -2.497958
9 R Dhawan 513 30 627 7.333333 20.900000 -1.320904 -1.142795 -2.463700
In [0]:
from palettable.colorbrewer.qualitative import Paired_12_r
plt.figure(figsize=[20,7])
plt.title('Bowlers who give least runs and take up more wicket')
sns.set_style('darkgrid')
ax=sns.barplot(x='bowler',y='wicket_to_runs',data=bowler.head(10),palette=Paired_12_r.mpl_colors)
t=ax.set(xlabel='Bowler',ylabel='Balance b/w Average and Economy (Lower the better)')

Dream Team for IPL 2020

Criteria for choosing players

  • Maximum of 4 foreign players
  • 4 Bowlers
  • 4 Batsman
  • 2 Allrounders (one bowling allrounder and one batting allrounder)
  • 1 Wicket Keeper Batsman
In [0]:
# Start By Looking at Most Consistent Players For Seasons from 2016-2019
consistent_players=pd.concat([season2016.sort_values('total_value',ascending=False).head(20),
                 season2017.sort_values('total_value',ascending=False).head(20),
                 season2018.sort_values('total_value',ascending=False).head(20),
                 season2019.sort_values('total_value',ascending=False).head(20)])
consistent_players_total=consistent_players.groupby('player').sum().sort_values('total_value',ascending=False)
consistent_players_total.drop('season',axis=1,inplace=True)
In [0]:
consistent_players_total.head(30)
Out[0]:
batsman_runs mvp_batsman mvp_fielder type_out mvp_bowler total_value
player
DA Warner 1871.0 650.0 170.0 0.0 0.0 820.0
SP Narine 547.0 200.0 35.0 59.0 549.5 784.5
V Kohli 1555.0 539.5 125.0 0.0 5.0 669.5
AR Patel 399.0 145.5 25.0 41.0 458.5 629.0
Rashid Khan 93.0 40.5 2.5 53.0 582.5 625.5
KH Pandya 580.0 197.5 55.0 38.0 370.0 622.5
SK Raina 1238.0 457.0 95.0 7.0 69.5 621.5
RA Jadeja 536.0 198.5 35.0 28.0 379.0 612.5
AM Rahane 1311.0 490.5 100.0 0.0 0.0 590.5
S Dhawan 1437.0 492.0 97.5 0.0 0.0 589.5
SR Watson 623.0 226.0 32.5 24.0 275.0 533.5
HH Pandya 409.0 113.5 42.5 31.0 291.5 447.5
AB de Villiers 875.0 351.0 72.5 0.0 0.0 423.5
DJ Bravo 195.0 69.5 20.0 31.0 328.5 418.0
KL Rahul 1040.0 349.0 67.5 0.0 0.0 416.5
G Gambhir 960.0 326.5 85.0 0.0 0.0 411.5
B Kumar 71.0 29.0 2.5 39.0 364.5 396.0
Q de Kock 854.0 308.0 85.0 0.0 0.0 393.0
JD Unadkat 66.0 26.0 7.5 37.0 349.5 383.0
KS Williamson 657.0 241.0 47.5 0.0 0.0 288.5
MC Henriques 206.0 70.5 22.5 11.0 178.5 271.5
BA Stokes 317.0 127.5 15.0 9.0 124.5 267.0
Shakib Al Hasan 216.0 68.0 12.5 14.0 170.0 250.5
SS Iyer 470.0 185.5 50.0 0.0 0.0 235.5
M Vijay 523.0 204.0 30.0 0.0 0.0 234.0
AT Rayudu 503.0 190.0 35.0 0.0 0.0 225.0
JJ Bumrah 14.0 5.0 2.5 24.0 207.0 214.5
KD Karthik 411.0 156.5 57.5 0.0 0.0 214.0
JC Buttler 467.0 171.0 37.5 0.0 0.0 208.5
SPD Smith 533.0 186.5 20.0 0.0 0.0 206.5
In [0]:
consistent_players_total.sort_values('mvp_batsman',ascending=False).head(30)
Out[0]:
batsman_runs mvp_batsman mvp_fielder type_out mvp_bowler total_value
player
DA Warner 1871.0 650.0 170.0 0.0 0.0 820.0
V Kohli 1555.0 539.5 125.0 0.0 5.0 669.5
S Dhawan 1437.0 492.0 97.5 0.0 0.0 589.5
AM Rahane 1311.0 490.5 100.0 0.0 0.0 590.5
SK Raina 1238.0 457.0 95.0 7.0 69.5 621.5
AB de Villiers 875.0 351.0 72.5 0.0 0.0 423.5
KL Rahul 1040.0 349.0 67.5 0.0 0.0 416.5
G Gambhir 960.0 326.5 85.0 0.0 0.0 411.5
Q de Kock 854.0 308.0 85.0 0.0 0.0 393.0
KS Williamson 657.0 241.0 47.5 0.0 0.0 288.5
SR Watson 623.0 226.0 32.5 24.0 275.0 533.5
M Vijay 523.0 204.0 30.0 0.0 0.0 234.0
SP Narine 547.0 200.0 35.0 59.0 549.5 784.5
RA Jadeja 536.0 198.5 35.0 28.0 379.0 612.5
KH Pandya 580.0 197.5 55.0 38.0 370.0 622.5
AT Rayudu 503.0 190.0 35.0 0.0 0.0 225.0
SPD Smith 533.0 186.5 20.0 0.0 0.0 206.5
SS Iyer 470.0 185.5 50.0 0.0 0.0 235.5
JC Buttler 467.0 171.0 37.5 0.0 0.0 208.5
RR Pant 480.0 165.0 32.5 0.0 0.0 197.5
KD Karthik 411.0 156.5 57.5 0.0 0.0 214.0
RV Uthappa 385.0 151.0 45.0 0.0 0.0 196.0
AR Patel 399.0 145.5 25.0 41.0 458.5 629.0
JM Bairstow 369.0 134.0 20.0 0.0 0.0 154.0
CH Gayle 350.0 127.5 27.5 0.0 0.0 155.0
BA Stokes 317.0 127.5 15.0 9.0 124.5 267.0
HH Pandya 409.0 113.5 42.5 31.0 291.5 447.5
AD Russell 268.0 89.5 17.5 3.0 62.5 169.5
MC Henriques 206.0 70.5 22.5 11.0 178.5 271.5
DJ Bravo 195.0 69.5 20.0 31.0 328.5 418.0
In [0]:
consistent_players_total.sort_values('mvp_bowler',ascending=False).head(30)
Out[0]:
batsman_runs mvp_batsman mvp_fielder type_out mvp_bowler total_value
player
Rashid Khan 93.0 40.5 2.5 53.0 582.5 625.5
SP Narine 547.0 200.0 35.0 59.0 549.5 784.5
AR Patel 399.0 145.5 25.0 41.0 458.5 629.0
RA Jadeja 536.0 198.5 35.0 28.0 379.0 612.5
KH Pandya 580.0 197.5 55.0 38.0 370.0 622.5
B Kumar 71.0 29.0 2.5 39.0 364.5 396.0
JD Unadkat 66.0 26.0 7.5 37.0 349.5 383.0
DJ Bravo 195.0 69.5 20.0 31.0 328.5 418.0
HH Pandya 409.0 113.5 42.5 31.0 291.5 447.5
SR Watson 623.0 226.0 32.5 24.0 275.0 533.5
JJ Bumrah 14.0 5.0 2.5 24.0 207.0 214.5
S Kaul 18.0 7.5 2.5 15.0 195.5 205.5
Mustafizur Rahman 0.0 0.0 0.0 16.0 192.0 192.0
AJ Tye 53.0 14.5 2.5 18.0 184.0 201.0
MC Henriques 206.0 70.5 22.5 11.0 178.5 271.5
Sandeep Sharma 15.0 8.5 0.0 21.0 178.5 187.0
MJ McClenaghan 12.0 0.0 0.0 14.0 177.0 177.0
YS Chahal 30.0 11.0 2.5 21.0 175.5 189.0
PJ Cummins 51.0 11.0 5.0 15.0 172.5 188.5
UT Yadav 21.0 6.0 0.0 15.0 170.5 176.5
Shakib Al Hasan 216.0 68.0 12.5 14.0 170.0 250.5
P Kumar 61.0 31.5 0.0 15.0 164.5 196.0
R Ashwin 34.0 13.0 5.0 16.0 164.0 182.0
Harbhajan Singh 106.0 30.5 5.0 13.0 161.5 197.0
DT Christian 51.0 17.0 7.5 21.0 160.5 185.0
PP Chawla 55.0 21.0 0.0 15.0 149.5 170.5
MM Sharma 79.0 34.0 5.0 11.0 147.5 186.5
S Gopal 34.0 10.0 5.0 13.0 143.5 158.5
CH Morris 157.0 62.5 2.5 11.0 140.5 205.5
BA Stokes 317.0 127.5 15.0 9.0 124.5 267.0

From the above data, we need to choose

  • 4 Pure Batsmen
  • 4 Bowlers
  • 2 Allrounders
  • 1 Wicket Keeper
  • Maximum of 4 Overseas Players
In [0]:
DAWarner=mvp[mvp['player']=='DA Warner']
DAWarner
Out[0]:
player season batsman_runs mvp_batsman mvp_fielder type_out mvp_bowler total_value
0 DA Warner 2016 755.0 275.5 50.0 0.0 0.0 325.5
3 DA Warner 2019 617.0 211.0 55.0 0.0 0.0 266.0
11 DA Warner 2017 499.0 163.5 65.0 0.0 0.0 228.5
In [0]:
plt.title("David Warner")
ax=sns.pointplot(x='season',y='mvp_batsman',data=DAWarner)
ticks=np.arange(150,400,step=20)
ax=ax.set(yticks=ticks)

We select David Warner as our first foreign player and We need to select 10 more players and 3 more overseas players. Team So Far:

  1. David Warner
In [0]:
VK=mvp[mvp['player']=='V Kohli']
VK
Out[0]:
player season batsman_runs mvp_batsman mvp_fielder type_out mvp_bowler total_value
1 V Kohli 2016 661.0 212.0 57.5 0.0 5.0 274.5
14 V Kohli 2018 481.0 175.5 32.5 0.0 0.0 208.0
30 V Kohli 2019 413.0 152.0 35.0 0.0 0.0 187.0
86 V Kohli 2017 271.0 84.5 17.5 0.0 0.0 102.0
In [0]:
plt.title("Virat Kohli")
ax=sns.pointplot(x='season',y='mvp_batsman',data=VK)
ticks=np.arange(100,300,step=20)
ax=ax.set(yticks=ticks)

Team So Far:

  1. David Warner
  2. Virat Kohli
In [0]:
SD=mvp[mvp['player']=='S Dhawan']
SD
Out[0]:
player season batsman_runs mvp_batsman mvp_fielder type_out mvp_bowler total_value
4 S Dhawan 2016 560.0 190.0 30.0 0.0 0.0 220.0
21 S Dhawan 2018 453.0 153.0 40.0 0.0 0.0 193.0
25 S Dhawan 2017 424.0 149.0 27.5 0.0 0.0 176.5
42 S Dhawan 2019 368.0 119.0 27.5 0.0 0.0 146.5
In [0]:
plt.title("Shikar Dhawan")
ax=sns.pointplot(x='season',y='mvp_batsman',data=SD)
ticks=np.arange(150,300,step=20)
ax=ax.set(yticks=ticks)

Team So Far:

  1. D Warner
  2. S Dhawan
  3. V Kohli

We have picked out top order. We will now pick out Middle Order We need: 2 Batsman and 1 Wicket Keeper Batsman. We have 3 overseas players.

We will Skip AM Rahane since he is a top order batsman and we've got it covered. Let us have him as backup. We shall have our next MVP Batsman KL Rahul as backup as well

In [0]:
SR=consistent_players[consistent_players['player']=='SK Raina']
SR
Out[0]:
player season batsman_runs mvp_batsman mvp_fielder type_out mvp_bowler total_value
28 SK Raina 2016 416.0 160.5 37.5 5.0 39.5 237.5
44 SK Raina 2017 361.0 120.5 30.0 2.0 30.0 180.5
20 SK Raina 2018 461.0 176.0 27.5 0.0 0.0 203.5
In [0]:
plt.title("Suresh Raina")
ax=sns.pointplot(x='season',y='mvp_batsman',data=SR)
ticks=np.arange(100,300,step=20)
ax=ax.set(yticks=ticks)
In [0]:
ABD=mvp[mvp['player']=='AB de Villiers']
ABD
Out[0]:
player season batsman_runs mvp_batsman mvp_fielder type_out mvp_bowler total_value
9 AB de Villiers 2016 503.0 203.0 35.0 0.0 0.0 238.0
35 AB de Villiers 2018 405.0 155.5 22.5 0.0 0.0 178.0
40 AB de Villiers 2019 372.0 148.0 37.5 0.0 0.0 185.5
126 AB de Villiers 2017 218.0 72.5 15.0 0.0 0.0 87.5
In [0]:
plt.title("AB de Villers")
ax=sns.pointplot(x='season',y='mvp_batsman',data=ABD)
ticks=np.arange(100,300,step=20)
ax=ax.set(yticks=ticks)

We have picked 5 batsmen so far, two of which are overseas players

Batting Lineup:

  1. D Warner
  2. S Dhawan
  3. V Kohli
  4. Ab de Villers
  5. Suresh Raina

Now, we need to pick a wicket keeper.

We have Q deKock who is the highest ranked wicket keeper in our most consistent batsmen dataframe. But choosing him would mean we are left with 1 overseas player. So, we will choose the next best wicket keeper who is an Indian, RR Pant

In [0]:
RP=mvp[mvp['player']=='RR Pant']
RP
Out[0]:
player season batsman_runs mvp_batsman mvp_fielder type_out mvp_bowler total_value
15 RR Pant 2018 480.0 165.0 32.5 0.0 0.0 197.5
72 RR Pant 2019 297.0 120.5 30.0 0.0 0.0 150.5
92 RR Pant 2017 263.0 76.0 32.5 0.0 0.0 108.5
121 RR Pant 2016 222.0 90.5 12.5 0.0 0.0 103.0
In [0]:
plt.title("Rishab Pant")
ax=sns.pointplot(x='season',y='mvp_batsman',data=RP)
ticks=np.arange(100,300,step=20)
ax=ax.set(yticks=ticks)

Batting Lineup so far:

  1. D Warner
  2. S Dhawan
  3. V Kohli
  4. Ab de Villers
  5. Suresh Raina
  6. Rishab Pant

We have 2 overseas players remaining. We need to pick 1 batting allrounder, 1 bowling allrounder, 3 pure bowlers.

To choose the batting allrounder, we will look at the players who have highest mvp_bowler value among top 30 batsmen for seasons from 2016-2019

In [0]:
top25bat=pd.concat([season2016.sort_values('mvp_batsman',ascending=False).head(50),
                 season2017.sort_values('mvp_batsman',ascending=False).head(50),
                 season2018.sort_values('mvp_batsman',ascending=False).head(50),
                 season2019.sort_values('mvp_batsman',ascending=False).head(50)])
top25bat.drop('season',axis=1,inplace=True)
top25bat.groupby('player').sum().sort_values('mvp_bowler',ascending=False).head(10)
Out[0]:
batsman_runs mvp_batsman mvp_fielder type_out mvp_bowler total_value
player
SP Narine 547.0 200.0 35.0 59.0 549.5 784.5
RA Jadeja 659.0 243.5 40.0 38.0 503.0 786.5
KH Pandya 712.0 248.0 67.5 44.0 474.0 789.5
HH Pandya 634.0 191.0 52.5 36.0 377.0 620.5
BA Stokes 606.0 215.0 40.0 28.0 330.0 585.0
AD Russell 614.0 198.5 32.5 20.0 272.0 503.0
MC Henriques 465.0 159.5 40.0 14.0 243.0 442.5
DJ Bravo 199.0 82.0 25.0 21.0 239.5 346.5
Shakib Al Hasan 216.0 68.0 12.5 14.0 170.0 250.5
AR Patel 201.0 68.0 20.0 10.0 150.0 238.0
In [0]:
SP=mvp[mvp['player']=='SP Narine']
SP
Out[0]:
player season batsman_runs mvp_batsman mvp_fielder type_out mvp_bowler total_value
97 SP Narine 2018 249.0 89.5 12.5 22.0 206.0 308.0
146 SP Narine 2017 188.0 77.5 15.0 25.0 216.5 309.0
206 SP Narine 2019 110.0 33.0 7.5 12.0 127.0 167.5
375 SP Narine 2016 21.0 8.5 0.0 7.0 119.5 128.0
In [0]:
plt.title("Sunil Narine Batting")
ax=sns.pointplot(x='season',y='mvp_batsman',data=SP)
ticks=np.arange(60,200,step=20)
ax=ax.set(yticks=ticks)
In [0]:
RJ=mvp[mvp['player']=='RA Jadeja']
RJ
Out[0]:
player season batsman_runs mvp_batsman mvp_fielder type_out mvp_bowler total_value
107 RA Jadeja 2016 234.0 78.5 15.0 8.0 132.0 225.5
167 RA Jadeja 2017 161.0 68.5 17.5 7.0 107.5 193.5
180 RA Jadeja 2019 141.0 51.5 2.5 13.0 139.5 193.5
194 RA Jadeja 2018 123.0 45.0 5.0 10.0 124.0 174.0
In [0]:
plt.title("Ravindra Jadeja Batting")
ax=sns.pointplot(x='season',y='mvp_batsman',data=RJ)
ticks=np.arange(60,200,step=20)
ax=ax.set(yticks=ticks)
In [0]:
KP=mvp[mvp['player']=='KH Pandya']
KP
Out[0]:
player season batsman_runs mvp_batsman mvp_fielder type_out mvp_bowler total_value
120 KH Pandya 2017 222.0 82.0 17.5 16.0 135.0 234.5
141 KH Pandya 2018 192.0 61.0 20.0 11.0 128.5 209.5
165 KH Pandya 2019 166.0 54.5 17.5 11.0 106.5 178.5
184 KH Pandya 2016 132.0 50.5 12.5 6.0 104.0 167.0
In [0]:
plt.title("Krunal Pandya Batting")
ax=sns.pointplot(x='season',y='mvp_batsman',data=KP)
ticks=np.arange(60,200,step=20)
ax=ax.set(yticks=ticks)
In [0]:
HP=mvp[mvp['player']=='HH Pandya']
HP
Out[0]:
player season batsman_runs mvp_batsman mvp_fielder type_out mvp_bowler total_value
114 HH Pandya 2017 225.0 77.5 10.0 5.0 85.5 173.0
130 HH Pandya 2019 208.0 67.0 15.0 13.0 135.5 217.5
134 HH Pandya 2018 201.0 46.5 27.5 18.0 156.0 230.0
298 HH Pandya 2016 41.0 5.0 2.5 4.0 54.0 61.5
In [0]:
plt.title("Hardik Pandya Batting")
ax=sns.pointplot(x='season',y='mvp_batsman',data=KP)
ticks=np.arange(60,300,step=20)
ax=ax.set(yticks=ticks)

If we observe the graphs of 4 players for mvp_batsman, we observe that HH Pandya and KH Pandya are the most consistent with mvp_bat around 60. RA Jadeja and SR Narine have a dip in form. If we chose HH Pandya, We can need to choose and extra spinner. If we chose KH Pandya, We can need to choose and extra pacer.

Team so far:

  1. D Warner
  2. S Dhawan
  3. V Kohli
  4. Ab de Villers
  5. Suresh Raina
  6. Rishab Pant
  7. Hardik/Krunal Pandya

We now need to choose a bowling allrounder. To choose the bowling allrounder, we will look at the players who have highest mvp_batsman value among top 30 bowlers for seasons from 2016-2019

In [0]:
top25bowl=pd.concat([season2016.sort_values('mvp_bowler',ascending=False).head(50),
                 season2017.sort_values('mvp_bowler',ascending=False).head(50),
                 season2018.sort_values('mvp_bowler',ascending=False).head(50),
                 season2019.sort_values('mvp_bowler',ascending=False).head(50)])
top25bowl.drop('season',axis=1,inplace=True)
top25bowl.groupby('player').sum().sort_values('mvp_batsman',ascending=False).head(10)
Out[0]:
batsman_runs mvp_batsman mvp_fielder type_out mvp_bowler total_value
player
SR Watson 708.0 248.0 47.5 32.0 355.0 650.5
KH Pandya 712.0 248.0 67.5 44.0 474.0 789.5
RA Jadeja 659.0 243.5 40.0 38.0 503.0 786.5
BA Stokes 606.0 215.0 40.0 28.0 330.0 585.0
SP Narine 568.0 208.5 35.0 66.0 669.0 912.5
AD Russell 614.0 198.5 32.5 20.0 272.0 503.0
HH Pandya 634.0 191.0 52.5 36.0 377.0 620.5
AR Patel 485.0 171.5 25.0 48.0 534.0 730.5
MC Henriques 465.0 159.5 40.0 14.0 243.0 442.5
DR Smith 331.0 125.5 5.0 4.0 55.0 185.5

SR Watson is retired. Since we already have KH Pandya shortlisted, we can go with him.

In [0]:
plt.title("Krunal Pandya Bowling")
ax=sns.pointplot(x='season',y='mvp_bowler',data=KP)
ticks=np.arange(60,200,step=20)
ax=ax.set(yticks=ticks)

Team so far:

  1. D Warner
  2. S Dhawan
  3. V Kohli
  4. Ab de Villers
  5. Suresh Raina
  6. Rishab Pant
  7. Hardik Pandya
  8. Krunal Pandya

We need to choose 3 bowlers and we still have 2 overseas players remaining. We can go with either 2 spinners or 2 pacers. If we look at top 2 in most consistent bowlers list, we have Rashid Khan and SR Narine.Both are overseas players and we can afford to pick both of them

In [0]:
RK=mvp[mvp['player']=='Rashid Khan']
RK
Out[0]:
player season batsman_runs mvp_batsman mvp_fielder type_out mvp_bowler total_value
286 Rashid Khan 2019 46.0 19.5 0.0 18.0 178.0 197.5
292 Rashid Khan 2018 43.0 18.5 2.5 17.0 222.5 243.5
497 Rashid Khan 2017 4.0 2.5 0.0 18.0 182.0 184.5
In [0]:
plt.title("Rashid Khan")
ax=sns.pointplot(x='season',y='mvp_bowler',data=RK)
ticks=np.arange(100,300,step=20)
ax=ax.set(yticks=ticks)
In [0]:
plt.title("Sunil Narine Bowling")
ax=sns.pointplot(x='season',y='mvp_bowler',data=SP)
ticks=np.arange(100,300,step=20)
ax=ax.set(yticks=ticks)

We see a dip in form of Sunil Narine from 2017. We shall go with Rashid Khan as of now and keep Sunil Narine as a backup. We have to pick 2 more bowlers and atleast one pacer and one more overseas player can be accommodated. Highest ranked pace bowler is B Kumar in our list.

Batting Lineup:

  1. D Warner
  2. S Dhawan
  3. V Kohli
  4. Ab de Villers
  5. Suresh Raina
  6. Rishab Pant
  7. Hardik Pandya
  8. Krunal Pandya
  9. Rashid Khan
In [0]:
BK=mvp[mvp['player']=='B Kumar']
BK
Out[0]:
player season batsman_runs mvp_batsman mvp_fielder type_out mvp_bowler total_value
304 B Kumar 2016 39.0 13.5 2.5 21.0 198.5 214.5
324 B Kumar 2019 32.0 15.5 0.0 18.0 166.0 181.5
349 B Kumar 2018 25.0 7.0 0.0 8.0 126.0 133.0
499 B Kumar 2017 3.0 0.0 0.0 9.0 138.5 138.5
In [0]:
plt.title("Bhuvaneshwar Kumar")
ax=sns.pointplot(x='season',y='mvp_bowler',data=BK)
ticks=np.arange(100,300,step=20)
ax=ax.set(yticks=ticks)

Batting Lineup:

  1. D Warner
  2. S Dhawan
  3. V Kohli
  4. Ab de Villers
  5. Suresh Raina
  6. Rishab Pant
  7. Hardik Pandya
  8. Krunal Pandya
  9. Rashid Khan
  10. Bhuvaneshwar Kumar

We now need to pick another bowler. If we go with Pacer, we can go with JD Unadkat, DJ Bravo or JJ Bumrah. We can also go with Sunil Narine if we want a spinner

In [0]:
JU=mvp[mvp['player']=='JD Unadkat']
JU
Out[0]:
player season batsman_runs mvp_batsman mvp_fielder type_out mvp_bowler total_value
262 JD Unadkat 2018 57.0 22.5 7.5 18.0 168.0 198.0
453 JD Unadkat 2017 9.0 3.5 0.0 19.0 181.5 185.0
474 JD Unadkat 2019 6.0 2.5 0.0 8.0 101.0 103.5
618 JD Unadkat 2016 0.0 0.0 0.0 0.0 8.0 8.0
In [0]:
plt.title("Jaidev Unadkat")
ax=sns.pointplot(x='season',y='mvp_bowler',data=JU)
ticks=np.arange(100,300,step=20)
ax=ax.set(yticks=ticks)
In [0]:
DJ=mvp[mvp['player']=='DJ Bravo']
DJ
Out[0]:
player season batsman_runs mvp_batsman mvp_fielder type_out mvp_bowler total_value
195 DJ Bravo 2018 122.0 50.0 17.5 14.0 148.0 215.5
235 DJ Bravo 2019 77.0 32.0 7.5 7.0 91.5 131.0
241 DJ Bravo 2016 73.0 19.5 2.5 17.0 180.5 202.5
In [0]:
plt.title("Dwayne Bravo")
ax=sns.pointplot(x='season',y='mvp_bowler',data=DJ)
ticks=np.arange(100,300,step=20)
ax=ax.set(yticks=ticks)
In [0]:
JB=mvp[mvp['player']=='JJ Bumrah']
JB
Out[0]:
player season batsman_runs mvp_batsman mvp_fielder type_out mvp_bowler total_value
409 JJ Bumrah 2018 15.0 6.0 0.0 13.0 160.5 166.5
412 JJ Bumrah 2017 14.0 5.0 2.5 24.0 207.0 214.5
451 JJ Bumrah 2016 9.0 2.5 0.0 18.0 174.0 176.5
527 JJ Bumrah 2019 1.0 0.0 0.0 12.0 144.0 144.0
In [0]:
plt.title("Jasprit Bumrah")
ax=sns.pointplot(x='season',y='mvp_bowler',data=JB)
ticks=np.arange(100,300,step=20)
ax=ax.set(yticks=ticks)

Using the graph above, we can go with JJ Bumrah. So, our playing 11 is, Batting Lineup:

  1. D Warner
  2. S Dhawan
  3. V Kohli
  4. Ab de Villers
  5. Suresh Raina
  6. Rishab Pant
  7. Hardik Pandya
  8. Krunal Pandya
  9. Rashid Khan
  10. Bhuvaneshwar Kumar
  11. Jasprith Bumrah

Bench:

  1. Sunil Narine
  2. AM Rahane
  3. Quinton de Kock
  4. KL Rahul
In [0]: