Cluster of cricketers
This week, we look at the concept of statistical clusters and how it can be used to analyse on-field performances of cricketers. Intrigued? Read on…
Is Anil Kumble like Murali Kartik? If we pose this question in an open forum there will be howls of protest. 
Or we could ask: Is Virender Sehwag like Adam Gilchrist? … and face the same sort of outraged ire.
So let’s first clarify matters. We’re merely trying to compare the performance of players in the ongoing IPL3.
Recall that both Kumble and Kartik come in to bowl just around the 7th over mark, both bowl tight and both tend to strike with quick wickets. Likewise Gilchrist and Sehwag come out to open with the identical intention of creating a run-scoring mayhem. They might perish early, but won’t have strike rates below 160 or 170.
Elsewhere on this Castrol Cricket portal, readers will find the option to compare two players based on their Castrol Index (CI). Try it out. It is good fun and very illustrative.
But we’re now proposing an even more interesting game based on the CI. If you check out the CI of a player, it is approximately the sum of two components: the batting momentum and the bowling efficiency.
So let us plot a player’s batting momentum (on the x-axis) with his bowling efficiency (on the y-axis). One would expect Sehwag and Gilchrist to be far right on the x-axis, but barely able to take off on the y-axis. On the other hand, we expect both Kumble and Kartik to be high up on the y-axis, but struggling to go far from the origin on the x-axis side …
But here’s the key observation: both Kumble and Kartik will plot very close to each other … as indeed would Sehwag and Gilchrist … on the x-y axis. Try the same idea with Rahul Dravid and Mahela Jayawardene and you might find that they too plot rather close to one another.
That’s the key idea in the statistical concept of clustering. Within a cluster we find players with very similar performances, but, between clusters, performances are very different. So Kumble and Kartik might be in the same cluster, Sehwag and Gilchrist might be some other cluster and Dravid and Jayawardene in a third cluster.
In a project we did after IPL1, four bright students from the Indian Statistical Institute, Kolkata analyzed the IPL data and proposed nine clusters. Then they proposed something even smarter: they said that we should pay players in the same cluster approximately the same sum of money. If a new player comes along, analyze his cricketing record to see which cluster he is closest to … and pay him accordingly.
Of course this is for on-field performance. As we have discussed elsewhere, a player also offers an off-field value … and his eventual offer is based on an assessment of both his on-field skills and off-field value. 







I love players basically cricketer. Yuvraj is my favorite player