Poker, Probability, Monte Carlo, and R

My daughter just started a business analytics Master's program. For the probability sequence of the core statistics course, one of her assignments is to calculate the probability of single 5 card draw poker hands from a 52-card deck.

I well remember this exercise from back in the day, when I computed all such relevant probabilities using basic combination counting techniques for an intro to probability course. My daughter though, a business undergrad, is less interested in the math than she is in the stats/computation, opining that's where she'll make money.

It's hard to argue that logic, though I thought it might be an analytics "ah-hah" moment for her to connect the probability math with statistics. The population of 5 card draw hands, consisting of 52 choose 5 or 2598960 elements, is pretty straightforward both mathematically and statistically.

So of course ever the geek, I just had to attempt to show her how probability and statistics converge. In addition to explaining the "combinatorics" of the counts and probabilities, I undertook two computational exercises. The first was to delineate all possible combinations of 5 card draws from a 52 card deck, counting occurrences of relevant combinations such as 2 pair, a straight, or nothing in a cell loop.

The second and statistical approach revolves on Monte Carlo simulation that's driven from "repeated random sampling to obtain numerical results. Their essential idea is using randomness to solve problems that might be deterministic in principle." So by generating a sizable number of randomly sampled hands, similar logic to that articulated above can be used to tally counts of pertinent combinations. The Law of Large Numbers suggests that as sample size gets large, the statistical counts should look a lot like those probability calculations.

Method 1, grounded in math/probability, produces exact answers, while Method 2, which revolves on sampling, is approximate and variable, with generally more accuracy the larger the sample size.

Why approximate over exact? In this case the probability distribution is simple, but in more complicated instances, the math may be too complex. Better a Monte Carlo statistical estimate than intractable mathematics.

The remainder of this notebook looks at first the probability calculations and then the corresponding sampling-based approximations. The technology is Jupyter Notebook with Microsoft Open R 3.4.4.

For the probability calculations, I draw heavily on R's choose and combn combinatorics functions. The basic sample function feeds the statistical estimation cell's main loop. Simple frequencies of individual hands by suit and rank computed from the data.table package are central to the exercise. Not surprisingly, the notebook is very compute-intensive, consuming almost 2.5 hours from end to end.

Load libraries, set options, assign file names, and migrate to the working directory.

In [32]:
suppressMessages(library(data.table))
suppressMessages(library(tidyverse))

options(warn=-1)
options(digits = 10)
options(scipen = 10)

dir <- "c:/data/poker"

mdata <- "mathcounts.rdata"
sdata <- "statcounts.rdata"

setwd(dir)

cat("\n\n")

Define the 52 card deck by suit and rank in R.

In [33]:
set.seed(987)

rank = c(2:10,"J","Q","K","A")

cards = data.table(id=1:52,
                   suit=c(rep("C",13),rep("D",13),rep("H",13),rep("S",13)),
                   rank=rep(rank,4))
cards$rank <- factor(cards$rank,levels=rank,ordered=TRUE)

nr <- nrow(cards)

draw <- 5

cat("\n\n")

Invoke the choose and combn R combinatorics functions to build a matrix consisting of all choose(52,5) = 2598960 combinations of hands. Establish an every 10% of iterations vector for holding cumulative ongoing counts.

In [34]:
howmany <- choose(52,5)

combos <- cbind(matrix(combn(cards$suit,draw),byrow=TRUE,ncol=draw),
                         matrix(combn(cards$rank,draw),byrow=TRUE,ncol=draw))[1:howmany,]
dim(combos)

q <- as.vector(round(quantile(1:howmany,probs=seq(0,1,.1)))[2:11])

cat("\n\n")
  1. 2598960
  2. 10

Worker function to help track counts for the different types of hands.

In [35]:
addtimes <- function()
{
    data.table(type=c("nothing","1 pair","2 pair","3 of a kind","full house","4 of a kind",
                      "flush","straight","straight flush","royal straight flush","bad"),
               counts=c(ctnothing,ct2111,ct221,ct311,ct32,ct41,
                       ctflush,ctstraight,ctstraightflush,ctroyalstraightflush,ctbad))
}

cat("\n\n")

Compute counts summarizing the 2598960 separate hands. The strategy is as follows: loop through each possibility, first determining if there's a pair or more of ranks. If so, determine if the multiple ranks consist of 1 pair, 2 pairs, 3 of a kind, a full house, or 4 of a kind. Increment the appropriate counter. If not, look for flushes and straights, incrementing counts as appropriate. Snapshot the counts every 10% of processing.

In [36]:
ptmpre <- proc.time()

mathcounts <- list()

ctnothing <- 0
ctroyalstraightflush <- 0
ctstraightflush <- 0
ctstraight <- 0
ctflush <- 0
ct2111 <- 0
ct221 <- 0
ct311 <- 0
ct32 <- 0
ct41 <- 0
ctbad <- 0

ctstr1 <- 0
ctstr2 <- 0


for (i in 1:howmany) 
{
    s <- data.table(suit=combos[i,1:5],rank=factor(combos[i,6:10],levels=rank,ordered=TRUE))[order(rank)]
    
    rr <- s[,.(count=.N),.(rank)][order(-count)]

    nrrr <- nrow(rr)

    if (nrrr == draw)   # no pairs or other multiple ranks. look for straights/flushes.
    {
        ss <- s[,.(count=.N),.(suit)][order(-count)]
        
        nrss <- nrow(ss)
        isflush <- (nrss == 1)
        fc <- as.integer(s$rank)
        isstr1 <- ((fc[5]-fc[1]) == (draw-1))
        isstr2 <- all(fc %in% c(1:4,13))
        if (isstr1) ctstr1 <- ctstr1 + 1              
        if (isstr2) ctstr2 <- ctstr2 + 1
        isstraight <- isstr1 | isstr2
        if (isstraight & isflush) 
        {
            if (all(c("K","A") %in% s$rank)) ctroyalstraightflush <- ctroyalstraightflush + 1
            else ctstraightflush <- ctstraightflush + 1
        }
        else if (isflush) ctflush <- ctflush + 1
        else if (isstraight) ctstraight <- ctstraight + 1
        else ctnothing <- ctnothing + 1
    }       
    else if (all(rr$count==c(2,1,1,1))) # multiple ranks == at least 1 pair.
        ct2111 <- ct2111 + 1
    else if (all(rr$count==c(2,2,1))) 
        ct221 <- ct221 + 1
    else if (all(rr$count==c(3,1,1))) 
        ct311 <- ct311 + 1
    else if (all(rr$count==c(3,2))) 
        ct32 <- ct32 + 1
    else if (all(rr$count==c(4,1))) 
        ct41 <- ct41 + 1
    else
        ctbad <- ctbad + 1
    
    if (i %in% q) mathcounts[[as.character(i)]] <- addtimes()
}

ptmpost <- proc.time()
print(ptmpost-ptmpre)

cat("\n\n")
   user  system elapsed 
4521.35    6.58 4532.55 


Show the final probability counts. Thankfully, they indeed match the mathematical computations.

In [43]:
for (nm in names(mathcounts)) 
{
    mathcounts[[nm]]$pct <- 100*mathcounts[[nm]]$counts/as.numeric(nm)
    mathcounts[[nm]] <- mathcounts[[nm]][order(-pct)]
}

saveRDS(mathcounts,mdata)

mathcounts[length(mathcounts)]

cat("\n\n")
$`2598960` =
typecountspct
nothing 1302540 50.1177394034537
1 pair 1098240 42.2569027611044
2 pair 123552 4.7539015606243
3 of a kind 54912 2.1128451380552
straight 10200 0.3924646781790
flush 5108 0.1965401545233
full house 3744 0.1440576230492
4 of a kind 624 0.0240096038415
straight flush 36 0.0013851694524
royal straight flush 4 0.0001539077169
bad 0 0.0000000000000

Set up to do statistical Monte Carlo simulation.

In [42]:
howmany = choose(52,5)
combos = matrix(ncol=draw,nrow=howmany,byrow=TRUE)

q <- as.vector(round(quantile(1:howmany,probs=seq(0,1,.1)))[2:11])

cat("\n\n")

In [39]:
for (i in 1:howmany) combos[i,] <- sample(nr,draw,replace=FALSE) 

dim(combos)

cat("\n\n")
  1. 2598960
  2. 5

Now use a similar computation strategy to that executed for probability. Rather than delineate all possibilities as above, in this exercise sample 5 cards without replacement many times. Just for kicks, perform choose(52,5) = 2598960 sampling iterations and view cumulative counts/percentages for each 10%.

In [40]:
ptmpre <- proc.time()

statcounts <- list()

ctnothing <- 0
ctroyalstraightflush <- 0
ctstraightflush <- 0
ctstraight <- 0
ctflush <- 0
ct2111 <- 0
ct221 <- 0
ct311 <- 0
ct32 <- 0
ct41 <- 0
ctbad <- 0

ctstr1 <- 0
ctstr2 <- 0

for (i in 1:howmany) 
{
    
    s <- cards[combos[i,]][order(rank)]
    
    rr <- s[,.(count=.N),.(rank)][order(-count)]

    nrrr <- nrow(rr)

    if (nrrr == draw)
    {
        ss <- s[,.(count=.N),.(suit)][order(-count)]

        nrss <- nrow(ss)
        isflush <- (nrss == 1)
        fc <- as.integer(s$rank)
        isstr1 <- ((fc[5]-fc[1]) == (draw-1))
        isstr2 <- all(fc %in% c(1:4,13))
        if (isstr1) ctstr1 <- ctstr1 + 1
        if (isstr2) ctstr2 <- ctstr2 + 1
        isstraight <- isstr1 | isstr2
        if (isstraight & isflush) 
        {
            if (all(c("K","A") %in% s$rank)) ctroyalstraightflush <- ctroyalstraightflush + 1
            else ctstraightflush <- ctstraightflush + 1
        }
        else if (isflush) ctflush <- ctflush + 1
        else if (isstraight) ctstraight <- ctstraight + 1
        else ctnothing <- ctnothing + 1
    }       
    else if (all(rr$count==c(2,1,1,1))) 
        ct2111 <- ct2111 + 1
    else if (all(rr$count==c(2,2,1))) 
        ct221 <- ct221 + 1
    else if (all(rr$count==c(3,1,1))) 
        ct311 <- ct311 + 1
    else if (all(rr$count==c(3,2))) 
        ct32 <- ct32 + 1
    else if (all(rr$count==c(4,1))) 
        ct41 <- ct41 + 1
    else
        ctbad <- ctbad + 1
    
    if (i %in% q) statcounts[[as.character(i)]] <- addtimes()
}

ptmpost <- proc.time()
print(ptmpost-ptmpre)

cat("\n\n")
   user  system elapsed 
4562.47   13.73 4581.83 


Print all stat counts. Note how the figures start to look a lot like the math results as the sample size gets larger, corroborating the law of large numbers. Indeed, a large number of samples is needed to get a reliable feel for rare events like a straight flush.

In [41]:
for (nm in names(statcounts)) 
{
    statcounts[[nm]]$pct <- 100*statcounts[[nm]]$counts/as.numeric(nm)
    statcounts[[nm]] <- statcounts[[nm]][order(-pct)]
}

saveRDS(statcounts,sdata)

statcounts

cat("\n\n")
$`259897`
typecountspct
nothing 130628 50.2614497281615
1 pair 109668 42.1967163914936
2 pair 12181 4.6868567163145
3 of a kind 5503 2.1173772686872
straight 972 0.3739943131317
flush 502 0.1931534415557
full house 384 0.1477508397557
4 of a kind 58 0.0223165330881
straight flush 1 0.0003847678119
royal straight flush 0 0.0000000000000
bad 0 0.0000000000000
$`519793`
typecountspct
nothing 260879 50.1890175512175
1 pair 219595 42.2466250988374
2 pair 24549 4.7228415927109
3 of a kind 10936 2.1039144428648
straight 1943 0.3738026483619
flush 992 0.1908452018400
full house 766 0.1473663554530
4 of a kind 131 0.0252023401623
straight flush 2 0.0003847685521
royal straight flush 0 0.0000000000000
bad 0 0.0000000000000
$`779689`
typecountspct
nothing 391264 50.1820597699852
1 pair 329296 42.2342754611133
2 pair 36849 4.7261151561712
3 of a kind 16425 2.1066091736577
straight 2994 0.3839992612439
flush 1515 0.1943082434150
full house 1147 0.1471099374238
4 of a kind 192 0.0246252031259
straight flush 6 0.0007695375977
royal straight flush 1 0.0001282562663
bad 0 0.0000000000000
$`1039585`
typecountspct
nothing 521701 50.18358287201143
1 pair 438990 42.22742729069773
2 pair 49182 4.73092628308412
3 of a kind 21900 2.10660984912249
straight 3992 0.38399938436972
flush 2041 0.19632834255977
full house 1503 0.14457692252197
4 of a kind 264 0.02539474886613
straight flush 11 0.00105811453609
royal straight flush 1 0.00009619223055
bad 0 0.00000000000000
$`1299480`
typecountspct
nothing 651640 50.1462123310863
1 pair 548963 42.2448210053252
2 pair 61590 4.7395881429495
3 of a kind 27465 2.1135377227814
straight 5081 0.3910025548681
flush 2549 0.1961553852310
full house 1851 0.1424415920214
4 of a kind 326 0.0250869578601
straight flush 13 0.0010004001601
royal straight flush 2 0.0001539077169
bad 0 0.0000000000000
$`1559376`
typecountspct
nothing 781870 50.1399277659782
1 pair 658784 42.2466422466422
2 pair 73945 4.7419608869189
3 of a kind 33063 2.1202711853972
straight 6108 0.3916951395943
flush 3015 0.1933465693970
full house 2196 0.1408255609936
4 of a kind 378 0.0242404654169
straight flush 15 0.0009619232308
royal straight flush 2 0.0001282564308
bad 0 0.0000000000000
$`1819272`
typecountspct
nothing 911798 50.1188387442889
1 pair 768982 42.2686657080415
2 pair 86236 4.7401378133671
3 of a kind 38540 2.1184297894982
straight 7144 0.3926845463460
flush 3518 0.1933740529179
full house 2587 0.1421997370377
4 of a kind 449 0.0246802017510
straight flush 15 0.0008245056264
royal straight flush 3 0.0001649011253
bad 0 0.0000000000000
$`2079168`
typecountspct
nothing 1042274 50.1293786745467
1 pair 878931 42.2732073598670
2 pair 98447 4.7349228152800
3 of a kind 43934 2.1130567611660
straight 8142 0.3915989472712
flush 3997 0.1922403576815
full house 2914 0.1401522147320
4 of a kind 507 0.0243847539016
straight flush 19 0.0009138270693
royal straight flush 3 0.0001442884846
bad 0 0.0000000000000
$`2339064`
typecountspct
nothing 1172527 50.1280426700595
1 pair 988856 42.2757137042851
2 pair 110876 4.7401866729598
3 of a kind 49295 2.1074669183913
straight 9176 0.3922936696046
flush 4477 0.1914013468635
full house 3269 0.1397567574038
4 of a kind 563 0.0240694568426
straight flush 22 0.0009405471590
royal straight flush 3 0.0001282564308
bad 0 0.0000000000000
$`2598960`
typecountspct
nothing 1302994 50.1352079293256
1 pair 1098540 42.2684458398744
2 pair 123115 4.7370871425493
3 of a kind 54803 2.1086511527688
straight 10246 0.3942346169237
flush 4987 0.1918844460861
full house 3612 0.1389786683904
4 of a kind 634 0.0243943731339
straight flush 26 0.0010004001601
royal straight flush 3 0.0001154307877
bad 0 0.0000000000000

I went back to my daughter to discuss the findings, excited at my seeming success demonstrating the connection between probability and statistics. Alas, I don't believe my efforts were fruitful. (: