Benchmarks • cards

Measure the performance of different implementations of cards using bench::mark().

library(cards)
library(reticulate)

phevaluator <- import("phevaluator")

Data Frame

Benchmark the initial implementation using data.frame compared to an integer() approach similar to PH Evaluator card.py.

New Deck

Create a new deck using new_deck_df() and an integer vector.

deck <- new_deck_df()
deck_int <- 0:51
bench::mark(new_deck_df())
#> # A tibble: 1 × 6
#>   expression         min   median `itr/sec` mem_alloc `gc/sec`
#>   <bch:expr>    <bch:tm> <bch:tm>     <dbl> <bch:byt>    <dbl>
#> 1 new_deck_df()   14.8µs   17.3µs    56923.    1.25KB     57.0
bench::mark(0:51)
#> # A tibble: 1 × 6
#>   expression      min   median `itr/sec` mem_alloc `gc/sec`
#>   <bch:expr> <bch:tm> <bch:tm>     <dbl> <bch:byt>    <dbl>
#> 1 0:51              0      1ns 76114582.        0B        0

While new_deck_df() is not designed to be called frequently, using an integer vector is much faster.

Deal

Compare performance of deal_hand_df() to sampling integers:

bench::mark(deal_hand_df(deck))
#> # A tibble: 1 × 6
#>   expression              min   median `itr/sec` mem_alloc `gc/sec`
#>   <bch:expr>         <bch:tm> <bch:tm>     <dbl> <bch:byt>    <dbl>
#> 1 deal_hand_df(deck)   13.5µs   15.6µs    64210.    3.73KB     57.8
bench::mark(sample(deck_int, 5))
#> # A tibble: 1 × 6
#>   expression               min   median `itr/sec` mem_alloc `gc/sec`
#>   <bch:expr>          <bch:tm> <bch:tm>     <dbl> <bch:byt>    <dbl>
#> 1 sample(deck_int, 5)   1.44µs   1.84µs   524241.      264B     52.4

deal_hand_df() is about 7 times slower than sample().

Print

Test performance of print_hand_df() against a simple function that prints cards based on integers.

test_hand <- deal_hand_df(deck)
bench::mark(print_hand_df(test_hand))
#> # A tibble: 1 × 6
#>   expression                    min   median `itr/sec` mem_alloc `gc/sec`
#>   <bch:expr>               <bch:tm> <bch:tm>     <dbl> <bch:byt>    <dbl>
#> 1 print_hand_df(test_hand)   59.3µs   69.2µs    14358.    10.5KB     22.8

print_hand_int <- function(h) {
  cards <- paste0(rep(c(2:9, "T", "J", "Q", "K", "A"), each = 4), c("C", "D", "H", "S"))
  paste(cards[h + 1], collapse = " ")
}
test_hand_int <- sample(0:51, 5)
bench::mark(print_hand_int(test_hand_int))
#> # A tibble: 1 × 6
#>   expression                         min   median `itr/sec` mem_alloc `gc/sec`
#>   <bch:expr>                    <bch:tm> <bch:tm>     <dbl> <bch:byt>    <dbl>
#> 1 print_hand_int(test_hand_int)   4.18µs   4.88µs   196738.      928B        0

print_hand_df() is 14-15 times slower than the integer approach.

Evaluate

Test performance of eval_hand_df() with a single hand and with randomly selected hands:

bench::mark(eval_hand_df(test_hand))
#> # A tibble: 1 × 6
#>   expression                   min   median `itr/sec` mem_alloc `gc/sec`
#>   <bch:expr>              <bch:tm> <bch:tm>     <dbl> <bch:byt>    <dbl>
#> 1 eval_hand_df(test_hand)   42.1µs   48.3µs    20653.      37KB     16.5
bench::mark(eval_hand_df(deal_hand_df(deck)))
#> # A tibble: 1 × 6
#>   expression                            min  median `itr/sec` mem_alloc `gc/sec`
#>   <bch:expr>                       <bch:tm> <bch:t>     <dbl> <bch:byt>    <dbl>
#> 1 eval_hand_df(deal_hand_df(deck))     47µs  62.1µs    16014.      264B     18.5

As expected for a naive poker hand evaluator, performance of eval_hand_df() is poor compared to fast algorithms.

Summary

An implementation using integer would likely be much faster than the first implementation using data.frame. Rank and suit can be derived using integer division and modulo arithmetic respectively,and tabulate() is a faster replacement for rle().

0:51 %/% 4
#>  [1]  0  0  0  0  1  1  1  1  2  2  2  2  3  3  3  3  4  4  4  4  5  5  5  5  6
#> [26]  6  6  6  7  7  7  7  8  8  8  8  9  9  9  9 10 10 10 10 11 11 11 11 12 12
#> [51] 12 12
tabulate(0:51 %/% 4 + 1, 13)
#>  [1] 4 4 4 4 4 4 4 4 4 4 4 4 4
0:51 %% 4
#>  [1] 0 1 2 3 0 1 2 3 0 1 2 3 0 1 2 3 0 1 2 3 0 1 2 3 0 1 2 3 0 1 2 3 0 1 2 3 0 1
#> [39] 2 3 0 1 2 3 0 1 2 3 0 1 2 3
tabulate(0:51 %% 4 + 1, 4)
#> [1] 13 13 13 13

bench::mark(rle(sort(sample(0:51, 5) %/% 4 + 1)))
#> # A tibble: 1 × 6
#>   expression                             min median `itr/sec` mem_alloc `gc/sec`
#>   <bch:expr>                         <bch:t> <bch:>     <dbl> <bch:byt>    <dbl>
#> 1 rle(sort(sample(0:51, 5)%/%4 + 1))  15.4µs 17.9µs    55880.      264B     16.8
bench::mark(tabulate(sample(0:51, 5) %/% 4 + 1, 13))
#> # A tibble: 1 × 6
#>   expression                             min median `itr/sec` mem_alloc `gc/sec`
#>   <bch:expr>                          <bch:> <bch:>     <dbl> <bch:byt>    <dbl>
#> 1 tabulate(sample(0:51, 5)%/%4 + 1, … 2.13µs 2.62µs   371589.      264B     37.2

Note that the tabulate approach is 7 times faster than sorting and run length encoding.

Integer

Benchmark the second implementation using integer().

New Deck

Create a new deck using new_deck() and new_deck_df().

deck_df <- new_deck_df()
deck <- new_deck()
bench::mark(new_deck_df())
#> # A tibble: 1 × 6
#>   expression         min   median `itr/sec` mem_alloc `gc/sec`
#>   <bch:expr>    <bch:tm> <bch:tm>     <dbl> <bch:byt>    <dbl>
#> 1 new_deck_df()     15µs   17.3µs    57688.    1.25KB     17.3
bench::mark(new_deck())
#> # A tibble: 1 × 6
#>   expression      min   median `itr/sec` mem_alloc `gc/sec`
#>   <bch:expr> <bch:tm> <bch:tm>     <dbl> <bch:byt>    <dbl>
#> 1 new_deck()        0     41ns 19923001.        0B        0

new_deck() is 90 times faster.

Deal

Compare performance of deal_hand_df() and deal_hand()

bench::mark(deal_hand_df(deck_df))
#> # A tibble: 1 × 6
#>   expression                 min   median `itr/sec` mem_alloc `gc/sec`
#>   <bch:expr>            <bch:tm> <bch:tm>     <dbl> <bch:byt>    <dbl>
#> 1 deal_hand_df(deck_df)   13.6µs   15.7µs    63230.      264B     19.0
bench::mark(deal_hand(deck))
#> # A tibble: 1 × 6
#>   expression           min   median `itr/sec` mem_alloc `gc/sec`
#>   <bch:expr>      <bch:tm> <bch:tm>     <dbl> <bch:byt>    <dbl>
#> 1 deal_hand(deck)    1.6µs   2.01µs   484160.    3.02KB        0

deal_hand() is about 6 times faster.

Print

Test performance of print_hand_df() against print_hand().

test_hand_df <- deal_hand_df(deck_df)
test_hand <- deal_hand(deck)

bench::mark(print_hand_df(test_hand_df))
#> # A tibble: 1 × 6
#>   expression                       min   median `itr/sec` mem_alloc `gc/sec`
#>   <bch:expr>                  <bch:tm> <bch:tm>     <dbl> <bch:byt>    <dbl>
#> 1 print_hand_df(test_hand_df)   60.8µs   69.6µs    14330.        0B     18.7
bench::mark(print_hand(test_hand))
#> # A tibble: 1 × 6
#>   expression                 min   median `itr/sec` mem_alloc `gc/sec`
#>   <bch:expr>            <bch:tm> <bch:tm>     <dbl> <bch:byt>    <dbl>
#> 1 print_hand(test_hand)   4.22µs   4.96µs   195999.    8.52KB        0

print_hand() is 16 times faster.

Evaluate

Test performance of eval_hand_df() and eval_hand() with a single hand.

bench::mark(eval_hand_df(test_hand_df))
#> # A tibble: 1 × 6
#>   expression                      min   median `itr/sec` mem_alloc `gc/sec`
#>   <bch:expr>                 <bch:tm> <bch:tm>     <dbl> <bch:byt>    <dbl>
#> 1 eval_hand_df(test_hand_df)   39.2µs   45.2µs    22200.        0B     17.8
bench::mark(eval_hand(test_hand))
#> # A tibble: 1 × 6
#>   expression                min   median `itr/sec` mem_alloc `gc/sec`
#>   <bch:expr>           <bch:tm> <bch:tm>     <dbl> <bch:byt>    <dbl>
#> 1 eval_hand(test_hand)   1.35µs   1.76µs   518620.    27.5KB        0

eval_hand() is 20 times faster, but should perform poorly compared to fast algorithms.

Multiple Hands

Compare performance evaluating and printing multiple hands.

bench::mark({
  deck <- new_deck_df()
  replicate(50, {
    hand <- deal_hand_df(deck)
    paste0(print_hand_df(hand), ": ", eval_hand_df(hand))
  })
})
#> # A tibble: 1 × 6
#>   expression                             min median `itr/sec` mem_alloc `gc/sec`
#>   <bch:expr>                          <bch:> <bch:>     <dbl> <bch:byt>    <dbl>
#> 1 { deck <- new_deck_df() replicate(… 6.46ms  6.8ms      147.    34.1KB     21.0
bench::mark({
  deck <- new_deck()
  replicate(50, {
    hand <- deal_hand(deck)
    paste0(print_hand(hand), ": ", eval_hand(hand))
  })
})
#> # A tibble: 1 × 6
#>   expression                             min median `itr/sec` mem_alloc `gc/sec`
#>   <bch:expr>                           <bch> <bch:>     <dbl> <bch:byt>    <dbl>
#> 1 { deck <- new_deck() replicate(50, … 470µs  523µs     1889.      74KB     12.6

Overall, the new implementation is 13-14 times faster.

Python

Benchmark the integer() approach to PH Evaluator using reticulate.

Import

Test performance of phevaluator using reticulate::import(), starting with sample_cards():

bench::mark(deal_hand(deck))
#> # A tibble: 1 × 6
#>   expression           min   median `itr/sec` mem_alloc `gc/sec`
#>   <bch:expr>      <bch:tm> <bch:tm>     <dbl> <bch:byt>    <dbl>
#> 1 deal_hand(deck)    1.6µs   2.01µs   478122.      264B     47.8
bench::mark(phevaluator$sample_cards(5L))
#> # A tibble: 1 × 6
#>   expression                        min   median `itr/sec` mem_alloc `gc/sec`
#>   <bch:expr>                   <bch:tm> <bch:tm>     <dbl> <bch:byt>    <dbl>
#> 1 phevaluator$sample_cards(5L)   18.1µs   20.7µs    48328.        0B     4.83

phevaluator$sample_cards() is 13 times slower than than the R integer implementation.

Also test phevaluator$evaluate_card() against the R integer method. evaluate_card() expects five to seven integers passed as individual parameters.

bench::mark(eval_hand(test_hand))
#> # A tibble: 1 × 6
#>   expression                min   median `itr/sec` mem_alloc `gc/sec`
#>   <bch:expr>           <bch:tm> <bch:tm>     <dbl> <bch:byt>    <dbl>
#> 1 eval_hand(test_hand)   1.48µs   1.76µs   525428.        0B     52.5
bench::mark(do.call(phevaluator$evaluate_cards, as.list(test_hand)))
#> # A tibble: 1 × 6
#>   expression                             min median `itr/sec` mem_alloc `gc/sec`
#>   <bch:expr>                          <bch:> <bch:>     <dbl> <bch:byt>    <dbl>
#> 1 do.call(phevaluator$evaluate_cards… 20.7µs 23.8µs    42420.        0B     8.49

Surprisingly, phevaluator is almost as slow as the original data frame implementation. Test again using some specific hands and avoid the overhead of do.call() and as.list():

four_aces <- c(51L, 50L, 49L, 48L, 47L)
royal_flush <- c(50L, 46L, 42L, 38L, 34L)

bench::mark(eval_hand(four_aces))
#> # A tibble: 1 × 6
#>   expression                min   median `itr/sec` mem_alloc `gc/sec`
#>   <bch:expr>           <bch:tm> <bch:tm>     <dbl> <bch:byt>    <dbl>
#> 1 eval_hand(four_aces)    656ns    861ns  1113434.        0B        0
bench::mark(phevaluator$evaluate_cards(51L, 50L, 49L, 48L, 47L))
#> # A tibble: 1 × 6
#>   expression                             min median `itr/sec` mem_alloc `gc/sec`
#>   <bch:expr>                          <bch:> <bch:>     <dbl> <bch:byt>    <dbl>
#> 1 phevaluator$evaluate_cards(51L, 50… 18.6µs 21.3µs    47073.        0B     9.42

bench::mark(eval_hand(royal_flush))
#> # A tibble: 1 × 6
#>   expression                  min   median `itr/sec` mem_alloc `gc/sec`
#>   <bch:expr>             <bch:tm> <bch:tm>     <dbl> <bch:byt>    <dbl>
#> 1 eval_hand(royal_flush)   1.31µs    1.6µs   598872.        0B        0
bench::mark(phevaluator$evaluate_cards(50L, 46L, 42L, 38L, 34L))
#> # A tibble: 1 × 6
#>   expression                             min median `itr/sec` mem_alloc `gc/sec`
#>   <bch:expr>                          <bch:> <bch:>     <dbl> <bch:byt>    <dbl>
#> 1 phevaluator$evaluate_cards(50L, 46… 18.4µs 21.3µs    47674.        0B     9.54

Calling evaluate_cards() directly doesn’t significantly change the results. Test once more with random hands:

bench::mark(eval_hand(deal_hand(deck)))
#> # A tibble: 1 × 6
#>   expression                      min   median `itr/sec` mem_alloc `gc/sec`
#>   <bch:expr>                 <bch:tm> <bch:tm>     <dbl> <bch:byt>    <dbl>
#> 1 eval_hand(deal_hand(deck))   2.71µs   3.77µs   261439.      264B     26.1
bench::mark(do.call(phevaluator$evaluate_cards, as.list(deal_hand(deck))))
#> # A tibble: 1 × 6
#>   expression                             min median `itr/sec` mem_alloc `gc/sec`
#>   <bch:expr>                          <bch:> <bch:>     <dbl> <bch:byt>    <dbl>
#> 1 do.call(phevaluator$evaluate_cards… 22.9µs 26.2µs    38481.      264B     11.5

Conclusion: using phevaluator via reticulate::import() is not a faster way to evaluate hands. It is important to note that phevaluator$evaluate_cards() does more than eval_hand(), as phevaluator ranks all poker hands and eval_hand() only determines the hand rank category.

C/C++

Benchmark the integer() approach against the C/C++ implementation of PH Evaluator using Rcpp.

The current version only implements eval_hand_phe(), which uses EvaluateCards() and describeCategory() to return the card rank category.

Evaluate

Test performance of eval_hand() and eval_hand_phe() with a single hand:

bench::mark(eval_hand(test_hand))
#> # A tibble: 1 × 6
#>   expression                min   median `itr/sec` mem_alloc `gc/sec`
#>   <bch:expr>           <bch:tm> <bch:tm>     <dbl> <bch:byt>    <dbl>
#> 1 eval_hand(test_hand)   1.48µs   1.84µs   514238.        0B     51.4
bench::mark(eval_hand_phe(test_hand))
#> # A tibble: 1 × 6
#>   expression                    min   median `itr/sec` mem_alloc `gc/sec`
#>   <bch:expr>               <bch:tm> <bch:tm>     <dbl> <bch:byt>    <dbl>
#> 1 eval_hand_phe(test_hand)    369ns    492ns  1980542.    2.24KB        0

Somewhat surprisingly, eval_hand_phe() is only 2 times faster than eval_hand(), however, eval_hand_phe() doesn’t just evaluate hand rank category, it also determines exact hand rank.

Reviewing the benchmarks on the PH Evaluator README and on my own system, the compiled C/C++ implementation should be capable of about 70 million hands per second, while eval_hand_phe() achieves about 1 million per second. This is likely due to the additional overhead of using R, and, more importantly, the additional call to describeCategory(), as the benchmark code only calls EvaluateCards().

A future implementation could implement the full pheval libraries and the C++ code in card_sampler.h to generate random hands in a standalone R package using Rcpp Modules.