Measure the performance of different implementations of
cards
using bench::mark()
.
library(cards)
library(reticulate)
phevaluator <- import("phevaluator")
Data Frame
Benchmark the initial implementation using data.frame
compared to an integer()
approach similar to PH Evaluator
card.py
.
New Deck
Create a new deck using new_deck_df()
and an integer
vector.
deck <- new_deck_df()
deck_int <- 0:51
bench::mark(new_deck_df())
#> # A tibble: 1 × 6
#> expression min median `itr/sec` mem_alloc `gc/sec`
#> <bch:expr> <bch:tm> <bch:tm> <dbl> <bch:byt> <dbl>
#> 1 new_deck_df() 14.9µs 17.2µs 57930. 1.25KB 58.0
bench::mark(0:51)
#> # A tibble: 1 × 6
#> expression min median `itr/sec` mem_alloc `gc/sec`
#> <bch:expr> <bch:tm> <bch:tm> <dbl> <bch:byt> <dbl>
#> 1 0:51 0 1ns 67680050. 0B 0
While new_deck_df()
is not designed to be called
frequently, using an integer vector is much faster.
Deal
Compare performance of deal_hand_df()
to sampling
integers:
bench::mark(deal_hand_df(deck))
#> # A tibble: 1 × 6
#> expression min median `itr/sec` mem_alloc `gc/sec`
#> <bch:expr> <bch:tm> <bch:tm> <dbl> <bch:byt> <dbl>
#> 1 deal_hand_df(deck) 13.4µs 15.5µs 63639. 3.73KB 57.3
bench::mark(sample(deck_int, 5))
#> # A tibble: 1 × 6
#> expression min median `itr/sec` mem_alloc `gc/sec`
#> <bch:expr> <bch:tm> <bch:tm> <dbl> <bch:byt> <dbl>
#> 1 sample(deck_int, 5) 1.44µs 1.8µs 532720. 264B 107.
deal_hand_df()
is about 7 times slower than
sample()
.
Test performance of print_hand_df()
against a simple
function that prints cards based on integers.
test_hand <- deal_hand_df(deck)
bench::mark(print_hand_df(test_hand))
#> # A tibble: 1 × 6
#> expression min median `itr/sec` mem_alloc `gc/sec`
#> <bch:expr> <bch:tm> <bch:tm> <dbl> <bch:byt> <dbl>
#> 1 print_hand_df(test_hand) 59.8µs 68.8µs 14368. 10.5KB 20.4
print_hand_int <- function(h) {
cards <- paste0(rep(c(2:9, "T", "J", "Q", "K", "A"), each = 4), c("C", "D", "H", "S"))
paste(cards[h + 1], collapse = " ")
}
test_hand_int <- sample(0:51, 5)
bench::mark(print_hand_int(test_hand_int))
#> # A tibble: 1 × 6
#> expression min median `itr/sec` mem_alloc `gc/sec`
#> <bch:expr> <bch:tm> <bch:tm> <dbl> <bch:byt> <dbl>
#> 1 print_hand_int(test_hand_int) 4.14µs 4.76µs 200509. 928B 0
print_hand_df()
is 14-15 times slower than the integer
approach.
Evaluate
Test performance of eval_hand_df()
with a single hand
and with randomly selected hands:
bench::mark(eval_hand_df(test_hand))
#> # A tibble: 1 × 6
#> expression min median `itr/sec` mem_alloc `gc/sec`
#> <bch:expr> <bch:tm> <bch:tm> <dbl> <bch:byt> <dbl>
#> 1 eval_hand_df(test_hand) 42.1µs 47.8µs 20681. 37KB 16.6
bench::mark(eval_hand_df(deal_hand_df(deck)))
#> # A tibble: 1 × 6
#> expression min median `itr/sec` mem_alloc `gc/sec`
#> <bch:expr> <bch:tm> <bch:t> <dbl> <bch:byt> <dbl>
#> 1 eval_hand_df(deal_hand_df(deck)) 47.2µs 62.2µs 15985. 264B 18.5
As expected for a naive poker hand evaluator, performance of
eval_hand_df()
is poor compared to fast algorithms.
Summary
An implementation using integer would likely be much faster than the
first implementation using data.frame
. Rank and suit can be
derived using integer division and modulo arithmetic respectively,and
tabulate()
is a faster replacement for
rle()
.
0:51 %/% 4
#> [1] 0 0 0 0 1 1 1 1 2 2 2 2 3 3 3 3 4 4 4 4 5 5 5 5 6
#> [26] 6 6 6 7 7 7 7 8 8 8 8 9 9 9 9 10 10 10 10 11 11 11 11 12 12
#> [51] 12 12
tabulate(0:51 %/% 4 + 1, 13)
#> [1] 4 4 4 4 4 4 4 4 4 4 4 4 4
0:51 %% 4
#> [1] 0 1 2 3 0 1 2 3 0 1 2 3 0 1 2 3 0 1 2 3 0 1 2 3 0 1 2 3 0 1 2 3 0 1 2 3 0 1
#> [39] 2 3 0 1 2 3 0 1 2 3 0 1 2 3
tabulate(0:51 %% 4 + 1, 4)
#> [1] 13 13 13 13
bench::mark(rle(sort(sample(0:51, 5) %/% 4 + 1)))
#> # A tibble: 1 × 6
#> expression min median `itr/sec` mem_alloc `gc/sec`
#> <bch:expr> <bch:t> <bch:> <dbl> <bch:byt> <dbl>
#> 1 rle(sort(sample(0:51, 5)%/%4 + 1)) 15.3µs 17.5µs 55210. 264B 16.6
bench::mark(tabulate(sample(0:51, 5) %/% 4 + 1, 13))
#> # A tibble: 1 × 6
#> expression min median `itr/sec` mem_alloc `gc/sec`
#> <bch:expr> <bch:> <bch:> <dbl> <bch:byt> <dbl>
#> 1 tabulate(sample(0:51, 5)%/%4 + 1, … 2.17µs 2.67µs 370748. 264B 0
Note that the tabulate approach is 7 times faster than sorting and run length encoding.
Integer
Benchmark the second implementation using integer()
.
New Deck
Create a new deck using new_deck()
and
new_deck_df()
.
deck_df <- new_deck_df()
deck <- new_deck()
bench::mark(new_deck_df())
#> # A tibble: 1 × 6
#> expression min median `itr/sec` mem_alloc `gc/sec`
#> <bch:expr> <bch:tm> <bch:tm> <dbl> <bch:byt> <dbl>
#> 1 new_deck_df() 14.8µs 17.2µs 57689. 1.25KB 17.3
bench::mark(new_deck())
#> # A tibble: 1 × 6
#> expression min median `itr/sec` mem_alloc `gc/sec`
#> <bch:expr> <bch:tm> <bch:tm> <dbl> <bch:byt> <dbl>
#> 1 new_deck() 41ns 82ns 10928520. 0B 0
new_deck()
is 90 times faster.
Deal
Compare performance of deal_hand_df()
and
deal_hand()
bench::mark(deal_hand_df(deck_df))
#> # A tibble: 1 × 6
#> expression min median `itr/sec` mem_alloc `gc/sec`
#> <bch:expr> <bch:tm> <bch:tm> <dbl> <bch:byt> <dbl>
#> 1 deal_hand_df(deck_df) 13.3µs 15.9µs 62993. 264B 18.9
bench::mark(deal_hand(deck))
#> # A tibble: 1 × 6
#> expression min median `itr/sec` mem_alloc `gc/sec`
#> <bch:expr> <bch:tm> <bch:tm> <dbl> <bch:byt> <dbl>
#> 1 deal_hand(deck) 1.6µs 2.01µs 487704. 3.02KB 0
deal_hand()
is about 6 times faster.
Test performance of print_hand_df()
against
print_hand()
.
test_hand_df <- deal_hand_df(deck_df)
test_hand <- deal_hand(deck)
bench::mark(print_hand_df(test_hand_df))
#> # A tibble: 1 × 6
#> expression min median `itr/sec` mem_alloc `gc/sec`
#> <bch:expr> <bch:tm> <bch:tm> <dbl> <bch:byt> <dbl>
#> 1 print_hand_df(test_hand_df) 60µs 69.1µs 14392. 0B 20.8
bench::mark(print_hand(test_hand))
#> # A tibble: 1 × 6
#> expression min median `itr/sec` mem_alloc `gc/sec`
#> <bch:expr> <bch:tm> <bch:tm> <dbl> <bch:byt> <dbl>
#> 1 print_hand(test_hand) 4.14µs 4.8µs 205401. 8.52KB 0
print_hand()
is 16 times faster.
Evaluate
Test performance of eval_hand_df()
and
eval_hand()
with a single hand.
bench::mark(eval_hand_df(test_hand_df))
#> # A tibble: 1 × 6
#> expression min median `itr/sec` mem_alloc `gc/sec`
#> <bch:expr> <bch:tm> <bch:tm> <dbl> <bch:byt> <dbl>
#> 1 eval_hand_df(test_hand_df) 38.7µs 44.6µs 22498. 0B 18.0
bench::mark(eval_hand(test_hand))
#> # A tibble: 1 × 6
#> expression min median `itr/sec` mem_alloc `gc/sec`
#> <bch:expr> <bch:tm> <bch:tm> <dbl> <bch:byt> <dbl>
#> 1 eval_hand(test_hand) 943ns 1.19µs 796311. 27.5KB 0
eval_hand()
is 20 times faster, but should perform
poorly compared to fast algorithms.
Multiple Hands
Compare performance evaluating and printing multiple hands.
bench::mark({
deck <- new_deck_df()
replicate(50, {
hand <- deal_hand_df(deck)
paste0(print_hand_df(hand), ": ", eval_hand_df(hand))
})
})
#> # A tibble: 1 × 6
#> expression min median `itr/sec` mem_alloc `gc/sec`
#> <bch:expr> <bch:> <bch:> <dbl> <bch:byt> <dbl>
#> 1 { deck <- new_deck_df() replicate(… 6.39ms 6.85ms 146. 34.1KB 20.8
bench::mark({
deck <- new_deck()
replicate(50, {
hand <- deal_hand(deck)
paste0(print_hand(hand), ": ", eval_hand(hand))
})
})
#> # A tibble: 1 × 6
#> expression min median `itr/sec` mem_alloc `gc/sec`
#> <bch:expr> <bch> <bch:> <dbl> <bch:byt> <dbl>
#> 1 { deck <- new_deck() replicate(50, … 468µs 533µs 1856. 74KB 12.6
Overall, the new implementation is 13-14 times faster.
Python
Benchmark the integer()
approach to PH Evaluator
using reticulate.
Import
Test performance of phevaluator
using
reticulate::import()
, starting with
sample_cards()
:
bench::mark(deal_hand(deck))
#> # A tibble: 1 × 6
#> expression min median `itr/sec` mem_alloc `gc/sec`
#> <bch:expr> <bch:tm> <bch:tm> <dbl> <bch:byt> <dbl>
#> 1 deal_hand(deck) 1.6µs 2.01µs 488220. 264B 0
bench::mark(phevaluator$sample_cards(5L))
#> # A tibble: 1 × 6
#> expression min median `itr/sec` mem_alloc `gc/sec`
#> <bch:expr> <bch:tm> <bch:tm> <dbl> <bch:byt> <dbl>
#> 1 phevaluator$sample_cards(5L) 18.3µs 21.1µs 47407. 0B 9.48
phevaluator$sample_cards()
is 13 times slower than than
the R integer implementation.
Also test phevaluator$evaluate_card()
against the R
integer method. evaluate_card()
expects five to seven
integers passed as individual parameters.
bench::mark(eval_hand(test_hand))
#> # A tibble: 1 × 6
#> expression min median `itr/sec` mem_alloc `gc/sec`
#> <bch:expr> <bch:tm> <bch:tm> <dbl> <bch:byt> <dbl>
#> 1 eval_hand(test_hand) 984ns 1.19µs 804573. 0B 0
bench::mark(do.call(phevaluator$evaluate_cards, as.list(test_hand)))
#> # A tibble: 1 × 6
#> expression min median `itr/sec` mem_alloc `gc/sec`
#> <bch:expr> <bch:> <bch:> <dbl> <bch:byt> <dbl>
#> 1 do.call(phevaluator$evaluate_cards… 20.4µs 23.5µs 41422. 0B 12.4
Surprisingly, phevaluator
is almost as slow as the
original data frame implementation. Test again using some specific hands
and avoid the overhead of do.call()
and
as.list()
:
four_aces <- c(51L, 50L, 49L, 48L, 47L)
royal_flush <- c(50L, 46L, 42L, 38L, 34L)
bench::mark(eval_hand(four_aces))
#> # A tibble: 1 × 6
#> expression min median `itr/sec` mem_alloc `gc/sec`
#> <bch:expr> <bch:tm> <bch:tm> <dbl> <bch:byt> <dbl>
#> 1 eval_hand(four_aces) 697ns 861ns 1081486. 0B 0
bench::mark(phevaluator$evaluate_cards(51L, 50L, 49L, 48L, 47L))
#> # A tibble: 1 × 6
#> expression min median `itr/sec` mem_alloc `gc/sec`
#> <bch:expr> <bch:> <bch:> <dbl> <bch:byt> <dbl>
#> 1 phevaluator$evaluate_cards(51L, 50… 18.7µs 21.3µs 47006. 0B 9.40
bench::mark(eval_hand(royal_flush))
#> # A tibble: 1 × 6
#> expression min median `itr/sec` mem_alloc `gc/sec`
#> <bch:expr> <bch:tm> <bch:tm> <dbl> <bch:byt> <dbl>
#> 1 eval_hand(royal_flush) 1.23µs 1.52µs 612538. 0B 61.3
bench::mark(phevaluator$evaluate_cards(50L, 46L, 42L, 38L, 34L))
#> # A tibble: 1 × 6
#> expression min median `itr/sec` mem_alloc `gc/sec`
#> <bch:expr> <bch:> <bch:> <dbl> <bch:byt> <dbl>
#> 1 phevaluator$evaluate_cards(50L, 46… 18.4µs 21.3µs 47038. 0B 9.41
Calling evaluate_cards()
directly doesn’t significantly
change the results. Test once more with random hands:
bench::mark(eval_hand(deal_hand(deck)))
#> # A tibble: 1 × 6
#> expression min median `itr/sec` mem_alloc `gc/sec`
#> <bch:expr> <bch:tm> <bch:tm> <dbl> <bch:byt> <dbl>
#> 1 eval_hand(deal_hand(deck)) 2.75µs 3.77µs 258085. 264B 25.8
bench::mark(do.call(phevaluator$evaluate_cards, as.list(deal_hand(deck))))
#> # A tibble: 1 × 6
#> expression min median `itr/sec` mem_alloc `gc/sec`
#> <bch:expr> <bch:> <bch:> <dbl> <bch:byt> <dbl>
#> 1 do.call(phevaluator$evaluate_cards… 23.1µs 26.7µs 37618. 264B 11.3
Conclusion: using phevaluator
via
reticulate::import()
is not a faster way to evaluate hands.
It is important to note that phevaluator$evaluate_cards()
does more than eval_hand()
, as phevaluator
ranks all poker hands and eval_hand()
only determines the
hand rank category.
C/C++
Benchmark the integer()
approach against the C/C++
implementation of PH Evaluator
using Rcpp.
The current version only implements eval_hand_phe()
,
which uses EvaluateCards()
and
describeCategory()
to return the card rank category.
Evaluate
Test performance of eval_hand()
and
eval_hand_phe()
with a single hand:
bench::mark(eval_hand(test_hand))
#> # A tibble: 1 × 6
#> expression min median `itr/sec` mem_alloc `gc/sec`
#> <bch:expr> <bch:tm> <bch:tm> <dbl> <bch:byt> <dbl>
#> 1 eval_hand(test_hand) 1.02µs 1.27µs 758335. 0B 0
bench::mark(eval_hand_phe(test_hand))
#> # A tibble: 1 × 6
#> expression min median `itr/sec` mem_alloc `gc/sec`
#> <bch:expr> <bch:tm> <bch:tm> <dbl> <bch:byt> <dbl>
#> 1 eval_hand_phe(test_hand) 410ns 492ns 1933874. 2.24KB 0
Somewhat surprisingly, eval_hand_phe()
is only 2 times
faster than eval_hand()
, however,
eval_hand_phe()
doesn’t just evaluate hand rank category,
it also determines exact hand rank.
Reviewing the benchmarks on the PH Evaluator README
and
on my own system, the compiled C/C++ implementation should be capable of
about 70 million hands per second, while eval_hand_phe()
achieves about 1 million per second. This is likely due to the
additional overhead of using R, and, more importantly, the additional
call to describeCategory()
, as the benchmark code
only calls EvaluateCards()
.
A future implementation could implement the full pheval
libraries and the C++ code in card_sampler.h
to generate random hands in a standalone R package using Rcpp
Modules.