Measure the performance of different implementations of
cards
using bench::mark()
.
library(cards)
library(reticulate)
phevaluator <- import("phevaluator")
Data Frame
Benchmark the initial implementation using data.frame
compared to an integer()
approach similar to PH Evaluator
card.py
.
New Deck
Create a new deck using new_deck_df()
and an integer
vector.
deck <- new_deck_df()
deck_int <- 0:51
bench::mark(new_deck_df())
#> # A tibble: 1 × 6
#> expression min median `itr/sec` mem_alloc `gc/sec`
#> <bch:expr> <bch:tm> <bch:tm> <dbl> <bch:byt> <dbl>
#> 1 new_deck_df() 20.5µs 21.5µs 44850. 1.25KB 53.9
bench::mark(0:51)
#> # A tibble: 1 × 6
#> expression min median `itr/sec` mem_alloc `gc/sec`
#> <bch:expr> <bch:tm> <bch:tm> <dbl> <bch:byt> <dbl>
#> 1 0:51 0 41ns 26275285. 0B 0
While new_deck_df()
is not designed to be called
frequently, using an integer vector is much faster.
Deal
Compare performance of deal_hand_df()
to sampling
integers:
bench::mark(deal_hand_df(deck))
#> # A tibble: 1 × 6
#> expression min median `itr/sec` mem_alloc `gc/sec`
#> <bch:expr> <bch:tm> <bch:tm> <dbl> <bch:byt> <dbl>
#> 1 deal_hand_df(deck) 18.2µs 20µs 47860. 3.73KB 19.2
bench::mark(sample(deck_int, 5))
#> # A tibble: 1 × 6
#> expression min median `itr/sec` mem_alloc `gc/sec`
#> <bch:expr> <bch:tm> <bch:tm> <dbl> <bch:byt> <dbl>
#> 1 sample(deck_int, 5) 2.05µs 2.42µs 388419. 264B 0
deal_hand_df()
is about 7 times slower than
sample()
.
Test performance of print_hand_df()
against a simple
function that prints cards based on integers.
test_hand <- deal_hand_df(deck)
bench::mark(print_hand_df(test_hand))
#> # A tibble: 1 × 6
#> expression min median `itr/sec` mem_alloc `gc/sec`
#> <bch:expr> <bch:tm> <bch:tm> <dbl> <bch:byt> <dbl>
#> 1 print_hand_df(test_hand) 80.2µs 85.4µs 11582. 10.5KB 16.5
print_hand_int <- function(h) {
cards <- paste0(rep(c(2:9, "T", "J", "Q", "K", "A"), each = 4), c("C", "D", "H", "S"))
paste0(cards[h + 1], collapse = " ")
}
test_hand_int <- sample(0:51, 5)
bench::mark(print_hand_int(test_hand_int))
#> # A tibble: 1 × 6
#> expression min median `itr/sec` mem_alloc `gc/sec`
#> <bch:expr> <bch:tm> <bch:tm> <dbl> <bch:byt> <dbl>
#> 1 print_hand_int(test_hand_int) 5.66µs 6.03µs 154248. 928B 0
print_hand_df()
is 14-15 times slower than the integer
approach.
Evaluate
Test performance of eval_hand_df()
with a single hand
and with randomly selected hands:
bench::mark(eval_hand_df(test_hand))
#> # A tibble: 1 × 6
#> expression min median `itr/sec` mem_alloc `gc/sec`
#> <bch:expr> <bch:tm> <bch:tm> <dbl> <bch:byt> <dbl>
#> 1 eval_hand_df(test_hand) 56.3µs 60.4µs 16320. 37KB 14.4
bench::mark(eval_hand_df(deal_hand_df(deck)))
#> # A tibble: 1 × 6
#> expression min median `itr/sec` mem_alloc `gc/sec`
#> <bch:expr> <bch:tm> <bch:t> <dbl> <bch:byt> <dbl>
#> 1 eval_hand_df(deal_hand_df(deck)) 63.1µs 80µs 12366. 264B 12.3
As expected for a naive poker hand evaluator, performance of
eval_hand_df()
is poor compared to fast algorithms.
Summary
An implementation using integer would likely be much faster than the
first implementation using data.frame
. Rank and suit can be
derived using integer division and modulo arithmetic respectively,and
tabulate()
is a faster replacement for
rle()
.
0:51 %/% 4
#> [1] 0 0 0 0 1 1 1 1 2 2 2 2 3 3 3 3 4 4 4 4 5 5 5 5 6
#> [26] 6 6 6 7 7 7 7 8 8 8 8 9 9 9 9 10 10 10 10 11 11 11 11 12 12
#> [51] 12 12
tabulate(0:51 %/% 4 + 1, 13)
#> [1] 4 4 4 4 4 4 4 4 4 4 4 4 4
0:51 %% 4
#> [1] 0 1 2 3 0 1 2 3 0 1 2 3 0 1 2 3 0 1 2 3 0 1 2 3 0 1 2 3 0 1 2 3 0 1 2 3 0 1
#> [39] 2 3 0 1 2 3 0 1 2 3 0 1 2 3
tabulate(0:51 %% 4 + 1, 4)
#> [1] 13 13 13 13
bench::mark(rle(sort(sample(0:51, 5) %/% 4 + 1)))
#> # A tibble: 1 × 6
#> expression min median `itr/sec` mem_alloc `gc/sec`
#> <bch:expr> <bch:t> <bch:> <dbl> <bch:byt> <dbl>
#> 1 rle(sort(sample(0:51, 5)%/%4 + 1)) 20.5µs 21.9µs 44963. 264B 13.5
bench::mark(tabulate(sample(0:51, 5) %/% 4 + 1, 13))
#> # A tibble: 1 × 6
#> expression min median `itr/sec` mem_alloc `gc/sec`
#> <bch:expr> <bch:> <bch:> <dbl> <bch:byt> <dbl>
#> 1 tabulate(sample(0:51, 5)%/%4 + 1, … 2.95µs 3.24µs 300685. 264B 30.1
Note that the tabulate approach is 7 times faster than sorting and run length encoding.
Integer
Benchmark the second implementation using integer()
.
New Deck
Create a new deck using new_deck()
and
new_deck_df()
.
deck_df <- new_deck_df()
deck <- new_deck()
bench::mark(new_deck_df())
#> # A tibble: 1 × 6
#> expression min median `itr/sec` mem_alloc `gc/sec`
#> <bch:expr> <bch:tm> <bch:tm> <dbl> <bch:byt> <dbl>
#> 1 new_deck_df() 20.3µs 21.6µs 45423. 1.25KB 13.6
bench::mark(new_deck())
#> # A tibble: 1 × 6
#> expression min median `itr/sec` mem_alloc `gc/sec`
#> <bch:expr> <bch:tm> <bch:tm> <dbl> <bch:byt> <dbl>
#> 1 new_deck() 0 82ns 13364698. 0B 0
new_deck()
is 90 times faster.
Deal
Compare performance of deal_hand_df()
and
deal_hand()
bench::mark(deal_hand_df(deck_df))
#> # A tibble: 1 × 6
#> expression min median `itr/sec` mem_alloc `gc/sec`
#> <bch:expr> <bch:tm> <bch:tm> <dbl> <bch:byt> <dbl>
#> 1 deal_hand_df(deck_df) 18.2µs 19.4µs 50958. 264B 15.3
bench::mark(deal_hand(deck))
#> # A tibble: 1 × 6
#> expression min median `itr/sec` mem_alloc `gc/sec`
#> <bch:expr> <bch:tm> <bch:tm> <dbl> <bch:byt> <dbl>
#> 1 deal_hand(deck) 2.3µs 2.54µs 380449. 3.02KB 0
deal_hand()
is about 6 times faster.
Test performance of print_hand_df()
against
print_hand()
.
test_hand_df <- deal_hand_df(deck_df)
test_hand <- deal_hand(deck)
bench::mark(print_hand_df(test_hand_df))
#> # A tibble: 1 × 6
#> expression min median `itr/sec` mem_alloc `gc/sec`
#> <bch:expr> <bch:tm> <bch:tm> <dbl> <bch:byt> <dbl>
#> 1 print_hand_df(test_hand_df) 80.4µs 84.9µs 11581. 0B 14.5
bench::mark(print_hand(test_hand))
#> # A tibble: 1 × 6
#> expression min median `itr/sec` mem_alloc `gc/sec`
#> <bch:expr> <bch:tm> <bch:tm> <dbl> <bch:byt> <dbl>
#> 1 print_hand(test_hand) 5.62µs 5.95µs 163167. 8.45KB 0
print_hand()
is 16 times faster.
Evaluate
Test performance of eval_hand_df()
and
eval_hand()
with a single hand.
bench::mark(eval_hand_df(test_hand_df))
#> # A tibble: 1 × 6
#> expression min median `itr/sec` mem_alloc `gc/sec`
#> <bch:expr> <bch:tm> <bch:tm> <dbl> <bch:byt> <dbl>
#> 1 eval_hand_df(test_hand_df) 56.3µs 59.3µs 16646. 0B 14.4
bench::mark(eval_hand(test_hand))
#> # A tibble: 1 × 6
#> expression min median `itr/sec` mem_alloc `gc/sec`
#> <bch:expr> <bch:tm> <bch:tm> <dbl> <bch:byt> <dbl>
#> 1 eval_hand(test_hand) 2.13µs 2.38µs 408740. 27.5KB 0
eval_hand()
is 20 times faster, but should perform
poorly compared to fast algorithms.
Multiple Hands
Compare performance evaluating and printing multiple hands.
bench::mark({
deck <- new_deck_df()
replicate(50, {
hand <- deal_hand_df(deck)
paste0(print_hand_df(hand), ": ", eval_hand_df(hand))
})
})
#> # A tibble: 1 × 6
#> expression min median `itr/sec` mem_alloc `gc/sec`
#> <bch:expr> <bch:> <bch:> <dbl> <bch:byt> <dbl>
#> 1 { deck <- new_deck_df() replicate(… 7.93ms 8.35ms 120. 34.2KB 16.1
bench::mark({
deck <- new_deck()
replicate(50, {
hand <- deal_hand(deck)
paste0(print_hand(hand), ": ", eval_hand(hand))
})
})
#> # A tibble: 1 × 6
#> expression min median `itr/sec` mem_alloc `gc/sec`
#> <bch:expr> <bch> <bch:> <dbl> <bch:byt> <dbl>
#> 1 { deck <- new_deck() replicate(50, … 607µs 651µs 1533. 74KB 8.34
Overall, the new implementation is 13-14 times faster.
Python
Benchmark the integer()
approach to PH Evaluator
using reticulate.
Import
Test performance of phevaluator
using
reticulate::import()
, starting with
sample_cards()
:
bench::mark(deal_hand(deck))
#> # A tibble: 1 × 6
#> expression min median `itr/sec` mem_alloc `gc/sec`
#> <bch:expr> <bch:tm> <bch:tm> <dbl> <bch:byt> <dbl>
#> 1 deal_hand(deck) 2.25µs 2.54µs 382791. 264B 0
bench::mark(phevaluator$sample_cards(5L))
#> # A tibble: 1 × 6
#> expression min median `itr/sec` mem_alloc `gc/sec`
#> <bch:expr> <bch:tm> <bch:tm> <dbl> <bch:byt> <dbl>
#> 1 phevaluator$sample_cards(5L) 24.5µs 25.6µs 38316. 0B 7.66
phevaluator$sample_cards()
is 13 times slower than than
the R integer implementation.
Also test phevaluator$evaluate_card()
against the R
integer method. evaluate_card()
expects five to seven
integers passed as individual parameters.
bench::mark(eval_hand(test_hand))
#> # A tibble: 1 × 6
#> expression min median `itr/sec` mem_alloc `gc/sec`
#> <bch:expr> <bch:tm> <bch:tm> <dbl> <bch:byt> <dbl>
#> 1 eval_hand(test_hand) 2.13µs 2.34µs 415141. 0B 0
bench::mark(do.call(phevaluator$evaluate_cards, as.list(test_hand)))
#> # A tibble: 1 × 6
#> expression min median `itr/sec` mem_alloc `gc/sec`
#> <bch:expr> <bch:> <bch:> <dbl> <bch:byt> <dbl>
#> 1 do.call(phevaluator$evaluate_cards… 26.9µs 28.1µs 34834. 0B 6.97
Surprisingly, phevaluator
is almost as slow as the
original data frame implementation. Test again using some specific hands
and avoid the overhead of do.call()
and
as.list()
:
four_aces <- c(51L, 50L, 49L, 48L, 47L)
royal_flush <- c(50L, 46L, 42L, 38L, 34L)
bench::mark(eval_hand(four_aces))
#> # A tibble: 1 × 6
#> expression min median `itr/sec` mem_alloc `gc/sec`
#> <bch:expr> <bch:tm> <bch:tm> <dbl> <bch:byt> <dbl>
#> 1 eval_hand(four_aces) 1.07µs 1.15µs 842025. 0B 0
bench::mark(phevaluator$evaluate_cards(51L, 50L, 49L, 48L, 47L))
#> # A tibble: 1 × 6
#> expression min median `itr/sec` mem_alloc `gc/sec`
#> <bch:expr> <bch:> <bch:> <dbl> <bch:byt> <dbl>
#> 1 phevaluator$evaluate_cards(51L, 50… 24.8µs 25.9µs 38062. 0B 7.61
bench::mark(eval_hand(royal_flush))
#> # A tibble: 1 × 6
#> expression min median `itr/sec` mem_alloc `gc/sec`
#> <bch:expr> <bch:tm> <bch:tm> <dbl> <bch:byt> <dbl>
#> 1 eval_hand(royal_flush) 1.84µs 2.05µs 469090. 0B 0
bench::mark(phevaluator$evaluate_cards(50L, 46L, 42L, 38L, 34L))
#> # A tibble: 1 × 6
#> expression min median `itr/sec` mem_alloc `gc/sec`
#> <bch:expr> <bch:> <bch:> <dbl> <bch:byt> <dbl>
#> 1 phevaluator$evaluate_cards(50L, 46… 24.4µs 25.4µs 38781. 0B 7.76
Calling evaluate_cards()
directly doesn’t significantly
change the results. Test once more with random hands:
bench::mark(eval_hand(deal_hand(deck)))
#> # A tibble: 1 × 6
#> expression min median `itr/sec` mem_alloc `gc/sec`
#> <bch:expr> <bch:tm> <bch:tm> <dbl> <bch:byt> <dbl>
#> 1 eval_hand(deal_hand(deck)) 3.77µs 4.92µs 204364. 264B 0
bench::mark(do.call(phevaluator$evaluate_cards, as.list(deal_hand(deck))))
#> # A tibble: 1 × 6
#> expression min median `itr/sec` mem_alloc `gc/sec`
#> <bch:expr> <bch:> <bch:> <dbl> <bch:byt> <dbl>
#> 1 do.call(phevaluator$evaluate_cards… 29.9µs 31.3µs 31296. 264B 9.39
Conclusion: using phevaluator
via
reticulate::import()
is not a faster way to evaluate hands.
It is important to note that phevaluator$evaluate_cards()
does more than eval_hand()
, as phevaluator
ranks all poker hands and eval_hand()
only determines the
hand rank category.
C/C++
Benchmark the integer()
approach against the C/C++
implementation of PH Evaluator
using Rcpp.
The current version only implements eval_hand_phe()
,
which uses EvaluateCards()
and
describeCategory()
to return the card rank category.
Evaluate
Test performance of eval_hand()
and
eval_hand_phe()
with a single hand:
bench::mark(eval_hand(test_hand))
#> # A tibble: 1 × 6
#> expression min median `itr/sec` mem_alloc `gc/sec`
#> <bch:expr> <bch:tm> <bch:tm> <dbl> <bch:byt> <dbl>
#> 1 eval_hand(test_hand) 2.09µs 2.34µs 411985. 0B 41.2
bench::mark(eval_hand_phe(test_hand))
#> # A tibble: 1 × 6
#> expression min median `itr/sec` mem_alloc `gc/sec`
#> <bch:expr> <bch:tm> <bch:tm> <dbl> <bch:byt> <dbl>
#> 1 eval_hand_phe(test_hand) 615ns 697ns 1386481. 2.23KB 0
Somewhat surprisingly, eval_hand_phe()
is only 2 times
faster than eval_hand()
, however,
eval_hand_phe()
doesn’t just evaluate hand rank category,
it also determines exact hand rank.
Reviewing the benchmarks on the PH Evaluator README
and
on my own system, the compiled C/C++ implementation should be capable of
about 70 million hands per second, while eval_hand_phe()
achieves about 1 million per second. This is likely due to the
additional overhead of using R, and, more importantly, the additional
call to describeCategory()
, as the benchmark code
only calls EvaluateCards()
.
A future implementation could implement the full pheval
libraries and the C++ code in card_sampler.h
to generate random hands in a standalone R package using Rcpp
Modules.