Risk Value Analysis

risk quantification
Author

John Benninghoff

Published

November 3, 2024

Modified

August 14, 2025

An exploration of the value of cybersecurity risk reduction.

Questions/TODO

library(poilog)
library(tibble)
library(dplyr)
library(quantrr)
library(formattable)
library(ggplot2)
library(plotly)
library(jbplot)

Background

What is the value of a cybersecurity program? Put another way, how much should an organization pay to reduce the likelihood of a breach or the expected impact? In this analysis, we compare two firms, one with typical breach rate and impact, and a second that makes investments to reduce their risk. Using Monte Carlo simulation, we can calculate the value of this risk reduction.

For the analysis, we use a 10 year horizon to fit with the typical executive tenure of 5-10 years. (A 2023 study found that CISOs at Fortune 500 companies had served an average of 8.3 years at the company and 4.5 years as CISO)

Baseline Risk

We can model baseline risk for a typical firm using quantrr and data from the Cyentia 2022 Information Risk Insights Study (IRIS).

The 2022 IRIS found that the upper bound likelihood of a breach in the next year fit a Poisson log-normal distribution with a mean (\(\mu\)) of -2.284585 and and standard deviation (\(\sigma\)) of 0.8690759.

As was done in the breach rate analysis, we can use trial and error to find a reasonable value of \(\lambda\) for a Poisson distribution that approximates these results without increasing the number of breaches (it underestimates the number of multiple breaches but replicates the number of single breaches):

runs <- 1e6
lambda <- 0.138

breaches_poilog <- rpoilog(runs, mu = -2.284585, sig = 0.8690759, keep0 = TRUE)
breaches_pois <- rpois(runs, lambda = lambda)

breach_table <- function(breaches) {
  years <- length(breaches)
  tibble(
    "One or more" = sum(breaches >= 1) / years,
    "Two or more" = sum(breaches >= 2) / years,
    "Three or more" = sum(breaches >= 3) / years
  )
}

bind_rows(breach_table(breaches_poilog), breach_table(breaches_pois))
# A tibble: 2 × 3
  `One or more` `Two or more` `Three or more`
          <dbl>         <dbl>           <dbl>
1         0.129       0.0165         0.00249 
2         0.128       0.00857        0.000396

A Poisson distribution with a \(\lambda\) of 0.138 approximates the Poisson log-normal model from the Cyentia IRIS report.

meanlog <- 12.55949
sdlog <- 3.068723

For the impact, we can use the log-normal loss model from IRIS, with a mean (\(\mu\)) of 12.55949 and standard deviation(\(\sigma\)) of 3.068723.

Using the baseline parameters, we can simulate security events and losses over the next 10 years:

calc_risk("baseline", lambda, meanlog, sdlog, runs = 10) |>
  mutate(losses = currency(losses, digits = 0))
# A tibble: 10 × 5
    year risk     treatment events losses    
   <int> <chr>    <chr>      <int> <formttbl>
 1     1 baseline none           0 $0        
 2     2 baseline none           0 $0        
 3     3 baseline none           0 $0        
 4     4 baseline none           0 $0        
 5     5 baseline none           0 $0        
 6     6 baseline none           0 $0        
 7     7 baseline none           0 $0        
 8     8 baseline none           0 $0        
 9     9 baseline none           0 $0        
10    10 baseline none           0 $0        

Net Present Value

We can calculate the (negative) net present value of the baseline security risk over the next ten years by discounting future years. A discount rate of 5% is reasonable, and we use the formula \(\mathrm{NPV} = \large \frac{R_t}{(1+i)^t}\), treating year 1 as \(t = 0\):

rate <- 0.05
baseline_value <- calc_risk("baseline", lambda, meanlog, sdlog, runs = 10) |>
  mutate(
    losses = currency(losses, digits = 0), discount = (1 + rate)^(year - 1),
    value = losses / discount
  )

baseline_value
# A tibble: 10 × 7
    year risk     treatment events losses     discount value     
   <int> <chr>    <chr>      <int> <formttbl>    <dbl> <formttbl>
 1     1 baseline none           0 $0             1    $0        
 2     2 baseline none           0 $0             1.05 $0        
 3     3 baseline none           0 $0             1.10 $0        
 4     4 baseline none           1 $2,521,472     1.16 $2,178,143
 5     5 baseline none           0 $0             1.22 $0        
 6     6 baseline none           0 $0             1.28 $0        
 7     7 baseline none           0 $0             1.34 $0        
 8     8 baseline none           0 $0             1.41 $0        
 9     9 baseline none           2 $6,536,378     1.48 $4,424,078
10    10 baseline none           0 $0             1.55 $0        
baseline_value |>
  group_by(risk) |>
  summarize(npv = sum(value))
# A tibble: 1 × 2
  risk     npv       
  <chr>    <formttbl>
1 baseline $6,602,220

The baseline value is highly variable depending on how many breaches occur over the 10-year period. We can forecast this range by running the 10-year simulation 100,000 times:

baseline_forecast <- calc_risk("baseline", lambda, meanlog, sdlog, runs = 100000 * 10) |>
  mutate(
    sim = ceiling(year / 10),
    year = year %% 10,
    year = if_else(year == 0, 10, year),
    discount = (1 + rate)^(year - 1),
    value = losses / discount
  ) |>
  group_by(sim) |>
  summarize(npv = sum(value))

baseline_forecast |>
  filter(npv != 0) |>
  ggplot(aes(npv)) +
  geom_hist_bw(bins = 100) +
  scale_x_log10(labels = scales::label_currency(scale_cut = scales::cut_short_scale())) +
  labs(x = NULL, y = NULL) +
  theme_quo()

A histogram showing number of losses by dollar value on a log10 scale. The curve is normally shaped.

That’s a broad range, from $100 or less to $10B or more, with the most common non-zero value around $1M. But how many runs have no loss?

baseline_forecast |>
  mutate(no_loss = (npv == 0)) |>
  count(no_loss)
# A tibble: 2 × 2
  no_loss     n
  <lgl>   <int>
1 FALSE   74639
2 TRUE    25361

About 25% of the time, there is no loss over the 10 year period.

Security NPV

What is the NPV of a hypothetical security investment? The key ways we can reduce risk are by lowering the likelihood, by lowering the impact, or both.

Reduce Likelihood

Let’s first look at an investment that reduces the breach rate by half:

likelihood_forecast <- calc_risk("likelihood", lambda / 2, meanlog, sdlog, runs = 100000 * 10) |>
  mutate(
    sim = ceiling(year / 10),
    year = year %% 10,
    year = if_else(year == 0, 10, year),
    discount = (1 + rate)^(year - 1),
    value = losses / discount
  ) |>
  group_by(sim) |>
  summarize(npv = sum(value))

To measure the value of this investment, we calculate the difference between the baseline risk and the risk after reducing the likelihood:

likelihood_return <-
  full_join(baseline_forecast, likelihood_forecast, by = "sim", suffix = c("_base", "_reduced")) |>
  mutate(return = npv_base - npv_reduced)

head(likelihood_return, 10)
# A tibble: 10 × 4
     sim  npv_base npv_reduced     return
   <dbl>     <dbl>       <dbl>      <dbl>
 1     1   423910.   20095468. -19671558.
 2     2  5674768.          0    5674768.
 3     3        0           0          0 
 4     4  7507846.       9064.   7498782.
 5     5   426174.      32469.    393705.
 6     6   827229.          0     827229.
 7     7        0     1108950.  -1108950.
 8     8  5621042.          0    5621042.
 9     9    93934.          0      93934.
10    10 20384828.     775702.  19609126.
summary(likelihood_return)
      sim            npv_base          npv_reduced            return          
 Min.   :     1   Min.   :0.000e+00   Min.   :0.000e+00   Min.   :-2.318e+11  
 1st Qu.: 25001   1st Qu.:0.000e+00   1st Qu.:0.000e+00   1st Qu.:-4.849e+04  
 Median : 50000   Median :2.509e+05   Median :0.000e+00   Median : 5.505e+04  
 Mean   : 50000   Mean   :3.151e+07   Mean   :2.054e+07   Mean   : 1.096e+07  
 3rd Qu.: 75000   3rd Qu.:3.122e+06   3rd Qu.:4.816e+05   3rd Qu.: 2.350e+06  
 Max.   :100000   Max.   :2.259e+11   Max.   :2.318e+11   Max.   : 2.259e+11  

The NPV of the risk reduction (return) is highly variable, and sometimes negative, because even though we’ve reduced the overall risk, in a given 10 year period we might be unlucky and experience a larger breach than in the baseline scenario. Since we can’t plot negative numbers using a log scale, we can examine the data using the cumulative distribution function (CDF). We limit the x-axis to zoom in to the 1% to 99% quantiles:

(likelihood_return |>
  ggplot(aes(return)) +
  stat_ecdf() +
  coord_cartesian(
    xlim = c(quantile(likelihood_return$return, 0.01), quantile(likelihood_return$return, 0.99))
  ) +
  labs(x = NULL, y = NULL) +
  theme_minimal()) |>
  ggplotly()