Risk Value Analysis

risk quantification
Author

John Benninghoff

Published

November 3, 2024

Modified

November 26, 2024

An exploration of the value of cybersecurity risk reduction.

Questions/TODO

library(poilog)
library(tibble)
library(dplyr)
library(quantrr)
library(formattable)
library(ggplot2)
library(plotly)
library(jbplot)

Background

What is the value of a cybersecurity program? Put another way, how much should an organization pay to reduce the likelihood of a breach or the expected impact? In this analysis, we compare two firms, one with typical breach rate and impact, and a second that makes investments to reduce their risk. Using Monte Carlo simulation, we can calculate the value of this risk reduction.

For the analysis, we use a 10 year horizon to fit with the typical executive tenure of 5-10 years. (A 2023 study found that CISOs at Fortune 500 companies had served an average of 8.3 years at the company and 4.5 years as CISO)

Baseline Risk

We can model baseline risk for a typical firm using quantrr and data from the Cyentia 2022 Information Risk Insights Study (IRIS).

The 2022 IRIS found that the upper bound likelihood of a breach in the next year fit a Poisson log-normal distribution with a mean (\(\mu\)) of -2.284585 and and standard deviation (\(\sigma\)) of 0.8690759.

As was done in the breach rate analysis, we can use trial and error to find a reasonable value of \(\lambda\) for a Poisson distribution that approximates these results:

runs <- 1e6
lambda <- 0.138

breaches_poilog <- rpoilog(runs, mu = -2.284585, sig = 0.8690759, keep0 = TRUE)
breaches_pois <- rpois(runs, lambda = lambda)

breach_table <- function(breaches) {
  years <- length(breaches)
  tibble(
    "One or more" = sum(breaches >= 1) / years,
    "Two or more" = sum(breaches >= 2) / years,
    "Three or more" = sum(breaches >= 3) / years
  )
}

bind_rows(breach_table(breaches_poilog), breach_table(breaches_pois))
# A tibble: 2 × 3
  `One or more` `Two or more` `Three or more`
          <dbl>         <dbl>           <dbl>
1         0.129       0.0163         0.00249 
2         0.128       0.00862        0.000409

A Poisson distribution with a \(\lambda\) of 0.138 approximates the Poisson log-normal model from the Cyentia IRIS report.

meanlog <- 12.55949
sdlog <- 3.068723

For the impact, we can use the log-normal loss model from IRIS, with a mean (\(\mu\)) of 12.55949 and standard deviation(\(\sigma\)) of 3.068723.

Using the baseline parameters, we can simulate security events and losses over the next 10 years:

calc_risk("baseline", lambda, meanlog, sdlog, runs = 10)
# A tibble: 10 × 4
    year risk     events  losses
   <int> <chr>     <int>   <dbl>
 1     1 baseline      1    262.
 2     2 baseline      0      0 
 3     3 baseline      1 216607.
 4     4 baseline      0      0 
 5     5 baseline      0      0 
 6     6 baseline      0      0 
 7     7 baseline      0      0 
 8     8 baseline      0      0 
 9     9 baseline      0      0 
10    10 baseline      0      0 

Net Present Value

We can calculate the (negative) net present value of the baseline security risk over the next ten years by discounting future years. A discount rate of 5% is reasonable, and we use the formula \(\mathrm{NPV} = \large \frac{R_t}{(1+i)^t}\), treating year 1 as \(t = 0\):

rate <- 0.05
baseline_value <- calc_risk("baseline", lambda, meanlog, sdlog, runs = 10) |>
  mutate(discount = (1 + rate)^(year - 1)) |>
  mutate(value = losses / discount)

baseline_value
# A tibble: 10 × 6
    year risk     events  losses discount   value
   <int> <chr>     <int>   <dbl>    <dbl>   <dbl>
 1     1 baseline      0      0      1         0 
 2     2 baseline      0      0      1.05      0 
 3     3 baseline      0      0      1.10      0 
 4     4 baseline      0      0      1.16      0 
 5     5 baseline      1  11142.     1.22   9166.
 6     6 baseline      0      0      1.28      0 
 7     7 baseline      1 186813.     1.34 139403.
 8     8 baseline      0      0      1.41      0 
 9     9 baseline      0      0      1.48      0 
10    10 baseline      0      0      1.55      0 
baseline_value |>
  group_by(risk) |>
  summarize(npv = currency(sum(value)))
# A tibble: 1 × 2
  risk     npv        
  <chr>    <formttbl> 
1 baseline $148,569.45

The baseline value is highly variable depending on how many breaches occur over the 10-year period. We can forecast this range by running the 10-year simulation 100,000 times:

baseline_forecast <- calc_risk("baseline", lambda, meanlog, sdlog, runs = 100000 * 10) |>
  mutate(sim = ceiling(year / 10), .before = year) |>
  mutate(year = year %% 10) |>
  mutate(year = if_else(year == 0, 10, year)) |>
  mutate(discount = (1 + rate)^(year - 1)) |>
  mutate(value = losses / discount) |>
  group_by(sim) |>
  summarize(npv = sum(value))

baseline_forecast |>
  filter(npv != 0) |>
  ggplot(aes(npv)) +
  geom_hist_bw(bins = 100) +
  scale_x_log10(labels = scales::label_currency(scale_cut = scales::cut_short_scale())) +
  labs(x = NULL, y = NULL) +
  theme_quo()

That’s a broad range, from $100 or less to $10B or more, with the most common non-zero value around $1M. But how many runs have no loss?

baseline_forecast |>
  mutate(no_loss = (npv == 0)) |>
  count(no_loss)
# A tibble: 2 × 2
  no_loss     n
  <lgl>   <int>
1 FALSE   74730
2 TRUE    25270

About 25% of the time, there is no loss over the 10 year period.

Security NPV

What is the NPV of a hypothetical security investment? The key ways we can reduce risk are by lowering the likelihood, by lowering the impact, or both.

Reduce Likelihood

Let’s first look at an investment that reduces the breach rate by half:

likelihood_forecast <- calc_risk("likelihood", lambda / 2, meanlog, sdlog, runs = 100000 * 10) |>
  mutate(sim = ceiling(year / 10), .before = year) |>
  mutate(year = year %% 10) |>
  mutate(year = if_else(year == 0, 10, year)) |>
  mutate(discount = (1 + rate)^(year - 1)) |>
  mutate(value = losses / discount) |>
  group_by(sim) |>
  summarize(npv = sum(value))

To measure the value of this investment, we calculate the difference between the baseline risk and the risk after reducing the likelihood:

likelihood_return <-
  full_join(baseline_forecast, likelihood_forecast, by = "sim", suffix = c("_base", "_reduced")) |>
  mutate(return = npv_base - npv_reduced)

likelihood_return |>
  summary()
      sim            npv_base          npv_reduced            return          
 Min.   :     1   Min.   :0.000e+00   Min.   :0.000e+00   Min.   :-2.535e+11  
 1st Qu.: 25001   1st Qu.:0.000e+00   1st Qu.:0.000e+00   1st Qu.:-4.714e+04  
 Median : 50000   Median :2.502e+05   Median :0.000e+00   Median : 5.637e+04  
 Mean   : 50000   Mean   :3.488e+07   Mean   :1.938e+07   Mean   : 1.551e+07  
 3rd Qu.: 75000   3rd Qu.:3.116e+06   3rd Qu.:4.663e+05   3rd Qu.: 2.361e+06  
 Max.   :100000   Max.   :9.249e+10   Max.   :2.535e+11   Max.   : 9.249e+10  

The NPV of the risk reduction (return) is highly variable. Since we can’t plot negative numbers using a log scale, we can examine the data using the cumulative distribution function (CDF). We limit the x-axis to zoom in to the 1% to 99% quantiles:

(likelihood_return |>
  ggplot(aes(return)) +
  stat_ecdf() +
  coord_cartesian(
    xlim = c(quantile(likelihood_return$return, 0.01), quantile(likelihood_return$return, 0.99))
  ) +
  labs(x = NULL, y = NULL) +
  theme_minimal()) |>
  ggplotly()

Reviewing the data:

  • About 40% of the time, our security investment has a negative or zero return
  • About 15% of the time, the security investment has a negative return of over $1M
  • About 60% of the time, the security investment has a positive return
  • About 32% of the time, the security investment has a positive return of over $1M

Reduce Impact

Now let’s look at an investment that reduces the breach impact by half:

impact_forecast <- calc_risk("impact", lambda, meanlog + log(0.5), sdlog, runs = 100000 * 10) |>
  mutate(sim = ceiling(year / 10), .before = year) |>
  mutate(year = year %% 10) |>
  mutate(year = if_else(year == 0, 10, year)) |>
  mutate(discount = (1 + rate)^(year - 1)) |>
  mutate(value = losses / discount) |>
  group_by(sim) |>
  summarize(npv = sum(value))

To measure the value of this investment, we calculate the difference between the baseline risk and the risk after reducing the likelihood:

impact_return <-
  full_join(baseline_forecast, impact_forecast, by = "sim", suffix = c("_base", "_reduced")) |>
  mutate(return = npv_base - npv_reduced)

impact_return |>
  summary()
      sim            npv_base          npv_reduced            return          
 Min.   :     1   Min.   :0.000e+00   Min.   :0.000e+00   Min.   :-1.641e+10  
 1st Qu.: 25001   1st Qu.:0.000e+00   1st Qu.:0.000e+00   1st Qu.:-6.496e+05  
 Median : 50000   Median :2.502e+05   Median :1.269e+05   Median : 6.110e+02  
 Mean   : 50000   Mean   :3.488e+07   Mean   :1.465e+07   Mean   : 2.023e+07  
 3rd Qu.: 75000   3rd Qu.:3.116e+06   3rd Qu.:1.575e+06   3rd Qu.: 1.989e+06  
 Max.   :100000   Max.   :9.249e+10   Max.   :1.643e+10   Max.   : 9.249e+10  

Again, the NPV of the risk reduction (return) is highly variable. We again examine the data using the cumulative distribution function (CDF), limiting the x-axis:

(impact_return |>
  ggplot(aes(return)) +
  stat_ecdf() +
  coord_cartesian(
    xlim = c(quantile(impact_return$return, 0.01), quantile(impact_return$return, 0.99))
  ) +
  labs(x = NULL, y = NULL) +
  theme_minimal()) |>
  ggplotly()

Reviewing the data:

  • About 50% of the time, our security investment has a negative or zero return
  • About 22% of the time, the security investment has a negative return of over $1M
  • About 50% of the time, the security investment has a positive return
  • About 30% of the time, the security investment has a positive return of over $1M

Analysis

What can we learn from these simulations? While a security investment is more likely than not to have a positive return, it’s not a particularly good bet. Over a reasonable planning horizon for a typical executive, it’s hard for an investment with a variable return to compete with investments that have a clear expected positive return. As a CISO, it’s a reasonable choice to simply maintain the status quo of the baseline risk, as there’s a good chance that there will be no breaches (25%) or breaches with lower impact:

baseline_forecast |>
  pull(npv) |>
  quantile(0.5) |>
  currency()
        50% 
$250,173.50 

Put another way, the analysis helps explain why firms don’t invest more in security: the firms’ managers are better off prioritizing non-security investments, and (potentially) blaming the CISO when breaches do occur, especially if they have limited their risk by purchasing cybersecurity insurance. A rational manager will minimize investments in security unless mandated by insurance or if increasing security spend is more than offset by reductions in premiums.

For the most part, this is what we often see in practice: security leaders struggling to get funding to improve security beyond what is minimally expected by external stakeholders (clients, regulators, and insurers). However, we also see certain larger organizations invest more in security, like large banks and other financial institutions, why is this? Work done by VivoSecurity in forecasting data breaches suggests an answer. Vivo found a positive correlation between the size of an organization and the likelihood of a security breach (which has also been identified by others, like Cyentia), and also found a negative correlation with the number of CISAs and CISSPs on staff. The correlation was stronger when looking at the effect on larger breaches.

I believe what this correlation shows is that the overall level of security investment at a firm, as measured by the headcount of certified professionals, has a big impact on reducing the likelihood of the largest breaches of $1M or higher. From the same presentation, the Vivo model predicts fairly frequent small breaches (under $100K) at three of the largest Canadian banks, but large breaches are very rare (under 1% for breaches in the $1M-$10M range). The high level of investment at older banks may also be partly explained by the fact that their security programs predate commercial cyber insurance. This insight is not captured in the simple model presented here.

Implications

What are the implications for security? At a macro level, I think this is an argument for regulation, either government regulation or private regulation through the insurance market. Historically we’ve seen both happen in fire safety: government regulation through building codes has reduced the risk of fire and loss of life over time, and insurance-driven regulations - UL, founded as Underwriters Laboratories, was initially funded by fire insurance companies.

At the firm level, I think this means that security leaders shouldn’t present security as an investment. As with safety, I think the main argument for better security is a moral or emotional case: we care about security because we care about our customers, partners, and other stakeholders. Also, people are typically loss-averse, so expressing security risk in those terms will better connect with decision makers. Using Tail value at risk or Loss Exceedance Curves express loss in this way - “There’s a 5% chance of cybersecurity losses exceeding $780,000 and a 1% chance of losses exceeding $25,000,000 over the next year.” I also think it means security leaders should be mindful of how they spend their limited funds, by maximizing investments in what works.

A Counterexample

After completing my initial analysis, I remembered a counterexample: in one of his last presentations, Marcus Ranum described the layered security controls he helped put in place at an entertainment company that “didn’t want to be next” after the 2014 Sony attack. Marcus worked with their security team to implement a combination of encrypted drives, next-gen firewalls, and whitelisting products to dramatically reduce malware attacks against corporate endpoints. One of the surprising outcomes was that the investment in installing the new controls was more than offset by a reduction in operational costs responding to malware.

So it’s clear that security can be a good investment, but why? The conclusions in the initial analysis rely on the fact that security breaches are relatively infrequent, which was not the case for malware response at the company Marcus worked with. Additionally, these low-level infections aren’t likely to make their way into the public dataset used by the Cyentia IRIS report.

High-Frequency Incidents

We can repeat the analysis looking at malware with a hypothetical 50% reduction in frequency. In a large organization, we might expect to respond to clean up a malware infection once a week (\(\lambda = 52\)) with 90% of incidents costing between $200 and $2000 to clean up, with a typical response cost of $600. To simplify the analysis, we just look at the cost of the next year:

lnorm_param(200, 2000, 600)
$meanlog
[1] 6.44961

$sdlog
[1] 0.6999362

$mdiff
[1] -0.0513167
baseline_malware <- calc_risk("baseline malware", lambda = 52, meanlog = 6.44961, sdlog = 0.6999362)

baseline_malware |>
  ggplot(aes(losses)) +
  geom_hist_bw(bins = 100) +
  scale_x_continuous(labels = scales::label_currency(scale_cut = scales::cut_short_scale())) +
  labs(x = NULL, y = NULL) +
  theme_quo()

baseline_malware |>
  summary()
      year            risk               events          losses     
 Min.   :     1   Length:100000      Min.   :23.00   Min.   :16018  
 1st Qu.: 25001   Class :character   1st Qu.:47.00   1st Qu.:36840  
 Median : 50000   Mode  :character   Median :52.00   Median :41676  
 Mean   : 50000                      Mean   :52.01   Mean   :42029  
 3rd Qu.: 75000                      3rd Qu.:57.00   3rd Qu.:46843  
 Max.   :100000                      Max.   :83.00   Max.   :82466  

In this case, the baseline risk is never 0, and falls within a range of about $15K to $80K, with a typical cost of $40K/year.

What is the value of reducing the likelihood of malware by 50%?

impact_malware <- calc_risk("impact malware", lambda = 26, meanlog = 6.44961, sdlog = 0.6999362)

impact_malware_return <-
  full_join(baseline_malware, impact_malware, by = "year", suffix = c("_base", "_reduced")) |>
  select(c("year", "losses_base", "losses_reduced")) |>
  mutate(return = losses_base - losses_reduced)

impact_malware_return |>
  summary()
      year         losses_base    losses_reduced      return      
 Min.   :     1   Min.   :16018   Min.   : 4279   Min.   :-16396  
 1st Qu.: 25001   1st Qu.:36840   1st Qu.:17287   1st Qu.: 14798  
 Median : 50000   Median :41676   Median :20658   Median : 20911  
 Mean   : 50000   Mean   :42029   Mean   :21014   Mean   : 21015  
 3rd Qu.: 75000   3rd Qu.:46843   3rd Qu.:24334   3rd Qu.: 27068  
 Max.   :100000   Max.   :82466   Max.   :50176   Max.   : 64054  
impact_malware_return |>
  pull(return) |>
  quantile(0.01)
     1% 
209.851 

While there are still cases where investing in security generates a negative return, over 99% of the time, the return is positive, with an average return of just over $20,000. In this hypothetical example, $20K/year isn’t a big deal, which leads me to conclude that the entertainment company Marcus was working with had a much higher baseline rate of malware incidents, saw a much larger reduction, and probably spent more on typical response.

So, security can be a good investment, if it reduces the likelihood and/or impact of frequent events, like malware response.