Looking at the data

it looks like there might be a difference between the high and low debt groups, the variance is decidedly higher in the high debt group.

d.both_completed %>%
  ggplot(aes(x=sonarqube_issues, fill=high_debt_version)) + 
  geom_boxplot() +
  labs(
    title = "Number of issuess for the different debt levels",
    x ="Number of issues"
  ) +
  scale_y_continuous(breaks = NULL) +
  scale_fill_manual(
    name = "Debt level", 
    labels = c("High debt", "Low debt"), 
    values = c("#7070FF", "lightblue"), 
    guide = guide_legend(reverse = TRUE)
  ) 

Descriptive Statistics:

d.both_completed %>%
  pull(sonarqube_issues) %>% 
  summary()
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##   0.000   0.000   1.000   1.659   3.000  12.000
sprintf("Variance: %.2f", var(pull(d.both_completed, sonarqube_issues)))
## [1] "Variance: 6.00"

Initial model

Variable names are modeled using the negative binomial family rather than poisson since the variance is greater than the mean.

We include high_debt_verison as a predictor in our model as this variable represent the very effect we want to measure. We also include a varying intercept for each individual to prevent the model from learning too much from single participants with extreme measurements.

Selecting priors

We iterate over the model until we have sane priors, in this case a prior that reasonably cna fit our data without being too restrictive.

Base model with priors

sonarqube_issues.with <- extendable_model(
  base_name = "sonarqube_issues",
  base_formula = "sonarqube_issues ~ 1 + high_debt_version + (1 | session)",
  base_priors = c(
    prior(normal(0, 1), class = "b"),
    prior(normal(1.5, 1), class = "Intercept"),
    prior(exponential(1), class = "sd"),
    prior(gamma(0.01, 0.01), class = "shape")
  ),
  family = negbinomial(),
  data = d.both_completed,
  base_control = list(adapt_delta = 0.98)
)

Default priors

prior_summary(sonarqube_issues.with(only_priors= TRUE))

Selected priors

prior_summary(sonarqube_issues.with(sample_prior = "only"))

Prior predictive check

pp_check(sonarqube_issues.with(sample_prior = "only"), nsamples = 400, type = "bars")  + xlim(-1, 15)

Beta parameter influence

We choose a beta prior that allows for large effects (+-10 issues) but is skeptical to any effects larger than +-4 issues.

sim.size <- 1000
sim.intercept <- rnorm(sim.size, 1.5, 1)
sim.beta <- rnorm(sim.size, 0, 1)
sim.beta.diff <- exp(sim.intercept + sim.beta) - exp(sim.intercept)
sim.beta.diff.min <- sim.beta.diff

data.frame(x = sim.beta.diff.min) %>%
  ggplot(aes(x)) +
  geom_density() +
  xlim(-15, 15) +
  labs(
    title = "Beta parameter prior influence",
    x = "Issues difference",
    y = "Density"
  )

Model fit

We check the posterior distribution and can see that the model seems to have been able to fit the data well Sampling seems to also have worked well as Rhat values are close to 1 and the sampling plots look nice.

Posterior predictive check

pp_check(sonarqube_issues.with(), nsamples = 200, type = "bars") + xlim(-1, 15)

Summary

summary(sonarqube_issues.with())
##  Family: negbinomial 
##   Links: mu = log; shape = identity 
## Formula: sonarqube_issues ~ 1 + high_debt_version + (1 | session) 
##    Data: as.data.frame(data) (Number of observations: 44) 
##   Draws: 4 chains, each with iter = 2000; warmup = 1000; thin = 1;
##          total post-warmup draws = 4000
## 
## Group-Level Effects: 
## ~session (Number of levels: 22) 
##               Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
## sd(Intercept)     0.50      0.36     0.02     1.29 1.00      835     2026
## 
## Population-Level Effects: 
##                        Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS
## Intercept                  0.73      0.33     0.09     1.39 1.00     2154
## high_debt_versionfalse    -0.66      0.42    -1.50     0.20 1.00     4438
##                        Tail_ESS
## Intercept                  2255
## high_debt_versionfalse     2810
## 
## Family Specific Parameters: 
##       Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
## shape     1.17      2.97     0.30     3.98 1.00      971      772
## 
## Draws were sampled using sampling(NUTS). For each parameter, Bulk_ESS
## and Tail_ESS are effective sample size measures, and Rhat is the potential
## scale reduction factor on split chains (at convergence, Rhat = 1).

Sampling plots

plot(sonarqube_issues.with(), ask = FALSE)

Model predictor extenstions

# default prior for monotonic predictor
edlvl_prior <- prior(dirichlet(2), class = "simo", coef = "moeducation_level1")

One variable

loo_result <- loo(
  # Benchmark model(s)
  sonarqube_issues.with(),
  # New model(s)
  sonarqube_issues.with("modified_lines"),
  sonarqube_issues.with("work_domain"),
  sonarqube_issues.with("work_experience_programming.s"),
  sonarqube_issues.with("work_experience_java.s"),
  sonarqube_issues.with("education_field"),
  sonarqube_issues.with("mo(education_level)", edlvl_prior),
  sonarqube_issues.with("workplace_peer_review"),
  sonarqube_issues.with("workplace_td_tracking"),
  sonarqube_issues.with("workplace_pair_programming"),
  sonarqube_issues.with("workplace_coding_standards"),
  sonarqube_issues.with("scenario"),
  sonarqube_issues.with("group")
)

Comparison

loo_result[2]
## $diffs
##                                                           elpd_diff se_diff
## sonarqube_issues.with("group")                             0.0       0.0   
## sonarqube_issues.with("work_domain")                      -0.9       2.1   
## sonarqube_issues.with()                                   -1.3       1.5   
## sonarqube_issues.with("workplace_coding_standards")       -1.5       1.4   
## sonarqube_issues.with("scenario")                         -1.7       1.8   
## sonarqube_issues.with("workplace_pair_programming")       -1.7       1.5   
## sonarqube_issues.with("mo(education_level)", edlvl_prior) -1.7       1.5   
## sonarqube_issues.with("workplace_peer_review")            -1.8       1.3   
## sonarqube_issues.with("workplace_td_tracking")            -2.0       1.5   
## sonarqube_issues.with("modified_lines")                   -2.0       1.5   
## sonarqube_issues.with("education_field")                  -2.1       1.6   
## sonarqube_issues.with("work_experience_programming.s")    -2.4       1.6   
## sonarqube_issues.with("work_experience_java.s")           -2.4       1.4

Diagnostics

loo_result[1]
## $loos
## $loos$`sonarqube_issues.with()`
## 
## Computed from 4000 by 44 log-likelihood matrix
## 
##          Estimate   SE
## elpd_loo    -78.0  7.9
## p_loo         6.9  1.3
## looic       156.1 15.7
## ------
## Monte Carlo SE of elpd_loo is NA.
## 
## Pareto k diagnostic values:
##                          Count Pct.    Min. n_eff
## (-Inf, 0.5]   (good)     28    63.6%   697       
##  (0.5, 0.7]   (ok)       11    25.0%   327       
##    (0.7, 1]   (bad)       5    11.4%   85        
##    (1, Inf)   (very bad)  0     0.0%   <NA>      
## See help('pareto-k-diagnostic') for details.
## 
## $loos$`sonarqube_issues.with("modified_lines")`
## 
## Computed from 4000 by 44 log-likelihood matrix
## 
##          Estimate   SE
## elpd_loo    -78.8  7.8
## p_loo         7.0  1.2
## looic       157.5 15.6
## ------
## Monte Carlo SE of elpd_loo is NA.
## 
## Pareto k diagnostic values:
##                          Count Pct.    Min. n_eff
## (-Inf, 0.5]   (good)     28    63.6%   782       
##  (0.5, 0.7]   (ok)       11    25.0%   230       
##    (0.7, 1]   (bad)       5    11.4%   187       
##    (1, Inf)   (very bad)  0     0.0%   <NA>      
## See help('pareto-k-diagnostic') for details.
## 
## $loos$`sonarqube_issues.with("work_domain")`
## 
## Computed from 4000 by 44 log-likelihood matrix
## 
##          Estimate   SE
## elpd_loo    -77.7  7.7
## p_loo         8.7  1.2
## looic       155.3 15.3
## ------
## Monte Carlo SE of elpd_loo is NA.
## 
## Pareto k diagnostic values:
##                          Count Pct.    Min. n_eff
## (-Inf, 0.5]   (good)     32    72.7%   961       
##  (0.5, 0.7]   (ok)        9    20.5%   784       
##    (0.7, 1]   (bad)       3     6.8%   100       
##    (1, Inf)   (very bad)  0     0.0%   <NA>      
## See help('pareto-k-diagnostic') for details.
## 
## $loos$`sonarqube_issues.with("work_experience_programming.s")`
## 
## Computed from 4000 by 44 log-likelihood matrix
## 
##          Estimate   SE
## elpd_loo    -79.1  8.3
## p_loo         8.5  2.0
## looic       158.2 16.5
## ------
## Monte Carlo SE of elpd_loo is NA.
## 
## Pareto k diagnostic values:
##                          Count Pct.    Min. n_eff
## (-Inf, 0.5]   (good)     28    63.6%   679       
##  (0.5, 0.7]   (ok)       12    27.3%   204       
##    (0.7, 1]   (bad)       4     9.1%   20        
##    (1, Inf)   (very bad)  0     0.0%   <NA>      
## See help('pareto-k-diagnostic') for details.
## 
## $loos$`sonarqube_issues.with("work_experience_java.s")`
## 
## Computed from 4000 by 44 log-likelihood matrix
## 
##          Estimate   SE
## elpd_loo    -79.1  8.2
## p_loo         8.1  1.7
## looic       158.3 16.4
## ------
## Monte Carlo SE of elpd_loo is NA.
## 
## Pareto k diagnostic values:
##                          Count Pct.    Min. n_eff
## (-Inf, 0.5]   (good)     30    68.2%   750       
##  (0.5, 0.7]   (ok)        9    20.5%   1054      
##    (0.7, 1]   (bad)       5    11.4%   37        
##    (1, Inf)   (very bad)  0     0.0%   <NA>      
## See help('pareto-k-diagnostic') for details.
## 
## $loos$`sonarqube_issues.with("education_field")`
## 
## Computed from 4000 by 44 log-likelihood matrix
## 
##          Estimate   SE
## elpd_loo    -78.9  7.9
## p_loo         7.6  1.2
## looic       157.7 15.8
## ------
## Monte Carlo SE of elpd_loo is NA.
## 
## Pareto k diagnostic values:
##                          Count Pct.    Min. n_eff
## (-Inf, 0.5]   (good)     27    61.4%   580       
##  (0.5, 0.7]   (ok)       15    34.1%   243       
##    (0.7, 1]   (bad)       2     4.5%   732       
##    (1, Inf)   (very bad)  0     0.0%   <NA>      
## See help('pareto-k-diagnostic') for details.
## 
## $loos$`sonarqube_issues.with("mo(education_level)", edlvl_prior)`
## 
## Computed from 4000 by 44 log-likelihood matrix
## 
##          Estimate   SE
## elpd_loo    -78.5  7.8
## p_loo         7.1  1.2
## looic       156.9 15.7
## ------
## Monte Carlo SE of elpd_loo is NA.
## 
## Pareto k diagnostic values:
##                          Count Pct.    Min. n_eff
## (-Inf, 0.5]   (good)     28    63.6%   799       
##  (0.5, 0.7]   (ok)       11    25.0%   459       
##    (0.7, 1]   (bad)       5    11.4%   75        
##    (1, Inf)   (very bad)  0     0.0%   <NA>      
## See help('pareto-k-diagnostic') for details.
## 
## $loos$`sonarqube_issues.with("workplace_peer_review")`
## 
## Computed from 4000 by 44 log-likelihood matrix
## 
##          Estimate   SE
## elpd_loo    -78.6  8.0
## p_loo         7.3  1.4
## looic       157.2 16.1
## ------
## Monte Carlo SE of elpd_loo is NA.
## 
## Pareto k diagnostic values:
##                          Count Pct.    Min. n_eff
## (-Inf, 0.5]   (good)     28    63.6%   703       
##  (0.5, 0.7]   (ok)       12    27.3%   235       
##    (0.7, 1]   (bad)       4     9.1%   34        
##    (1, Inf)   (very bad)  0     0.0%   <NA>      
## See help('pareto-k-diagnostic') for details.
## 
## $loos$`sonarqube_issues.with("workplace_td_tracking")`
## 
## Computed from 4000 by 44 log-likelihood matrix
## 
##          Estimate   SE
## elpd_loo    -78.7  8.0
## p_loo         7.5  1.4
## looic       157.5 16.0
## ------
## Monte Carlo SE of elpd_loo is NA.
## 
## Pareto k diagnostic values:
##                          Count Pct.    Min. n_eff
## (-Inf, 0.5]   (good)     29    65.9%   590       
##  (0.5, 0.7]   (ok)       10    22.7%   145       
##    (0.7, 1]   (bad)       5    11.4%   53        
##    (1, Inf)   (very bad)  0     0.0%   <NA>      
## See help('pareto-k-diagnostic') for details.
## 
## $loos$`sonarqube_issues.with("workplace_pair_programming")`
## 
## Computed from 4000 by 44 log-likelihood matrix
## 
##          Estimate   SE
## elpd_loo    -78.4  7.9
## p_loo         6.9  1.3
## looic       156.8 15.9
## ------
## Monte Carlo SE of elpd_loo is NA.
## 
## Pareto k diagnostic values:
##                          Count Pct.    Min. n_eff
## (-Inf, 0.5]   (good)     30    68.2%   305       
##  (0.5, 0.7]   (ok)       12    27.3%   541       
##    (0.7, 1]   (bad)       2     4.5%   98        
##    (1, Inf)   (very bad)  0     0.0%   <NA>      
## See help('pareto-k-diagnostic') for details.
## 
## $loos$`sonarqube_issues.with("workplace_coding_standards")`
## 
## Computed from 4000 by 44 log-likelihood matrix
## 
##          Estimate   SE
## elpd_loo    -78.3  8.2
## p_loo         7.9  1.7
## looic       156.5 16.4
## ------
## Monte Carlo SE of elpd_loo is NA.
## 
## Pareto k diagnostic values:
##                          Count Pct.    Min. n_eff
## (-Inf, 0.5]   (good)     29    65.9%   575       
##  (0.5, 0.7]   (ok)       11    25.0%   105       
##    (0.7, 1]   (bad)       4     9.1%   59        
##    (1, Inf)   (very bad)  0     0.0%   <NA>      
## See help('pareto-k-diagnostic') for details.
## 
## $loos$`sonarqube_issues.with("scenario")`
## 
## Computed from 4000 by 44 log-likelihood matrix
## 
##          Estimate   SE
## elpd_loo    -78.4  8.1
## p_loo         8.3  1.6
## looic       156.8 16.1
## ------
## Monte Carlo SE of elpd_loo is NA.
## 
## Pareto k diagnostic values:
##                          Count Pct.    Min. n_eff
## (-Inf, 0.5]   (good)     27    61.4%   550       
##  (0.5, 0.7]   (ok)       14    31.8%   134       
##    (0.7, 1]   (bad)       3     6.8%   139       
##    (1, Inf)   (very bad)  0     0.0%   <NA>      
## See help('pareto-k-diagnostic') for details.
## 
## $loos$`sonarqube_issues.with("group")`
## 
## Computed from 4000 by 44 log-likelihood matrix
## 
##          Estimate   SE
## elpd_loo    -76.8  7.7
## p_loo         7.7  1.2
## looic       153.5 15.5
## ------
## Monte Carlo SE of elpd_loo is NA.
## 
## Pareto k diagnostic values:
##                          Count Pct.    Min. n_eff
## (-Inf, 0.5]   (good)     28    63.6%   763       
##  (0.5, 0.7]   (ok)       13    29.5%   233       
##    (0.7, 1]   (bad)       3     6.8%   186       
##    (1, Inf)   (very bad)  0     0.0%   <NA>      
## See help('pareto-k-diagnostic') for details.

Candidate models

We inspect some of our top performing models.

All models seems to have sampled nicely (rhat = 1 and fluffy plots) they also have about the same fit to the data and similar estimates for the high_debt_version beta parameter.

sonarqube_issues0

We select the simplest model as a baseline.

sonarqube_issues0 <- brm(
  "sonarqube_issues ~ 1 + high_debt_version + (1 | session)",
  prior = c(
    prior(normal(0, 1), class = "b"),
    prior(normal(1.5, 1), class = "Intercept"),
    prior(exponential(1), class = "sd"),
    prior(gamma(0.01, 0.01), class = "shape")
  ),
  family = negbinomial(),
  data = d.both_completed,
  control = list(adapt_delta = 0.97),
  file = "fits/sonarqube_issues0",
  file_refit = "on_change",
  seed = 20210421
)

Summary

summary(sonarqube_issues0)
## Warning: There were 1 divergent transitions after warmup. Increasing adapt_delta
## above 0.97 may help. See http://mc-stan.org/misc/warnings.html#divergent-
## transitions-after-warmup
##  Family: negbinomial 
##   Links: mu = log; shape = identity 
## Formula: sonarqube_issues ~ 1 + high_debt_version + (1 | session) 
##    Data: d.both_completed (Number of observations: 44) 
##   Draws: 4 chains, each with iter = 2000; warmup = 1000; thin = 1;
##          total post-warmup draws = 4000
## 
## Group-Level Effects: 
## ~session (Number of levels: 22) 
##               Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
## sd(Intercept)     0.49      0.35     0.02     1.30 1.01     1007     1838
## 
## Population-Level Effects: 
##                        Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS
## Intercept                  0.75      0.33     0.11     1.43 1.00     2552
## high_debt_versionfalse    -0.66      0.42    -1.47     0.18 1.00     5800
##                        Tail_ESS
## Intercept                  2449
## high_debt_versionfalse     2938
## 
## Family Specific Parameters: 
##       Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
## shape     1.15      2.44     0.30     4.21 1.00     1054      712
## 
## Draws were sampled using sampling(NUTS). For each parameter, Bulk_ESS
## and Tail_ESS are effective sample size measures, and Rhat is the potential
## scale reduction factor on split chains (at convergence, Rhat = 1).

Random effects

ranef(sonarqube_issues0)
## $session
## , , Intercept
## 
##                              Estimate Est.Error       Q2.5     Q97.5
## 6033d69a5af2c702367b3a95 -0.009987209 0.4616050 -1.0502194 0.9296742
## 6033d90a5af2c702367b3a96 -0.187905851 0.5266761 -1.4824487 0.7626271
## 6034fc165af2c702367b3a98  0.424770535 0.5508350 -0.3151748 1.7620497
## 603500725af2c702367b3a99 -0.149993543 0.5203650 -1.3902790 0.8116594
## 603f97625af2c702367b3a9d  0.434690481 0.5719282 -0.3166209 1.8396017
## 603fd5d95af2c702367b3a9e -0.033234698 0.4784248 -1.0748305 0.9508240
## 60409b7b5af2c702367b3a9f  0.222350182 0.4773440 -0.5626926 1.3970872
## 604b82b5a7718fbed181b336  0.365265498 0.5319191 -0.3830703 1.7044967
## 6050c1bf856f36729d2e5218 -0.152261556 0.5417575 -1.5339989 0.7790176
## 6050e1e7856f36729d2e5219  0.071377411 0.4491386 -0.8238129 1.0987777
## 6055fdc6856f36729d2e521b  0.129041352 0.4553217 -0.7556882 1.2030735
## 60589862856f36729d2e521f -0.296572945 0.6319051 -1.8940926 0.5919265
## 605afa3a856f36729d2e5222 -0.298424247 0.6363673 -1.9319767 0.6196124
## 605c8bc6856f36729d2e5223 -0.009098036 0.4647879 -1.0061963 0.9950005
## 605f3f2d856f36729d2e5224  0.102293653 0.4646004 -0.7892336 1.2281876
## 605f46c3856f36729d2e5225 -0.297769037 0.6253277 -1.9128928 0.5995682
## 60605337856f36729d2e5226 -0.299624015 0.6465762 -1.9051299 0.6031115
## 60609ae6856f36729d2e5228 -0.082241411 0.4572846 -1.1307307 0.8035126
## 6061ce91856f36729d2e522e -0.152201237 0.5380366 -1.4275426 0.7667674
## 6061f106856f36729d2e5231 -0.301762048 0.6090987 -1.8612133 0.5900917
## 6068ea9f856f36729d2e523e -0.021681111 0.4564977 -0.9852528 0.9455471
## 6075ab05856f36729d2e5247  0.057655955 0.4571002 -0.8745741 1.1040265

Sampling plots

plot(sonarqube_issues0, ask = FALSE)

Posterior predictive check

pp_check(sonarqube_issues0, nsamples = 200, type = "bars")  + xlim(-1, 15)

sonarqube_issues1

We select the best performing model with one variable.

sonarqube_issues1 <- brm(
  "sonarqube_issues ~ 1 + high_debt_version + (1 | session) + group",
  prior = c(
    prior(normal(0, 1), class = "b"),
    prior(normal(1.5, 1), class = "Intercept"),
    prior(exponential(1), class = "sd"),
    prior(gamma(0.01, 0.01), class = "shape")
  ),
  family = negbinomial(),
  data = d.both_completed,
  control = list(adapt_delta = 0.97),
  file = "fits/sonarqube_issues1",
  file_refit = "on_change",
  seed = 20210421
)

Summary

summary(sonarqube_issues1)
##  Family: negbinomial 
##   Links: mu = log; shape = identity 
## Formula: sonarqube_issues ~ 1 + high_debt_version + (1 | session) + group 
##    Data: d.both_completed (Number of observations: 44) 
##   Draws: 4 chains, each with iter = 2000; warmup = 1000; thin = 1;
##          total post-warmup draws = 4000
## 
## Group-Level Effects: 
## ~session (Number of levels: 22) 
##               Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
## sd(Intercept)     0.42      0.31     0.01     1.17 1.00     1330     2047
## 
## Population-Level Effects: 
##                           Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS
## Intercept                     0.30      0.43    -0.50     1.21 1.00     3986
## high_debt_versionfalse       -0.68      0.42    -1.51     0.15 1.00     5157
## groupconsultants              0.76      0.59    -0.43     1.90 1.00     4093
## groupfriends                  0.08      0.59    -1.04     1.25 1.00     4374
## groupprofessionalMcontact    -0.59      0.87    -2.28     1.07 1.00     5423
## groupstudents                 0.84      0.52    -0.23     1.82 1.00     4260
##                           Tail_ESS
## Intercept                     2960
## high_debt_versionfalse        3038
## groupconsultants              3118
## groupfriends                  2924
## groupprofessionalMcontact     3008
## groupstudents                 2896
## 
## Family Specific Parameters: 
##       Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
## shape     1.17      1.22     0.35     3.68 1.00     1733     1744
## 
## Draws were sampled using sampling(NUTS). For each parameter, Bulk_ESS
## and Tail_ESS are effective sample size measures, and Rhat is the potential
## scale reduction factor on split chains (at convergence, Rhat = 1).

Random effects

ranef(sonarqube_issues1)
## $session
## , , Intercept
## 
##                             Estimate Est.Error       Q2.5     Q97.5
## 6033d69a5af2c702367b3a95 -0.08793163 0.4254560 -1.1011332 0.7540889
## 6033d90a5af2c702367b3a96 -0.20079067 0.4833733 -1.4731654 0.5983684
## 6034fc165af2c702367b3a98  0.26693379 0.4688284 -0.4313584 1.4474426
## 603500725af2c702367b3a99 -0.16850737 0.4771840 -1.3597238 0.6485513
## 603f97625af2c702367b3a9d  0.27052649 0.4643893 -0.4172192 1.4048541
## 603fd5d95af2c702367b3a9e -0.08280594 0.4339497 -1.0559824 0.7568240
## 60409b7b5af2c702367b3a9f  0.09936763 0.4179527 -0.6686725 1.1184929
## 604b82b5a7718fbed181b336  0.24805256 0.4805990 -0.4677418 1.4732993
## 6050c1bf856f36729d2e5218 -0.08210783 0.4563209 -1.1825350 0.7920952
## 6050e1e7856f36729d2e5219  0.12870440 0.4448802 -0.6728105 1.2342085
## 6055fdc6856f36729d2e521b  0.17392195 0.4605721 -0.5819869 1.3029758
## 60589862856f36729d2e521f -0.19803871 0.5303025 -1.5868327 0.6171970
## 605afa3a856f36729d2e5222 -0.15516621 0.5380156 -1.5332180 0.7607105
## 605c8bc6856f36729d2e5223  0.05054397 0.4094259 -0.7964717 0.9778253
## 605f3f2d856f36729d2e5224  0.18007677 0.4544009 -0.5891896 1.3008193
## 605f46c3856f36729d2e5225 -0.19480853 0.5256472 -1.5345202 0.6579007
## 60605337856f36729d2e5226 -0.21522552 0.5274579 -1.6080612 0.5832447
## 60609ae6856f36729d2e5228 -0.01764193 0.4307146 -1.0229100 0.8797154
## 6061ce91856f36729d2e522e -0.08517517 0.4716213 -1.2050640 0.8007288
## 6061f106856f36729d2e5231 -0.21215550 0.5317851 -1.5469798 0.6390249
## 6068ea9f856f36729d2e523e -0.07986930 0.4309126 -1.1217416 0.7361495
## 6075ab05856f36729d2e5247 -0.02276170 0.4130337 -0.9302098 0.8766495

Sampling plots

plot(sonarqube_issues1, ask = FALSE)

Posterior predictive check

pp_check(sonarqube_issues1, nsamples = 200, type = "bars") + xlim(-1, 15)

Final model

All candidate models look nice, none is significantly better than the others, we will proceed the simplest model: sonarqube_issues0

Variations

We will try a few different variations of the selected candidate model.

All data points

Some participants did only complete one scenario. Those has been excluded from the initial dataset to improve sampling of the models. We do however want to use all data we can and will therefore try to fit the model with the complete dataset.

sonarqube_issues0.all <- brm(
  "sonarqube_issues ~ 1 + high_debt_version + (1 | session)",
  prior = c(
    prior(normal(0, 1), class = "b"),
    prior(normal(1.5, 1), class = "Intercept"),
    prior(exponential(1), class = "sd"),
    prior(gamma(0.01, 0.01), class = "shape")
  ),
  family = negbinomial(),
  data = as.data.frame(d.completed),
  control = list(adapt_delta = 0.97),
  file = "fits/sonarqube_issues0.all",
  file_refit = "on_change",
  seed = 20210421
)
Summary
summary(sonarqube_issues0.all)
##  Family: negbinomial 
##   Links: mu = log; shape = identity 
## Formula: sonarqube_issues ~ 1 + high_debt_version + (1 | session) 
##    Data: as.data.frame(d.completed) (Number of observations: 51) 
##   Draws: 4 chains, each with iter = 2000; warmup = 1000; thin = 1;
##          total post-warmup draws = 4000
## 
## Group-Level Effects: 
## ~session (Number of levels: 29) 
##               Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
## sd(Intercept)     0.47      0.30     0.02     1.15 1.00      778     1734
## 
## Population-Level Effects: 
##                        Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS
## Intercept                  0.72      0.30     0.15     1.32 1.00     2881
## high_debt_versionfalse    -0.75      0.38    -1.48     0.02 1.00     5581
##                        Tail_ESS
## Intercept                  3030
## high_debt_versionfalse     3190
## 
## Family Specific Parameters: 
##       Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
## shape     1.61      4.83     0.40     5.50 1.01     1073      908
## 
## Draws were sampled using sampling(NUTS). For each parameter, Bulk_ESS
## and Tail_ESS are effective sample size measures, and Rhat is the potential
## scale reduction factor on split chains (at convergence, Rhat = 1).
Random effects
ranef(sonarqube_issues0.all)
## $session
## , , Intercept
## 
##                              Estimate Est.Error       Q2.5     Q97.5
## 6033c6fc5af2c702367b3a93  0.161886044 0.4528918 -0.6593817 1.2613277
## 6033d69a5af2c702367b3a95 -0.008572596 0.4359737 -0.9191957 0.9595582
## 6033d90a5af2c702367b3a96 -0.176302002 0.4858606 -1.3454738 0.6798654
## 6034fc165af2c702367b3a98  0.456057900 0.5284069 -0.2406157 1.6866567
## 603500725af2c702367b3a99 -0.146520835 0.4930724 -1.3590706 0.7570271
## 603f84f15af2c702367b3a9b  0.010150381 0.4816875 -1.0294584 1.0173712
## 603f97625af2c702367b3a9d  0.478607813 0.5573800 -0.2700666 1.7566582
## 603fd5d95af2c702367b3a9e -0.031706417 0.4318265 -0.9877084 0.8982282
## 60409b7b5af2c702367b3a9f  0.238552029 0.4482937 -0.4838794 1.3117787
## 604b82b5a7718fbed181b336  0.405950848 0.5110235 -0.2914744 1.6398606
## 604f1239a7718fbed181b33f -0.088461366 0.5056780 -1.2431911 0.9170678
## 6050c1bf856f36729d2e5218 -0.149131412 0.4884993 -1.2958397 0.7479989
## 6050e1e7856f36729d2e5219  0.083641977 0.4325663 -0.7733990 1.1073544
## 6055fdc6856f36729d2e521b  0.132246488 0.4442384 -0.6740875 1.1882254
## 60579f2a856f36729d2e521e  0.012344744 0.4829837 -0.9913816 1.0484549
## 60589862856f36729d2e521f -0.294598062 0.5687279 -1.7187600 0.5470783
## 605a30a7856f36729d2e5221 -0.127009728 0.5190265 -1.3564036 0.8741563
## 605afa3a856f36729d2e5222 -0.295572385 0.5840496 -1.7493740 0.5511923
## 605c8bc6856f36729d2e5223 -0.009083656 0.4147736 -0.8888632 0.8930097
## 605f3f2d856f36729d2e5224  0.117285653 0.4363941 -0.6869239 1.1279872
## 605f46c3856f36729d2e5225 -0.280752274 0.5522386 -1.6229684 0.5522585
## 60605337856f36729d2e5226 -0.294207156 0.5729919 -1.7446652 0.5807301
## 60609ae6856f36729d2e5228 -0.078975864 0.4561461 -1.1212379 0.8426180
## 6061ce91856f36729d2e522e -0.154804150 0.4799145 -1.3298704 0.7217109
## 6061f106856f36729d2e5231 -0.299956121 0.5865593 -1.7644569 0.5906793
## 60672faa856f36729d2e523c  0.015862527 0.4773318 -0.9958862 1.0135475
## 6068ea9f856f36729d2e523e -0.003588927 0.4213246 -0.9198978 0.8690974
## 606db69d856f36729d2e5243 -0.137276766 0.5491813 -1.4784213 0.8537878
## 6075ab05856f36729d2e5247  0.049210335 0.4322615 -0.8124429 1.0306560
Sampling plots
plot(sonarqube_issues0.all, ask = FALSE)

Posterior predictive check
pp_check(sonarqube_issues0.all, nsamples = 200, type = "bars") + xlim(-1, 15)

With experience predictor

As including all data points didn’t harm the model we will create this variant with all data points as well.

This variation includes work_experience_programming.s predictors as it can give further insight into how experience play a factor in the effect we try to measure. This is especially important as our sampling shewed towards containing less experienced developer than the population at large.

sonarqube_issues0.all.exp <- brm(
  "sonarqube_issues ~ 1 + high_debt_version + (1 | session) + work_experience_programming.s",
  prior = c(
    prior(normal(0, 1), class = "b"),
    prior(normal(1.5, 1), class = "Intercept"),
    prior(exponential(1), class = "sd"),
    prior(gamma(0.01, 0.01), class = "shape")
  ),
  family = negbinomial(),
  data = as.data.frame(d.completed),
  control = list(adapt_delta = 0.99),
  file = "fits/sonarqube_issues0.all.exp",
  file_refit = "on_change",
  seed = 20210421
)
Summary
summary(sonarqube_issues0.all.exp)
##  Family: negbinomial 
##   Links: mu = log; shape = identity 
## Formula: sonarqube_issues ~ 1 + high_debt_version + (1 | session) + work_experience_programming.s 
##    Data: as.data.frame(d.completed) (Number of observations: 51) 
##   Draws: 4 chains, each with iter = 2000; warmup = 1000; thin = 1;
##          total post-warmup draws = 4000
## 
## Group-Level Effects: 
## ~session (Number of levels: 29) 
##               Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
## sd(Intercept)     0.48      0.32     0.02     1.16 1.01      843     1538
## 
## Population-Level Effects: 
##                               Estimate Est.Error l-95% CI u-95% CI Rhat
## Intercept                         0.73      0.30     0.11     1.33 1.00
## high_debt_versionfalse           -0.79      0.38    -1.51    -0.05 1.00
## work_experience_programming.s    -0.22      0.24    -0.70     0.24 1.00
##                               Bulk_ESS Tail_ESS
## Intercept                         2191     2425
## high_debt_versionfalse            4989     2928
## work_experience_programming.s     3558     2635
## 
## Family Specific Parameters: 
##       Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
## shape     1.38      1.92     0.39     4.70 1.00     1350     1535
## 
## Draws were sampled using sampling(NUTS). For each parameter, Bulk_ESS
## and Tail_ESS are effective sample size measures, and Rhat is the potential
## scale reduction factor on split chains (at convergence, Rhat = 1).
Random effects
ranef(sonarqube_issues0.all.exp)
## $session
## , , Intercept
## 
##                               Estimate Est.Error       Q2.5     Q97.5
## 6033c6fc5af2c702367b3a93  0.1387331993 0.4787681 -0.7355703 1.3114903
## 6033d69a5af2c702367b3a95 -0.0409921257 0.4427061 -1.0308361 0.8693022
## 6033d90a5af2c702367b3a96 -0.2092045772 0.5291557 -1.5061767 0.6850437
## 6034fc165af2c702367b3a98  0.4418368193 0.5376180 -0.2726670 1.7603479
## 603500725af2c702367b3a99 -0.1825905260 0.5007371 -1.4387202 0.6774876
## 603f84f15af2c702367b3a9b -0.0099472862 0.4802221 -1.0274654 1.0327163
## 603f97625af2c702367b3a9d  0.4554365705 0.5386001 -0.2776636 1.7251454
## 603fd5d95af2c702367b3a9e -0.0543882795 0.4754935 -1.1213730 0.9212237
## 60409b7b5af2c702367b3a9f  0.2046696997 0.4604153 -0.5741948 1.3381165
## 604b82b5a7718fbed181b336  0.3873645878 0.5114101 -0.2937607 1.6177832
## 604f1239a7718fbed181b33f -0.0923220453 0.5079860 -1.2587866 0.9414571
## 6050c1bf856f36729d2e5218 -0.1309724158 0.4786292 -1.2250585 0.7836002
## 6050e1e7856f36729d2e5219  0.0702908691 0.4177468 -0.7557189 1.0300036
## 6055fdc6856f36729d2e521b  0.1170809140 0.4342524 -0.7042097 1.1544104
## 60579f2a856f36729d2e521e  0.0004900546 0.4796501 -0.9936349 1.0265199
## 60589862856f36729d2e521f -0.2274524477 0.5706230 -1.6737325 0.6821970
## 605a30a7856f36729d2e5221 -0.1577676275 0.5460883 -1.5122223 0.8412269
## 605afa3a856f36729d2e5222 -0.2559865072 0.5641741 -1.6809735 0.6312147
## 605c8bc6856f36729d2e5223  0.0064255915 0.4424725 -0.9303494 0.9934114
## 605f3f2d856f36729d2e5224  0.3324069958 0.5995096 -0.5812309 1.8010839
## 605f46c3856f36729d2e5225 -0.3141036317 0.5723557 -1.7846489 0.5453950
## 60605337856f36729d2e5226 -0.3165459109 0.6182186 -1.9174237 0.5772180
## 60609ae6856f36729d2e5228 -0.1186270296 0.4551608 -1.1710651 0.7511623
## 6061ce91856f36729d2e522e -0.1810360391 0.4997436 -1.4197739 0.6822217
## 6061f106856f36729d2e5231 -0.3172017244 0.6033642 -1.9077646 0.5541131
## 60672faa856f36729d2e523c  0.0043945182 0.4776146 -1.0353118 1.0087955
## 6068ea9f856f36729d2e523e -0.0088914153 0.4343647 -0.9455057 0.9523177
## 606db69d856f36729d2e5243 -0.1274824128 0.5316038 -1.4126639 0.8420767
## 6075ab05856f36729d2e5247  0.0380257755 0.4178063 -0.8240417 0.9911064
Loo comparison
loo(
  sonarqube_issues0.all,
  sonarqube_issues0.all.exp
)
## Output of model 'sonarqube_issues0.all':
## 
## Computed from 4000 by 51 log-likelihood matrix
## 
##          Estimate   SE
## elpd_loo    -88.1  8.2
## p_loo         7.9  1.7
## looic       176.2 16.3
## ------
## Monte Carlo SE of elpd_loo is NA.
## 
## Pareto k diagnostic values:
##                          Count Pct.    Min. n_eff
## (-Inf, 0.5]   (good)     38    74.5%   802       
##  (0.5, 0.7]   (ok)       10    19.6%   213       
##    (0.7, 1]   (bad)       3     5.9%   81        
##    (1, Inf)   (very bad)  0     0.0%   <NA>      
## See help('pareto-k-diagnostic') for details.
## 
## Output of model 'sonarqube_issues0.all.exp':
## 
## Computed from 4000 by 51 log-likelihood matrix
## 
##          Estimate   SE
## elpd_loo    -89.7  8.9
## p_loo         9.9  2.7
## looic       179.4 17.8
## ------
## Monte Carlo SE of elpd_loo is NA.
## 
## Pareto k diagnostic values:
##                          Count Pct.    Min. n_eff
## (-Inf, 0.5]   (good)     32    62.7%   920       
##  (0.5, 0.7]   (ok)       13    25.5%   209       
##    (0.7, 1]   (bad)       5     9.8%   121       
##    (1, Inf)   (very bad)  1     2.0%   10        
## See help('pareto-k-diagnostic') for details.
## 
## Model comparisons:
##                           elpd_diff se_diff
## sonarqube_issues0.all      0.0       0.0   
## sonarqube_issues0.all.exp -1.6       2.7
Sampling plots
plot(sonarqube_issues0.all.exp, ask = FALSE)

Posterior predictive check
pp_check(sonarqube_issues0.all.exp, nsamples = 200, type = "bars") + xlim(-1, 15)

Final model

  • Fitting the model to all data point did not significantly damage the model and will be used as is a more fair representation of reality.
  • Adding the experience predictors did not significantly damage the model and will be used as it provides useful insight.

This means that our final model, with all data points and experience predictors, is sonarqube_issues0.all.exp

Interpreting the model

To begin interpreting the model we look at how it’s parameters were estimated. As our research is focused on how the outcome of the model is effected we will mainly analyze the \(\beta\) parameters.

\(\beta\) parameters

mcmc_areas(sonarqube_issues0.all.exp, pars = c("b_high_debt_versionfalse", "b_work_experience_programming.s"), prob = 0.95) + scale_y_discrete() +
  scale_y_discrete(labels=c("High debt version: false", "Professional programming experience")) +
  ggtitle("Beta parameters densities in sonarqube issues model", subtitle = "Shaded region marks 95% of the density. Line marks the median")

Effects sizes

scale_programming_experience <- function(x) {
  (x - mean(d.completed$work_experience_programming))/ sd(d.completed$work_experience_programming)
}
unscale_programming_experience <- function(x) {
  x * sd(d.completed$work_experience_programming) + mean(d.completed$work_experience_programming)
}

post_settings <- expand.grid(
  high_debt_version = c("false", "true"),
  session = NA,
  work_experience_programming.s = sapply(c(0, 3, 10, 25, 40), scale_programming_experience)
)

post <- posterior_predict(sonarqube_issues0.all.exp, newdata = post_settings) %>%
  melt(value.name = "estimate", varnames = c("sample_number", "settings_id")) %>%
  left_join(
    rowid_to_column(post_settings, var= "settings_id"),
    by = "settings_id"
  ) %>%
  mutate(work_experience_programming = unscale_programming_experience(work_experience_programming.s)) %>%
  select(
    estimate,
    high_debt_version,
    work_experience_programming
  )%>%
  mutate(estimate = estimate)

ggplot(post, aes(x=estimate, fill = high_debt_version)) +
  geom_bar(position = "dodge2") +
  scale_fill_manual(
    name = "Debt version",
    labels = c("Low debt", "High debt"),
      values = c("lightblue", "darkblue")
  ) +
  facet_grid(rows = vars(work_experience_programming)) +
  labs(
    title = "SonarQube issues introduced / years of programming experience",
    subtitle = "Estimated for five different experience levels",
    x = "Issued introduced",
    y = "Incidence rate"
  ) + 
  xlim(-1, 10) + 
  scale_x_continuous(limits = c(-1,7), breaks = c(0,1,2,3,4,5,6,7), labels = c("0","1","2","3","4","5","6","7")) +
  scale_y_continuous(limits = NULL, breaks = sapply(c(0.1, 0.3, 0.5), function(x) x*nrow(post) / 10), labels = c("10%","30%","50%")) + 
  theme(legend.position = "top")

post.diff <- post %>% filter(high_debt_version == "true")
post.diff$estimate = post.diff$estimate -  filter(post, high_debt_version == "false")$estimate

ggplot(post.diff, aes(x=estimate)) +
  geom_boxplot(quantile_lines = TRUE, quantile_fun = hdi, vline_linetype = 2) +
  xlim(-7, 7) +
  facet_grid(rows = vars(work_experience_programming)) +
  labs(
    title = "SonarQube issues introduced difference / years of programming experience",
    subtitle = "Difference as: high debt issues - low debt issues",
    x = "Issues # difference"
  ) +
  scale_y_continuous(breaks = NULL)

We can then proceed to calculate some likelihoods:

d <- post
d.high <- d %>% filter(high_debt_version == "true") %>% pull(estimate)
d.low <- d %>% filter(high_debt_version == "false") %>% pull(estimate)
x <- sum(d.high) / sum(d.low)
x
## [1] 2.156998

Given all the simulated cases we find that they introduce 116% more issues in the high debt version.

d <- post %>% filter(work_experience_programming == 10)
d.high <- d %>% filter(high_debt_version == "true") %>% pull(estimate)
d.low <- d %>% filter(high_debt_version == "false") %>% pull(estimate)
x <- sum(d.high) / sum(d.low)
x
## [1] 2.226831

Considering developers with 10 years of professional programming experience we find that they introduce 123% more issues in the high debt version.

d <- post %>% filter(work_experience_programming == 0)
d.high <- d %>% filter(high_debt_version == "true") %>% pull(estimate)
d.low <- d %>% filter(high_debt_version == "false") %>% pull(estimate)
x <- sum(d.high) / sum(d.low)
x
## [1] 2.289173

Considering developers with no of professional programming experience we find that they introduce 129% more issues in the high debt version.

d <- post %>% filter(work_experience_programming == 25)
d.high <- d %>% filter(high_debt_version == "true") %>% pull(estimate)
d.low <- d %>% filter(high_debt_version == "false") %>% pull(estimate)
x <- sum(d.high) / sum(d.low)
x
## [1] 2.061877

Considering developers with 25 years of professional programming experience we find that they introduce 106% more issues in the high debt version.

