it looks like there might be a difference between the high and low debt groups, the variance is decidedly higher in the high debt group.
d.both_completed %>%
ggplot(aes(x=sonarqube_issues, fill=high_debt_version)) +
geom_boxplot() +
labs(
title = "Number of issuess for the different debt levels",
x ="Number of issues"
) +
scale_y_continuous(breaks = NULL) +
scale_fill_manual(
name = "Debt level",
labels = c("High debt", "Low debt"),
values = c("#7070FF", "lightblue"),
guide = guide_legend(reverse = TRUE)
)
d.both_completed %>%
pull(sonarqube_issues) %>%
summary()
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 0.000 0.000 1.000 1.659 3.000 12.000
sprintf("Variance: %.2f", var(pull(d.both_completed, sonarqube_issues)))
## [1] "Variance: 6.00"
Variable names are modeled using the negative binomial family rather than poisson since the variance is greater than the mean.
We include high_debt_verison
as a predictor in our model
as this variable represent the very effect we want to measure. We also
include a varying intercept for each individual to prevent the model
from learning too much from single participants with extreme
measurements.
We iterate over the model until we have sane priors, in this case a prior that reasonably cna fit our data without being too restrictive.
sonarqube_issues.with <- extendable_model(
base_name = "sonarqube_issues",
base_formula = "sonarqube_issues ~ 1 + high_debt_version + (1 | session)",
base_priors = c(
prior(normal(0, 1), class = "b"),
prior(normal(1.5, 1), class = "Intercept"),
prior(exponential(1), class = "sd"),
prior(gamma(0.01, 0.01), class = "shape")
),
family = negbinomial(),
data = d.both_completed,
base_control = list(adapt_delta = 0.98)
)
prior_summary(sonarqube_issues.with(only_priors= TRUE))
prior_summary(sonarqube_issues.with(sample_prior = "only"))
pp_check(sonarqube_issues.with(sample_prior = "only"), nsamples = 400, type = "bars") + xlim(-1, 15)
We choose a beta prior that allows for large effects (+-10 issues) but is skeptical to any effects larger than +-4 issues.
sim.size <- 1000
sim.intercept <- rnorm(sim.size, 1.5, 1)
sim.beta <- rnorm(sim.size, 0, 1)
sim.beta.diff <- exp(sim.intercept + sim.beta) - exp(sim.intercept)
sim.beta.diff.min <- sim.beta.diff
data.frame(x = sim.beta.diff.min) %>%
ggplot(aes(x)) +
geom_density() +
xlim(-15, 15) +
labs(
title = "Beta parameter prior influence",
x = "Issues difference",
y = "Density"
)
We check the posterior distribution and can see that the model seems to have been able to fit the data well Sampling seems to also have worked well as Rhat values are close to 1 and the sampling plots look nice.
pp_check(sonarqube_issues.with(), nsamples = 200, type = "bars") + xlim(-1, 15)
summary(sonarqube_issues.with())
## Family: negbinomial
## Links: mu = log; shape = identity
## Formula: sonarqube_issues ~ 1 + high_debt_version + (1 | session)
## Data: as.data.frame(data) (Number of observations: 44)
## Draws: 4 chains, each with iter = 2000; warmup = 1000; thin = 1;
## total post-warmup draws = 4000
##
## Group-Level Effects:
## ~session (Number of levels: 22)
## Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
## sd(Intercept) 0.50 0.36 0.02 1.29 1.00 835 2026
##
## Population-Level Effects:
## Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS
## Intercept 0.73 0.33 0.09 1.39 1.00 2154
## high_debt_versionfalse -0.66 0.42 -1.50 0.20 1.00 4438
## Tail_ESS
## Intercept 2255
## high_debt_versionfalse 2810
##
## Family Specific Parameters:
## Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
## shape 1.17 2.97 0.30 3.98 1.00 971 772
##
## Draws were sampled using sampling(NUTS). For each parameter, Bulk_ESS
## and Tail_ESS are effective sample size measures, and Rhat is the potential
## scale reduction factor on split chains (at convergence, Rhat = 1).
plot(sonarqube_issues.with(), ask = FALSE)
# default prior for monotonic predictor
edlvl_prior <- prior(dirichlet(2), class = "simo", coef = "moeducation_level1")
loo_result <- loo(
# Benchmark model(s)
sonarqube_issues.with(),
# New model(s)
sonarqube_issues.with("modified_lines"),
sonarqube_issues.with("work_domain"),
sonarqube_issues.with("work_experience_programming.s"),
sonarqube_issues.with("work_experience_java.s"),
sonarqube_issues.with("education_field"),
sonarqube_issues.with("mo(education_level)", edlvl_prior),
sonarqube_issues.with("workplace_peer_review"),
sonarqube_issues.with("workplace_td_tracking"),
sonarqube_issues.with("workplace_pair_programming"),
sonarqube_issues.with("workplace_coding_standards"),
sonarqube_issues.with("scenario"),
sonarqube_issues.with("group")
)
loo_result[2]
## $diffs
## elpd_diff se_diff
## sonarqube_issues.with("group") 0.0 0.0
## sonarqube_issues.with("work_domain") -0.9 2.1
## sonarqube_issues.with() -1.3 1.5
## sonarqube_issues.with("workplace_coding_standards") -1.5 1.4
## sonarqube_issues.with("scenario") -1.7 1.8
## sonarqube_issues.with("workplace_pair_programming") -1.7 1.5
## sonarqube_issues.with("mo(education_level)", edlvl_prior) -1.7 1.5
## sonarqube_issues.with("workplace_peer_review") -1.8 1.3
## sonarqube_issues.with("workplace_td_tracking") -2.0 1.5
## sonarqube_issues.with("modified_lines") -2.0 1.5
## sonarqube_issues.with("education_field") -2.1 1.6
## sonarqube_issues.with("work_experience_programming.s") -2.4 1.6
## sonarqube_issues.with("work_experience_java.s") -2.4 1.4
loo_result[1]
## $loos
## $loos$`sonarqube_issues.with()`
##
## Computed from 4000 by 44 log-likelihood matrix
##
## Estimate SE
## elpd_loo -78.0 7.9
## p_loo 6.9 1.3
## looic 156.1 15.7
## ------
## Monte Carlo SE of elpd_loo is NA.
##
## Pareto k diagnostic values:
## Count Pct. Min. n_eff
## (-Inf, 0.5] (good) 28 63.6% 697
## (0.5, 0.7] (ok) 11 25.0% 327
## (0.7, 1] (bad) 5 11.4% 85
## (1, Inf) (very bad) 0 0.0% <NA>
## See help('pareto-k-diagnostic') for details.
##
## $loos$`sonarqube_issues.with("modified_lines")`
##
## Computed from 4000 by 44 log-likelihood matrix
##
## Estimate SE
## elpd_loo -78.8 7.8
## p_loo 7.0 1.2
## looic 157.5 15.6
## ------
## Monte Carlo SE of elpd_loo is NA.
##
## Pareto k diagnostic values:
## Count Pct. Min. n_eff
## (-Inf, 0.5] (good) 28 63.6% 782
## (0.5, 0.7] (ok) 11 25.0% 230
## (0.7, 1] (bad) 5 11.4% 187
## (1, Inf) (very bad) 0 0.0% <NA>
## See help('pareto-k-diagnostic') for details.
##
## $loos$`sonarqube_issues.with("work_domain")`
##
## Computed from 4000 by 44 log-likelihood matrix
##
## Estimate SE
## elpd_loo -77.7 7.7
## p_loo 8.7 1.2
## looic 155.3 15.3
## ------
## Monte Carlo SE of elpd_loo is NA.
##
## Pareto k diagnostic values:
## Count Pct. Min. n_eff
## (-Inf, 0.5] (good) 32 72.7% 961
## (0.5, 0.7] (ok) 9 20.5% 784
## (0.7, 1] (bad) 3 6.8% 100
## (1, Inf) (very bad) 0 0.0% <NA>
## See help('pareto-k-diagnostic') for details.
##
## $loos$`sonarqube_issues.with("work_experience_programming.s")`
##
## Computed from 4000 by 44 log-likelihood matrix
##
## Estimate SE
## elpd_loo -79.1 8.3
## p_loo 8.5 2.0
## looic 158.2 16.5
## ------
## Monte Carlo SE of elpd_loo is NA.
##
## Pareto k diagnostic values:
## Count Pct. Min. n_eff
## (-Inf, 0.5] (good) 28 63.6% 679
## (0.5, 0.7] (ok) 12 27.3% 204
## (0.7, 1] (bad) 4 9.1% 20
## (1, Inf) (very bad) 0 0.0% <NA>
## See help('pareto-k-diagnostic') for details.
##
## $loos$`sonarqube_issues.with("work_experience_java.s")`
##
## Computed from 4000 by 44 log-likelihood matrix
##
## Estimate SE
## elpd_loo -79.1 8.2
## p_loo 8.1 1.7
## looic 158.3 16.4
## ------
## Monte Carlo SE of elpd_loo is NA.
##
## Pareto k diagnostic values:
## Count Pct. Min. n_eff
## (-Inf, 0.5] (good) 30 68.2% 750
## (0.5, 0.7] (ok) 9 20.5% 1054
## (0.7, 1] (bad) 5 11.4% 37
## (1, Inf) (very bad) 0 0.0% <NA>
## See help('pareto-k-diagnostic') for details.
##
## $loos$`sonarqube_issues.with("education_field")`
##
## Computed from 4000 by 44 log-likelihood matrix
##
## Estimate SE
## elpd_loo -78.9 7.9
## p_loo 7.6 1.2
## looic 157.7 15.8
## ------
## Monte Carlo SE of elpd_loo is NA.
##
## Pareto k diagnostic values:
## Count Pct. Min. n_eff
## (-Inf, 0.5] (good) 27 61.4% 580
## (0.5, 0.7] (ok) 15 34.1% 243
## (0.7, 1] (bad) 2 4.5% 732
## (1, Inf) (very bad) 0 0.0% <NA>
## See help('pareto-k-diagnostic') for details.
##
## $loos$`sonarqube_issues.with("mo(education_level)", edlvl_prior)`
##
## Computed from 4000 by 44 log-likelihood matrix
##
## Estimate SE
## elpd_loo -78.5 7.8
## p_loo 7.1 1.2
## looic 156.9 15.7
## ------
## Monte Carlo SE of elpd_loo is NA.
##
## Pareto k diagnostic values:
## Count Pct. Min. n_eff
## (-Inf, 0.5] (good) 28 63.6% 799
## (0.5, 0.7] (ok) 11 25.0% 459
## (0.7, 1] (bad) 5 11.4% 75
## (1, Inf) (very bad) 0 0.0% <NA>
## See help('pareto-k-diagnostic') for details.
##
## $loos$`sonarqube_issues.with("workplace_peer_review")`
##
## Computed from 4000 by 44 log-likelihood matrix
##
## Estimate SE
## elpd_loo -78.6 8.0
## p_loo 7.3 1.4
## looic 157.2 16.1
## ------
## Monte Carlo SE of elpd_loo is NA.
##
## Pareto k diagnostic values:
## Count Pct. Min. n_eff
## (-Inf, 0.5] (good) 28 63.6% 703
## (0.5, 0.7] (ok) 12 27.3% 235
## (0.7, 1] (bad) 4 9.1% 34
## (1, Inf) (very bad) 0 0.0% <NA>
## See help('pareto-k-diagnostic') for details.
##
## $loos$`sonarqube_issues.with("workplace_td_tracking")`
##
## Computed from 4000 by 44 log-likelihood matrix
##
## Estimate SE
## elpd_loo -78.7 8.0
## p_loo 7.5 1.4
## looic 157.5 16.0
## ------
## Monte Carlo SE of elpd_loo is NA.
##
## Pareto k diagnostic values:
## Count Pct. Min. n_eff
## (-Inf, 0.5] (good) 29 65.9% 590
## (0.5, 0.7] (ok) 10 22.7% 145
## (0.7, 1] (bad) 5 11.4% 53
## (1, Inf) (very bad) 0 0.0% <NA>
## See help('pareto-k-diagnostic') for details.
##
## $loos$`sonarqube_issues.with("workplace_pair_programming")`
##
## Computed from 4000 by 44 log-likelihood matrix
##
## Estimate SE
## elpd_loo -78.4 7.9
## p_loo 6.9 1.3
## looic 156.8 15.9
## ------
## Monte Carlo SE of elpd_loo is NA.
##
## Pareto k diagnostic values:
## Count Pct. Min. n_eff
## (-Inf, 0.5] (good) 30 68.2% 305
## (0.5, 0.7] (ok) 12 27.3% 541
## (0.7, 1] (bad) 2 4.5% 98
## (1, Inf) (very bad) 0 0.0% <NA>
## See help('pareto-k-diagnostic') for details.
##
## $loos$`sonarqube_issues.with("workplace_coding_standards")`
##
## Computed from 4000 by 44 log-likelihood matrix
##
## Estimate SE
## elpd_loo -78.3 8.2
## p_loo 7.9 1.7
## looic 156.5 16.4
## ------
## Monte Carlo SE of elpd_loo is NA.
##
## Pareto k diagnostic values:
## Count Pct. Min. n_eff
## (-Inf, 0.5] (good) 29 65.9% 575
## (0.5, 0.7] (ok) 11 25.0% 105
## (0.7, 1] (bad) 4 9.1% 59
## (1, Inf) (very bad) 0 0.0% <NA>
## See help('pareto-k-diagnostic') for details.
##
## $loos$`sonarqube_issues.with("scenario")`
##
## Computed from 4000 by 44 log-likelihood matrix
##
## Estimate SE
## elpd_loo -78.4 8.1
## p_loo 8.3 1.6
## looic 156.8 16.1
## ------
## Monte Carlo SE of elpd_loo is NA.
##
## Pareto k diagnostic values:
## Count Pct. Min. n_eff
## (-Inf, 0.5] (good) 27 61.4% 550
## (0.5, 0.7] (ok) 14 31.8% 134
## (0.7, 1] (bad) 3 6.8% 139
## (1, Inf) (very bad) 0 0.0% <NA>
## See help('pareto-k-diagnostic') for details.
##
## $loos$`sonarqube_issues.with("group")`
##
## Computed from 4000 by 44 log-likelihood matrix
##
## Estimate SE
## elpd_loo -76.8 7.7
## p_loo 7.7 1.2
## looic 153.5 15.5
## ------
## Monte Carlo SE of elpd_loo is NA.
##
## Pareto k diagnostic values:
## Count Pct. Min. n_eff
## (-Inf, 0.5] (good) 28 63.6% 763
## (0.5, 0.7] (ok) 13 29.5% 233
## (0.7, 1] (bad) 3 6.8% 186
## (1, Inf) (very bad) 0 0.0% <NA>
## See help('pareto-k-diagnostic') for details.
We inspect some of our top performing models.
All models seems to have sampled nicely (rhat = 1 and fluffy plots) they also have about the same fit to the data and similar estimates for the high_debt_version beta parameter.
We select the simplest model as a baseline.
sonarqube_issues0 <- brm(
"sonarqube_issues ~ 1 + high_debt_version + (1 | session)",
prior = c(
prior(normal(0, 1), class = "b"),
prior(normal(1.5, 1), class = "Intercept"),
prior(exponential(1), class = "sd"),
prior(gamma(0.01, 0.01), class = "shape")
),
family = negbinomial(),
data = d.both_completed,
control = list(adapt_delta = 0.97),
file = "fits/sonarqube_issues0",
file_refit = "on_change",
seed = 20210421
)
summary(sonarqube_issues0)
## Warning: There were 1 divergent transitions after warmup. Increasing adapt_delta
## above 0.97 may help. See http://mc-stan.org/misc/warnings.html#divergent-
## transitions-after-warmup
## Family: negbinomial
## Links: mu = log; shape = identity
## Formula: sonarqube_issues ~ 1 + high_debt_version + (1 | session)
## Data: d.both_completed (Number of observations: 44)
## Draws: 4 chains, each with iter = 2000; warmup = 1000; thin = 1;
## total post-warmup draws = 4000
##
## Group-Level Effects:
## ~session (Number of levels: 22)
## Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
## sd(Intercept) 0.49 0.35 0.02 1.30 1.01 1007 1838
##
## Population-Level Effects:
## Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS
## Intercept 0.75 0.33 0.11 1.43 1.00 2552
## high_debt_versionfalse -0.66 0.42 -1.47 0.18 1.00 5800
## Tail_ESS
## Intercept 2449
## high_debt_versionfalse 2938
##
## Family Specific Parameters:
## Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
## shape 1.15 2.44 0.30 4.21 1.00 1054 712
##
## Draws were sampled using sampling(NUTS). For each parameter, Bulk_ESS
## and Tail_ESS are effective sample size measures, and Rhat is the potential
## scale reduction factor on split chains (at convergence, Rhat = 1).
ranef(sonarqube_issues0)
## $session
## , , Intercept
##
## Estimate Est.Error Q2.5 Q97.5
## 6033d69a5af2c702367b3a95 -0.009987209 0.4616050 -1.0502194 0.9296742
## 6033d90a5af2c702367b3a96 -0.187905851 0.5266761 -1.4824487 0.7626271
## 6034fc165af2c702367b3a98 0.424770535 0.5508350 -0.3151748 1.7620497
## 603500725af2c702367b3a99 -0.149993543 0.5203650 -1.3902790 0.8116594
## 603f97625af2c702367b3a9d 0.434690481 0.5719282 -0.3166209 1.8396017
## 603fd5d95af2c702367b3a9e -0.033234698 0.4784248 -1.0748305 0.9508240
## 60409b7b5af2c702367b3a9f 0.222350182 0.4773440 -0.5626926 1.3970872
## 604b82b5a7718fbed181b336 0.365265498 0.5319191 -0.3830703 1.7044967
## 6050c1bf856f36729d2e5218 -0.152261556 0.5417575 -1.5339989 0.7790176
## 6050e1e7856f36729d2e5219 0.071377411 0.4491386 -0.8238129 1.0987777
## 6055fdc6856f36729d2e521b 0.129041352 0.4553217 -0.7556882 1.2030735
## 60589862856f36729d2e521f -0.296572945 0.6319051 -1.8940926 0.5919265
## 605afa3a856f36729d2e5222 -0.298424247 0.6363673 -1.9319767 0.6196124
## 605c8bc6856f36729d2e5223 -0.009098036 0.4647879 -1.0061963 0.9950005
## 605f3f2d856f36729d2e5224 0.102293653 0.4646004 -0.7892336 1.2281876
## 605f46c3856f36729d2e5225 -0.297769037 0.6253277 -1.9128928 0.5995682
## 60605337856f36729d2e5226 -0.299624015 0.6465762 -1.9051299 0.6031115
## 60609ae6856f36729d2e5228 -0.082241411 0.4572846 -1.1307307 0.8035126
## 6061ce91856f36729d2e522e -0.152201237 0.5380366 -1.4275426 0.7667674
## 6061f106856f36729d2e5231 -0.301762048 0.6090987 -1.8612133 0.5900917
## 6068ea9f856f36729d2e523e -0.021681111 0.4564977 -0.9852528 0.9455471
## 6075ab05856f36729d2e5247 0.057655955 0.4571002 -0.8745741 1.1040265
plot(sonarqube_issues0, ask = FALSE)
pp_check(sonarqube_issues0, nsamples = 200, type = "bars") + xlim(-1, 15)
We select the best performing model with one variable.
sonarqube_issues1 <- brm(
"sonarqube_issues ~ 1 + high_debt_version + (1 | session) + group",
prior = c(
prior(normal(0, 1), class = "b"),
prior(normal(1.5, 1), class = "Intercept"),
prior(exponential(1), class = "sd"),
prior(gamma(0.01, 0.01), class = "shape")
),
family = negbinomial(),
data = d.both_completed,
control = list(adapt_delta = 0.97),
file = "fits/sonarqube_issues1",
file_refit = "on_change",
seed = 20210421
)
summary(sonarqube_issues1)
## Family: negbinomial
## Links: mu = log; shape = identity
## Formula: sonarqube_issues ~ 1 + high_debt_version + (1 | session) + group
## Data: d.both_completed (Number of observations: 44)
## Draws: 4 chains, each with iter = 2000; warmup = 1000; thin = 1;
## total post-warmup draws = 4000
##
## Group-Level Effects:
## ~session (Number of levels: 22)
## Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
## sd(Intercept) 0.42 0.31 0.01 1.17 1.00 1330 2047
##
## Population-Level Effects:
## Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS
## Intercept 0.30 0.43 -0.50 1.21 1.00 3986
## high_debt_versionfalse -0.68 0.42 -1.51 0.15 1.00 5157
## groupconsultants 0.76 0.59 -0.43 1.90 1.00 4093
## groupfriends 0.08 0.59 -1.04 1.25 1.00 4374
## groupprofessionalMcontact -0.59 0.87 -2.28 1.07 1.00 5423
## groupstudents 0.84 0.52 -0.23 1.82 1.00 4260
## Tail_ESS
## Intercept 2960
## high_debt_versionfalse 3038
## groupconsultants 3118
## groupfriends 2924
## groupprofessionalMcontact 3008
## groupstudents 2896
##
## Family Specific Parameters:
## Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
## shape 1.17 1.22 0.35 3.68 1.00 1733 1744
##
## Draws were sampled using sampling(NUTS). For each parameter, Bulk_ESS
## and Tail_ESS are effective sample size measures, and Rhat is the potential
## scale reduction factor on split chains (at convergence, Rhat = 1).
ranef(sonarqube_issues1)
## $session
## , , Intercept
##
## Estimate Est.Error Q2.5 Q97.5
## 6033d69a5af2c702367b3a95 -0.08793163 0.4254560 -1.1011332 0.7540889
## 6033d90a5af2c702367b3a96 -0.20079067 0.4833733 -1.4731654 0.5983684
## 6034fc165af2c702367b3a98 0.26693379 0.4688284 -0.4313584 1.4474426
## 603500725af2c702367b3a99 -0.16850737 0.4771840 -1.3597238 0.6485513
## 603f97625af2c702367b3a9d 0.27052649 0.4643893 -0.4172192 1.4048541
## 603fd5d95af2c702367b3a9e -0.08280594 0.4339497 -1.0559824 0.7568240
## 60409b7b5af2c702367b3a9f 0.09936763 0.4179527 -0.6686725 1.1184929
## 604b82b5a7718fbed181b336 0.24805256 0.4805990 -0.4677418 1.4732993
## 6050c1bf856f36729d2e5218 -0.08210783 0.4563209 -1.1825350 0.7920952
## 6050e1e7856f36729d2e5219 0.12870440 0.4448802 -0.6728105 1.2342085
## 6055fdc6856f36729d2e521b 0.17392195 0.4605721 -0.5819869 1.3029758
## 60589862856f36729d2e521f -0.19803871 0.5303025 -1.5868327 0.6171970
## 605afa3a856f36729d2e5222 -0.15516621 0.5380156 -1.5332180 0.7607105
## 605c8bc6856f36729d2e5223 0.05054397 0.4094259 -0.7964717 0.9778253
## 605f3f2d856f36729d2e5224 0.18007677 0.4544009 -0.5891896 1.3008193
## 605f46c3856f36729d2e5225 -0.19480853 0.5256472 -1.5345202 0.6579007
## 60605337856f36729d2e5226 -0.21522552 0.5274579 -1.6080612 0.5832447
## 60609ae6856f36729d2e5228 -0.01764193 0.4307146 -1.0229100 0.8797154
## 6061ce91856f36729d2e522e -0.08517517 0.4716213 -1.2050640 0.8007288
## 6061f106856f36729d2e5231 -0.21215550 0.5317851 -1.5469798 0.6390249
## 6068ea9f856f36729d2e523e -0.07986930 0.4309126 -1.1217416 0.7361495
## 6075ab05856f36729d2e5247 -0.02276170 0.4130337 -0.9302098 0.8766495
plot(sonarqube_issues1, ask = FALSE)
pp_check(sonarqube_issues1, nsamples = 200, type = "bars") + xlim(-1, 15)
All candidate models look nice, none is significantly better than the
others, we will proceed the simplest model:
sonarqube_issues0
We will try a few different variations of the selected candidate model.
Some participants did only complete one scenario. Those has been excluded from the initial dataset to improve sampling of the models. We do however want to use all data we can and will therefore try to fit the model with the complete dataset.
sonarqube_issues0.all <- brm(
"sonarqube_issues ~ 1 + high_debt_version + (1 | session)",
prior = c(
prior(normal(0, 1), class = "b"),
prior(normal(1.5, 1), class = "Intercept"),
prior(exponential(1), class = "sd"),
prior(gamma(0.01, 0.01), class = "shape")
),
family = negbinomial(),
data = as.data.frame(d.completed),
control = list(adapt_delta = 0.97),
file = "fits/sonarqube_issues0.all",
file_refit = "on_change",
seed = 20210421
)
summary(sonarqube_issues0.all)
## Family: negbinomial
## Links: mu = log; shape = identity
## Formula: sonarqube_issues ~ 1 + high_debt_version + (1 | session)
## Data: as.data.frame(d.completed) (Number of observations: 51)
## Draws: 4 chains, each with iter = 2000; warmup = 1000; thin = 1;
## total post-warmup draws = 4000
##
## Group-Level Effects:
## ~session (Number of levels: 29)
## Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
## sd(Intercept) 0.47 0.30 0.02 1.15 1.00 778 1734
##
## Population-Level Effects:
## Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS
## Intercept 0.72 0.30 0.15 1.32 1.00 2881
## high_debt_versionfalse -0.75 0.38 -1.48 0.02 1.00 5581
## Tail_ESS
## Intercept 3030
## high_debt_versionfalse 3190
##
## Family Specific Parameters:
## Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
## shape 1.61 4.83 0.40 5.50 1.01 1073 908
##
## Draws were sampled using sampling(NUTS). For each parameter, Bulk_ESS
## and Tail_ESS are effective sample size measures, and Rhat is the potential
## scale reduction factor on split chains (at convergence, Rhat = 1).
ranef(sonarqube_issues0.all)
## $session
## , , Intercept
##
## Estimate Est.Error Q2.5 Q97.5
## 6033c6fc5af2c702367b3a93 0.161886044 0.4528918 -0.6593817 1.2613277
## 6033d69a5af2c702367b3a95 -0.008572596 0.4359737 -0.9191957 0.9595582
## 6033d90a5af2c702367b3a96 -0.176302002 0.4858606 -1.3454738 0.6798654
## 6034fc165af2c702367b3a98 0.456057900 0.5284069 -0.2406157 1.6866567
## 603500725af2c702367b3a99 -0.146520835 0.4930724 -1.3590706 0.7570271
## 603f84f15af2c702367b3a9b 0.010150381 0.4816875 -1.0294584 1.0173712
## 603f97625af2c702367b3a9d 0.478607813 0.5573800 -0.2700666 1.7566582
## 603fd5d95af2c702367b3a9e -0.031706417 0.4318265 -0.9877084 0.8982282
## 60409b7b5af2c702367b3a9f 0.238552029 0.4482937 -0.4838794 1.3117787
## 604b82b5a7718fbed181b336 0.405950848 0.5110235 -0.2914744 1.6398606
## 604f1239a7718fbed181b33f -0.088461366 0.5056780 -1.2431911 0.9170678
## 6050c1bf856f36729d2e5218 -0.149131412 0.4884993 -1.2958397 0.7479989
## 6050e1e7856f36729d2e5219 0.083641977 0.4325663 -0.7733990 1.1073544
## 6055fdc6856f36729d2e521b 0.132246488 0.4442384 -0.6740875 1.1882254
## 60579f2a856f36729d2e521e 0.012344744 0.4829837 -0.9913816 1.0484549
## 60589862856f36729d2e521f -0.294598062 0.5687279 -1.7187600 0.5470783
## 605a30a7856f36729d2e5221 -0.127009728 0.5190265 -1.3564036 0.8741563
## 605afa3a856f36729d2e5222 -0.295572385 0.5840496 -1.7493740 0.5511923
## 605c8bc6856f36729d2e5223 -0.009083656 0.4147736 -0.8888632 0.8930097
## 605f3f2d856f36729d2e5224 0.117285653 0.4363941 -0.6869239 1.1279872
## 605f46c3856f36729d2e5225 -0.280752274 0.5522386 -1.6229684 0.5522585
## 60605337856f36729d2e5226 -0.294207156 0.5729919 -1.7446652 0.5807301
## 60609ae6856f36729d2e5228 -0.078975864 0.4561461 -1.1212379 0.8426180
## 6061ce91856f36729d2e522e -0.154804150 0.4799145 -1.3298704 0.7217109
## 6061f106856f36729d2e5231 -0.299956121 0.5865593 -1.7644569 0.5906793
## 60672faa856f36729d2e523c 0.015862527 0.4773318 -0.9958862 1.0135475
## 6068ea9f856f36729d2e523e -0.003588927 0.4213246 -0.9198978 0.8690974
## 606db69d856f36729d2e5243 -0.137276766 0.5491813 -1.4784213 0.8537878
## 6075ab05856f36729d2e5247 0.049210335 0.4322615 -0.8124429 1.0306560
plot(sonarqube_issues0.all, ask = FALSE)
pp_check(sonarqube_issues0.all, nsamples = 200, type = "bars") + xlim(-1, 15)
As including all data points didn’t harm the model we will create this variant with all data points as well.
This variation includes work_experience_programming.s
predictors as it can give further insight into how experience play a
factor in the effect we try to measure. This is especially important as
our sampling shewed towards containing less experienced developer than
the population at large.
sonarqube_issues0.all.exp <- brm(
"sonarqube_issues ~ 1 + high_debt_version + (1 | session) + work_experience_programming.s",
prior = c(
prior(normal(0, 1), class = "b"),
prior(normal(1.5, 1), class = "Intercept"),
prior(exponential(1), class = "sd"),
prior(gamma(0.01, 0.01), class = "shape")
),
family = negbinomial(),
data = as.data.frame(d.completed),
control = list(adapt_delta = 0.99),
file = "fits/sonarqube_issues0.all.exp",
file_refit = "on_change",
seed = 20210421
)
summary(sonarqube_issues0.all.exp)
## Family: negbinomial
## Links: mu = log; shape = identity
## Formula: sonarqube_issues ~ 1 + high_debt_version + (1 | session) + work_experience_programming.s
## Data: as.data.frame(d.completed) (Number of observations: 51)
## Draws: 4 chains, each with iter = 2000; warmup = 1000; thin = 1;
## total post-warmup draws = 4000
##
## Group-Level Effects:
## ~session (Number of levels: 29)
## Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
## sd(Intercept) 0.48 0.32 0.02 1.16 1.01 843 1538
##
## Population-Level Effects:
## Estimate Est.Error l-95% CI u-95% CI Rhat
## Intercept 0.73 0.30 0.11 1.33 1.00
## high_debt_versionfalse -0.79 0.38 -1.51 -0.05 1.00
## work_experience_programming.s -0.22 0.24 -0.70 0.24 1.00
## Bulk_ESS Tail_ESS
## Intercept 2191 2425
## high_debt_versionfalse 4989 2928
## work_experience_programming.s 3558 2635
##
## Family Specific Parameters:
## Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
## shape 1.38 1.92 0.39 4.70 1.00 1350 1535
##
## Draws were sampled using sampling(NUTS). For each parameter, Bulk_ESS
## and Tail_ESS are effective sample size measures, and Rhat is the potential
## scale reduction factor on split chains (at convergence, Rhat = 1).
ranef(sonarqube_issues0.all.exp)
## $session
## , , Intercept
##
## Estimate Est.Error Q2.5 Q97.5
## 6033c6fc5af2c702367b3a93 0.1387331993 0.4787681 -0.7355703 1.3114903
## 6033d69a5af2c702367b3a95 -0.0409921257 0.4427061 -1.0308361 0.8693022
## 6033d90a5af2c702367b3a96 -0.2092045772 0.5291557 -1.5061767 0.6850437
## 6034fc165af2c702367b3a98 0.4418368193 0.5376180 -0.2726670 1.7603479
## 603500725af2c702367b3a99 -0.1825905260 0.5007371 -1.4387202 0.6774876
## 603f84f15af2c702367b3a9b -0.0099472862 0.4802221 -1.0274654 1.0327163
## 603f97625af2c702367b3a9d 0.4554365705 0.5386001 -0.2776636 1.7251454
## 603fd5d95af2c702367b3a9e -0.0543882795 0.4754935 -1.1213730 0.9212237
## 60409b7b5af2c702367b3a9f 0.2046696997 0.4604153 -0.5741948 1.3381165
## 604b82b5a7718fbed181b336 0.3873645878 0.5114101 -0.2937607 1.6177832
## 604f1239a7718fbed181b33f -0.0923220453 0.5079860 -1.2587866 0.9414571
## 6050c1bf856f36729d2e5218 -0.1309724158 0.4786292 -1.2250585 0.7836002
## 6050e1e7856f36729d2e5219 0.0702908691 0.4177468 -0.7557189 1.0300036
## 6055fdc6856f36729d2e521b 0.1170809140 0.4342524 -0.7042097 1.1544104
## 60579f2a856f36729d2e521e 0.0004900546 0.4796501 -0.9936349 1.0265199
## 60589862856f36729d2e521f -0.2274524477 0.5706230 -1.6737325 0.6821970
## 605a30a7856f36729d2e5221 -0.1577676275 0.5460883 -1.5122223 0.8412269
## 605afa3a856f36729d2e5222 -0.2559865072 0.5641741 -1.6809735 0.6312147
## 605c8bc6856f36729d2e5223 0.0064255915 0.4424725 -0.9303494 0.9934114
## 605f3f2d856f36729d2e5224 0.3324069958 0.5995096 -0.5812309 1.8010839
## 605f46c3856f36729d2e5225 -0.3141036317 0.5723557 -1.7846489 0.5453950
## 60605337856f36729d2e5226 -0.3165459109 0.6182186 -1.9174237 0.5772180
## 60609ae6856f36729d2e5228 -0.1186270296 0.4551608 -1.1710651 0.7511623
## 6061ce91856f36729d2e522e -0.1810360391 0.4997436 -1.4197739 0.6822217
## 6061f106856f36729d2e5231 -0.3172017244 0.6033642 -1.9077646 0.5541131
## 60672faa856f36729d2e523c 0.0043945182 0.4776146 -1.0353118 1.0087955
## 6068ea9f856f36729d2e523e -0.0088914153 0.4343647 -0.9455057 0.9523177
## 606db69d856f36729d2e5243 -0.1274824128 0.5316038 -1.4126639 0.8420767
## 6075ab05856f36729d2e5247 0.0380257755 0.4178063 -0.8240417 0.9911064
loo(
sonarqube_issues0.all,
sonarqube_issues0.all.exp
)
## Output of model 'sonarqube_issues0.all':
##
## Computed from 4000 by 51 log-likelihood matrix
##
## Estimate SE
## elpd_loo -88.1 8.2
## p_loo 7.9 1.7
## looic 176.2 16.3
## ------
## Monte Carlo SE of elpd_loo is NA.
##
## Pareto k diagnostic values:
## Count Pct. Min. n_eff
## (-Inf, 0.5] (good) 38 74.5% 802
## (0.5, 0.7] (ok) 10 19.6% 213
## (0.7, 1] (bad) 3 5.9% 81
## (1, Inf) (very bad) 0 0.0% <NA>
## See help('pareto-k-diagnostic') for details.
##
## Output of model 'sonarqube_issues0.all.exp':
##
## Computed from 4000 by 51 log-likelihood matrix
##
## Estimate SE
## elpd_loo -89.7 8.9
## p_loo 9.9 2.7
## looic 179.4 17.8
## ------
## Monte Carlo SE of elpd_loo is NA.
##
## Pareto k diagnostic values:
## Count Pct. Min. n_eff
## (-Inf, 0.5] (good) 32 62.7% 920
## (0.5, 0.7] (ok) 13 25.5% 209
## (0.7, 1] (bad) 5 9.8% 121
## (1, Inf) (very bad) 1 2.0% 10
## See help('pareto-k-diagnostic') for details.
##
## Model comparisons:
## elpd_diff se_diff
## sonarqube_issues0.all 0.0 0.0
## sonarqube_issues0.all.exp -1.6 2.7
plot(sonarqube_issues0.all.exp, ask = FALSE)
pp_check(sonarqube_issues0.all.exp, nsamples = 200, type = "bars") + xlim(-1, 15)
This means that our final model, with all data points and experience
predictors, is sonarqube_issues0.all.exp
To begin interpreting the model we look at how it’s parameters were estimated. As our research is focused on how the outcome of the model is effected we will mainly analyze the \(\beta\) parameters.
mcmc_areas(sonarqube_issues0.all.exp, pars = c("b_high_debt_versionfalse", "b_work_experience_programming.s"), prob = 0.95) + scale_y_discrete() +
scale_y_discrete(labels=c("High debt version: false", "Professional programming experience")) +
ggtitle("Beta parameters densities in sonarqube issues model", subtitle = "Shaded region marks 95% of the density. Line marks the median")
scale_programming_experience <- function(x) {
(x - mean(d.completed$work_experience_programming))/ sd(d.completed$work_experience_programming)
}
unscale_programming_experience <- function(x) {
x * sd(d.completed$work_experience_programming) + mean(d.completed$work_experience_programming)
}
post_settings <- expand.grid(
high_debt_version = c("false", "true"),
session = NA,
work_experience_programming.s = sapply(c(0, 3, 10, 25, 40), scale_programming_experience)
)
post <- posterior_predict(sonarqube_issues0.all.exp, newdata = post_settings) %>%
melt(value.name = "estimate", varnames = c("sample_number", "settings_id")) %>%
left_join(
rowid_to_column(post_settings, var= "settings_id"),
by = "settings_id"
) %>%
mutate(work_experience_programming = unscale_programming_experience(work_experience_programming.s)) %>%
select(
estimate,
high_debt_version,
work_experience_programming
)%>%
mutate(estimate = estimate)
ggplot(post, aes(x=estimate, fill = high_debt_version)) +
geom_bar(position = "dodge2") +
scale_fill_manual(
name = "Debt version",
labels = c("Low debt", "High debt"),
values = c("lightblue", "darkblue")
) +
facet_grid(rows = vars(work_experience_programming)) +
labs(
title = "SonarQube issues introduced / years of programming experience",
subtitle = "Estimated for five different experience levels",
x = "Issued introduced",
y = "Incidence rate"
) +
xlim(-1, 10) +
scale_x_continuous(limits = c(-1,7), breaks = c(0,1,2,3,4,5,6,7), labels = c("0","1","2","3","4","5","6","7")) +
scale_y_continuous(limits = NULL, breaks = sapply(c(0.1, 0.3, 0.5), function(x) x*nrow(post) / 10), labels = c("10%","30%","50%")) +
theme(legend.position = "top")
post.diff <- post %>% filter(high_debt_version == "true")
post.diff$estimate = post.diff$estimate - filter(post, high_debt_version == "false")$estimate
ggplot(post.diff, aes(x=estimate)) +
geom_boxplot(quantile_lines = TRUE, quantile_fun = hdi, vline_linetype = 2) +
xlim(-7, 7) +
facet_grid(rows = vars(work_experience_programming)) +
labs(
title = "SonarQube issues introduced difference / years of programming experience",
subtitle = "Difference as: high debt issues - low debt issues",
x = "Issues # difference"
) +
scale_y_continuous(breaks = NULL)
We can then proceed to calculate some likelihoods:
d <- post
d.high <- d %>% filter(high_debt_version == "true") %>% pull(estimate)
d.low <- d %>% filter(high_debt_version == "false") %>% pull(estimate)
x <- sum(d.high) / sum(d.low)
x
## [1] 2.156998
Given all the simulated cases we find that they introduce 116% more issues in the high debt version.
d <- post %>% filter(work_experience_programming == 10)
d.high <- d %>% filter(high_debt_version == "true") %>% pull(estimate)
d.low <- d %>% filter(high_debt_version == "false") %>% pull(estimate)
x <- sum(d.high) / sum(d.low)
x
## [1] 2.226831
Considering developers with 10 years of professional programming experience we find that they introduce 123% more issues in the high debt version.
d <- post %>% filter(work_experience_programming == 0)
d.high <- d %>% filter(high_debt_version == "true") %>% pull(estimate)
d.low <- d %>% filter(high_debt_version == "false") %>% pull(estimate)
x <- sum(d.high) / sum(d.low)
x
## [1] 2.289173
Considering developers with no of professional programming experience we find that they introduce 129% more issues in the high debt version.
d <- post %>% filter(work_experience_programming == 25)
d.high <- d %>% filter(high_debt_version == "true") %>% pull(estimate)
d.low <- d %>% filter(high_debt_version == "false") %>% pull(estimate)
x <- sum(d.high) / sum(d.low)
x
## [1] 2.061877
Considering developers with 25 years of professional programming experience we find that they introduce 106% more issues in the high debt version.