Looking at the data

There appears to be significant difference in the reuse rate between high and low debt versions.

Constructor reuse

d.completed %>%
  mutate_at("high_debt_version", 
            function(x) case_when(
              x == "false" ~ "Low debt",
              x == "true" ~ "High debt"
            )) %>%
  ggplot(aes(high_debt_version)) +
  geom_bar(aes(fill = reused_logic_constructor), position = "fill") + 
  scale_fill_manual("Legend", values = c("lightblue", "darkblue"), labels = c("Reused", "Duplicated")) +
  labs(title = "Constructor reuse") +
  xlab("Debt version") +
  ylab("Ratio of reuse")

Validation reuse

d.completed %>%
  mutate_at("high_debt_version", 
            function(x) case_when(
              x == "false" ~ "Low debt",
              x == "true" ~ "High debt"
            )) %>%
  ggplot(aes(high_debt_version)) +
  geom_bar(aes(fill = reused_logic_validation), position = "fill") + 
  scale_fill_manual("Legend", values = c("lightblue", "darkblue"), labels = c("Reused", "Duplicated")) +
  labs(title = "Validation reuse") +
  xlab("Debt version") +
  ylab("Ratio of reuse")

Initial model

For a boolean outcome, bernoulli is the most suitable family.

We include high_debt_verison as a predictor in our model as this variable represent the very effect we want to measure. We also include a varying intercept for each individual to prevent the model from learning too much from single participants with extreme measurements.

Since they may correlate, constructor and logic reuse are both included in a single multivariate model.

Selecting priors

We iterate over the model until we have sane priors, in this case a prior giving a 50/50 chance was chosen in both cases. The prior lkj(2) will mean the model is skeptical of strong correlations.

Base model with priors

reuse.with <- extendable_model(
  base_name = "reuse",
  base_formula = "mvbind(
    reused_logic_validation,
    reused_logic_constructor
  ) ~ 1 + high_debt_version + (1 |c| session)",
    base_priors = c(
    prior(normal(0, 1), class = "b"),
    prior(normal(0, 1), class = "Intercept"),
    prior(exponential(1), class = "sd", resp = reusedlogicconstructor),
    prior(exponential(1), class = "sd", resp = reusedlogicvalidation),
    prior(lkj(2), class = "L")
  ),
  family = bernoulli(),
  data = d.both_completed,
  base_control = list(adapt_delta = 0.95)
)

Default priors

prior_summary(reuse.with(only_priors= TRUE))
## Setting 'rescor' to FALSE by default for this model

Selected priors

prior_summary(reuse.with(sample_prior = "only"))
## Setting 'rescor' to FALSE by default for this model

Prior predictive check

Constructor reuse
pp_check(reuse.with(sample_prior = "only"), type = "bars", nsamples = 200, resp = "reusedlogicconstructor")

Validation reuse
pp_check(reuse.with(sample_prior = "only"), type = "bars", nsamples = 200, resp = "reusedlogicvalidation")

Beta parameter influence

We choose a beta parameter priors allowing for the beta parameter to account for 100% of the effect but that is skeptical to such strong effects from the beta parameter.

sim.size <- 1000
sim.intercept <- rnorm(sim.size, 0, 1)
sim.beta <- rnorm(sim.size, 0, 1)
sim.beta.diff <- (plogis(sim.intercept + sim.beta) / plogis(sim.intercept) * 100) - 100

data.frame(x = sim.beta.diff) %>%
  ggplot(aes(x)) +
  geom_density() +
  xlim(-150, 150) +
  labs(
    title = "Beta parameter prior influence",
    x = "Estimate with beta as % of estimate without beta",
    y = "Density"
  )

Model fit

We check the posterior distribution and can see that the model seems to have been able to fit the data well. Sampling seems to also have worked well as Rhat values are close to 1 and the sampling plots look nice.

Posterior predictive check

Constructor reuse
pp_check(reuse.with(), type = "bars", nsamples = 200, resp = "reusedlogicconstructor")

Validation reuse
pp_check(reuse.with(), type = "bars", nsamples = 200, resp = "reusedlogicvalidation")

Summary

summary(reuse.with())
## Setting 'rescor' to FALSE by default for this model
##  Family: MV(bernoulli, bernoulli) 
##   Links: mu = logit
##          mu = logit 
## Formula: reused_logic_validation ~ 1 + high_debt_version + (1 | c | session) 
##          reused_logic_constructor ~ 1 + high_debt_version + (1 | c | session) 
##    Data: as.data.frame(data) (Number of observations: 44) 
## Samples: 4 chains, each with iter = 2000; warmup = 1000; thin = 1;
##          total post-warmup samples = 4000
## 
## Group-Level Effects: 
## ~session (Number of levels: 22) 
##                                                                       Estimate
## sd(reusedlogicvalidation_Intercept)                                       1.83
## sd(reusedlogicconstructor_Intercept)                                      1.93
## cor(reusedlogicvalidation_Intercept,reusedlogicconstructor_Intercept)     0.70
##                                                                       Est.Error
## sd(reusedlogicvalidation_Intercept)                                        0.84
## sd(reusedlogicconstructor_Intercept)                                       0.86
## cor(reusedlogicvalidation_Intercept,reusedlogicconstructor_Intercept)      0.24
##                                                                       l-95% CI
## sd(reusedlogicvalidation_Intercept)                                       0.35
## sd(reusedlogicconstructor_Intercept)                                      0.45
## cor(reusedlogicvalidation_Intercept,reusedlogicconstructor_Intercept)     0.08
##                                                                       u-95% CI
## sd(reusedlogicvalidation_Intercept)                                       3.76
## sd(reusedlogicconstructor_Intercept)                                      3.90
## cor(reusedlogicvalidation_Intercept,reusedlogicconstructor_Intercept)     0.96
##                                                                       Rhat
## sd(reusedlogicvalidation_Intercept)                                   1.00
## sd(reusedlogicconstructor_Intercept)                                  1.01
## cor(reusedlogicvalidation_Intercept,reusedlogicconstructor_Intercept) 1.01
##                                                                       Bulk_ESS
## sd(reusedlogicvalidation_Intercept)                                       1083
## sd(reusedlogicconstructor_Intercept)                                      1218
## cor(reusedlogicvalidation_Intercept,reusedlogicconstructor_Intercept)     1114
##                                                                       Tail_ESS
## sd(reusedlogicvalidation_Intercept)                                        840
## sd(reusedlogicconstructor_Intercept)                                       911
## cor(reusedlogicvalidation_Intercept,reusedlogicconstructor_Intercept)     1080
## 
## Population-Level Effects: 
##                                               Estimate Est.Error l-95% CI
## reusedlogicvalidation_Intercept                  -0.04      0.58    -1.13
## reusedlogicconstructor_Intercept                 -0.27      0.57    -1.38
## reusedlogicvalidation_high_debt_versionfalse     -1.61      0.69    -2.99
## reusedlogicconstructor_high_debt_versionfalse    -1.45      0.67    -2.84
##                                               u-95% CI Rhat Bulk_ESS Tail_ESS
## reusedlogicvalidation_Intercept                   1.17 1.00     2806     2730
## reusedlogicconstructor_Intercept                  0.91 1.00     2577     2828
## reusedlogicvalidation_high_debt_versionfalse     -0.33 1.00     3823     2879
## reusedlogicconstructor_high_debt_versionfalse    -0.18 1.00     4128     2915
## 
## Samples were drawn using sample(hmc). For each parameter, Bulk_ESS
## and Tail_ESS are effective sample size measures, and Rhat is the potential
## scale reduction factor on split chains (at convergence, Rhat = 1).

Sampling plots

plot(reuse.with(), ask = FALSE)

Model predictor extenstions

# default prior for monotonic predictor
edlvl_prior <- c(
  prior(dirichlet(2), class = "simo", coef = "moeducation_level1", resp = "reusedlogicconstructor"),
  prior(dirichlet(2), class = "simo", coef = "moeducation_level1", resp = "reusedlogicvalidation")
)

We use loo to check some possible extensions on the model.

One variable

loo_result <- loo(
  # Benchmark model(s)
  reuse.with(),
  
  # New model(s)
  reuse.with("work_domain"),
  reuse.with("work_experience_programming.s"),
  reuse.with("work_experience_java.s"),
  reuse.with("education_field"),
  reuse.with("mo(education_level)", edlvl_prior),
  reuse.with("workplace_peer_review"),
  reuse.with("workplace_td_tracking"),
  reuse.with("workplace_pair_programming"),
  reuse.with("workplace_coding_standards"),
  reuse.with("scenario"),
  reuse.with("group")
)

Comparison

loo_result[2]
## $diffs
##                                                elpd_diff se_diff
## reuse.with("scenario")                          0.0       0.0   
## reuse.with("workplace_peer_review")            -0.1       0.9   
## reuse.with("mo(education_level)", edlvl_prior) -0.2       1.2   
## reuse.with()                                   -0.2       1.0   
## reuse.with("workplace_coding_standards")       -0.2       0.9   
## reuse.with("education_field")                  -0.3       0.9   
## reuse.with("work_experience_java.s")           -0.4       1.5   
## reuse.with("workplace_td_tracking")            -0.6       1.1   
## reuse.with("work_experience_programming.s")    -0.6       1.5   
## reuse.with("workplace_pair_programming")       -0.7       1.2   
## reuse.with("group")                            -0.9       1.4   
## reuse.with("work_domain")                      -1.2       1.6

Diagnostics

loo_result[1]
## $loos
## $loos$`reuse.with()`
## 
## Computed from 4000 by 44 log-likelihood matrix
## 
##          Estimate  SE
## elpd_loo    -42.9 4.2
## p_loo        17.8 2.1
## looic        85.8 8.4
## ------
## Monte Carlo SE of elpd_loo is NA.
## 
## Pareto k diagnostic values:
##                          Count Pct.    Min. n_eff
## (-Inf, 0.5]   (good)     22    50.0%   314       
##  (0.5, 0.7]   (ok)       12    27.3%   504       
##    (0.7, 1]   (bad)       9    20.5%   165       
##    (1, Inf)   (very bad)  1     2.3%   124       
## See help('pareto-k-diagnostic') for details.
## 
## $loos$`reuse.with("work_domain")`
## 
## Computed from 4000 by 44 log-likelihood matrix
## 
##          Estimate  SE
## elpd_loo    -43.8 4.4
## p_loo        18.1 2.2
## looic        87.7 8.9
## ------
## Monte Carlo SE of elpd_loo is NA.
## 
## Pareto k diagnostic values:
##                          Count Pct.    Min. n_eff
## (-Inf, 0.5]   (good)     22    50.0%   233       
##  (0.5, 0.7]   (ok)       15    34.1%   368       
##    (0.7, 1]   (bad)       7    15.9%   139       
##    (1, Inf)   (very bad)  0     0.0%   <NA>      
## See help('pareto-k-diagnostic') for details.
## 
## $loos$`reuse.with("work_experience_programming.s")`
## 
## Computed from 4000 by 44 log-likelihood matrix
## 
##          Estimate  SE
## elpd_loo    -43.3 4.3
## p_loo        18.5 2.2
## looic        86.7 8.6
## ------
## Monte Carlo SE of elpd_loo is NA.
## 
## Pareto k diagnostic values:
##                          Count Pct.    Min. n_eff
## (-Inf, 0.5]   (good)     24    54.5%   362       
##  (0.5, 0.7]   (ok)       15    34.1%   267       
##    (0.7, 1]   (bad)       5    11.4%   179       
##    (1, Inf)   (very bad)  0     0.0%   <NA>      
## See help('pareto-k-diagnostic') for details.
## 
## $loos$`reuse.with("work_experience_java.s")`
## 
## Computed from 4000 by 44 log-likelihood matrix
## 
##          Estimate  SE
## elpd_loo    -43.1 4.3
## p_loo        18.6 2.3
## looic        86.1 8.7
## ------
## Monte Carlo SE of elpd_loo is NA.
## 
## Pareto k diagnostic values:
##                          Count Pct.    Min. n_eff
## (-Inf, 0.5]   (good)     25    56.8%   466       
##  (0.5, 0.7]   (ok)       15    34.1%   319       
##    (0.7, 1]   (bad)       4     9.1%   89        
##    (1, Inf)   (very bad)  0     0.0%   <NA>      
## See help('pareto-k-diagnostic') for details.
## 
## $loos$`reuse.with("education_field")`
## 
## Computed from 4000 by 44 log-likelihood matrix
## 
##          Estimate  SE
## elpd_loo    -43.0 4.2
## p_loo        18.3 2.1
## looic        85.9 8.3
## ------
## Monte Carlo SE of elpd_loo is NA.
## 
## Pareto k diagnostic values:
##                          Count Pct.    Min. n_eff
## (-Inf, 0.5]   (good)     20    45.5%   377       
##  (0.5, 0.7]   (ok)       14    31.8%   423       
##    (0.7, 1]   (bad)      10    22.7%   143       
##    (1, Inf)   (very bad)  0     0.0%   <NA>      
## See help('pareto-k-diagnostic') for details.
## 
## $loos$`reuse.with("mo(education_level)", edlvl_prior)`
## 
## Computed from 4000 by 44 log-likelihood matrix
## 
##          Estimate  SE
## elpd_loo    -42.8 4.4
## p_loo        18.8 2.2
## looic        85.7 8.7
## ------
## Monte Carlo SE of elpd_loo is NA.
## 
## Pareto k diagnostic values:
##                          Count Pct.    Min. n_eff
## (-Inf, 0.5]   (good)     21    47.7%   466       
##  (0.5, 0.7]   (ok)       19    43.2%   278       
##    (0.7, 1]   (bad)       4     9.1%   75        
##    (1, Inf)   (very bad)  0     0.0%   <NA>      
## See help('pareto-k-diagnostic') for details.
## 
## $loos$`reuse.with("workplace_peer_review")`
## 
## Computed from 4000 by 44 log-likelihood matrix
## 
##          Estimate  SE
## elpd_loo    -42.8 4.1
## p_loo        18.0 2.0
## looic        85.7 8.3
## ------
## Monte Carlo SE of elpd_loo is NA.
## 
## Pareto k diagnostic values:
##                          Count Pct.    Min. n_eff
## (-Inf, 0.5]   (good)     26    59.1%   327       
##  (0.5, 0.7]   (ok)       16    36.4%   385       
##    (0.7, 1]   (bad)       2     4.5%   288       
##    (1, Inf)   (very bad)  0     0.0%   <NA>      
## See help('pareto-k-diagnostic') for details.
## 
## $loos$`reuse.with("workplace_td_tracking")`
## 
## Computed from 4000 by 44 log-likelihood matrix
## 
##          Estimate  SE
## elpd_loo    -43.3 4.2
## p_loo        18.4 2.2
## looic        86.5 8.4
## ------
## Monte Carlo SE of elpd_loo is NA.
## 
## Pareto k diagnostic values:
##                          Count Pct.    Min. n_eff
## (-Inf, 0.5]   (good)     25    56.8%   216       
##  (0.5, 0.7]   (ok)       13    29.5%   273       
##    (0.7, 1]   (bad)       6    13.6%   102       
##    (1, Inf)   (very bad)  0     0.0%   <NA>      
## See help('pareto-k-diagnostic') for details.
## 
## $loos$`reuse.with("workplace_pair_programming")`
## 
## Computed from 4000 by 44 log-likelihood matrix
## 
##          Estimate  SE
## elpd_loo    -43.4 4.0
## p_loo        17.0 1.9
## looic        86.8 8.0
## ------
## Monte Carlo SE of elpd_loo is NA.
## 
## Pareto k diagnostic values:
##                          Count Pct.    Min. n_eff
## (-Inf, 0.5]   (good)     24    54.5%   266       
##  (0.5, 0.7]   (ok)       11    25.0%   229       
##    (0.7, 1]   (bad)       9    20.5%   169       
##    (1, Inf)   (very bad)  0     0.0%   <NA>      
## See help('pareto-k-diagnostic') for details.
## 
## $loos$`reuse.with("workplace_coding_standards")`
## 
## Computed from 4000 by 44 log-likelihood matrix
## 
##          Estimate  SE
## elpd_loo    -42.9 4.2
## p_loo        18.3 2.1
## looic        85.9 8.4
## ------
## Monte Carlo SE of elpd_loo is NA.
## 
## Pareto k diagnostic values:
##                          Count Pct.    Min. n_eff
## (-Inf, 0.5]   (good)     23    52.3%   408       
##  (0.5, 0.7]   (ok)       14    31.8%   325       
##    (0.7, 1]   (bad)       7    15.9%   176       
##    (1, Inf)   (very bad)  0     0.0%   <NA>      
## See help('pareto-k-diagnostic') for details.
## 
## $loos$`reuse.with("scenario")`
## 
## Computed from 4000 by 44 log-likelihood matrix
## 
##          Estimate  SE
## elpd_loo    -42.7 4.1
## p_loo        18.1 2.1
## looic        85.4 8.2
## ------
## Monte Carlo SE of elpd_loo is NA.
## 
## Pareto k diagnostic values:
##                          Count Pct.    Min. n_eff
## (-Inf, 0.5]   (good)     24    54.5%   417       
##  (0.5, 0.7]   (ok)       15    34.1%   174       
##    (0.7, 1]   (bad)       5    11.4%   118       
##    (1, Inf)   (very bad)  0     0.0%   <NA>      
## See help('pareto-k-diagnostic') for details.
## 
## $loos$`reuse.with("group")`
## 
## Computed from 4000 by 44 log-likelihood matrix
## 
##          Estimate  SE
## elpd_loo    -43.6 4.3
## p_loo        18.1 2.1
## looic        87.3 8.5
## ------
## Monte Carlo SE of elpd_loo is NA.
## 
## Pareto k diagnostic values:
##                          Count Pct.    Min. n_eff
## (-Inf, 0.5]   (good)     21    47.7%   346       
##  (0.5, 0.7]   (ok)       17    38.6%   260       
##    (0.7, 1]   (bad)       6    13.6%   74        
##    (1, Inf)   (very bad)  0     0.0%   <NA>      
## See help('pareto-k-diagnostic') for details.

Two variables

loo_result <- loo(
  # Benchmark model(s)
  reuse.with(),
  reuse.with("scenario"),
  reuse.with("workplace_peer_review"),
  reuse.with("mo(education_level)", edlvl_prior),
  reuse.with("workplace_coding_standards"),
  
  # New model(s)
  reuse.with(c("scenario", "workplace_peer_review")),
  reuse.with(c("scenario", "mo(education_level)"), edlvl_prior),
  reuse.with(c("scenario", "workplace_coding_standards")),
  
  reuse.with(c("workplace_peer_review", "mo(education_level)"), edlvl_prior),
  reuse.with(c("workplace_peer_review", "workplace_coding_standards")),
  
  reuse.with(c("workplace_coding_standards", "mo(education_level)"), edlvl_prior)
)

Comparison

loo_result[2]
## $diffs
##                                                                                     elpd_diff
## reuse.with("scenario")                                                               0.0     
## reuse.with("workplace_peer_review")                                                 -0.1     
## reuse.with("mo(education_level)", edlvl_prior)                                      -0.2     
## reuse.with(c("scenario", "workplace_coding_standards"))                             -0.2     
## reuse.with()                                                                        -0.2     
## reuse.with("workplace_coding_standards")                                            -0.2     
## reuse.with(c("scenario", "mo(education_level)"), edlvl_prior)                       -0.4     
## reuse.with(c("workplace_coding_standards", "mo(education_level)"),     edlvl_prior) -0.6     
## reuse.with(c("workplace_peer_review", "mo(education_level)"),     edlvl_prior)      -0.9     
## reuse.with(c("workplace_peer_review", "workplace_coding_standards"))                -1.0     
## reuse.with(c("scenario", "workplace_peer_review"))                                  -1.1     
##                                                                                     se_diff
## reuse.with("scenario")                                                               0.0   
## reuse.with("workplace_peer_review")                                                  0.9   
## reuse.with("mo(education_level)", edlvl_prior)                                       1.2   
## reuse.with(c("scenario", "workplace_coding_standards"))                              0.5   
## reuse.with()                                                                         1.0   
## reuse.with("workplace_coding_standards")                                             0.9   
## reuse.with(c("scenario", "mo(education_level)"), edlvl_prior)                        0.7   
## reuse.with(c("workplace_coding_standards", "mo(education_level)"),     edlvl_prior)  1.1   
## reuse.with(c("workplace_peer_review", "mo(education_level)"),     edlvl_prior)       1.3   
## reuse.with(c("workplace_peer_review", "workplace_coding_standards"))                 1.1   
## reuse.with(c("scenario", "workplace_peer_review"))                                   0.6

Diagnostics

loo_result[1]
## $loos
## $loos$`reuse.with()`
## 
## Computed from 4000 by 44 log-likelihood matrix
## 
##          Estimate  SE
## elpd_loo    -42.9 4.2
## p_loo        17.8 2.1
## looic        85.8 8.4
## ------
## Monte Carlo SE of elpd_loo is NA.
## 
## Pareto k diagnostic values:
##                          Count Pct.    Min. n_eff
## (-Inf, 0.5]   (good)     22    50.0%   314       
##  (0.5, 0.7]   (ok)       12    27.3%   504       
##    (0.7, 1]   (bad)       9    20.5%   165       
##    (1, Inf)   (very bad)  1     2.3%   124       
## See help('pareto-k-diagnostic') for details.
## 
## $loos$`reuse.with("scenario")`
## 
## Computed from 4000 by 44 log-likelihood matrix
## 
##          Estimate  SE
## elpd_loo    -42.7 4.1
## p_loo        18.1 2.1
## looic        85.4 8.2
## ------
## Monte Carlo SE of elpd_loo is NA.
## 
## Pareto k diagnostic values:
##                          Count Pct.    Min. n_eff
## (-Inf, 0.5]   (good)     24    54.5%   417       
##  (0.5, 0.7]   (ok)       15    34.1%   174       
##    (0.7, 1]   (bad)       5    11.4%   118       
##    (1, Inf)   (very bad)  0     0.0%   <NA>      
## See help('pareto-k-diagnostic') for details.
## 
## $loos$`reuse.with("workplace_peer_review")`
## 
## Computed from 4000 by 44 log-likelihood matrix
## 
##          Estimate  SE
## elpd_loo    -42.8 4.1
## p_loo        18.0 2.0
## looic        85.7 8.3
## ------
## Monte Carlo SE of elpd_loo is NA.
## 
## Pareto k diagnostic values:
##                          Count Pct.    Min. n_eff
## (-Inf, 0.5]   (good)     26    59.1%   327       
##  (0.5, 0.7]   (ok)       16    36.4%   385       
##    (0.7, 1]   (bad)       2     4.5%   288       
##    (1, Inf)   (very bad)  0     0.0%   <NA>      
## See help('pareto-k-diagnostic') for details.
## 
## $loos$`reuse.with("mo(education_level)", edlvl_prior)`
## 
## Computed from 4000 by 44 log-likelihood matrix
## 
##          Estimate  SE
## elpd_loo    -42.8 4.4
## p_loo        18.8 2.2
## looic        85.7 8.7
## ------
## Monte Carlo SE of elpd_loo is NA.
## 
## Pareto k diagnostic values:
##                          Count Pct.    Min. n_eff
## (-Inf, 0.5]   (good)     21    47.7%   466       
##  (0.5, 0.7]   (ok)       19    43.2%   278       
##    (0.7, 1]   (bad)       4     9.1%   75        
##    (1, Inf)   (very bad)  0     0.0%   <NA>      
## See help('pareto-k-diagnostic') for details.
## 
## $loos$`reuse.with("workplace_coding_standards")`
## 
## Computed from 4000 by 44 log-likelihood matrix
## 
##          Estimate  SE
## elpd_loo    -42.9 4.2
## p_loo        18.3 2.1
## looic        85.9 8.4
## ------
## Monte Carlo SE of elpd_loo is NA.
## 
## Pareto k diagnostic values:
##                          Count Pct.    Min. n_eff
## (-Inf, 0.5]   (good)     23    52.3%   408       
##  (0.5, 0.7]   (ok)       14    31.8%   325       
##    (0.7, 1]   (bad)       7    15.9%   176       
##    (1, Inf)   (very bad)  0     0.0%   <NA>      
## See help('pareto-k-diagnostic') for details.
## 
## $loos$`reuse.with(c("scenario", "workplace_peer_review"))`
## 
## Computed from 4000 by 44 log-likelihood matrix
## 
##          Estimate  SE
## elpd_loo    -43.8 4.3
## p_loo        19.5 2.2
## looic        87.7 8.6
## ------
## Monte Carlo SE of elpd_loo is NA.
## 
## Pareto k diagnostic values:
##                          Count Pct.    Min. n_eff
## (-Inf, 0.5]   (good)     19    43.2%   904       
##  (0.5, 0.7]   (ok)       18    40.9%   236       
##    (0.7, 1]   (bad)       7    15.9%   78        
##    (1, Inf)   (very bad)  0     0.0%   <NA>      
## See help('pareto-k-diagnostic') for details.
## 
## $loos$`reuse.with(c("scenario", "mo(education_level)"), edlvl_prior)`
## 
## Computed from 4000 by 44 log-likelihood matrix
## 
##          Estimate  SE
## elpd_loo    -43.1 4.5
## p_loo        19.5 2.3
## looic        86.1 8.9
## ------
## Monte Carlo SE of elpd_loo is NA.
## 
## Pareto k diagnostic values:
##                          Count Pct.    Min. n_eff
## (-Inf, 0.5]   (good)     24    54.5%   461       
##  (0.5, 0.7]   (ok)       10    22.7%   268       
##    (0.7, 1]   (bad)      10    22.7%   166       
##    (1, Inf)   (very bad)  0     0.0%   <NA>      
## See help('pareto-k-diagnostic') for details.
## 
## $loos$`reuse.with(c("scenario", "workplace_coding_standards"))`
## 
## Computed from 4000 by 44 log-likelihood matrix
## 
##          Estimate  SE
## elpd_loo    -42.9 4.3
## p_loo        18.7 2.1
## looic        85.8 8.5
## ------
## Monte Carlo SE of elpd_loo is NA.
## 
## Pareto k diagnostic values:
##                          Count Pct.    Min. n_eff
## (-Inf, 0.5]   (good)     21    47.7%   1043      
##  (0.5, 0.7]   (ok)       17    38.6%   311       
##    (0.7, 1]   (bad)       6    13.6%   193       
##    (1, Inf)   (very bad)  0     0.0%   <NA>      
## See help('pareto-k-diagnostic') for details.
## 
## $loos$`reuse.with(c("workplace_peer_review", "mo(education_level)"),     edlvl_prior)`
## 
## Computed from 4000 by 44 log-likelihood matrix
## 
##          Estimate  SE
## elpd_loo    -43.6 4.6
## p_loo        19.9 2.4
## looic        87.1 9.2
## ------
## Monte Carlo SE of elpd_loo is NA.
## 
## Pareto k diagnostic values:
##                          Count Pct.    Min. n_eff
## (-Inf, 0.5]   (good)     23    52.3%   1216      
##  (0.5, 0.7]   (ok)       12    27.3%   232       
##    (0.7, 1]   (bad)       9    20.5%   40        
##    (1, Inf)   (very bad)  0     0.0%   <NA>      
## See help('pareto-k-diagnostic') for details.
## 
## $loos$`reuse.with(c("workplace_peer_review", "workplace_coding_standards"))`
## 
## Computed from 4000 by 44 log-likelihood matrix
## 
##          Estimate  SE
## elpd_loo    -43.7 4.4
## p_loo        19.2 2.3
## looic        87.3 8.8
## ------
## Monte Carlo SE of elpd_loo is NA.
## 
## Pareto k diagnostic values:
##                          Count Pct.    Min. n_eff
## (-Inf, 0.5]   (good)     22    50.0%   397       
##  (0.5, 0.7]   (ok)       16    36.4%   206       
##    (0.7, 1]   (bad)       5    11.4%   96        
##    (1, Inf)   (very bad)  1     2.3%   45        
## See help('pareto-k-diagnostic') for details.
## 
## $loos$`reuse.with(c("workplace_coding_standards", "mo(education_level)"),     edlvl_prior)`
## 
## Computed from 4000 by 44 log-likelihood matrix
## 
##          Estimate  SE
## elpd_loo    -43.3 4.4
## p_loo        19.5 2.3
## looic        86.5 8.9
## ------
## Monte Carlo SE of elpd_loo is NA.
## 
## Pareto k diagnostic values:
##                          Count Pct.    Min. n_eff
## (-Inf, 0.5]   (good)     17    38.6%   521       
##  (0.5, 0.7]   (ok)       17    38.6%   245       
##    (0.7, 1]   (bad)      10    22.7%   148       
##    (1, Inf)   (very bad)  0     0.0%   <NA>      
## See help('pareto-k-diagnostic') for details.

Three variables

loo_result <- loo(
  # Benchmark model(s)
  reuse.with(),
  reuse.with("scenario"),
  reuse.with("workplace_peer_review"),
  reuse.with("mo(education_level)", edlvl_prior),
  reuse.with("workplace_coding_standards"),
  reuse.with(c("scenario", "workplace_coding_standards")),
  
  # New model(s)
  reuse.with(c("scenario", "workplace_coding_standards", "mo(education_level)"), edlvl_prior)
)

Comparison

loo_result[2]
## $diffs
##                                                                                                 elpd_diff
## reuse.with("scenario")                                                                           0.0     
## reuse.with("workplace_peer_review")                                                             -0.1     
## reuse.with("mo(education_level)", edlvl_prior)                                                  -0.2     
## reuse.with(c("scenario", "workplace_coding_standards"))                                         -0.2     
## reuse.with()                                                                                    -0.2     
## reuse.with("workplace_coding_standards")                                                        -0.2     
## reuse.with(c("scenario", "workplace_coding_standards", "mo(education_level)"),     edlvl_prior) -0.5     
##                                                                                                 se_diff
## reuse.with("scenario")                                                                           0.0   
## reuse.with("workplace_peer_review")                                                              0.9   
## reuse.with("mo(education_level)", edlvl_prior)                                                   1.2   
## reuse.with(c("scenario", "workplace_coding_standards"))                                          0.5   
## reuse.with()                                                                                     1.0   
## reuse.with("workplace_coding_standards")                                                         0.9   
## reuse.with(c("scenario", "workplace_coding_standards", "mo(education_level)"),     edlvl_prior)  0.8

Diagnostics

loo_result[1]
## $loos
## $loos$`reuse.with()`
## 
## Computed from 4000 by 44 log-likelihood matrix
## 
##          Estimate  SE
## elpd_loo    -42.9 4.2
## p_loo        17.8 2.1
## looic        85.8 8.4
## ------
## Monte Carlo SE of elpd_loo is NA.
## 
## Pareto k diagnostic values:
##                          Count Pct.    Min. n_eff
## (-Inf, 0.5]   (good)     22    50.0%   314       
##  (0.5, 0.7]   (ok)       12    27.3%   504       
##    (0.7, 1]   (bad)       9    20.5%   165       
##    (1, Inf)   (very bad)  1     2.3%   124       
## See help('pareto-k-diagnostic') for details.
## 
## $loos$`reuse.with("scenario")`
## 
## Computed from 4000 by 44 log-likelihood matrix
## 
##          Estimate  SE
## elpd_loo    -42.7 4.1
## p_loo        18.1 2.1
## looic        85.4 8.2
## ------
## Monte Carlo SE of elpd_loo is NA.
## 
## Pareto k diagnostic values:
##                          Count Pct.    Min. n_eff
## (-Inf, 0.5]   (good)     24    54.5%   417       
##  (0.5, 0.7]   (ok)       15    34.1%   174       
##    (0.7, 1]   (bad)       5    11.4%   118       
##    (1, Inf)   (very bad)  0     0.0%   <NA>      
## See help('pareto-k-diagnostic') for details.
## 
## $loos$`reuse.with("workplace_peer_review")`
## 
## Computed from 4000 by 44 log-likelihood matrix
## 
##          Estimate  SE
## elpd_loo    -42.8 4.1
## p_loo        18.0 2.0
## looic        85.7 8.3
## ------
## Monte Carlo SE of elpd_loo is NA.
## 
## Pareto k diagnostic values:
##                          Count Pct.    Min. n_eff
## (-Inf, 0.5]   (good)     26    59.1%   327       
##  (0.5, 0.7]   (ok)       16    36.4%   385       
##    (0.7, 1]   (bad)       2     4.5%   288       
##    (1, Inf)   (very bad)  0     0.0%   <NA>      
## See help('pareto-k-diagnostic') for details.
## 
## $loos$`reuse.with("mo(education_level)", edlvl_prior)`
## 
## Computed from 4000 by 44 log-likelihood matrix
## 
##          Estimate  SE
## elpd_loo    -42.8 4.4
## p_loo        18.8 2.2
## looic        85.7 8.7
## ------
## Monte Carlo SE of elpd_loo is NA.
## 
## Pareto k diagnostic values:
##                          Count Pct.    Min. n_eff
## (-Inf, 0.5]   (good)     21    47.7%   466       
##  (0.5, 0.7]   (ok)       19    43.2%   278       
##    (0.7, 1]   (bad)       4     9.1%   75        
##    (1, Inf)   (very bad)  0     0.0%   <NA>      
## See help('pareto-k-diagnostic') for details.
## 
## $loos$`reuse.with("workplace_coding_standards")`
## 
## Computed from 4000 by 44 log-likelihood matrix
## 
##          Estimate  SE
## elpd_loo    -42.9 4.2
## p_loo        18.3 2.1
## looic        85.9 8.4
## ------
## Monte Carlo SE of elpd_loo is NA.
## 
## Pareto k diagnostic values:
##                          Count Pct.    Min. n_eff
## (-Inf, 0.5]   (good)     23    52.3%   408       
##  (0.5, 0.7]   (ok)       14    31.8%   325       
##    (0.7, 1]   (bad)       7    15.9%   176       
##    (1, Inf)   (very bad)  0     0.0%   <NA>      
## See help('pareto-k-diagnostic') for details.
## 
## $loos$`reuse.with(c("scenario", "workplace_coding_standards"))`
## 
## Computed from 4000 by 44 log-likelihood matrix
## 
##          Estimate  SE
## elpd_loo    -42.9 4.3
## p_loo        18.7 2.1
## looic        85.8 8.5
## ------
## Monte Carlo SE of elpd_loo is NA.
## 
## Pareto k diagnostic values:
##                          Count Pct.    Min. n_eff
## (-Inf, 0.5]   (good)     21    47.7%   1043      
##  (0.5, 0.7]   (ok)       17    38.6%   311       
##    (0.7, 1]   (bad)       6    13.6%   193       
##    (1, Inf)   (very bad)  0     0.0%   <NA>      
## See help('pareto-k-diagnostic') for details.
## 
## $loos$`reuse.with(c("scenario", "workplace_coding_standards", "mo(education_level)"),     edlvl_prior)`
## 
## Computed from 4000 by 44 log-likelihood matrix
## 
##          Estimate  SE
## elpd_loo    -43.2 4.5
## p_loo        20.1 2.4
## looic        86.3 9.0
## ------
## Monte Carlo SE of elpd_loo is NA.
## 
## Pareto k diagnostic values:
##                          Count Pct.    Min. n_eff
## (-Inf, 0.5]   (good)     17    38.6%   460       
##  (0.5, 0.7]   (ok)       18    40.9%   268       
##    (0.7, 1]   (bad)       9    20.5%   103       
##    (1, Inf)   (very bad)  0     0.0%   <NA>      
## See help('pareto-k-diagnostic') for details.

Candidate models

We pick some of our top performing models as candidates and inspect them closer.

The candidate models are named and listed in order of complexity.

Reuse0

reuse0 <- brm(
  "mvbind(
    reused_logic_validation,
    reused_logic_constructor
  ) ~ 1 + high_debt_version + (1 |c| session)",
  prior = c(
    prior(normal(0, 1), class = "b"),
    prior(normal(0, 1), class = "Intercept"),
    prior(exponential(1), class = "sd", resp = reusedlogicconstructor),
    prior(exponential(1), class = "sd", resp = reusedlogicvalidation),
    prior(lkj(2), class = "L")
  ),
  family = bernoulli(),
  data = as.data.frame(d.both_completed),
  control = list(adapt_delta = 0.95),
  file = "fits/reuse0",
  file_refit = "on_change",
  seed = 20210421
)

Summary

summary(reuse0)
##  Family: MV(bernoulli, bernoulli) 
##   Links: mu = logit
##          mu = logit 
## Formula: reused_logic_validation ~ 1 + high_debt_version + (1 | c | session) 
##          reused_logic_constructor ~ 1 + high_debt_version + (1 | c | session) 
##    Data: as.data.frame(d.both_completed) (Number of observations: 44) 
## Samples: 4 chains, each with iter = 2000; warmup = 1000; thin = 1;
##          total post-warmup samples = 4000
## 
## Group-Level Effects: 
## ~session (Number of levels: 22) 
##                                                                       Estimate
## sd(reusedlogicvalidation_Intercept)                                       1.83
## sd(reusedlogicconstructor_Intercept)                                      1.93
## cor(reusedlogicvalidation_Intercept,reusedlogicconstructor_Intercept)     0.70
##                                                                       Est.Error
## sd(reusedlogicvalidation_Intercept)                                        0.84
## sd(reusedlogicconstructor_Intercept)                                       0.86
## cor(reusedlogicvalidation_Intercept,reusedlogicconstructor_Intercept)      0.24
##                                                                       l-95% CI
## sd(reusedlogicvalidation_Intercept)                                       0.35
## sd(reusedlogicconstructor_Intercept)                                      0.45
## cor(reusedlogicvalidation_Intercept,reusedlogicconstructor_Intercept)     0.08
##                                                                       u-95% CI
## sd(reusedlogicvalidation_Intercept)                                       3.76
## sd(reusedlogicconstructor_Intercept)                                      3.90
## cor(reusedlogicvalidation_Intercept,reusedlogicconstructor_Intercept)     0.96
##                                                                       Rhat
## sd(reusedlogicvalidation_Intercept)                                   1.00
## sd(reusedlogicconstructor_Intercept)                                  1.01
## cor(reusedlogicvalidation_Intercept,reusedlogicconstructor_Intercept) 1.01
##                                                                       Bulk_ESS
## sd(reusedlogicvalidation_Intercept)                                       1083
## sd(reusedlogicconstructor_Intercept)                                      1218
## cor(reusedlogicvalidation_Intercept,reusedlogicconstructor_Intercept)     1114
##                                                                       Tail_ESS
## sd(reusedlogicvalidation_Intercept)                                        840
## sd(reusedlogicconstructor_Intercept)                                       911
## cor(reusedlogicvalidation_Intercept,reusedlogicconstructor_Intercept)     1080
## 
## Population-Level Effects: 
##                                               Estimate Est.Error l-95% CI
## reusedlogicvalidation_Intercept                  -0.04      0.58    -1.13
## reusedlogicconstructor_Intercept                 -0.27      0.57    -1.38
## reusedlogicvalidation_high_debt_versionfalse     -1.61      0.69    -2.99
## reusedlogicconstructor_high_debt_versionfalse    -1.45      0.67    -2.84
##                                               u-95% CI Rhat Bulk_ESS Tail_ESS
## reusedlogicvalidation_Intercept                   1.17 1.00     2806     2730
## reusedlogicconstructor_Intercept                  0.91 1.00     2577     2828
## reusedlogicvalidation_high_debt_versionfalse     -0.33 1.00     3823     2879
## reusedlogicconstructor_high_debt_versionfalse    -0.18 1.00     4128     2915
## 
## Samples were drawn using sample(hmc). For each parameter, Bulk_ESS
## and Tail_ESS are effective sample size measures, and Rhat is the potential
## scale reduction factor on split chains (at convergence, Rhat = 1).

Random effects

ranef(reuse0)
## $session
## , , reusedlogicvalidation_Intercept
## 
##                             Estimate Est.Error        Q2.5     Q97.5
## 6033d69a5af2c702367b3a95 -1.64448922  1.679742 -5.65318850 0.8033159
## 6033d90a5af2c702367b3a96 -1.62690433  1.684030 -5.62943675 0.9040157
## 6034fc165af2c702367b3a98 -0.76279173  1.369683 -4.00549900 1.4654105
## 603500725af2c702367b3a99  2.51846792  1.769619 -0.10701285 6.6478783
## 603f97625af2c702367b3a9d  2.49516741  1.773428 -0.11299845 6.6805038
## 603fd5d95af2c702367b3a9e -1.65402718  1.721964 -5.84767875 0.8621369
## 60409b7b5af2c702367b3a9f  2.52822202  1.780113 -0.08867287 6.6143128
## 604b82b5a7718fbed181b336 -0.06599373  1.188023 -2.49329925 2.3903343
## 6050c1bf856f36729d2e5218  0.54433520  1.163719 -1.72581600 3.0152225
## 6050e1e7856f36729d2e5219 -1.57684245  1.643184 -5.41257000 0.8466732
## 6055fdc6856f36729d2e521b -0.05666709  1.187775 -2.52282150 2.3371613
## 60589862856f36729d2e521f  0.55057906  1.157042 -1.75923000 3.0030273
## 605afa3a856f36729d2e5222  0.54613712  1.153260 -1.68495875 2.9554012
## 605c8bc6856f36729d2e5223  0.53589562  1.149106 -1.61921350 2.8826883
## 605f3f2d856f36729d2e5224  0.54857289  1.107645 -1.54963100 2.8087348
## 605f46c3856f36729d2e5225 -1.62919069  1.665907 -5.61894275 0.8403877
## 60605337856f36729d2e5226 -1.62417926  1.700964 -5.74833775 0.8687494
## 60609ae6856f36729d2e5228  0.54418186  1.125913 -1.60584325 2.8864072
## 6061ce91856f36729d2e522e -1.63494073  1.666962 -5.67413275 0.7876265
## 6061f106856f36729d2e5231 -1.60880513  1.662427 -5.56378300 0.8064458
## 6068ea9f856f36729d2e523e -1.61402626  1.681033 -5.61390150 0.8417808
## 6075ab05856f36729d2e5247 -1.60730732  1.628466 -5.48658950 0.8400181
## 
## , , reusedlogicconstructor_Intercept
## 
##                              Estimate Est.Error        Q2.5     Q97.5
## 6033d69a5af2c702367b3a95 -1.711648279  1.756048 -5.93215200 0.8517546
## 6033d90a5af2c702367b3a96 -1.737330752  1.781254 -6.17996175 0.8482377
## 6034fc165af2c702367b3a98 -0.005344317  1.232504 -2.47904525 2.5154483
## 603500725af2c702367b3a99  2.691630801  1.812021 -0.01676437 6.9697668
## 603f97625af2c702367b3a9d  2.679688515  1.815994 -0.05252160 6.8975563
## 603fd5d95af2c702367b3a9e -1.684212161  1.724919 -5.92130450 0.8809865
## 60409b7b5af2c702367b3a9f  2.707746357  1.883769 -0.03635867 6.9711278
## 604b82b5a7718fbed181b336 -0.804791874  1.425887 -4.05708950 1.5209530
## 6050c1bf856f36729d2e5218  0.589916480  1.223256 -1.80534450 3.2209473
## 6050e1e7856f36729d2e5219 -1.633898531  1.722661 -5.56118700 0.9225588
## 6055fdc6856f36729d2e521b -0.808681736  1.435134 -4.24051725 1.5489745
## 60589862856f36729d2e521f  0.618890238  1.218197 -1.76410250 3.1219453
## 605afa3a856f36729d2e5222  0.601106509  1.146102 -1.65621200 2.9979560
## 605c8bc6856f36729d2e5223  0.626501807  1.187399 -1.59110625 3.1200883
## 605f3f2d856f36729d2e5224  0.617193545  1.162436 -1.65937700 3.1116105
## 605f46c3856f36729d2e5225 -1.697137433  1.735656 -5.98194350 0.8534365
## 60605337856f36729d2e5226 -1.685561549  1.770842 -5.94581475 0.8886053
## 60609ae6856f36729d2e5228  0.598373442  1.190561 -1.74908675 3.0263265
## 6061ce91856f36729d2e522e -1.686388960  1.737622 -5.97792575 0.8796309
## 6061f106856f36729d2e5231 -1.687842757  1.786531 -6.01350625 0.9543536
## 6068ea9f856f36729d2e523e -1.700962104  1.731385 -5.97584575 0.9611543
## 6075ab05856f36729d2e5247 -1.660715559  1.724302 -5.85075025 0.8891074

Sampling plots

plot(reuse0, ask = FALSE)

Posterior predictive check

Constructor reuse
pp_check(reuse0, type = "bars", nsamples = 200, resp = "reusedlogicconstructor")

Validation reuse
pp_check(reuse0, type = "bars", nsamples = 200, resp = "reusedlogicvalidation")

Reuse1

reuse1 <- brm(
  "mvbind(
    reused_logic_validation,
    reused_logic_constructor
  ) ~ 1 + high_debt_version + scenario + (1 |c| session)",
  prior = c(
    prior(normal(0, 1), class = "b"),
    prior(normal(0, 1), class = "Intercept"),
    prior(exponential(1), class = "sd", resp = reusedlogicconstructor),
    prior(exponential(1), class = "sd", resp = reusedlogicvalidation),
    prior(lkj(2), class = "L")
  ),
  family = bernoulli(),
  data = as.data.frame(d.both_completed),
  control = list(adapt_delta = 0.95),
  file = "fits/reuse1",
  file_refit = "on_change",
  seed = 20210421
)

Summary

summary(reuse1)
##  Family: MV(bernoulli, bernoulli) 
##   Links: mu = logit
##          mu = logit 
## Formula: reused_logic_validation ~ 1 + high_debt_version + scenario + (1 | c | session) 
##          reused_logic_constructor ~ 1 + high_debt_version + scenario + (1 | c | session) 
##    Data: as.data.frame(d.both_completed) (Number of observations: 44) 
## Samples: 4 chains, each with iter = 2000; warmup = 1000; thin = 1;
##          total post-warmup samples = 4000
## 
## Group-Level Effects: 
## ~session (Number of levels: 22) 
##                                                                       Estimate
## sd(reusedlogicvalidation_Intercept)                                       1.89
## sd(reusedlogicconstructor_Intercept)                                      2.01
## cor(reusedlogicvalidation_Intercept,reusedlogicconstructor_Intercept)     0.69
##                                                                       Est.Error
## sd(reusedlogicvalidation_Intercept)                                        0.86
## sd(reusedlogicconstructor_Intercept)                                       0.89
## cor(reusedlogicvalidation_Intercept,reusedlogicconstructor_Intercept)      0.25
##                                                                       l-95% CI
## sd(reusedlogicvalidation_Intercept)                                       0.37
## sd(reusedlogicconstructor_Intercept)                                      0.46
## cor(reusedlogicvalidation_Intercept,reusedlogicconstructor_Intercept)     0.04
##                                                                       u-95% CI
## sd(reusedlogicvalidation_Intercept)                                       3.84
## sd(reusedlogicconstructor_Intercept)                                      4.03
## cor(reusedlogicvalidation_Intercept,reusedlogicconstructor_Intercept)     0.97
##                                                                       Rhat
## sd(reusedlogicvalidation_Intercept)                                   1.00
## sd(reusedlogicconstructor_Intercept)                                  1.00
## cor(reusedlogicvalidation_Intercept,reusedlogicconstructor_Intercept) 1.00
##                                                                       Bulk_ESS
## sd(reusedlogicvalidation_Intercept)                                       1141
## sd(reusedlogicconstructor_Intercept)                                      1422
## cor(reusedlogicvalidation_Intercept,reusedlogicconstructor_Intercept)     1248
##                                                                       Tail_ESS
## sd(reusedlogicvalidation_Intercept)                                        658
## sd(reusedlogicconstructor_Intercept)                                       917
## cor(reusedlogicvalidation_Intercept,reusedlogicconstructor_Intercept)     1066
## 
## Population-Level Effects: 
##                                               Estimate Est.Error l-95% CI
## reusedlogicvalidation_Intercept                  -0.26      0.67    -1.56
## reusedlogicconstructor_Intercept                 -0.39      0.68    -1.74
## reusedlogicvalidation_high_debt_versionfalse     -1.65      0.68    -3.03
## reusedlogicvalidation_scenariotickets             0.44      0.66    -0.84
## reusedlogicconstructor_high_debt_versionfalse    -1.48      0.69    -2.84
## reusedlogicconstructor_scenariotickets            0.26      0.66    -1.04
##                                               u-95% CI Rhat Bulk_ESS Tail_ESS
## reusedlogicvalidation_Intercept                   1.11 1.00     3706     2811
## reusedlogicconstructor_Intercept                  0.97 1.00     3566     3134
## reusedlogicvalidation_high_debt_versionfalse     -0.36 1.00     4183     2777
## reusedlogicvalidation_scenariotickets             1.73 1.00     6202     2864
## reusedlogicconstructor_high_debt_versionfalse    -0.16 1.00     4877     2512
## reusedlogicconstructor_scenariotickets            1.57 1.00     5575     2896
## 
## Samples were drawn using sample(hmc). For each parameter, Bulk_ESS
## and Tail_ESS are effective sample size measures, and Rhat is the potential
## scale reduction factor on split chains (at convergence, Rhat = 1).

Random effects

ranef(reuse1)
## $session
## , , reusedlogicvalidation_Intercept
## 
##                              Estimate Est.Error        Q2.5     Q97.5
## 6033d69a5af2c702367b3a95 -1.695162823  1.676853 -5.62486825 0.8128174
## 6033d90a5af2c702367b3a96 -1.660967495  1.705909 -5.57747550 0.8185200
## 6034fc165af2c702367b3a98 -0.795359147  1.410638 -4.29177825 1.4300918
## 603500725af2c702367b3a99  2.657000081  1.830089 -0.01881306 7.0778410
## 603f97625af2c702367b3a9d  2.580973605  1.796790 -0.04921098 6.8277490
## 603fd5d95af2c702367b3a9e -1.734124510  1.738483 -6.01396500 0.8962419
## 60409b7b5af2c702367b3a9f  2.557759763  1.800641 -0.07594192 6.8860085
## 604b82b5a7718fbed181b336 -0.008325812  1.190826 -2.45745000 2.3916968
## 6050c1bf856f36729d2e5218  0.541533469  1.217123 -1.82525650 3.1873908
## 6050e1e7856f36729d2e5219 -1.666927946  1.694376 -5.75879000 0.8912427
## 6055fdc6856f36729d2e521b -0.072357751  1.231184 -2.58667350 2.5048593
## 60589862856f36729d2e521f  0.550181295  1.210881 -1.66271700 3.1372255
## 605afa3a856f36729d2e5222  0.566024661  1.151619 -1.60985425 3.0708253
## 605c8bc6856f36729d2e5223  0.570066575  1.130430 -1.62389375 2.9595718
## 605f3f2d856f36729d2e5224  0.539482972  1.198337 -1.84284925 3.0192855
## 605f46c3856f36729d2e5225 -1.618076218  1.643822 -5.50430625 0.8056822
## 60605337856f36729d2e5226 -1.732632313  1.799992 -6.09595575 0.8780582
## 60609ae6856f36729d2e5228  0.580551356  1.144888 -1.68345400 2.9726528
## 6061ce91856f36729d2e522e -1.651310798  1.708408 -5.87184950 0.8055595
## 6061f106856f36729d2e5231 -1.705539717  1.656698 -5.66440425 0.7433505
## 6068ea9f856f36729d2e523e -1.629824192  1.677229 -5.34906775 0.8807062
## 6075ab05856f36729d2e5247 -1.684848206  1.723714 -5.79982325 0.8651169
## 
## , , reusedlogicconstructor_Intercept
## 
##                             Estimate Est.Error         Q2.5     Q97.5
## 6033d69a5af2c702367b3a95 -1.82498455  1.849317 -6.237795000 0.8784421
## 6033d90a5af2c702367b3a96 -1.74701832  1.779300 -6.022685500 1.0182968
## 6034fc165af2c702367b3a98  0.04342082  1.242152 -2.354263500 2.6352758
## 603500725af2c702367b3a99  2.81061937  1.844072 -0.007717327 7.1762820
## 603f97625af2c702367b3a9d  2.78983275  1.924277 -0.074214957 7.3334005
## 603fd5d95af2c702367b3a9e -1.82439095  1.812279 -6.089966250 0.8871853
## 60409b7b5af2c702367b3a9f  2.76293483  1.891858 -0.021062767 7.0474240
## 604b82b5a7718fbed181b336 -0.85326719  1.452154 -4.315698250 1.4447720
## 6050c1bf856f36729d2e5218  0.60085146  1.219057 -1.817923250 3.1186740
## 6050e1e7856f36729d2e5219 -1.75462849  1.804687 -6.156250000 0.8432025
## 6055fdc6856f36729d2e521b -0.90331572  1.509559 -4.407148500 1.5385650
## 60589862856f36729d2e521f  0.63263833  1.197972 -1.651103750 3.1786760
## 605afa3a856f36729d2e5222  0.64369916  1.198076 -1.609403250 3.1688813
## 605c8bc6856f36729d2e5223  0.62713389  1.225826 -1.741967000 3.0997880
## 605f3f2d856f36729d2e5224  0.62512667  1.207206 -1.726381250 3.1226045
## 605f46c3856f36729d2e5225 -1.72376876  1.749191 -5.926578500 0.8558221
## 60605337856f36729d2e5226 -1.79606889  1.787076 -6.171988250 0.9625063
## 60609ae6856f36729d2e5228  0.61653287  1.198071 -1.616998250 3.0929363
## 6061ce91856f36729d2e522e -1.74622810  1.791889 -6.141287500 0.8451217
## 6061f106856f36729d2e5231 -1.78647574  1.770122 -5.897125000 0.8413062
## 6068ea9f856f36729d2e523e -1.76607406  1.788823 -6.162233750 0.8254057
## 6075ab05856f36729d2e5247 -1.75660207  1.821276 -5.879109000 0.8899523

Sampling plots

plot(reuse1, ask = FALSE)

Posterior predictive check

Constructor reuse
pp_check(reuse1, type = "bars", nsamples = 200, resp = "reusedlogicconstructor")

Validation reuse
pp_check(reuse1, type = "bars", nsamples = 200, resp = "reusedlogicvalidation")

Reuse2

reuse2 <- brm(
  "mvbind(
    reused_logic_validation,
    reused_logic_constructor
  ) ~ 1 + high_debt_version + scenario + workplace_coding_standards + (1 |c| session)",
  prior = c(
    prior(normal(0, 1), class = "b"),
    prior(normal(0, 1), class = "Intercept"),
    prior(exponential(1), class = "sd", resp = reusedlogicconstructor),
    prior(exponential(1), class = "sd", resp = reusedlogicvalidation),
    prior(lkj(2), class = "L")
  ),
  family = bernoulli(),
  data = as.data.frame(d.both_completed),
  control = list(adapt_delta = 0.95),
  file = "fits/reuse2",
  file_refit = "on_change",
  seed = 20210421
)

Summary

summary(reuse2)
##  Family: MV(bernoulli, bernoulli) 
##   Links: mu = logit
##          mu = logit 
## Formula: reused_logic_validation ~ 1 + high_debt_version + scenario + workplace_coding_standards + (1 | c | session) 
##          reused_logic_constructor ~ 1 + high_debt_version + scenario + workplace_coding_standards + (1 | c | session) 
##    Data: as.data.frame(d.both_completed) (Number of observations: 44) 
## Samples: 4 chains, each with iter = 2000; warmup = 1000; thin = 1;
##          total post-warmup samples = 4000
## 
## Group-Level Effects: 
## ~session (Number of levels: 22) 
##                                                                       Estimate
## sd(reusedlogicvalidation_Intercept)                                       1.92
## sd(reusedlogicconstructor_Intercept)                                      2.07
## cor(reusedlogicvalidation_Intercept,reusedlogicconstructor_Intercept)     0.71
##                                                                       Est.Error
## sd(reusedlogicvalidation_Intercept)                                        0.82
## sd(reusedlogicconstructor_Intercept)                                       0.86
## cor(reusedlogicvalidation_Intercept,reusedlogicconstructor_Intercept)      0.22
##                                                                       l-95% CI
## sd(reusedlogicvalidation_Intercept)                                       0.54
## sd(reusedlogicconstructor_Intercept)                                      0.64
## cor(reusedlogicvalidation_Intercept,reusedlogicconstructor_Intercept)     0.14
##                                                                       u-95% CI
## sd(reusedlogicvalidation_Intercept)                                       3.76
## sd(reusedlogicconstructor_Intercept)                                      4.01
## cor(reusedlogicvalidation_Intercept,reusedlogicconstructor_Intercept)     0.97
##                                                                       Rhat
## sd(reusedlogicvalidation_Intercept)                                   1.00
## sd(reusedlogicconstructor_Intercept)                                  1.00
## cor(reusedlogicvalidation_Intercept,reusedlogicconstructor_Intercept) 1.00
##                                                                       Bulk_ESS
## sd(reusedlogicvalidation_Intercept)                                       1317
## sd(reusedlogicconstructor_Intercept)                                      1589
## cor(reusedlogicvalidation_Intercept,reusedlogicconstructor_Intercept)     1744
##                                                                       Tail_ESS
## sd(reusedlogicvalidation_Intercept)                                       1084
## sd(reusedlogicconstructor_Intercept)                                      1969
## cor(reusedlogicvalidation_Intercept,reusedlogicconstructor_Intercept)     1597
## 
## Population-Level Effects: 
##                                                        Estimate Est.Error
## reusedlogicvalidation_Intercept                           -0.14      0.75
## reusedlogicconstructor_Intercept                          -0.36      0.79
## reusedlogicvalidation_high_debt_versionfalse              -1.64      0.68
## reusedlogicvalidation_scenariotickets                      0.46      0.64
## reusedlogicvalidation_workplace_coding_standardsfalse     -0.28      0.74
## reusedlogicconstructor_high_debt_versionfalse             -1.50      0.70
## reusedlogicconstructor_scenariotickets                     0.25      0.67
## reusedlogicconstructor_workplace_coding_standardsfalse    -0.11      0.76
##                                                        l-95% CI u-95% CI Rhat
## reusedlogicvalidation_Intercept                           -1.61     1.34 1.00
## reusedlogicconstructor_Intercept                          -1.96     1.19 1.00
## reusedlogicvalidation_high_debt_versionfalse              -3.01    -0.33 1.00
## reusedlogicvalidation_scenariotickets                     -0.76     1.71 1.00
## reusedlogicvalidation_workplace_coding_standardsfalse     -1.70     1.18 1.00
## reusedlogicconstructor_high_debt_versionfalse             -2.95    -0.18 1.00
## reusedlogicconstructor_scenariotickets                    -1.08     1.60 1.00
## reusedlogicconstructor_workplace_coding_standardsfalse    -1.61     1.39 1.00
##                                                        Bulk_ESS Tail_ESS
## reusedlogicvalidation_Intercept                            4560     3215
## reusedlogicconstructor_Intercept                           4310     3061
## reusedlogicvalidation_high_debt_versionfalse               5165     3201
## reusedlogicvalidation_scenariotickets                      6273     3149
## reusedlogicvalidation_workplace_coding_standardsfalse      4515     3033
## reusedlogicconstructor_high_debt_versionfalse              5475     2960
## reusedlogicconstructor_scenariotickets                     4797     2343
## reusedlogicconstructor_workplace_coding_standardsfalse     4288     2963
## 
## Samples were drawn using sample(hmc). For each parameter, Bulk_ESS
## and Tail_ESS are effective sample size measures, and Rhat is the potential
## scale reduction factor on split chains (at convergence, Rhat = 1).

Random effects

ranef(reuse2)
## $session
## , , reusedlogicvalidation_Intercept
## 
##                             Estimate Est.Error        Q2.5     Q97.5
## 6033d69a5af2c702367b3a95 -1.72724019  1.729424 -5.76663225 0.8824620
## 6033d90a5af2c702367b3a96 -1.71009588  1.708723 -5.79641525 0.9031177
## 6034fc165af2c702367b3a98 -0.74414633  1.434611 -4.25154900 1.6368215
## 603500725af2c702367b3a99  2.72107734  1.734281  0.02164775 6.6522770
## 603f97625af2c702367b3a9d  2.62006783  1.852629 -0.06818948 7.0577815
## 603fd5d95af2c702367b3a9e -1.74757375  1.734656 -5.82139625 0.7781280
## 60409b7b5af2c702367b3a9f  2.72509629  1.815604  0.03953357 7.1162100
## 604b82b5a7718fbed181b336  0.04097657  1.251696 -2.48596700 2.5700880
## 6050c1bf856f36729d2e5218  0.51494732  1.205913 -1.83810450 3.0532292
## 6050e1e7856f36729d2e5219 -1.65802503  1.726420 -5.73621550 0.9130677
## 6055fdc6856f36729d2e521b -0.17569988  1.263859 -2.76861675 2.2771918
## 60589862856f36729d2e521f  0.50942404  1.177708 -1.75178900 2.9368663
## 605afa3a856f36729d2e5222  0.51982453  1.181801 -1.74835550 2.9743145
## 605c8bc6856f36729d2e5223  0.50061688  1.164279 -1.72087650 2.9150390
## 605f3f2d856f36729d2e5224  0.50876936  1.206301 -1.80681200 3.0395723
## 605f46c3856f36729d2e5225 -1.73787705  1.676404 -5.80324625 0.7288653
## 60605337856f36729d2e5226 -1.80345345  1.714569 -5.83591775 0.7222267
## 60609ae6856f36729d2e5228  0.67748147  1.169618 -1.54658200 3.0790463
## 6061ce91856f36729d2e522e -1.70074400  1.665290 -5.77741925 0.8031437
## 6061f106856f36729d2e5231 -1.75997584  1.734934 -5.85627250 0.8133482
## 6068ea9f856f36729d2e523e -1.63109295  1.734193 -5.84098250 0.9712994
## 6075ab05856f36729d2e5247 -1.64509634  1.732791 -5.95051925 0.9490631
## 
## , , reusedlogicconstructor_Intercept
## 
##                             Estimate Est.Error         Q2.5     Q97.5
## 6033d69a5af2c702367b3a95 -1.80952856  1.850799 -6.438907750 0.9194913
## 6033d90a5af2c702367b3a96 -1.83906327  1.836806 -6.294757750 0.8568513
## 6034fc165af2c702367b3a98  0.09418828  1.331059 -2.590392000 2.6774448
## 603500725af2c702367b3a99  2.94507590  1.893184  0.024102138 7.4642418
## 603f97625af2c702367b3a9d  2.83295501  1.917229  0.003376609 7.2194823
## 603fd5d95af2c702367b3a9e -1.82554381  1.778430 -6.245592250 0.8921717
## 60409b7b5af2c702367b3a9f  2.94358988  1.900508  0.039903063 7.5241995
## 604b82b5a7718fbed181b336 -0.82592433  1.528005 -4.441380000 1.6969093
## 6050c1bf856f36729d2e5218  0.60768382  1.257928 -1.787242750 3.1277430
## 6050e1e7856f36729d2e5219 -1.74898559  1.805799 -6.153328000 0.9559280
## 6055fdc6856f36729d2e521b -0.99352040  1.592587 -4.729589750 1.6322363
## 60589862856f36729d2e521f  0.60321616  1.283964 -1.906011250 3.2212508
## 605afa3a856f36729d2e5222  0.60466370  1.229093 -1.754774000 3.1802890
## 605c8bc6856f36729d2e5223  0.59818866  1.248713 -1.731229000 3.2203953
## 605f3f2d856f36729d2e5224  0.59694744  1.237238 -1.860229250 3.1646430
## 605f46c3856f36729d2e5225 -1.84161597  1.777057 -6.068717250 0.8589194
## 60605337856f36729d2e5226 -1.90844404  1.867845 -6.511749750 0.8394590
## 60609ae6856f36729d2e5228  0.73412792  1.236296 -1.712285500 3.1899440
## 6061ce91856f36729d2e522e -1.83475328  1.818216 -6.350759250 0.8747636
## 6061f106856f36729d2e5231 -1.81988540  1.785118 -6.051485750 0.9739698
## 6068ea9f856f36729d2e523e -1.77632682  1.793617 -6.196038750 0.8985561
## 6075ab05856f36729d2e5247 -1.78492474  1.840006 -6.169046750 0.9942407

Sampling plots

plot(reuse2, ask = FALSE)

Posterior predictive check

Constructor reuse
pp_check(reuse2, type = "bars", nsamples = 200, resp = "reusedlogicconstructor")

Validation reuse
pp_check(reuse2, type = "bars", nsamples = 200, resp = "reusedlogicvalidation")

Final model

All candidate models look nice, none is significantly better than the others, we will proceed the simplest model: reuse0

Variations

We will try a few different variations of the selected candidate model.

All data points

Some participants did only complete one scenario. Those has been excluded from the initial dataset to improve sampling of the models. We do however want to use all data we can and will therefore try to fit the model with the complete dataset.

reuse0.all <- brm(
  "mvbind(
    reused_logic_validation,
    reused_logic_constructor
  ) ~ 1 + high_debt_version + (1 |c| session)",
  prior = c(
    prior(normal(0, 1), class = "b"),
    prior(normal(0, 1), class = "Intercept"),
    prior(exponential(1), class = "sd", resp = reusedlogicconstructor),
    prior(exponential(1), class = "sd", resp = reusedlogicvalidation),
    prior(lkj(2), class = "L")
  ),
  family = bernoulli(),
  data = as.data.frame(d.completed),
  control = list(adapt_delta = 0.95),
  file = "fits/reuse0.all",
  file_refit = "on_change",
  seed = 20210421
)
Summary
summary(reuse0.all)
##  Family: MV(bernoulli, bernoulli) 
##   Links: mu = logit
##          mu = logit 
## Formula: reused_logic_validation ~ 1 + high_debt_version + (1 | c | session) 
##          reused_logic_constructor ~ 1 + high_debt_version + (1 | c | session) 
##    Data: as.data.frame(d.completed) (Number of observations: 51) 
## Samples: 4 chains, each with iter = 2000; warmup = 1000; thin = 1;
##          total post-warmup samples = 4000
## 
## Group-Level Effects: 
## ~session (Number of levels: 29) 
##                                                                       Estimate
## sd(reusedlogicvalidation_Intercept)                                       1.82
## sd(reusedlogicconstructor_Intercept)                                      1.92
## cor(reusedlogicvalidation_Intercept,reusedlogicconstructor_Intercept)     0.73
##                                                                       Est.Error
## sd(reusedlogicvalidation_Intercept)                                        0.79
## sd(reusedlogicconstructor_Intercept)                                       0.83
## cor(reusedlogicvalidation_Intercept,reusedlogicconstructor_Intercept)      0.22
##                                                                       l-95% CI
## sd(reusedlogicvalidation_Intercept)                                       0.45
## sd(reusedlogicconstructor_Intercept)                                      0.47
## cor(reusedlogicvalidation_Intercept,reusedlogicconstructor_Intercept)     0.17
##                                                                       u-95% CI
## sd(reusedlogicvalidation_Intercept)                                       3.59
## sd(reusedlogicconstructor_Intercept)                                      3.80
## cor(reusedlogicvalidation_Intercept,reusedlogicconstructor_Intercept)     0.97
##                                                                       Rhat
## sd(reusedlogicvalidation_Intercept)                                   1.00
## sd(reusedlogicconstructor_Intercept)                                  1.00
## cor(reusedlogicvalidation_Intercept,reusedlogicconstructor_Intercept) 1.01
##                                                                       Bulk_ESS
## sd(reusedlogicvalidation_Intercept)                                       1301
## sd(reusedlogicconstructor_Intercept)                                      1503
## cor(reusedlogicvalidation_Intercept,reusedlogicconstructor_Intercept)     1254
##                                                                       Tail_ESS
## sd(reusedlogicvalidation_Intercept)                                       1200
## sd(reusedlogicconstructor_Intercept)                                       905
## cor(reusedlogicvalidation_Intercept,reusedlogicconstructor_Intercept)     1218
## 
## Population-Level Effects: 
##                                               Estimate Est.Error l-95% CI
## reusedlogicvalidation_Intercept                  -0.16      0.54    -1.19
## reusedlogicconstructor_Intercept                 -0.36      0.58    -1.53
## reusedlogicvalidation_high_debt_versionfalse     -1.75      0.66    -3.09
## reusedlogicconstructor_high_debt_versionfalse    -1.62      0.65    -2.95
##                                               u-95% CI Rhat Bulk_ESS Tail_ESS
## reusedlogicvalidation_Intercept                   0.90 1.00     2888     2622
## reusedlogicconstructor_Intercept                  0.77 1.00     3128     2744
## reusedlogicvalidation_high_debt_versionfalse     -0.54 1.00     3400     2098
## reusedlogicconstructor_high_debt_versionfalse    -0.39 1.00     4202     3033
## 
## Samples were drawn using sample(hmc). For each parameter, Bulk_ESS
## and Tail_ESS are effective sample size measures, and Rhat is the potential
## scale reduction factor on split chains (at convergence, Rhat = 1).
Random effects
ranef(reuse0.all)
## $session
## , , reusedlogicvalidation_Intercept
## 
##                             Estimate Est.Error         Q2.5     Q97.5
## 6033c6fc5af2c702367b3a93  1.52433132  1.613569 -0.969815500 5.1783700
## 6033d69a5af2c702367b3a95 -1.57931215  1.663673 -5.810711250 0.9008065
## 6033d90a5af2c702367b3a96 -1.59139089  1.640972 -5.544601500 0.9003481
## 6034fc165af2c702367b3a98 -0.62410491  1.307430 -3.581760750 1.5665022
## 603500725af2c702367b3a99  2.61102205  1.662121  0.030484688 6.5857627
## 603f84f15af2c702367b3a9b -0.70710294  1.668152 -4.643539500 2.0840563
## 603f97625af2c702367b3a9d  2.68207342  1.720716 -0.005952295 6.6985173
## 603fd5d95af2c702367b3a9e -1.53515976  1.646294 -5.649376250 0.8672848
## 60409b7b5af2c702367b3a9f  2.64053426  1.720147  0.002959409 6.5898188
## 604b82b5a7718fbed181b336  0.03315654  1.192443 -2.364176750 2.5186843
## 604f1239a7718fbed181b33f -1.38184365  1.637852 -5.258658250 1.3136058
## 6050c1bf856f36729d2e5218  0.65427781  1.153967 -1.554867750 3.0870665
## 6050e1e7856f36729d2e5219 -1.57173866  1.662905 -5.629063000 0.9188571
## 6055fdc6856f36729d2e521b  0.02806918  1.163098 -2.375179250 2.4169628
## 60579f2a856f36729d2e521e -0.79264961  1.679038 -4.743765500 2.0493635
## 60589862856f36729d2e521f  0.66974787  1.129378 -1.433695000 3.0534772
## 605a30a7856f36729d2e5221 -0.80703678  1.641389 -4.638936750 2.0203173
## 605afa3a856f36729d2e5222  0.66357750  1.122612 -1.413732250 3.0645868
## 605c8bc6856f36729d2e5223  0.66305916  1.145585 -1.605166250 3.0536628
## 605f3f2d856f36729d2e5224  0.66566948  1.157849 -1.554673750 3.1535625
## 605f46c3856f36729d2e5225 -1.55358132  1.603079 -5.381195000 0.8799558
## 60605337856f36729d2e5226 -1.54732232  1.645383 -5.440211250 0.9519867
## 60609ae6856f36729d2e5228  0.70043553  1.144582 -1.402780750 3.1083285
## 6061ce91856f36729d2e522e -1.53685377  1.627986 -5.490529250 0.9917854
## 6061f106856f36729d2e5231 -1.55189487  1.605967 -5.378386750 0.8988301
## 60672faa856f36729d2e523c -0.74043191  1.656474 -4.704838000 2.1571240
## 6068ea9f856f36729d2e523e -1.57390343  1.673930 -5.748856500 0.9385252
## 606db69d856f36729d2e5243 -0.77615544  1.645051 -4.531027000 2.0059235
## 6075ab05856f36729d2e5247 -1.54696144  1.638185 -5.581241750 0.8568468
## 
## , , reusedlogicconstructor_Intercept
## 
##                            Estimate Est.Error         Q2.5     Q97.5
## 6033c6fc5af2c702367b3a93  1.6513637  1.705281 -0.980449675 5.5500160
## 6033d69a5af2c702367b3a95 -1.6029530  1.696962 -5.689778500 0.9606220
## 6033d90a5af2c702367b3a96 -1.6341323  1.736069 -5.883496250 0.9600657
## 6034fc165af2c702367b3a98  0.1029461  1.285825 -2.428427000 2.7107973
## 603500725af2c702367b3a99  2.7875200  1.784280 -0.014277007 6.9738533
## 603f84f15af2c702367b3a9b -0.7843906  1.731413 -4.820681500 2.1859730
## 603f97625af2c702367b3a9d  2.8057268  1.784552  0.008421944 6.9930365
## 603fd5d95af2c702367b3a9e -1.6181059  1.695680 -5.590281750 1.0030633
## 60409b7b5af2c702367b3a9f  2.8022027  1.807808 -0.006584326 6.9946810
## 604b82b5a7718fbed181b336 -0.6685813  1.388766 -3.863349250 1.6474405
## 604f1239a7718fbed181b33f -1.4548001  1.686973 -5.387264500 1.2411622
## 6050c1bf856f36729d2e5218  0.7298783  1.234289 -1.576021750 3.3316175
## 6050e1e7856f36729d2e5219 -1.6154009  1.719899 -5.666797500 0.9926875
## 6055fdc6856f36729d2e521b -0.6774652  1.383467 -3.966178750 1.6536542
## 60579f2a856f36729d2e521e -0.8246259  1.774828 -4.914796250 2.1394963
## 60589862856f36729d2e521f  0.7217280  1.168790 -1.439215250 3.1721455
## 605a30a7856f36729d2e5221 -0.8743952  1.804458 -5.217587250 2.1505523
## 605afa3a856f36729d2e5222  0.7291399  1.193025 -1.533765750 3.2442200
## 605c8bc6856f36729d2e5223  0.7477587  1.165007 -1.485445750 3.1594975
## 605f3f2d856f36729d2e5224  0.7356732  1.198331 -1.457312250 3.2265058
## 605f46c3856f36729d2e5225 -1.6476631  1.711048 -5.710195750 0.9255903
## 60605337856f36729d2e5226 -1.6325260  1.729426 -5.824566500 0.9650235
## 60609ae6856f36729d2e5228  0.7502496  1.190529 -1.538054750 3.3043045
## 6061ce91856f36729d2e522e -1.6604910  1.759366 -6.024402250 0.9926731
## 6061f106856f36729d2e5231 -1.6271320  1.670796 -5.771684500 0.9179115
## 60672faa856f36729d2e523c -0.7914588  1.746501 -4.877514250 2.1338238
## 6068ea9f856f36729d2e523e -1.6245046  1.748432 -5.650251500 1.0124705
## 606db69d856f36729d2e5243 -0.8596535  1.806563 -5.012448250 2.1501285
## 6075ab05856f36729d2e5247 -1.6136623  1.708559 -5.850888750 0.9880117
Sampling plots
plot(reuse0.all, ask = FALSE)

Posterior predictive check
Constructor reuse
pp_check(reuse0.all, type = "bars", nsamples = 200, resp = "reusedlogicconstructor")

Validation reuse
pp_check(reuse0.all, type = "bars", nsamples = 200, resp = "reusedlogicvalidation")

With experience predictor

As including all data points didn’t harm the model we will create this variant with all data points as well.

This variation includes work_experience_programming.s predictors as it can give further insight into how experience play a factor in the effect we try to measure. This is especially important as our sampling shewed towards containing less experienced developer than the population at large.

reuse0.all.exp <- brm(
  "mvbind(
    reused_logic_validation,
    reused_logic_constructor
  ) ~ 1 + high_debt_version + work_experience_programming.s + (1 |c| session)",
  prior = c(
    prior(normal(0, 1), class = "b"),
    prior(normal(0, 1), class = "Intercept"),
    prior(exponential(1), class = "sd", resp = reusedlogicconstructor),
    prior(exponential(1), class = "sd", resp = reusedlogicvalidation),
    prior(lkj(2), class = "L")
  ),
  family = bernoulli(),
  data = as.data.frame(d.completed),
  control = list(adapt_delta = 0.95),
  file = "fits/reuse0.all.exp",
  file_refit = "on_change",
  seed = 20210421
)
Summary
summary(reuse0.all.exp)
##  Family: MV(bernoulli, bernoulli) 
##   Links: mu = logit
##          mu = logit 
## Formula: reused_logic_validation ~ 1 + high_debt_version + work_experience_programming.s + (1 | c | session) 
##          reused_logic_constructor ~ 1 + high_debt_version + work_experience_programming.s + (1 | c | session) 
##    Data: as.data.frame(d.completed) (Number of observations: 51) 
## Samples: 4 chains, each with iter = 2000; warmup = 1000; thin = 1;
##          total post-warmup samples = 4000
## 
## Group-Level Effects: 
## ~session (Number of levels: 29) 
##                                                                       Estimate
## sd(reusedlogicvalidation_Intercept)                                       1.91
## sd(reusedlogicconstructor_Intercept)                                      1.97
## cor(reusedlogicvalidation_Intercept,reusedlogicconstructor_Intercept)     0.73
##                                                                       Est.Error
## sd(reusedlogicvalidation_Intercept)                                        0.81
## sd(reusedlogicconstructor_Intercept)                                       0.85
## cor(reusedlogicvalidation_Intercept,reusedlogicconstructor_Intercept)      0.22
##                                                                       l-95% CI
## sd(reusedlogicvalidation_Intercept)                                       0.56
## sd(reusedlogicconstructor_Intercept)                                      0.51
## cor(reusedlogicvalidation_Intercept,reusedlogicconstructor_Intercept)     0.16
##                                                                       u-95% CI
## sd(reusedlogicvalidation_Intercept)                                       3.69
## sd(reusedlogicconstructor_Intercept)                                      3.89
## cor(reusedlogicvalidation_Intercept,reusedlogicconstructor_Intercept)     0.97
##                                                                       Rhat
## sd(reusedlogicvalidation_Intercept)                                   1.00
## sd(reusedlogicconstructor_Intercept)                                  1.00
## cor(reusedlogicvalidation_Intercept,reusedlogicconstructor_Intercept) 1.00
##                                                                       Bulk_ESS
## sd(reusedlogicvalidation_Intercept)                                       1186
## sd(reusedlogicconstructor_Intercept)                                      1369
## cor(reusedlogicvalidation_Intercept,reusedlogicconstructor_Intercept)     1378
##                                                                       Tail_ESS
## sd(reusedlogicvalidation_Intercept)                                        783
## sd(reusedlogicconstructor_Intercept)                                       896
## cor(reusedlogicvalidation_Intercept,reusedlogicconstructor_Intercept)     1031
## 
## Population-Level Effects: 
##                                                      Estimate Est.Error
## reusedlogicvalidation_Intercept                         -0.16      0.57
## reusedlogicconstructor_Intercept                        -0.38      0.58
## reusedlogicvalidation_high_debt_versionfalse            -1.79      0.66
## reusedlogicvalidation_work_experience_programming.s      0.26      0.47
## reusedlogicconstructor_high_debt_versionfalse           -1.63      0.65
## reusedlogicconstructor_work_experience_programming.s     0.31      0.48
##                                                      l-95% CI u-95% CI Rhat
## reusedlogicvalidation_Intercept                         -1.30     0.96 1.00
## reusedlogicconstructor_Intercept                        -1.53     0.74 1.00
## reusedlogicvalidation_high_debt_versionfalse            -3.12    -0.54 1.00
## reusedlogicvalidation_work_experience_programming.s     -0.67     1.21 1.00
## reusedlogicconstructor_high_debt_versionfalse           -2.95    -0.39 1.00
## reusedlogicconstructor_work_experience_programming.s    -0.66     1.24 1.00
##                                                      Bulk_ESS Tail_ESS
## reusedlogicvalidation_Intercept                          2569     2547
## reusedlogicconstructor_Intercept                         3682     3321
## reusedlogicvalidation_high_debt_versionfalse             4478     2812
## reusedlogicvalidation_work_experience_programming.s      2695     2302
## reusedlogicconstructor_high_debt_versionfalse            4569     2989
## reusedlogicconstructor_work_experience_programming.s     2549     2514
## 
## Samples were drawn using sample(hmc). For each parameter, Bulk_ESS
## and Tail_ESS are effective sample size measures, and Rhat is the potential
## scale reduction factor on split chains (at convergence, Rhat = 1).
Random effects
ranef(reuse0.all.exp)
## $session
## , , reusedlogicvalidation_Intercept
## 
##                             Estimate Est.Error        Q2.5     Q97.5
## 6033c6fc5af2c702367b3a93  1.69440769  1.669757 -1.03472550 5.5714933
## 6033d69a5af2c702367b3a95 -1.54017002  1.722019 -5.68123400 0.9601725
## 6033d90a5af2c702367b3a96 -1.56440549  1.759165 -5.72973025 1.0521998
## 6034fc165af2c702367b3a98 -0.60750281  1.373274 -3.84262425 1.7090468
## 603500725af2c702367b3a99  2.87956516  1.793369  0.06626417 7.0774048
## 603f84f15af2c702367b3a9b -0.79794279  1.785462 -4.95443175 2.2066393
## 603f97625af2c702367b3a9d  2.84001020  1.771612  0.04165200 6.9975305
## 603fd5d95af2c702367b3a9e -1.58100585  1.700567 -5.50799950 1.0505440
## 60409b7b5af2c702367b3a9f  2.85017201  1.783262  0.03730362 7.0165155
## 604b82b5a7718fbed181b336  0.10882545  1.228775 -2.27172200 2.6333652
## 604f1239a7718fbed181b33f -1.44418669  1.737604 -5.65407975 1.4076188
## 6050c1bf856f36729d2e5218  0.66830339  1.154907 -1.50743975 3.0903048
## 6050e1e7856f36729d2e5219 -1.60383489  1.687630 -5.68284750 0.9535508
## 6055fdc6856f36729d2e521b  0.08876662  1.222867 -2.40751125 2.5978193
## 60579f2a856f36729d2e521e -0.80411051  1.720143 -4.84170850 2.1074005
## 60589862856f36729d2e521f  0.26567694  1.330795 -2.37831325 3.0714628
## 605a30a7856f36729d2e5221 -0.83583864  1.791272 -5.01451450 2.1661973
## 605afa3a856f36729d2e5222  0.38890068  1.246211 -2.00175425 2.9543810
## 605c8bc6856f36729d2e5223  0.67401789  1.171177 -1.56932025 3.1122942
## 605f3f2d856f36729d2e5224  0.11490152  1.512102 -2.85034875 3.2663030
## 605f46c3856f36729d2e5225 -1.53612405  1.663945 -5.57183250 0.9840714
## 60605337856f36729d2e5226 -1.61733472  1.709670 -5.64570950 1.0346345
## 60609ae6856f36729d2e5228  0.81336784  1.182931 -1.44218375 3.2653805
## 6061ce91856f36729d2e522e -1.57869587  1.673598 -5.61324125 0.9152119
## 6061f106856f36729d2e5231 -1.57670092  1.683208 -5.57801550 1.0358605
## 60672faa856f36729d2e523c -0.79070451  1.739234 -4.83192975 2.1411885
## 6068ea9f856f36729d2e523e -1.60367217  1.668877 -5.62579500 0.8866286
## 606db69d856f36729d2e5243 -0.93886701  1.758043 -4.90556200 2.0598303
## 6075ab05856f36729d2e5247 -1.60869184  1.760252 -5.92382650 1.1282968
## 
## , , reusedlogicconstructor_Intercept
## 
##                            Estimate Est.Error        Q2.5     Q97.5
## 6033c6fc5af2c702367b3a93  1.8106828  1.775726 -1.00387800 5.9914078
## 6033d69a5af2c702367b3a95 -1.6174607  1.784822 -5.91908575 1.0826265
## 6033d90a5af2c702367b3a96 -1.5707289  1.729083 -5.67923250 1.1092988
## 6034fc165af2c702367b3a98  0.1529060  1.281617 -2.40816000 2.7561755
## 603500725af2c702367b3a99  2.9696416  1.888775  0.07114812 7.2944118
## 603f84f15af2c702367b3a9b -0.8366828  1.785836 -5.03745900 2.2163323
## 603f97625af2c702367b3a9d  2.9811799  1.865609  0.09382672 7.2111513
## 603fd5d95af2c702367b3a9e -1.6204577  1.806969 -5.84734450 1.0886143
## 60409b7b5af2c702367b3a9f  2.9579985  1.832993  0.08757456 7.2107488
## 604b82b5a7718fbed181b336 -0.6081398  1.441246 -4.11585300 1.8219030
## 604f1239a7718fbed181b33f -1.4476162  1.770623 -5.73338775 1.3782573
## 6050c1bf856f36729d2e5218  0.7342527  1.213078 -1.52650125 3.2850767
## 6050e1e7856f36729d2e5219 -1.6461784  1.764501 -5.99489800 1.0084595
## 6055fdc6856f36729d2e521b -0.6193632  1.419618 -4.01296625 1.7525353
## 60579f2a856f36729d2e521e -0.8327042  1.791048 -5.06673800 2.2946275
## 60589862856f36729d2e521f  0.3213118  1.375121 -2.37617575 3.2033225
## 605a30a7856f36729d2e5221 -0.8547844  1.881423 -5.35731650 2.2204713
## 605afa3a856f36729d2e5222  0.4464808  1.248476 -1.96266500 3.0384988
## 605c8bc6856f36729d2e5223  0.7178812  1.219858 -1.55502375 3.3135363
## 605f3f2d856f36729d2e5224  0.1586350  1.559585 -2.93105325 3.4420773
## 605f46c3856f36729d2e5225 -1.5890300  1.717338 -5.53698625 1.0393233
## 60605337856f36729d2e5226 -1.6091079  1.762246 -5.64782975 1.0305513
## 60609ae6856f36729d2e5228  0.8779225  1.243293 -1.41096400 3.5594180
## 6061ce91856f36729d2e522e -1.5674709  1.677502 -5.72683900 1.0392560
## 6061f106856f36729d2e5231 -1.5672068  1.736949 -5.79265925 1.1207223
## 60672faa856f36729d2e523c -0.8415536  1.790866 -4.83130600 2.1626418
## 6068ea9f856f36729d2e523e -1.6151410  1.708899 -5.78600800 0.9732339
## 606db69d856f36729d2e5243 -0.9906428  1.812217 -5.26753875 2.0467303
## 6075ab05856f36729d2e5247 -1.6098854  1.777096 -5.97939575 1.0652855
Loo comparison
loo(
  reuse0.all,
  reuse0.all.exp
)
## Output of model 'reuse0.all':
## 
## Computed from 4000 by 51 log-likelihood matrix
## 
##          Estimate  SE
## elpd_loo    -46.1 4.4
## p_loo        19.4 2.2
## looic        92.2 8.9
## ------
## Monte Carlo SE of elpd_loo is NA.
## 
## Pareto k diagnostic values:
##                          Count Pct.    Min. n_eff
## (-Inf, 0.5]   (good)     25    49.0%   369       
##  (0.5, 0.7]   (ok)       17    33.3%   249       
##    (0.7, 1]   (bad)       9    17.6%   165       
##    (1, Inf)   (very bad)  0     0.0%   <NA>      
## See help('pareto-k-diagnostic') for details.
## 
## Output of model 'reuse0.all.exp':
## 
## Computed from 4000 by 51 log-likelihood matrix
## 
##          Estimate  SE
## elpd_loo    -47.4 4.7
## p_loo        21.0 2.4
## looic        94.8 9.3
## ------
## Monte Carlo SE of elpd_loo is NA.
## 
## Pareto k diagnostic values:
##                          Count Pct.    Min. n_eff
## (-Inf, 0.5]   (good)     23    45.1%   214       
##  (0.5, 0.7]   (ok)       17    33.3%   200       
##    (0.7, 1]   (bad)      11    21.6%   108       
##    (1, Inf)   (very bad)  0     0.0%   <NA>      
## See help('pareto-k-diagnostic') for details.
## 
## Model comparisons:
##                elpd_diff se_diff
## reuse0.all      0.0       0.0   
## reuse0.all.exp -1.3       1.0
Sampling plots
plot(reuse0.all.exp, ask = FALSE)

Posterior predictive check
Constructor reuse
pp_check(reuse0.all.exp, type = "bars", nsamples = 200, resp = "reusedlogicconstructor")

Validation reuse
pp_check(reuse0.all.exp, type = "bars", nsamples = 200, resp = "reusedlogicvalidation")

Final model

  • Fitting the model to all data point did not significantly damage the model and will be used as is a more fair representation of reality.
  • Adding the experience predictors did not significantly damage the model and will be used as it provides useful insight.

This means that our final model, with all data points and experience predictors, is time0.all.exp

Interpreting the model

To begin interpreting the model we look at how it’s parameters were estimated. As our research is focused on how the outcome of the model is effected we will mainly analyze the \(\beta\) parameters.

\(\beta\) parameters

Constructor reuse

mcmc_areas(reuse0.all.exp, pars = c("b_reusedlogicconstructor_high_debt_versionfalse", "b_reusedlogicconstructor_work_experience_programming.s"), prob = 0.95) + scale_y_discrete() +
  scale_y_discrete(labels=c("High debt version: false", "Professional programming experience")) +
  ggtitle("Beta parameters densities in reuse constructor model", subtitle = "Shaded region marks 95% of the density. Line marks the median")

Validation reuse

mcmc_areas(reuse0.all.exp, pars = c("b_reusedlogicvalidation_high_debt_versionfalse", "b_reusedlogicvalidation_work_experience_programming.s"), prob = 0.95) + scale_y_discrete() +
  scale_y_discrete(labels=c("High debt version: false", "Professional programming experience")) +
  ggtitle("Beta parameters densities in reuse validation model", subtitle = "Shaded region marks 95% of the density. Line marks the median")

Effects sizes

When we look at effect size we will look at both outcomes combined

scale_programming_experience <- function(x) {
  (x - mean(d.completed$work_experience_programming))/ sd(d.completed$work_experience_programming)
}
unscale_programming_experience <- function(x) {
  x * sd(d.completed$work_experience_programming) + mean(d.completed$work_experience_programming)
}

post_settings <- expand.grid(
  high_debt_version = c("false", "true"),
  session = NA,
  work_experience_programming.s = sapply(c(0, 3, 10, 25, 40), scale_programming_experience)
)

post <- posterior_predict(reuse0.all.exp, newdata = post_settings) %>%
  melt(value.name = "estimate", varnames = c("sample_number", "settings_id", "model")) %>%
  left_join(
    rowid_to_column(post_settings, var= "settings_id"),
    by = "settings_id"
  ) %>%
  mutate(work_experience_programming = unscale_programming_experience(work_experience_programming.s)) %>%
  select(
    estimate,
    high_debt_version,
    work_experience_programming,
    model
  )

post %>%
  mutate_at("high_debt_version", 
            function(x) case_when(
              x == "false" ~ "Low debt",
              x == "true" ~ "High debt"
            )) %>%
  mutate_at("estimate", 
            function(x) case_when(
              x == 0 ~ "Reused",
              x == 1 ~ "Duplicated"
            )) %>%
  ggplot(aes(high_debt_version)) +
  geom_bar(aes(fill = estimate), position = "fill") + 
  facet_grid(rows = vars(work_experience_programming)) +
  scale_fill_manual("Legend", values = c("lightblue", "darkblue")) +
  labs(title = "Reuse / programming experience") +
  xlab("Debt version") +
  ylab("Ratio of reuse")

We can then proceed to calculate some likelihoods of duplication:

d <- post
d.high <- d %>% filter(high_debt_version == "true") %>% pull(estimate)
d.low <- d %>% filter(high_debt_version == "false") %>% pull(estimate)
x <- sum(d.high) / sum(d.low)
x
## [1] 1.866586

Given all the simulated cases we find that developers are 87% more likely to duplicate code in our high debt scenario.

d <- post %>% filter(work_experience_programming == 10)
d.high <- d %>% filter(high_debt_version == "true") %>% pull(estimate)
d.low <- d %>% filter(high_debt_version == "false") %>% pull(estimate)
x <- sum(d.high) / sum(d.low)
x
## [1] 2.017271

Considering developers with 10 years of professional programming experience we find that they are 102% more likely to duplicate code in our high debt scenario.

d <- post %>% filter(work_experience_programming == 0)
d.high <- d %>% filter(high_debt_version == "true") %>% pull(estimate)
d.low <- d %>% filter(high_debt_version == "false") %>% pull(estimate)
x <- sum(d.high) / sum(d.low)
x
## [1] 2.126187

Considering developers with no professional programming experience we find that they are 113% more likely to duplicate code in our high debt scenario.

d <- post %>% filter(work_experience_programming == 25)
d.high <- d %>% filter(high_debt_version == "true") %>% pull(estimate)
d.low <- d %>% filter(high_debt_version == "false") %>% pull(estimate)
x <- sum(d.high) / sum(d.low)
x
## [1] 1.783713

Considering developers with 25 years of professional programming experience we find that they are 78% more likely to duplicate code in our high debt scenario.

---
title: "Logic Reuse Model"
author: Hampus Broman & William Levén
date: 2021-05
output: 
  html_document: 
    pandoc_args: [ "-o", "docs/logic_reuse.html" ]
---

```{r include-setup, include=FALSE}
# Load setup file
source(knitr::purl('setup.Rmd', output = tempfile()))
```


## Looking at the data

There appears to be significant difference in the reuse rate between high and low debt versions.

### Constructor reuse

```{r plot-constructor}
d.completed %>%
  mutate_at("high_debt_version", 
            function(x) case_when(
              x == "false" ~ "Low debt",
              x == "true" ~ "High debt"
            )) %>%
  ggplot(aes(high_debt_version)) +
  geom_bar(aes(fill = reused_logic_constructor), position = "fill") + 
  scale_fill_manual("Legend", values = c("lightblue", "darkblue"), labels = c("Reused", "Duplicated")) +
  labs(title = "Constructor reuse") +
  xlab("Debt version") +
  ylab("Ratio of reuse")
```

### Validation reuse

```{r plot-validation}
d.completed %>%
  mutate_at("high_debt_version", 
            function(x) case_when(
              x == "false" ~ "Low debt",
              x == "true" ~ "High debt"
            )) %>%
  ggplot(aes(high_debt_version)) +
  geom_bar(aes(fill = reused_logic_validation), position = "fill") + 
  scale_fill_manual("Legend", values = c("lightblue", "darkblue"), labels = c("Reused", "Duplicated")) +
  labs(title = "Validation reuse") +
  xlab("Debt version") +
  ylab("Ratio of reuse")
```

## Initial model
For a boolean outcome, bernoulli is the most suitable family.

We include `high_debt_verison` as a predictor in our model as this variable represent the very effect we want to measure.
We also include a varying intercept for each individual to prevent the model from learning too much from single participants with extreme measurements.

Since they may correlate, constructor and logic reuse are both included in a single multivariate model.

### Selecting priors {.tabset}

We iterate over the model until we have sane priors, in this case a prior giving a 50/50 chance was chosen in both cases. 
The prior `lkj(2)` will mean the model is skeptical of strong correlations.

#### Base model with priors

```{r initial-model-definition, class.source = 'fold-show'}
reuse.with <- extendable_model(
  base_name = "reuse",
  base_formula = "mvbind(
    reused_logic_validation,
    reused_logic_constructor
  ) ~ 1 + high_debt_version + (1 |c| session)",
    base_priors = c(
    prior(normal(0, 1), class = "b"),
    prior(normal(0, 1), class = "Intercept"),
    prior(exponential(1), class = "sd", resp = reusedlogicconstructor),
    prior(exponential(1), class = "sd", resp = reusedlogicvalidation),
    prior(lkj(2), class = "L")
  ),
  family = bernoulli(),
  data = d.both_completed,
  base_control = list(adapt_delta = 0.95)
)
```

#### Default priors

```{r default-priors}
prior_summary(reuse.with(only_priors= TRUE))
```

#### Selected priors

```{r selected-priors, warning=FALSE}
prior_summary(reuse.with(sample_prior = "only"))
```

#### Prior predictive check

##### Constructor reuse

```{r priors-check-constructor, warning=FALSE, message=FALSE}
pp_check(reuse.with(sample_prior = "only"), type = "bars", nsamples = 200, resp = "reusedlogicconstructor")
```

##### Validation reuse

```{r priors-check-validation, warning=FALSE, message=FALSE}
pp_check(reuse.with(sample_prior = "only"), type = "bars", nsamples = 200, resp = "reusedlogicvalidation")
```

#### Beta parameter influence

We choose a beta parameter priors allowing for the beta parameter to account for 100% of the effect but that is skeptical to such strong effects from the beta parameter.

```{r priors-beta, warning=FALSE}
sim.size <- 1000
sim.intercept <- rnorm(sim.size, 0, 1)
sim.beta <- rnorm(sim.size, 0, 1)
sim.beta.diff <- (plogis(sim.intercept + sim.beta) / plogis(sim.intercept) * 100) - 100

data.frame(x = sim.beta.diff) %>%
  ggplot(aes(x)) +
  geom_density() +
  xlim(-150, 150) +
  labs(
    title = "Beta parameter prior influence",
    x = "Estimate with beta as % of estimate without beta",
    y = "Density"
  )

```

### Model fit {.tabset}

We check the posterior distribution and can see that the model seems to have been able to fit the data well. 
Sampling seems to also have worked well as Rhat values are close to 1 and the sampling plots look nice.

#### Posterior predictive check

##### Constructor reuse
```{r base-pp-check-constructor, warning=FALSE, message=FALSE}
pp_check(reuse.with(), type = "bars", nsamples = 200, resp = "reusedlogicconstructor")
```

##### Validation reuse

```{r base-pp-check-validation, warning=FALSE, message=FALSE}
pp_check(reuse.with(), type = "bars", nsamples = 200, resp = "reusedlogicvalidation")
```

#### Summary

```{r base-summary, warning=FALSE,}
summary(reuse.with())
```

#### Sampling plots

```{r base-plot, warning=FALSE, message=FALSE}
plot(reuse.with(), ask = FALSE)
```


## Model predictor extenstions {.tabset}

```{r mo-priors}
# default prior for monotonic predictor
edlvl_prior <- c(
  prior(dirichlet(2), class = "simo", coef = "moeducation_level1", resp = "reusedlogicconstructor"),
  prior(dirichlet(2), class = "simo", coef = "moeducation_level1", resp = "reusedlogicvalidation")
)
```

We use `loo` to check some possible extensions on the model.

### One variable {.tabset}

```{r model-extension-1, warning=FALSE, class.source = 'fold-show', message=FALSE}
loo_result <- loo(
  # Benchmark model(s)
  reuse.with(),
  
  # New model(s)
  reuse.with("work_domain"),
  reuse.with("work_experience_programming.s"),
  reuse.with("work_experience_java.s"),
  reuse.with("education_field"),
  reuse.with("mo(education_level)", edlvl_prior),
  reuse.with("workplace_peer_review"),
  reuse.with("workplace_td_tracking"),
  reuse.with("workplace_pair_programming"),
  reuse.with("workplace_coding_standards"),
  reuse.with("scenario"),
  reuse.with("group")
)
```

#### Comparison

```{r model-extension-1-sum, warning=FALSE}
loo_result[2]
```

#### Diagnostics

```{r model-extension-1-dig, warning=FALSE}
loo_result[1]
```

### Two variables {.tabset}

```{r model-extension-2, warning=FALSE, class.source = 'fold-show', message=FALSE}
loo_result <- loo(
  # Benchmark model(s)
  reuse.with(),
  reuse.with("scenario"),
  reuse.with("workplace_peer_review"),
  reuse.with("mo(education_level)", edlvl_prior),
  reuse.with("workplace_coding_standards"),
  
  # New model(s)
  reuse.with(c("scenario", "workplace_peer_review")),
  reuse.with(c("scenario", "mo(education_level)"), edlvl_prior),
  reuse.with(c("scenario", "workplace_coding_standards")),
  
  reuse.with(c("workplace_peer_review", "mo(education_level)"), edlvl_prior),
  reuse.with(c("workplace_peer_review", "workplace_coding_standards")),
  
  reuse.with(c("workplace_coding_standards", "mo(education_level)"), edlvl_prior)
)
```

#### Comparison

```{r model-extension-2-sum, warning=FALSE}
loo_result[2]
```

#### Diagnostics

```{r model-extension-2-dig, warning=FALSE}
loo_result[1]
```

### Three variables {.tabset}

```{r model-extension-3, warning=FALSE, class.source = 'fold-show', message=FALSE}
loo_result <- loo(
  # Benchmark model(s)
  reuse.with(),
  reuse.with("scenario"),
  reuse.with("workplace_peer_review"),
  reuse.with("mo(education_level)", edlvl_prior),
  reuse.with("workplace_coding_standards"),
  reuse.with(c("scenario", "workplace_coding_standards")),
  
  # New model(s)
  reuse.with(c("scenario", "workplace_coding_standards", "mo(education_level)"), edlvl_prior)
)
```

#### Comparison

```{r model-extension-3-sum, warning=FALSE}
loo_result[2]
```

#### Diagnostics

```{r model-extension-3-dig, warning=FALSE}
loo_result[1]
```

## Candidate models  {.tabset}
We pick some of our top performing models as candidates and inspect them closer.

The candidate models are named and listed in order of complexity.

### Reuse0 {.tabset}

```{r reuse0, class.source = 'fold-show', warning=FALSE, message=FALSE}
reuse0 <- brm(
  "mvbind(
    reused_logic_validation,
    reused_logic_constructor
  ) ~ 1 + high_debt_version + (1 |c| session)",
  prior = c(
    prior(normal(0, 1), class = "b"),
    prior(normal(0, 1), class = "Intercept"),
    prior(exponential(1), class = "sd", resp = reusedlogicconstructor),
    prior(exponential(1), class = "sd", resp = reusedlogicvalidation),
    prior(lkj(2), class = "L")
  ),
  family = bernoulli(),
  data = as.data.frame(d.both_completed),
  control = list(adapt_delta = 0.95),
  file = "fits/reuse0",
  file_refit = "on_change",
  seed = 20210421
)
```

#### Summary

```{r reuse0-sum}
summary(reuse0)
```

#### Random effects

```{r reuse0-raneff}
ranef(reuse0)
```

#### Sampling plots

```{r reuse0-plot}
plot(reuse0, ask = FALSE)
```

#### Posterior predictive check

##### Constructor reuse

```{r reuse0-pp-constructor}
pp_check(reuse0, type = "bars", nsamples = 200, resp = "reusedlogicconstructor")
```

##### Validation reuse

```{r reuse0-pp-validation}
pp_check(reuse0, type = "bars", nsamples = 200, resp = "reusedlogicvalidation")
```

### Reuse1 {.tabset}

```{r reuse1, class.source = 'fold-show', warning=FALSE, message=FALSE}
reuse1 <- brm(
  "mvbind(
    reused_logic_validation,
    reused_logic_constructor
  ) ~ 1 + high_debt_version + scenario + (1 |c| session)",
  prior = c(
    prior(normal(0, 1), class = "b"),
    prior(normal(0, 1), class = "Intercept"),
    prior(exponential(1), class = "sd", resp = reusedlogicconstructor),
    prior(exponential(1), class = "sd", resp = reusedlogicvalidation),
    prior(lkj(2), class = "L")
  ),
  family = bernoulli(),
  data = as.data.frame(d.both_completed),
  control = list(adapt_delta = 0.95),
  file = "fits/reuse1",
  file_refit = "on_change",
  seed = 20210421
)
```

#### Summary

```{r reuse1-sum}
summary(reuse1)
```

#### Random effects

```{r reuse1-raneff}
ranef(reuse1)
```

#### Sampling plots

```{r reuse1-plot}
plot(reuse1, ask = FALSE)
```

#### Posterior predictive check

##### Constructor reuse

```{r reuse1-pp-constructor}
pp_check(reuse1, type = "bars", nsamples = 200, resp = "reusedlogicconstructor")
```

##### Validation reuse

```{r reuse1-pp-validation}
pp_check(reuse1, type = "bars", nsamples = 200, resp = "reusedlogicvalidation")
```

### Reuse2 {.tabset}

```{r reuse2, class.source = 'fold-show', warning=FALSE, message=FALSE}
reuse2 <- brm(
  "mvbind(
    reused_logic_validation,
    reused_logic_constructor
  ) ~ 1 + high_debt_version + scenario + workplace_coding_standards + (1 |c| session)",
  prior = c(
    prior(normal(0, 1), class = "b"),
    prior(normal(0, 1), class = "Intercept"),
    prior(exponential(1), class = "sd", resp = reusedlogicconstructor),
    prior(exponential(1), class = "sd", resp = reusedlogicvalidation),
    prior(lkj(2), class = "L")
  ),
  family = bernoulli(),
  data = as.data.frame(d.both_completed),
  control = list(adapt_delta = 0.95),
  file = "fits/reuse2",
  file_refit = "on_change",
  seed = 20210421
)
```

#### Summary

```{r reuse2-sum}
summary(reuse2)
```

#### Random effects

```{r reuse2-raneff}
ranef(reuse2)
```

#### Sampling plots

```{r reuse2-plot}
plot(reuse2, ask = FALSE)
```

#### Posterior predictive check

##### Constructor reuse

```{r reuse2-pp-constructor}
pp_check(reuse2, type = "bars", nsamples = 200, resp = "reusedlogicconstructor")
```

##### Validation reuse

```{r reuse2-pp-validation}
pp_check(reuse2, type = "bars", nsamples = 200, resp = "reusedlogicvalidation")
```

## Final model 
All candidate models look nice, none is significantly better than the others, we will proceed the simplest model: `reuse0`

### Variations {.tabset}
We will try a few different variations of the selected candidate model.

#### All data points {.tabset}

Some participants did only complete one scenario. Those has been excluded from the initial dataset to improve sampling of the models. We do however want to use all data we can and will therefore try to fit the model with the complete dataset.

```{r variation.all, class.source = 'fold-show', warning=FALSE, message=FALSE}
reuse0.all <- brm(
  "mvbind(
    reused_logic_validation,
    reused_logic_constructor
  ) ~ 1 + high_debt_version + (1 |c| session)",
  prior = c(
    prior(normal(0, 1), class = "b"),
    prior(normal(0, 1), class = "Intercept"),
    prior(exponential(1), class = "sd", resp = reusedlogicconstructor),
    prior(exponential(1), class = "sd", resp = reusedlogicvalidation),
    prior(lkj(2), class = "L")
  ),
  family = bernoulli(),
  data = as.data.frame(d.completed),
  control = list(adapt_delta = 0.95),
  file = "fits/reuse0.all",
  file_refit = "on_change",
  seed = 20210421
)
```

##### Summary

```{r variation.all-sum, warning=FALSE}
summary(reuse0.all)
```

##### Random effects

```{r variation.all-raneff, warning=FALSE}
ranef(reuse0.all)
```

##### Sampling plots

```{r variation.all-plot, message=FALSE, warning=FALSE}
plot(reuse0.all, ask = FALSE)
```

##### Posterior predictive check

###### Constructor reuse

```{r variation.all-pp-constructor, warning=FALSE, message=FALSE}
pp_check(reuse0.all, type = "bars", nsamples = 200, resp = "reusedlogicconstructor")
```

###### Validation reuse

```{r variation.all-pp-validation, warning=FALSE, message=FALSE}
pp_check(reuse0.all, type = "bars", nsamples = 200, resp = "reusedlogicvalidation")
```


#### With experience predictor {.tabset}

As including all data points didn't harm the model we will create this variant with all data points as well.

This variation includes `work_experience_programming.s` predictors as it can give further insight into how experience play a factor in the effect we try to measure. This is especially important as our sampling shewed towards containing less experienced developer than the population at large.

```{r variation.all.exp, class.source = 'fold-show', message=FALSE, warning=FALSE}
reuse0.all.exp <- brm(
  "mvbind(
    reused_logic_validation,
    reused_logic_constructor
  ) ~ 1 + high_debt_version + work_experience_programming.s + (1 |c| session)",
  prior = c(
    prior(normal(0, 1), class = "b"),
    prior(normal(0, 1), class = "Intercept"),
    prior(exponential(1), class = "sd", resp = reusedlogicconstructor),
    prior(exponential(1), class = "sd", resp = reusedlogicvalidation),
    prior(lkj(2), class = "L")
  ),
  family = bernoulli(),
  data = as.data.frame(d.completed),
  control = list(adapt_delta = 0.95),
  file = "fits/reuse0.all.exp",
  file_refit = "on_change",
  seed = 20210421
)
```

##### Summary

```{r variation.all.exp-sum}
summary(reuse0.all.exp)
```

##### Random effects

```{r variation.all.exp-raneff}
ranef(reuse0.all.exp)
```

##### Loo comparison

```{r variation.all.exp-loo, warning=FALSE}
loo(
  reuse0.all,
  reuse0.all.exp
)
```

##### Sampling plots

```{r variation.all.exp-plot}
plot(reuse0.all.exp, ask = FALSE)
```

##### Posterior predictive check

###### Constructor reuse

```{r variation.all.exp-pp-constructor, warning=FALSE, message=FALSE}
pp_check(reuse0.all.exp, type = "bars", nsamples = 200, resp = "reusedlogicconstructor")
```

###### Validation reuse

```{r variation.all.exp-pp-validation, warning=FALSE, message=FALSE}
pp_check(reuse0.all.exp, type = "bars", nsamples = 200, resp = "reusedlogicvalidation")
```

### Final model
* Fitting the model to all data point did not significantly damage the model and will be used as is a more fair representation of reality.
* Adding the experience predictors did not significantly damage the model and will be used as it provides useful insight.

This means that our final model, with all data points and experience predictors, is `time0.all.exp`

## Interpreting the model
To begin interpreting the model we look at how it's parameters were estimated. As our research is focused on how the outcome of the model is effected we will mainly analyze the $\beta$ parameters.

### $\beta$ parameters

#### Constructor reuse

```{r interpret-beta-plot-constructor, warning=FALSE, message=FALSE}
mcmc_areas(reuse0.all.exp, pars = c("b_reusedlogicconstructor_high_debt_versionfalse", "b_reusedlogicconstructor_work_experience_programming.s"), prob = 0.95) + scale_y_discrete() +
  scale_y_discrete(labels=c("High debt version: false", "Professional programming experience")) +
  ggtitle("Beta parameters densities in reuse constructor model", subtitle = "Shaded region marks 95% of the density. Line marks the median")
```

#### Validation reuse

```{r interpret-beta-plot-validation, warning=FALSE, message=FALSE}
mcmc_areas(reuse0.all.exp, pars = c("b_reusedlogicvalidation_high_debt_versionfalse", "b_reusedlogicvalidation_work_experience_programming.s"), prob = 0.95) + scale_y_discrete() +
  scale_y_discrete(labels=c("High debt version: false", "Professional programming experience")) +
  ggtitle("Beta parameters densities in reuse validation model", subtitle = "Shaded region marks 95% of the density. Line marks the median")
```

### Effects sizes

When we look at effect size we will look at both outcomes combined
```{r effect-size, fig.width=8}
scale_programming_experience <- function(x) {
  (x - mean(d.completed$work_experience_programming))/ sd(d.completed$work_experience_programming)
}
unscale_programming_experience <- function(x) {
  x * sd(d.completed$work_experience_programming) + mean(d.completed$work_experience_programming)
}

post_settings <- expand.grid(
  high_debt_version = c("false", "true"),
  session = NA,
  work_experience_programming.s = sapply(c(0, 3, 10, 25, 40), scale_programming_experience)
)

post <- posterior_predict(reuse0.all.exp, newdata = post_settings) %>%
  melt(value.name = "estimate", varnames = c("sample_number", "settings_id", "model")) %>%
  left_join(
    rowid_to_column(post_settings, var= "settings_id"),
    by = "settings_id"
  ) %>%
  mutate(work_experience_programming = unscale_programming_experience(work_experience_programming.s)) %>%
  select(
    estimate,
    high_debt_version,
    work_experience_programming,
    model
  )

post %>%
  mutate_at("high_debt_version", 
            function(x) case_when(
              x == "false" ~ "Low debt",
              x == "true" ~ "High debt"
            )) %>%
  mutate_at("estimate", 
            function(x) case_when(
              x == 0 ~ "Reused",
              x == 1 ~ "Duplicated"
            )) %>%
  ggplot(aes(high_debt_version)) +
  geom_bar(aes(fill = estimate), position = "fill") + 
  facet_grid(rows = vars(work_experience_programming)) +
  scale_fill_manual("Legend", values = c("lightblue", "darkblue")) +
  labs(title = "Reuse / programming experience") +
  xlab("Debt version") +
  ylab("Ratio of reuse")

```

We can then proceed to calculate some likelihoods of duplication:


```{r es-a, class.source = 'fold-show'}
d <- post
d.high <- d %>% filter(high_debt_version == "true") %>% pull(estimate)
d.low <- d %>% filter(high_debt_version == "false") %>% pull(estimate)
x <- sum(d.high) / sum(d.low)
x
```

Given all the simulated cases we find that developers are `r scales::label_percent()(x - 1)` more likely to duplicate code in our high debt scenario.

```{r es-10, class.source = 'fold-show'}
d <- post %>% filter(work_experience_programming == 10)
d.high <- d %>% filter(high_debt_version == "true") %>% pull(estimate)
d.low <- d %>% filter(high_debt_version == "false") %>% pull(estimate)
x <- sum(d.high) / sum(d.low)
x
```

Considering developers with 10 years of professional programming experience we find that they are `r scales::label_percent()(x - 1)` more likely to duplicate code in our high debt scenario.

```{r es-0, class.source = 'fold-show'}
d <- post %>% filter(work_experience_programming == 0)
d.high <- d %>% filter(high_debt_version == "true") %>% pull(estimate)
d.low <- d %>% filter(high_debt_version == "false") %>% pull(estimate)
x <- sum(d.high) / sum(d.low)
x
```

Considering developers with no professional programming experience we find that they are `r scales::label_percent()(x - 1)` more likely to duplicate code in our high debt scenario.


```{r es-25, class.source = 'fold-show'}
d <- post %>% filter(work_experience_programming == 25)
d.high <- d %>% filter(high_debt_version == "true") %>% pull(estimate)
d.low <- d %>% filter(high_debt_version == "false") %>% pull(estimate)
x <- sum(d.high) / sum(d.low)
x
```

Considering developers with 25 years of professional programming experience we find that they are `r scales::label_percent()(x - 1)` more likely to duplicate code in our high debt scenario.