Module #11 Assignment- model matrices

4/3/25 



As we can see with the results above, the additive linear model demonstrated a strong overall fit, with an R² value of 0.7566, indicating that approximately 75.7% of the variance in VAS scores is explained by the model. The adjusted R², which accounts for the number of predictors, was lower at 0.4969, reflecting the complexity of the model and its many subject-level terms. The F-statistic yielded a p-value of 0.022, confirming that the model is statistically significant overall. Notably, the treatment effect was substantial and negative, suggesting that the active drug significantly reduced pain levels. There was also considerable variation across subjects, which is expected in a crossover design. Although the period effect was dropped due to multicollinearity, the model still effectively captured the key treatment effect and subject differences.

This data contains additive effects on subjects, period and treatment. Compare the results with those with those obtained from t tests. 



Both the additive linear model and the paired t-test indicate a statistically significant reduction in VAS scores under active treatment. The estimated mean treatment effect was −42.875 in both approaches, with a p-value of 0.0056. While the paired t-test treats the subject effect implicitly through pairing, the additive model accounts for subject variability explicitly. However, the period effect could not be evaluated in the additive model due to multicollinearity. Overall, the results from both analyses are consistent, reinforcing the conclusion that the active treatment significantly reduces pain. 

The paired t-test is simpler and directly answers the core question. The additive model is more flexible and informative, especially useful when you're analyzing or reporting on individual subject differences. 



12.3. Consider the following definitions
a <- g1(2, 2, 8)
b <- g1(2, 4, 8)
x <-- 1:8
y <- c(1:4, 8:5)
z <- rnorm (8)

Note:
The rnorm() is a built-in R function that generates a vector of normally distributed random numbers. The rnorm() method takes a sample size as input and generates that many random numbers.

Your assignment
Generate the model matrices for models z ~ a*b, z ~ a:b, etc. In your blog posting, discuss the implications. Carry out the model fits and notice which models contain singularities. 
Hint: We are looking for:
model.matrix (~ a:b);   

lm (z ~ a:b)




 






In this analysis, three different model formulations were explored using two categorical variables, a and b, and a numeric response variable z. A full factorial model (z ~ a * b), an additive model (z ~ a + b), and an interaction-only model (z ~ a:b) were utilized. The full factorial model, which included both main effects and the interaction term, fit without issue and captured the complete structure of the design. However, its adjusted R² was negative, suggesting overfitting due to the small sample size (n = 8). The additive model offered a simpler, more stable fit, also free of singularities, and is likely more appropriate when interaction effects are not strongly suspected or when data is limited. The interaction-only model, on the other hand, produced a singularity: the term a2:b2 was dropped because it was linearly dependent on the other terms and the intercept. These results highlight an important implication: when modeling interactions, it’s generally essential to include the associated main effects to ensure the results can be interpreted. Especially with smaller datasets, simpler models like the additive formulation may provide more stable and reliable results without sacrificing too much explanatory power.



Working through this analysis was definitely a challenge, especially since it was my first time dealing with different linear model structures and the concept of model singularities. Understanding how model formulas like a * ba + b, and a:bbehave in R—and what they actually represent—took some time to grasp. It was especially tricky to interpret what happens behind the scenes when interaction terms are included without main effects, and why singularities occur in those situations. That said, this exercise was very informative. Overall, I feel more confident now in how to approach model building in R, and I’m excited to continue learning and applying these concepts in more complex settings!

Comments

Popular posts from this blog

Module # 7- Regression models

Final Project

Module #5- Correlation Analysis