Module #10 Assignment - ANOVA (analysis of variance) and Regression coefficients

 3/28/25

This is from the Multiple Linear Regression chapter 11 of "Introductory Statistics with R", pg. 185-194 

I revised this question, so please follow my description only. Conduct ANOVA (analysis of variance) and Regression coefficients to the data from cystfibr : data (" cystfibr ") database. Note that the dataset is part of the ISwR package in R. 

You can choose any variable you like. in your report, you need to state the result of Coefficients (intercept) to any variables you like both under ANOVA and multivariate analysis. I am specifically looking at your interpretation of R results. 

Extra clue:
The model code:
i. lm(formula = cystfiber$spemax ~ age + weight + bmp + fev1, data=cystfiber)
ii. anova(lm(cystfibr$spemax ~ age + weight + bmp + fev1, data=cystfiber))




Interpretation of R results:

Regression Coefficients: The multiple linear regression model examines how age, weight, bmp (body mass proportion), and fev1 (forced expiratory volume) predict pemax (peak expiratory pressure) in cystic fibrosis patients. The model intercept is estimated at 179.30, meaning that when all predictor variables are zero, the predicted pemax is 179.30. While not directly interpretable in context, the intercept serves as the baseline for the regression line.

  • Weight shows a significant positive relationship with pemax (Estimate = 2.69p = 0.0329), suggesting that for every additional kg of body weight, pemax increases by approximately 2.69 units, holding other variables constant.

  • BMP (Estimate = -2.07p = 0.0204) has a significant negative association with pemax, indicating that higher body mass proportions are associated with lower pemax values.

  • FEV1 is also significant (p = 0.0470), with each additional unit of fev1 increasing pemax by about 1.09 units.

  • Age, despite having a negative estimate (-3.42), is not statistically significant (p = 0.314), implying no strong evidence that age alone impacts pemax when controlling for the other variables.

Overall model fit: The model explains 59.2% of the variance in pemax (R² = 0.5918), and the overall regression is significant (p = 0.0009), indicating that the predictors collectively have a meaningful relationship with the outcome.


ANOVA Results: The ANOVA table breaks down the contribution of each individual variable to the total variation in pemax. According to the F-values and p-values:

  • Age is the most significant predictor (F = 18.44, p < 0.001), despite being non-significant in the regression table. 

  • FEV1 is also statistically significant (F = 4.48, p = 0.047), confirming its individual contribution.

  • BMP is on the borderline of significance (p = 0.0501), while weight is not significant in ANOVA (p = 0.2038) but was significant in the regression.

This shows an important difference between what regression coefficients and ANOVA convey. In a regression model, the coefficients show the effect of each variable while holding all the others constant — partial effects. In contrast, ANOVA looks at how much each variable explains the outcome by itself, without adjusting for the other variables —individual explanatory power.

So, for example, age wasn’t statistically significant in the regression but in ANOVA, it showed a strong effect on its own. This might mean that age overlaps with or shares influence with another variable, like fev1, in explaining pemax.


I really enjoyed this week's content. I did a biology project where I needed to do an ANOVA test years ago. It was very interesting to utilize R to preform ANOVA tests and other statistical computations. Revisiting this type of analysis with a more advanced tool like R made the process feel more intuitive and powerful. It also helped solidify my understanding of how statistical models work in real-world research settings. I'm excited to continue exploring R and applying it to more complex datasets in the future.



Comments

Popular posts from this blog

Module # 7- Regression models

Final Project

Module #5- Correlation Analysis