Module # 7- Regression models

 1. In this assignment's segment, we will use the following regression equation  Y = a + bX +e

Where:
Y is the value of the Dependent variable (Y), what is being predicted or explained

a or Alpha, a constant; equals the value of Y when the value of X=0

or Beta, the coefficient of X; the slope of the regression line; how much Y changes for each one-unit change in X.

X is the value of the Independent variable (X), what is predicting or explaining the value of Y

e is the error term; the error in predicting the value of Y, given the value of X (it is not displayed in most regression equations).

A reminder about lm() Function.  

lm([target variable] ~ [predictor variables], data = [data source])

1.1  
The data in this assignment:

x <- c(16, 17, 13, 18, 12, 14, 19, 11, 11, 10)
y <- c(63, 81, 56, 91, 47, 57, 76, 72, 62, 48)

1.1 Define the relationship model between the predictor (x) nand the response (Y) variable:
1.2 Calculate the coefficients? 


2. The following question is posted by Chi Yau (Links to an external site.) the author of  R Tutorial With Bayesian Statistics Using Stan (Links to an external site.) and his blog posting regarding Regression analysis (Links to an external site.).

Problem - 

Apply the simple linear regression model (see the above formula) for the data set called "visit" (see below), and estimate the the discharge duration if the waiting time since the last eruption has been 80 minutes. Note: The full dataset is in R which can be accessed as data(faithful). 
> head(visit) 
  discharge  waiting 
1     3.600      79 
2     1.800      54 
3     3.333      74 
4     2.283      62 
5     4.533      85 
6     2.883      55 

Employ the following formula discharge ~ waiting and data=visit)

2.1 Define the relationship model between the predictor and the response variable

This sets up a linear regression model where eruptions (discharge duration) is the dependent variable, and waiting (time since the last eruption) is the independent variable.
2.2 Extract the parameters of the estimated regression equation with the coefficients function.

This means the estimated regression equation is: discharge = − 1.8740 + 0.0756 × waiting discharge=−1.8740+0.0756×waiting The intercept is -1.8740, and the slope is 0.0756.


2.3 Determine the fit of the eruption duration using the estimated regression equation.

Using the regression equation: discharge = − 1.8740 + ( 0.0756 × 80 ) = 4.176 discharge=−1.8740+(0.0756×80)=4.176 This means that if the waiting time is 80 minutes, the predicted discharge duration is 4.176 minutes.






3. We will use a very famous datasets in R called mtcars. This dateset was extracted from the 1974 Motor TrendUS magazine, and comprises fuel consumption and 10 aspects of automobile design and performance for 32 automobiles (1973--74 models).

This data frame contain 32 observations on 11 (numeric) variables.

[, 1]mpgMiles/(US) gallon
[, 2]cylNumber of cylinders
[, 3]dispDisplacement (cu.in.)
[, 4]hpGross horsepower
[, 5]dratRear axle ratio
[, 6]wtWeight (1000 lbs)
[, 7]qsec1/4 mile time
[, 8]vsEngine (0 = V-shaped, 1 = straight)
[, 9]amTransmission (0 = automatic, 1 = manual)
[,10]gearNumber of forward gears

To call mtcars data in R
R comes with several built-in data sets, which are generally used as demo data for playing with R functions. One of those datasets build in R is mtcars.
In this question, we will use 4 of the variables found in mtcars by using the following function

input <- mtcars[,c("mpg","disp","hp","wt")]
print(head(input))

3.1 Examine the relationship Multi Regression Model as stated above and its Coefficients using 4 different variables from mtcars (mpg, disp, hp and wt). 
Report on the result and explanation what does the multi regression model and coefficients tells about the data?   

input <- mtcars[,c("mpg","disp","hp","wt")]  
lm(formula = mpg ~ disp + hp + wt, data = input)


The multiple linear regression model used in this analysis examines the relationship between fuel efficiency (mpg) and three independent variables: engine displacement (disp), horsepower (hp), and vehicle weight (wt). The regression equation derived from the model is:

mpg=37.10550.000937×disp0.031157×hp3.800891×wt

Each coefficient represents the expected change in miles per gallon (mpg) for a one-unit increase in the respective predictor variable while holding the others constant. The intercept value (37.1055) suggests that if a car had zero displacement, horsepower, and weight—a purely theoretical scenario—the fuel efficiency would be 37.1 mpg. Among the predictors, weight (wt) has the strongest negative impact on mpg, with a coefficient of -3.8009, meaning that for every additional 1,000 pounds, fuel efficiency decreases by approximately 3.8 mpg. Similarly, horsepower (hp) also negatively affects fuel efficiency, with each additional unit reducing mpg by 0.031. Displacement (disp), however, is not a significant predictor in this model, as indicated by its high p-value (0.92851), suggesting that changes in engine size do not meaningfully contribute to variations in mpg when weight and horsepower are accounted for.

The overall fit of the model is strong, with an R-squared value of 0.8268, indicating that 82.7% of the variance in fuel efficiency is explained by the selected variables. The adjusted R-squared value (0.8083) confirms that the model maintains high explanatory power even after adjusting for the number of predictors. The residual standard error of 2.639 suggests that the model’s predictions are, on average, within approximately 2.64 mpg of the actual values. Additionally, the F-statistic (44.57) and its associated p-value (8.65e-11) confirm that the overall model is highly significant, meaning at least one of the predictors contributes meaningfully to explaining mpg.

So, overall, weight is the most influential factor affecting fuel efficiency, followed by horsepower, while displacement has little to no significant effect. This suggests that strategies for improving fuel efficiency should focus primarily on reducing vehicle weight and optimizing engine power rather than modifying engine displacement.

4. With the rmr data set, plot metabolic rate versus body weight. Fit a linear regression to the relation. According to the fitted model, what is the predicted metabolic rate for a body weight of 70 kg? 1305.39 

The data set rmr is R, make sure to install the book R package: ISwR. After installing the ISwR package, here is a simple illustration to the set of the problem.

library(ISwR)
plot(metabolic.rate~body.weight,data=rmr)















Comments

Popular posts from this blog

Final Project

Module #5- Correlation Analysis