Module #5- Correlation Analysis

This week was particularly interesting to me. After completing Statistics 1 and 2, incorporating the programming language R into mathematical concepts was a new and exciting experience. Learning the syntax and methods of R and then applying them to statistical problems made the subject feel more dynamic. It was really cool to see how programming and math work together, allowing for more efficient calculations, data analysis, and visualization. This hands-on approach gave me a deeper understanding of the concepts and made the learning process even more engaging. To begin with the homework: 

Question 1:

The director of manufacturing at a cookies company needs to determine whether a new machine is able to produce a particular type of cookies according to the manufacturer's specifications, which indicate that cookies should have a mean of 70 and standard deviation of 3.5 pounds. A sample of 49 cookies reveals a sample mean breaking strength of 69.1 pounds. 
A. State the null and alternative hypothesis _______

Null: The mean is 70 pounds. The specifications from the manufacturer are correct. 
Alt Hypothesis: The mean is not 70 pounds. The specifications from the manufacturer are not correct. 

B. Is there evidence that the machine is not meeting the manufacturer's specifications for average strength? Use a 0.05 level of significance _______ 



Fail to reject the null hypothesis: There is no evidence that the machine is not meeting the manufacturer's specifications for average strength at the 0.05 level of significance. The critical value for this is +/- 1.96.
C. Compute the p value and interpret its meaning _______

The p value is 0.072. 
If the p-value is less than the significance level ( α=0.05), we reject the null hypothesis. If the p-value is greater than or equal to 0.05, the output will indicate that there is no evidence the machine is not meeting the manufacturer's specifications. Hence, we fail to reject the null hypothesis: There is no evidence that the machine is not meeting the manufacturer's specifications.
D. What would be your answer in (B) if the standard deviation were specified as 1.75 pounds?




After changing the standard deviation to 1.75 pounds, the z value is -3.6. When comparing to the critical value of +/- 1.96, which remains the same, it is found that the null should be rejected. So, at the 0.05 level of significance, there is evidence that the machine is not meeting the manufacturer's specifications for average strength when the standard deviation is 1.75 pounds.
E. What would be your answer in (B) if the sample mean were 69 pounds and the standard deviation is 3.5 pounds? ______



Here we have a z value of -2 after changing those values. At the 0.05 level of significance, there is evidence that the machine is not meeting the manufacturer's specifications for average strength when the sample mean is 69 pounds and the standard deviation is 3.5 pounds.


Question 2:
If x̅ = 85, σ = standard deviation = 8, and n=64, set up 95% confidence interval estimate of the population mean μ. 
 

The 95% confidence interval for this is (83.04, 86.96).83.04 < μ <  86.96

Question 3, using Correlation Analysis
The correlation coefficient analysis formula:

(r) =[ nΣxy – (Σx)(Σy) / Sqrt([nΣx2 – (Σx)2][nΣy2 – (Σy)2])]

r: The correlation coefficient is denoted by the letter r.

n: Number of values. If we had five people we were calculating the correlation coefficient for, the value of n would be 5.

x: This is the first data variable.

y: This is the second data variable.

Σ: The Sigma symbol (Greek) tells us to calculate the “sum of” whatever is tagged next to it.


Using the dataset downloadable below (i.e. the data you will use to create the vectors are located in the download link below), complete these tasks in Rstudio:

x1 < - c(your data) e.g. girls_goals <- c(data1, data2, data3)
x2 <-  c(your data) e.g. girls_time<- c(data1, data2, data3)
y1<- c(your data) e.g. boys_goals ..........
y2<- c(your data) e.g. boys_time............
Note: from past couple of classes/assignments that x = c(data1, data2, data3) creates a vector of 3 data points and parse it into variable x.
Merge all in a dataframe
df<-data.frame(x1, x2, y1, y2)
Plot:
cor(df)
cor(df,method="pearson") #As pearson correlation
cor(df, method="spearman") #As spearman correlation



A. Spearman correlation- 1. So, there is a strong, perfect correlation between the values. 
B. Pearson correlation: This shows a very strong, positive correlation between the values. Although they are not all perfectly correlated, the results are showing a coefficient ranging between 0.98-1, emphasizing the strong and positive nature of the values. 
C. The plot for correlation: 



The code for the correlogram is seen below. 
















Comments

Popular posts from this blog

Module # 7- Regression models

Final Project