Advanced Statistics and Analysis using R Blog

Posts

Showing posts from February, 2025

Module # 7- Regression models

February 26, 2025

1. In this assignment's segment, we will use the following regression equation Y = a + bX +e Where: Y is the value of the Dependent variable (Y) , what is being predicted or explained a or Alpha, a constant; equals the value of Y when the value of X=0 b or Beta, the coefficient of X; the slope of the regression line; how much Y changes for each one-unit change in X. X is the value of the Independent variable (X), what is predicting or explaining the value of Y e is the error term; the error in predicting the value of Y, given the value of X (it is not displayed in most regression equations). A reminder about lm() Function. lm([target variable] ~ [predictor variables], data = [data source]) 1.1 The data in this assignment: x <- c ( 16 , 17 , 13 , 18 , 12 , 14 , 19 , 11 , 11 , 10 ) y <- c ( 63 , 81 , 56 , 91 , 47 , 57 , 76 , 72 , 62 , 48 ) 1.1 Define the relationship mode...

Module # 6 assignment- Distributions

February 22, 2025

A. Consider a population consisting of the following values, which represents the number of ice cream purchases during the academic year for each of the five housemates. 8, 14, 16, 10, 11 a. Compute the mean of this population. 11.8 b. Select a random sample of size 2 out of the five members. See the example used in the Power-point presentation slide # 13. c. Compute the mean and standard deviation of your sample. mean- 15 sd- 1.41 d. Compare the Mean and Standard deviation of your sample to the entire population of this set (8,14, 16, 10, 11). The mean for the population was 11.8 with a standard deviation of 2.86. The mean for the sample was 15, with a standard deviation of 1.41. In our sample, the mean is noticeably higher than the population mean, which indicates that this particular sample may not be fully representative of the entire set. Additionally, the lower sample standard deviation suggests that the values in the sample are closer together ...

Module #5- Correlation Analysis

February 13, 2025

This week was particularly interesting to me. After completing Statistics 1 and 2, incorporating the programming language R into mathematical concepts was a new and exciting experience. Learning the syntax and methods of R and then applying them to statistical problems made the subject feel more dynamic. It was really cool to see how programming and math work together, allowing for more efficient calculations, data analysis, and visualization. This hands-on approach gave me a deeper understanding of the concepts and made the learning process even more engaging. To begin with the homework: Question 1: The director of manufacturing at a cookies company needs to determine whether a new machine is able to produce a particular type of cookies according to the manufacturer's specifications, which indicate that cookies should have a mean of 70 and standard deviation of 3.5 pounds. A sample of 49 cookies reveals a sample mean breaking strength of 69.1 pounds. A. State the null...

Module # 4 Probability theory

February 04, 2025

Feb 4, 2025 In this week, we are getting into probability theory. I have taken Statistics previously, so it was fun to jog my memory. The problems this week I found to be slightly challenging, having to apply new concepts within R. Bayes' theorem was particularly interesting—it’s useful not just in theoretical applications but also in everyday situations, like interpreting predictions or making decisions based on uncertain information. I’m looking forward to mastering them further and using R more effectively to solve complex problem. To begin with the homework: A. Based on Table 1 What is the probability of: B B1 A 10 20 A1 20 40 A1 . Event A - 0.33 A2 . Event B - 0.33 A3. Event A or B- 0.56 A4 . P(A or B) = P(A) + P(B)- 0.67 B. Jane is getting married tomorrow, at an outdoor ceremony in the desert. In recent years, it has rained only 5 days each year. Unfortunately, the weatherman has predicted rain for tomorrow. When it actuall...

Module 3- Descriptive Statistics

February 02, 2025

2/2/25 It is week 3, and this week we are beginning to use R for descriptive statistics. The first question is pasted below: The following are two sets of data - each consist of 7 observations (n=7). Set#1: 10, 2, 3, 2, 4, 2, 5 Set#2: 20, 12, 13, 12, 14, 12, 15 1. For each set, compute the mean, median, and mode under Central Tendency Set 1: Mean- 4 Median- 3 Mode- 2 Set 2: Mean- 14 Median- 3 Mode- 2 2. For each set, compute the range, interquartile, variance, standard deviation under Variation Set 1: Range- (2, 10) Interquartile- 2.5 Variance- 8.33 Standard Deviation- 2.89 Set 2: Range- (12, 20) Interquartile- 2.5 Variance- 8.33 Standard Deviation- 2.89 3. Compare your results between set#1 vs. set #2 by discussing the differences between the two sets As seen by the data above, Set#1 has a higher Coefficient of Variation (CV) ...