Posts

Final Project

Image
 4/24/25 Problem Description: Florida is one of the world’s premier tourism destinations, attracting well over 100 million visitors annually. Yet visitor counts vary dramatically by season—peaking in winter months when travelers escape colder climates, and again in summer when families take vacations between school sessions. This volatility poses operational and strategic challenges for a wide range of stakeholders, from hotel and theme-park operators to transportation agencies and local governments. In an increasingly digital world, consumers signal their travel intentions earlier—often via online search engines. Google Trends data therefore offers a real-time, high-frequency proxy for consumer interest in “Florida vacation” and “beach vacation.” This project seeks to harness those digital search signals to build a timelier, more responsive forecasting model of weekly visitor volumes in Florida. By demonstrating that search interest—especially in “beach” vacations—correlates stron...

Module # 12 assignment- Time Series and Exponential Smoothing Model

Image
4/10/25  The table below represents charges for a student credit card.   a. Construct a time series plot using R. b. Employ E xponential Smoothing Model  as outlined in  Avril Voghlan Links to an external site. 's notes and report the statistical outcome c.  Provide a discussion on time series and Exponential Smoothing Model results that you obtained .   Month 2012 2013 Jan 31.9 39.4 Feb 27 36.2 March 31.3 40.5 Apr 31 44.6 May 39.4 46.8 Jun 40.7 44.7 Jul 42.3 52.2 Aug 49.5 54 Sep 45 48.8 Oct 50 55.8 Nov 50.9 58.7 Dec 58.5 63.4 A.  The time series plot illustrates a steady increase in student credit card charges from January to December for both 2012 and 2013, with 2013 consistently showing higher monthly charges than 2012. This suggests a year-over-year rise in credit card usage among students. Notably, the most significant increases occurred in February and July, while December recorded the highest charges in both years—$58.5 in 2012 and $63.4 in...

Module #11 Assignment- model matrices

Image
4/3/25  As we can see with the results above, the additive linear model demonstrated a strong overall fit, with an R² value of 0.7566, indicating that approximately 75.7% of the variance in VAS scores is explained by the model. The adjusted R², which accounts for the number of predictors, was lower at 0.4969, reflecting the complexity of the model and its many subject-level terms. The F-statistic yielded a p-value of 0.022, confirming that the model is statistically significant overall. Notably, the treatment effect was substantial and negative, suggesting that the active drug significantly reduced pain levels. There was also considerable variation across subjects, which is expected in a crossover design. Although the period effect was dropped due to multicollinearity, the model still effectively captured the key treatment effect and subject differences. This data contains additive effects on subjects, period and treatment. Compare the results with those with those obtained from t ...

Module #10 Assignment - ANOVA (analysis of variance) and Regression coefficients

Image
 3/28/25 This is from the Multiple Linear Regression chapter 11 of "Introductory Statistics with R", pg. 185-194  I revised this question, so please follow my description only. Conduct ANOVA (analysis of variance) and Regression coefficients to the data from cystfibr : data (" cystfibr ") database. Note that the dataset is part of the ISwR package in R.  You can choose any variable you like. in your report, you need to state the result of Coefficients (intercept) to any variables you like both under ANOVA and multivariate analysis. I am specifically looking at your interpretation of R results.  Extra clue: The model code: i. lm(formula = cystfiber$spemax ~ age + weight + bmp + fev1, data=cystfiber) ii. anova(lm(cystfibr$spemax ~ age + weight + bmp + fev1, data=cystfiber)) Interpretation of R results: Regression Coefficients:  The multiple linear regression model examines how age, weight, bmp (body mass proportion), and fev1 (forced expiratory volume) predict pem...

Module # 9 assignment- prop tables

Image
 3/11/25 In this assignment, you have two questions. 1. Your data.frame is > assignment_data <- data.frame( Country = c("France","Spain","Germany","Spain","Germany", "France","Spain","France","Germany","France"), age = c(44,27,30,38,40,35,52,48,45,37), salary = c(6000,5000,7000,4000,8000, 5500 ,   4500 ,   6000 ,   7500 ,   5000 ), Purchased=c("No","Yes","No","No","Yes", "Yes","No","Yes","No","Yes")) Generate simple table in R that consists of four columns: Country, age, salary and purchased. 2. Generate contingency table also known as r x c table using  mtcars  dataset i.e. data(mtcars) assignment9  < -  table ( mtcars$gear, mtcars$cyl, dnn= c ( "gears", "cylinders") 2.1 Add the  addmargins()  function to report on the sum totals of the rows and columns of ass...

Module # 8 Assignment- ANOVA test

Image
3/6/25 This week's work was both engaging and insightful. I particularly enjoyed the problems we worked on, as they were highly relevant and informative. The exercises provided a great opportunity to apply statistical concepts in a practical way, reinforcing my understanding of ANOVA tests, t-tests, and data structuring in R. Even though the ANOVA test didn’t show a significant difference, it was a good reminder that not every dataset will give clear results.  1.  A researcher is interested in the effects of drug against stress reaction. She gives a reaction time test to three different groups of subjects: one group that is under a great deal of stress, one group under a moderate amount of stress, and a third group that is under almost no stress. The subjects of the study were instructed to take the drug test during their next stress episode and to report their stress on a scale of 1 to 10 (10 being most pain). High Stress Moderate Stress Low Stress 10...

Module # 7- Regression models

Image
  1. In this assignment's segment, we will use the following regression equation    Y = a + bX +e Where: Y  is the value of the  Dependent variable (Y) , what is being predicted or explained a  or Alpha, a constant; equals the value of Y when the value of X=0 b  or Beta, the coefficient of X; the slope of the regression line; how much Y changes for each one-unit change in X. X  is the value of the Independent variable (X), what is predicting or explaining the value of Y e  is the error term; the error in predicting the value of Y, given the value of X (it is not displayed in most regression equations). A reminder about  lm() Function.   lm([target variable] ~ [predictor variables], data = [data source]) 1.1   The data in this assignment: x <- c ( 16 , 17 , 13 , 18 , 12 , 14 , 19 , 11 , 11 , 10 ) y <- c ( 63 , 81 , 56 , 91 , 47 , 57 , 76 , 72 , 62 , 48 ) 1.1 Define the relationship mode...