## -Dependent (or response or target) variable vs explanatory (or independent or predictor) variable

### statistics

##### Description

Chapter 10 – Regression Analysis: Estimating Relationships

-Dependent (or response or target) variable vs explanatory (or independent or predictor) variable

-difference between simple and multiple regression

-Creating a scatterplot in Excel

-keep track of what is your X and what is your Y

-What outliers are and how to deal with them

-Correlation

-What does correlation tell us? Strength and direction of the relationship.

-Range from -1 to 1

-Finding correlation in Excel: =CORREL() or regression output

-Simple regression

-Method -> Least Squares Estimation: minimizes the sum of the squared residuals.

-This is the regression line Excel provides

-Finding the regression line in Excel

-Formulas: =SLOPE() and =INTERCEPT()

-Known Y’s then Known X’s

-Coefficients on the regression output

-Percentage of variation explained R^2

-Is the percentage of variation of the dependent variable explained by the regression.

-Remember: the coefficient for X in simple regression means that for every unit change in X, Y increases by that amount.

-Finding in Excel: =RSQ(), square the Correlation or look on regression output

-Multiple regression

-The coefficients in multiple regression is the expected change in Y when this particular X increases by one unit and all the other Xs in the equations remain constant.

-Read much the same way as simple regression.

-Use the Adjusted R^2 with multiple regression.

-Dummy Variables

-A variable that is either a 1 or a 0 that represents if the observation is in a particular category.

-You will need one fewer dummy variable than you have categories

-Ex. Gender: one dummy variable where a 1 means the person is female

-Ex. Quarterly observations: three dummy variables where a 1 means that observation was taken during that quarter.

-The coefficient of a dummy variable is the amount that being in a given category adds to the outcome.

Chapter 11 –Regression Analysis: Statistical Inference

-Regression assumptions

- 1. There is a population regression line. It joins the means of the dependent variable for all values of the explanatory variables. For any fixed values of the explanatory variables, the mean of the errors is zero.

- 2. For any values of the explanatory variables, the variance (or standard deviation) of the dependent variable is a constant, the same for all such values.

-The fan shaped scatterplot means the data violates this assumption

- 3. For any values of the explanatory variables, the dependent variable is normally distributed.

- 4. The errors are probabilistically independent.

-Check using the runs test

-Regression coefficients

-t-test and p-value

-all coefficients have two values associated with them, a t-stat and a p-value

-the null hypothesis for the t-test is “The coefficient for this variable is 0” - bad

-the alternative hypothesis for the t-test is “The coefficient for this variable is not 0” – good

-the p-value is the probability that the null hypothesis is true; that the actual coefficient is 0.

-if the p-value is small, below .05, we can reject the null hypothesis and adopt the alternative