# Statistics Final Exam Review Essay Sample

Pages
Pages: Word count: Rewriting Possibility: % ()

Rejecting the null may be a mistake = p –value

ONE SAMPLE
3 formulas
T.Dist.rt (t, sample size – 1 “df”) -> alternative that mu is bigger than a 1 – T.Dist.rt (t, sample size – 1) -> mu is less than a T.Dist.2t(t,samplesize – 1) -> not equal to

p < significant level reject the null

NEVER accept null

TWO SAMPLE
directly get the p-value
chance that under the null hypthoesis, you have a difference in the sample mean that is as extreme or more as what you have now. If that probability is small, it is something in the nature not due to chance.

* Paired: T.Test (sample 1, sample 2, # of tails , 1)
* not equal to: number of tails = 2
* greater than or less than: number of tails = 1
* Type 1 = paired data (ex: every UNC mba student’s salary before they entered the program and salaries after graduate ) * salaries have a significant increase after mba?

* Independent: T.Test (sample 1, sample 2, # tails, 2)
* Type 2 = independent (ex: UNC mbas vs. DUKE mbas)

* Regression Coefficient:

* Null hypothesis: THIS regression coefficient = 0
* alternative hypothesis: THIS particular regression coefficient of interest is not 0 *
* (driver’s p-value and coefficient in ANOVA)

* THIS driver’s p-value is less than significant level, then the driver has a significant impact on the outcome. *
* ** for each individual driver

* Hypothesis test on a regression model as a whole:

* Null: All slope coefficients = 0 (R square being zero) * Alt: At least ONE slope coefficient is not equal to zero (R square greater than 0) *
* P-value is SIGNIFICANCE F on the ANOVA.

IF significance F < significant level, regression model is significant as a whole. *

* Hypothesis tests for full versus Partial regression models: *
* Partial model is worse than the full model
* 2 models on the same data where one model’s drivers are a subset of another model’s drivers. *
* full model – MUST HAVE MORE DRIVERS THAN THE OTHER AND MUST CONTAIN ALL THE DRIVERS IN THE PARTIAL MODEL AND SHOULD BE BASED ON SAME DATA *
* Partial f test:
* Null: partial model is as good as full model (R square full = R square partial) * Alt: Partial model is less explanatory than full (R square full > R square partial) *
more drivers always increases R square

* 1) First find Partial F value (formula)

* denominator:
* (1 – R square ) percentage not explained
* / (df) of residual : sample size – full model – 1
* ITS IN THE ANOVA OUTPUT DF full, residual and variable..pick RESIDUAL *
* Find p-value

* 2) F.Dist.rt(partial F, # variables removed, df residual) *

* Hypothesis tests for autocorrelation in residuals:

* DW statistic
* DW = SUMXMY2 (all but first residuals, all but last) / SUMSQ(all residuals) *

* t = observations
* k =
* DW < dL : positive autocorrelation

* CONFIDENCE INTERVAL FOR POPULATION MEAN

* simple sample
* with certain certainty, my population is between these two values. *
* MoE = critical t time standard error (estimator of error of sample mean) * SE = stdev.s/ SQRT

* critical t = T.INV.2T (significance level, df)
* SE = stdev.s/sqrt
* Exact moe = critical t * se
* Exact: sample mean + – moe

* approximate 95% confidence: Use 2 for MOE and it is 2 * se *

* Confidence interval for population proportion

* Indicator variables
* 1 = falls within category
* 0 = does not
* Approximate & conservative 95% = 1 / SQRT (samplesize( * Confidence interval: sample prop + – MoE

* moE 10% how many students I need to ask?
* .10 = 1/SQRT(x)

* Confidence interval for two populations’ mean difference *
* Approximate 95% = 2* SQRT (stdevs.1…)
* For INDEPENDENT

Confidence interval for paired 2 populations’ mean difference *
* for PAIRED data:

* Take difference between each pair and throw away original data *

* Normal distribution estimates:
* 2/3 of the chance that it falls between -1 +1 standard deviation * 95% that it stands between -2 and +2

* Interpretation of regression output elements
* Lags in time series regression
* Standard format of a forecast formula
* Cause, symptom and remedy of multiocrrelation
* autocorrelation (missing a driver, driver should be there)
* symptom: clear patter, DW low enough
* remedy: include driver

• data