Saturday, December 18, 2010

Finding the p value in t-tests


The hypotheses that we have been working on for t-tests where there are two sample means drawn from two independent populations have all been two-tailed. It is perfectly possible to have a one-tailed test, but we haven’t learned that. So, when you look at the Excel output, you are interested only in the result for the two-tailed test. Excel provides a p value, as shown here.

t-Test: Paired Two Sample for Means




Current
Previous
Mean
0.5468
0.3404
Variance
0.305139
0.276987
Observations
25
25
Pearson Correlation
0.880067

Hypothesized Mean Difference
0

df
24

t Stat
3.889064

P(T<=t) one-tail
0.000349

t Critical one-tail
1.710882

P(T<=t) two-tail
0.000697

t Critical two-tail
2.063899


This output was found by using a paired t-test, using the Earnings2005 data (you have the file). Because the p value for two-tailed is 0.000697, we reject the null hypothesis: p is smaller than 0.05. If we reject the null, we are saying that the two population means are not the same. So look at the top line of means, and you can see that ‘current’ is larger than ‘previous’.

Monster Review Q18 Random Variables


18. Let the random variable X represent the profit made on a randomly selected day by a small clothing store on Main Street.  Assume X is normal with a mean of $360 and a standard deviation of $50. 


What is P(X > $400)?
A)
0.2119
C)
0.7881
B)
0.2881
D)
0.8450
Answer:
A
Topic:
4.3 Random Variables






You know that X, the random variable is normally distributed. So you can use the Excel = normdist function. Note that we want X greater than $400. So we are looking for the area to the right of X=400. Because Excel ‘adds up’ from left to right, that means we will have to calculate the area up to 400, and then subtract from 1. You can do all this in one go with
=1-normdist(400,360,50,true). That gives 0.211855 which you can round up to get the answer in A.

When to use the T distribution?

On pg 333, chapter 8, question 22, the sample size is less than 30, the SD is unknown, but in part (B) of the question it says "the population has a normal distribution".

So should you use the T distribution or normal distribution? Go for the T dist because the standard deviation is unknown. So, even if the population is normally distributed, if you don’t know its standard deviation, use the T. In fact I always use the T distribution in applied (real) work. It is safer, giving you a wider margin or error and therefore confidence interval. Therefore the chances of rejecting a true null (and therefore making a Type 1 error are smaller. Nice question, thanks RB.

Thursday, December 16, 2010

chi-squared worked example

Chapter 12, question 15. Flights stats, inc., collects data on the number of flights scheduled and the number of flights flown at major airports throughout the US. Flightstats data showed 56% of flights scheduled at Newark, La Guardia, and Kennedy airports were flown during a three-day storm. All airlines say they always operate within set safety parameters-if conditions are too poor they do not fly. the following data show a sample of 400 scheduled flights during the snowstorm.
Did it fly  American   Continental  Delta  United  Total
Yes          48                   69              68        25         210
no            52                   41              62        35          190
Use the chi-test of independance with a .05 significance to analyze the data what is your conclusion? do you have a preference for which airline you would choose to fly during similar snow storm conditions?

Answers: let’s write the hypotheses:
Ho: There is no difference in flying or not and airline
Ha: There is ....

Step 1 Fill in the column totals
Step 2 Calculate the expected values by going rowtotal*columntotal/n
Mine are
48
69
68
25
210
52
41
62
35
190
100
110
130
60
400





52.5
57.75
68.25
31.5

47.5
52.25
61.75
28.5






Then use =chitest(...) to get the p value. I get p = 0.04109. That means we can reject the null and say that there is a difference in airlines. I think I’d probably pick United, because their rate of not flying in bad weather is highest. That implies a more safety-conscious attitude. But a bit boring.