One Sample Testing

This section shows how to test the null hypothesis that the population mean is equal to some hypothesized value. For example, suppose we have the quiz scores of students.

Assumptions

  1. Each value is sampled independently from each other value;
  2. The values are sampled from a normal distribution.

Hypothesis

The question is whether the population mean of scores suggested that the quiz score is equal to 60. So the hypothesis testing is defined as:

Test Statistic and Formula

In the null hypothesis, we use the paired difference as the test statistic, and reject the null hypothesis if it is large.

There are two cases: 1) If is known, samples should follow normal distribution (full population) under :

2) If is not known, then samples should follow student’s t-distribution under :

where is the standard deviation of the sample, which is also called standard error.

Examples

Assume our sample is dataset of 30 student from class 1. We generate the scores in normal distribution with mean 60 and standard deviation 10.

x <- as.integer(rnorm(30, mean = 60, sd = 15))
# [1] 59 15 57 56 83 66 48 63 44 58 45 62 54 53 73 73 59 57 49 38 67 55 90 44 60 63 61 67 42 44

Let’s first calculate the p-value under case 1:

x.mean <- mean(x)
# Calculate Z value
z <- (60 - x.mean) / (15 / sqrt(30))
# Calculate the probability value using normal distribution density function
p <- dnorm(z)
# [1] 0.2044448

The probability value 0.2044 is not small enough so we fail to reject .

However, in the real-world data analyses it is very rare that you would know the variance of the population . And the case 2 is the common case:

x.mean <- mean(x)
x.sd <- sd(x)
# Calculate T value
xT <- (60 - x.mean) / (x.sd / sqrt(30))
# Calculate the probability value using t-distribution density function
p <- dt(xT, df = (30 - 1))
# [1] 0.186752

We get the similar result that the probability is not small enough to reject .

If we generate the sample following normal distribution with mean 30, the probability of is very small, then we can reject the and conclude that the mean of the scores does not equal to 60.

x <- as.integer(rnorm(30, mean = 30, sd = 15))
x.mean <- mean(x)
x.sd <- sd(x)
xT <- (60 - x.mean) / (x.sd / sqrt(30))
p <- dt(xT, df = (30 - 1))
# [1] 8.632443e-10

Two-sample Testing

It is much common for a researcher to be interested in the difference between two means than in the specific values of the means themselves. This section covers how to test for differences between means of two separate groups of observations. That is, we have two samples and , distributed independently. We would like to know whether and come from the same population distribution.

Assumptions

  1. The two samples have the same variance.
  2. The two samples are normally distributed.
  3. Each value is sampled independently from each other value.

The equal variance assumption can be relaxed as long as both sample sizes and are large. The normality assumption can be relaxed as long as the population distributions are not highly skewed. However, if one (or both) samples is small, then the test does not perform well.

Hypothesis

The hypothesis testing is defined as:

Test Statistic

The test statistic is:

If the variances of the populations are known, then the test statistic is:

The test statistic should follow the standard normal distribution under .

In the most case, the variances are not known.

If either of the sample sizes is small (generally less than 30), the test value should follow the t-distribution with degrees of freedom instead of the standard normal distribution.

Examples

Suppose we have two samples:

# Generate data
g1 <- c(2, 3, 4, 5)
g2 <- c(2, 4, 6)

It is usually good to construct side-by-side boxplots, which gives a visual comparison of the samples and helps to identify departures from the test’s assumptions.

df <- data.frame(group = c(rep(1, length(g1)), rep(2, length(g2))), value = c(g1, g2))
boxplot(value ~ group, df)

Then, we calculate the test statistic:

# Test statistic: difference of means
v <- mean(g1) - mean(g2)
# [1] -0.5

Next step is to calculate the standard error of the statistic:

# Standard error of the statistic
s <- sqrt(sd(g1) / length(g1) + sd(g2) / length(g2))
# [1] 0.9946936

Finally, calculate the probability:

tt <- v / s
## [1] -0.5026674
# Probablity of getting tt
dt(tt, df =5)
# [1] 0.3274179
# One-tail test: Pr(t < -0.5026674)
pt(tt, df =5)
# [1] 0.3182754
# Two-tail test: Pr(t < -0.502664 or t > 0.502664)
pt(tt, df = 5) + pt(-tt, df = 5, lower.tail = FALSE)
# [1] 0.6365508

Computation of Standard Error

However, there is another way to estimate the standard error of statistic , especially when the sample sizes are not equal. We can use MSE, the estimate of variance, with the following formula:

Paired Samples Testing

Note that one of the assumption is that each value is sampled independently, which means each subject must provide only one scores. If a subject provides two scores, then the scores are not independent. In this section, we study the test where the samples are either paired up in some way or the same sample are used twice.

Example

Suppose a researcher wants to see whether teaching students to read using a computer game gives better results than teaching with a tried-and-true phonics method. She randomly selects 20 students and puts them into 10 pairs according to their reading readiness level, age, IQ, and so on. She randomly selects one student from each pair to learn to read via the computer game, and the other learns to read using the phonics method. At the end of the study, each student takes the same reading test.

The data are in pairs, but you’re really interested only in the difference in reading scores (computer reading score – phonics reading score) for each pair, not the reading scores themselves. So, you take the difference between the scores for each pair, and those paired differences make up your new dataset to work with. If the two reading methods are the same, the average of the paired differences should be 0. If the computer method is better, the average of the paired differences should be positive.

Hypothesis

is the mean of the paired difference.

Test Statistic

The formula for the test statistic for paired differences is:

When calculating probability value, if the dataset is small, we should use t-distribution with degrees of freedom instead of the standard normal distribution.