Repeated Sampling

Repeated Sampling is a method of collecting multiple samples from the same population to obtain a more accurate estimate of its true characteristics. A single sample may not always reflect the full picture due to random variation, so taking many samples helps reduce errors.

By calculating the average (mean) and spread (standard deviation) of multiple samples, statisticians can better estimate the real values in the population. This approach also minimizes the impact of unusual data points (outliers) that might skew a single sample. The more samples taken, the closer the overall average gets to the actual population mean, making repeated sampling a useful tool for improving accuracy in data analysis.

Here are some key things to remember:

  • Sample Size: Larger samples tend to be more accurate
  • Randomness: Samples should be randomly chosen to ensure fairness
  • Variation: Results may differ, but more samples improve accuracy
  • More Samples, More Accuracy: Additional samples lead to more reliable results
  • Normal Distribution: Many samples often result in a predictable pattern
  • Standard Error: Repeated samples help estimate the variability of the sample mean
  • Confidence Intervals: Provide a range where the true value likely lies
  • Cost and Resources: More samples require additional time and effort
  • Consistency: Similar results across samples indicate higher accuracy

Hypothesis testing is a statistical method used to determine whether there is enough evidence in a sample to support a specific claim or hypothesis about a population. The null hypothesis (\(H_0 \)) represents the default assumption, while the alternative hypothesis (\( H_1 \)) is the claim being tested. The significance level (\( \alpha \)) is the probability of incorrectly rejecting \( H_0 \) when it’s true, and the confidence level (\( 1 - \alpha \)) reflects the probability of a correct decision. If the \(p\)-value is less than \( \alpha \), we reject \( H_0 \), suggesting support for the alternative hypothesis.


Formulas

Mean of the Sample Means

The mean of the sample means is equal to the population mean:

\(\mu_{\bar{x}} = \mu\)

Where:

  • \(\mu_{\bar{x}}\) is the mean of the sample means, representing the average of all sample means taken from the population
  • \(\mu\) is the population mean, the true average value of the entire population

Standard Error of the Sample Means

The Standard Deviation of the sampling distribution of the sample mean (standard error) is given by:

\(\sigma_{\bar{x}} = \cfrac{\sigma}{\sqrt{n}}\)

Where:

  • \(\sigma_{\bar{x}}\) is the standard error of the sample mean
  • \( \sigma \) is the population standard deviation,
  • \(n\) is the sample size

Mean of the Binomial Distribution

The mean of the binomial distribution is calculated as follows:

\(\mu = np \)

Where:

  • \(\mu\) is the mean of the binomial distribution
  • \(n\) is the number of trials
  • \(p\) is the probability of success on a single trial

Standard Deviation of the Binomial Distribution

The standard deviation of the binomial distribution is found using this formula:

\(\sigma = \sqrt{npq}\)

Where:

  • \(\sigma\) is the standard deviation of the binomial distribution
  • \(n\) is the number of trials
  • \(p\) is the probability of success on a single trial
  • \(q = 1 - p\), the probability of failure

Example

Suppose you are estimating the average height of students in a school, and the true population mean height is 160cm. You take 5 different samples of 10 students each and calculate the sample mean for each. If the sample means are 158cm, 162cm, 159cm, 161cm and 160cm, what is the average of these sample means? How does this relate to the population mean?

We are given the following sample means from 5 different samples of 10 students each: 158cm, 162cm, 159cm, 161 cm, and 160cm. The true population mean height is 160 cm. Let’s calculate the average of the sample means.

The formula for the average of the sample means is:

\(\text{Average of sample means} = \cfrac{\text{Sum of sample means}}{\text{Number of samples}}\)

First, we can calculate the sum of the sample means:

\(\text{Sum of sample means} = 158 + 162 + 159 + 161 + 160 = 800\)

\(= 800\)

Next, we can divide by the sum of sample means by the number of samples (5):

\(\text{Average of sample means} = \cfrac{800}{5} = 160 \, \text{cm} \)

\(\text{Average of sample means} = 160 \; [\text{cm}]\)

Therefore, we can determine the average of the sample means is 160cm.

This illustrates a key property of repeated sampling: the mean of the sample means (\( \mu_{\bar{x}} \)) equals the population mean (\( \mu \)), as shown in the formula \( \mu_{\bar{x}} = \mu \).


A researcher claims that the average time to complete a task is \(\mu =\) 30 minutes. To test this claim, they take 8 samples of 25 workers each and record the sample mean times: 29.5, 30.2, 29.8, 30.4, 29.7, 30.1, 29.9, 30.3 minutes. The population standard deviation is known to be \(\sigma =\) 2 minutes.

  1. Calculate the average of the sample means
  2. Using the sampling distribution of the sample mean, determine if the researcher’s claim: \(\mu =\) 30 minutes is supported at a significance level of \(\alpha = 0.05\). (Hint: Compute the \(Z\)-Score for the average of the sample means and compare it to the critical \(Z\)-Value for a two-tailed test at \(\alpha = 0.05\), approximately \(\pm 1.96\))

i. First, we can calculate the Average of the Sample Means.

The sample means are: 29.5, 30.2, 29.8, 30.4, 29.7, 30.1, 29.9, 30.3 minutes. There are 8 samples in total.

The formula for the average of the sample means is:

\(\text{Average of sample means} = \cfrac{\text{Sum of sample means}}{\text{Number of samples}} \)

First, we can determine the sum of the sample means:

\(\text{Sum of Sample Means} = 29.5 + 30.2 + 29.8 + 30.4 + 29.7 + 30.1 + 29.9 + 30.3 = 239.9\)

\(\text{Sum of Sample Means} = 239.9\)

Next, we can divide by the number of samples (8):

\(\text{Average of sample means} = \cfrac{239.9}{8}\)

\(\text{Average of sample means} = 29.9875 \; [\text{minutes}]\)

Therefore, we can determine that the Average of the Sample Means is 29.9875 minutes (or approximately 29.99 minutes).


ii. We need to test the null hypothesis:

\( H_0: \mu = 30 \, \text{minutes} \) against the alternative \( H_a: \mu \neq 30 \, \text{minutes} \) at \( \alpha = 0.05 \).

The sampling distribution of the sample mean is normal because the sample size (\( n = 25 \)) is sufficiently large, and the population standard deviation is known. However, since we’re dealing with the average of 8 sample means, we adjust the standard error.

First, we can determine the standard error of the mean for a single sample is:

\( \sigma_{\bar{x}} = \cfrac{\sigma}{\sqrt{n}} \)

\(\sigma_{\bar{x}} = \cfrac{2}{\sqrt{25}}\)

\(\sigma_{\bar{x}} = 0.4 \; [\text{minutes}]\)

Next, we can determine the Standard Error for the average of 8 sample means:

\(\text{SEM}_{\text{avg}} = \cfrac{2}{\sqrt{25 \cdot 8}}\)

\(\text{SEM}_{\text{avg}} = \cfrac{2}{\sqrt{200}} \)

\(\text{SEM}_{\text{avg}} \approx 0.1414 \; [\text{minutes}]\)

Then, we can calculate the \(Z\)-Score for the average of the sample means (\(29.9875\)):

\(z = \cfrac{29.9875 - 30}{0.1414}\)

\(z = \cfrac{-0.0125}{0.1414}\)

\(z \approx -0.0884\)

For a two-tailed test at \(\alpha = 0.05\), the critical \(Z\)-Values are \( \pm 1.96 \). We can compare the \(Z\)-Score against the critical Z-Value:

\(|z| = 0.0884 < 1.96\)

Since the \(Z\)-Score does not fall in the rejection region (\(|z| < 1.96\)), we fail to reject the null hypothesis.

At the \(\alpha = 0.05\) significance level, the researcher’s claim (\(\mu =\) 30 minutes) is supported by the data.


A factory claims that the average time to assemble a product is 45 minutes. A researcher collects 12 samples of 30 workers each and records their mean assembly times. If the population standard deviation is 3 minutes, calculate the standard error of the mean (SEM).

First, we can identify the given values:

  • Population Standard Deviation: \(\sigma = 3\)
  • Sample Size: \(n = 30\)

Next, we can calculate the Standard Error of the Mean (SEM) as such:

\(\text{SEM} = \cfrac{\sigma}{\sqrt{n}}\)

\(\text{SEM} = \cfrac{3}{\sqrt{30}} \)

\(\text{SEM} = \cfrac{3}{5.477}\)

\(\text{SEM} \approx 0.548 \; [\text{minutes}] \)

Therefore, we can determine the Standard Error of the Mean is approximately 0.548 minutes.