How Do We Calculate P Value? | From Test To Decision

A p value comes from a test statistic and shows how likely your data, or more extreme data, would be if the null model were true.

A p value looks like one small decimal, yet it comes from a chain of choices. You pick a null claim, decide whether your test is one-tailed or two-tailed, turn the sample result into a test statistic, then read a tail area from a reference distribution. That tail area is the p value.

The part that trips people up is what the number means. A p value is not the chance that the null hypothesis is true. It is the chance of seeing data this far from the null, or farther, if the null model were right. Once that point clicks, the math starts to feel orderly.

How Do We Calculate P Value In A Hypothesis Test?

The logic barely changes across tests. You still need the same four building blocks:

  • A null hypothesis, such as a mean of 50 or no gap between two groups.
  • An alternative hypothesis that sets the tail direction.
  • A sample result, such as a mean, proportion, difference, or correlation.
  • A standard error and a matching reference distribution.

Step 1: Write The Hypotheses

Start with a null statement that includes an equals sign. Then write the alternative. If any departure matters, use a two-tailed test. If only one direction matters, use a one-tailed test. That choice changes the p value because it changes which tail area you count.

Step 2: Compute The Test Statistic

The test statistic tells you how far your sample result sits from the null value after sampling noise is taken into account. In many tests, the skeleton is:

test statistic = (sample estimate - null value) / standard error

That formula turns a raw difference into a standardized distance. Penn State’s lesson on hypothesis testing and tail area lays out this sequence clearly: compute the statistic first, then find the p value from the tail area.

Step 3: Read The Tail Area

Once you have the statistic, move to the right reference distribution. A z test uses the normal curve. A t test uses the t distribution. Chi-square and F tests use their own curves. The p value is the area in the tail that is at least as far out as your observed statistic, in the direction named by the alternative hypothesis.

The NIST page on critical values and p values defines the p value this same way. That definition keeps you from drifting into common mistakes.

Step 4: Compare With Alpha

After you get the tail area, compare it with alpha, often 0.05. If p is below alpha, you reject the null. If p is above alpha, you fail to reject it. That rule is useful, but it is not the whole story. A small p value does not tell you that the effect is large or that the study was flawless.

The ASA statement on p-values makes that point bluntly: one threshold should not carry the whole argument.

Worked Example With A Z Test

Say a school says the mean quiz score is 70. You sample 36 students and get a mean of 74. The population standard deviation is known to be 12, so the standard error is 12 / √36 = 2.

  1. Null hypothesis: μ = 70
  2. Alternative hypothesis: μ ≠ 70
  3. z = (74 – 70) / 2 = 2.00
  4. Right-tail area for z = 2.00 is about 0.0228
  5. Because the test is two-tailed, p = 2 × 0.0228 = 0.0456

So the p value is 0.0456. At alpha 0.05, you reject the null. The raw gap was 4 points, yet the p value came from the standardized gap, not from the raw gap alone.

Common Tests And Where The P Value Comes From

Software makes the output look automatic, though the same moving parts are always there: a statistic, a distribution, a tail rule, and sometimes degrees of freedom.

Test Statistic Where P Comes From
One-sample z test z = (x̄ – μ0) / SE Normal tail area
One-sample t test t = (x̄ – μ0) / SE t tail with n – 1 df
Two-sample t test Difference in means / SE t tail with test df
Paired t test Mean difference / SE t tail with n – 1 df
One-proportion z test z = (p̂ – p0) / SE Normal tail area
Two-proportion z test Difference in proportions / pooled SE Normal tail area
Chi-square test Σ (Observed – Expected)2 / Expected Right tail of chi-square
ANOVA F test Variance ratio Right tail of F distribution
Correlation test t built from r and sample size t tail with n – 2 df

Worked Example With A T Test

Now switch to a one-sample t test. A café says the mean wait time is 8 minutes. You sample 25 visits, get a mean of 9.4 minutes, and the sample standard deviation is 3 minutes.

First, compute the standard error: 3 / √25 = 0.6. Next, compute the test statistic:

t = (9.4 - 8.0) / 0.6 = 2.33

With 24 degrees of freedom, a right-tailed t of 2.33 gives a p value a little under 0.015. If your alternative says the mean wait is greater than 8 minutes, that is the final p value. If your alternative is two-tailed, double it.

That one change in distribution is why t tests are used so often in practice. You rarely know the population standard deviation, so the t curve handles the added uncertainty.

Reading Common P Value Ranges

A p value works better as a graded read than as a hard pass-or-fail switch. This table gives a plain reading that fits many classroom and workplace settings.

P Value Range Plain Reading Next Check
Above 0.10 Data fit the null model well Check sample size and precision
0.05 to 0.10 Some tension with the null model Read the interval estimate
0.01 to 0.049 Clearer tension with the null model Check whether the effect is meaningful
Below 0.01 Data are hard to square with the null model Check design and assumptions

Common Mistakes That Distort The Result

Most mistakes creep in before the last arithmetic step. Watch for these:

  • Using a one-tailed area when the question is two-tailed.
  • Picking the wrong test, such as z when a t test is needed.
  • Ignoring assumptions about independence or spread.
  • Running many tests and only reporting the smallest p value.
  • Reading p as the chance that the null hypothesis is true.

That last line is the biggest trap. The null hypothesis is assumed when the p value is built. The calculation asks how odd the data look under that assumption. It does not flip around and tell you the probability that the assumption itself is true.

How To Report A P Value Cleanly

Whether you calculate by hand or let software do the heavy lifting, report enough detail for a reader to see where the number came from:

  • Name the test.
  • Give the test statistic.
  • Give the degrees of freedom when the test uses them.
  • State whether the test is one-tailed or two-tailed when that is not obvious.
  • Report the p value itself, not just “below 0.05.”

A neat report line looks like this: “One-sample t test, t(24) = 2.33, p = 0.014.” That gives the reader the method and the result in one pass.

What The Number Tells You

Calculating a p value comes down to three moves: standardize the sample result, find the tail area under the right distribution, and match that area to the test direction. Once you see that pattern, the formulas stop feeling like separate rules and start feeling like small variations of the same idea.

If you want fewer mistakes, slow down at three points every time: the hypotheses, the standard error, and the tail choice. Get those right, and the p value stops looking like a black box.

References & Sources

  • Penn State Eberly College of Science.“Hypothesis Testing.”Shows the test sequence and states that the p value is the tail area more extreme than the test statistic.
  • National Institute of Standards and Technology.“Critical Values and P Values.”Defines the p value and links it to the observed test statistic under the null hypothesis.
  • American Statistical Association.“ASA Statement On P-Values.”Lists six principles that help readers avoid common p value mistakes and overreading one threshold.