Have a language expert improve your writing

Run a free plagiarism check in 10 minutes, generate accurate citations for free.

  • Knowledge Base
  • Null and Alternative Hypotheses | Definitions & Examples

Null & Alternative Hypotheses | Definitions, Templates & Examples

Published on May 6, 2022 by Shaun Turney . Revised on June 22, 2023.

The null and alternative hypotheses are two competing claims that researchers weigh evidence for and against using a statistical test :

  • Null hypothesis ( H 0 ): There’s no effect in the population .
  • Alternative hypothesis ( H a or H 1 ) : There’s an effect in the population.

Table of contents

Answering your research question with hypotheses, what is a null hypothesis, what is an alternative hypothesis, similarities and differences between null and alternative hypotheses, how to write null and alternative hypotheses, other interesting articles, frequently asked questions.

The null and alternative hypotheses offer competing answers to your research question . When the research question asks “Does the independent variable affect the dependent variable?”:

  • The null hypothesis ( H 0 ) answers “No, there’s no effect in the population.”
  • The alternative hypothesis ( H a ) answers “Yes, there is an effect in the population.”

The null and alternative are always claims about the population. That’s because the goal of hypothesis testing is to make inferences about a population based on a sample . Often, we infer whether there’s an effect in the population by looking at differences between groups or relationships between variables in the sample. It’s critical for your research to write strong hypotheses .

You can use a statistical test to decide whether the evidence favors the null or alternative hypothesis. Each type of statistical test comes with a specific way of phrasing the null and alternative hypothesis. However, the hypotheses can also be phrased in a general way that applies to any test.

Receive feedback on language, structure, and formatting

Professional editors proofread and edit your paper by focusing on:

  • Academic style
  • Vague sentences
  • Style consistency

See an example

null hypothesis research

The null hypothesis is the claim that there’s no effect in the population.

If the sample provides enough evidence against the claim that there’s no effect in the population ( p ≤ α), then we can reject the null hypothesis . Otherwise, we fail to reject the null hypothesis.

Although “fail to reject” may sound awkward, it’s the only wording that statisticians accept . Be careful not to say you “prove” or “accept” the null hypothesis.

Null hypotheses often include phrases such as “no effect,” “no difference,” or “no relationship.” When written in mathematical terms, they always include an equality (usually =, but sometimes ≥ or ≤).

You can never know with complete certainty whether there is an effect in the population. Some percentage of the time, your inference about the population will be incorrect. When you incorrectly reject the null hypothesis, it’s called a type I error . When you incorrectly fail to reject it, it’s a type II error.

Examples of null hypotheses

The table below gives examples of research questions and null hypotheses. There’s always more than one way to answer a research question, but these null hypotheses can help you get started.

( )
Does tooth flossing affect the number of cavities? Tooth flossing has on the number of cavities. test:

The mean number of cavities per person does not differ between the flossing group (µ ) and the non-flossing group (µ ) in the population; µ = µ .

Does the amount of text highlighted in the textbook affect exam scores? The amount of text highlighted in the textbook has on exam scores. :

There is no relationship between the amount of text highlighted and exam scores in the population; β = 0.

Does daily meditation decrease the incidence of depression? Daily meditation the incidence of depression.* test:

The proportion of people with depression in the daily-meditation group ( ) is greater than or equal to the no-meditation group ( ) in the population; ≥ .

*Note that some researchers prefer to always write the null hypothesis in terms of “no effect” and “=”. It would be fine to say that daily meditation has no effect on the incidence of depression and p 1 = p 2 .

The alternative hypothesis ( H a ) is the other answer to your research question . It claims that there’s an effect in the population.

Often, your alternative hypothesis is the same as your research hypothesis. In other words, it’s the claim that you expect or hope will be true.

The alternative hypothesis is the complement to the null hypothesis. Null and alternative hypotheses are exhaustive, meaning that together they cover every possible outcome. They are also mutually exclusive, meaning that only one can be true at a time.

Alternative hypotheses often include phrases such as “an effect,” “a difference,” or “a relationship.” When alternative hypotheses are written in mathematical terms, they always include an inequality (usually ≠, but sometimes < or >). As with null hypotheses, there are many acceptable ways to phrase an alternative hypothesis.

Examples of alternative hypotheses

The table below gives examples of research questions and alternative hypotheses to help you get started with formulating your own.

Does tooth flossing affect the number of cavities? Tooth flossing has an on the number of cavities. test:

The mean number of cavities per person differs between the flossing group (µ ) and the non-flossing group (µ ) in the population; µ ≠ µ .

Does the amount of text highlighted in a textbook affect exam scores? The amount of text highlighted in the textbook has an on exam scores. :

There is a relationship between the amount of text highlighted and exam scores in the population; β ≠ 0.

Does daily meditation decrease the incidence of depression? Daily meditation the incidence of depression. test:

The proportion of people with depression in the daily-meditation group ( ) is less than the no-meditation group ( ) in the population; < .

Null and alternative hypotheses are similar in some ways:

  • They’re both answers to the research question.
  • They both make claims about the population.
  • They’re both evaluated by statistical tests.

However, there are important differences between the two types of hypotheses, summarized in the following table.

A claim that there is in the population. A claim that there is in the population.

Equality symbol (=, ≥, or ≤) Inequality symbol (≠, <, or >)
Rejected Supported
Failed to reject Not supported

Prevent plagiarism. Run a free check.

To help you write your hypotheses, you can use the template sentences below. If you know which statistical test you’re going to use, you can use the test-specific template sentences. Otherwise, you can use the general template sentences.

General template sentences

The only thing you need to know to use these general template sentences are your dependent and independent variables. To write your research question, null hypothesis, and alternative hypothesis, fill in the following sentences with your variables:

Does independent variable affect dependent variable ?

  • Null hypothesis ( H 0 ): Independent variable does not affect dependent variable.
  • Alternative hypothesis ( H a ): Independent variable affects dependent variable.

Test-specific template sentences

Once you know the statistical test you’ll be using, you can write your hypotheses in a more precise and mathematical way specific to the test you chose. The table below provides template sentences for common statistical tests.

( )
test 

with two groups

The mean dependent variable does not differ between group 1 (µ ) and group 2 (µ ) in the population; µ = µ . The mean dependent variable differs between group 1 (µ ) and group 2 (µ ) in the population; µ ≠ µ .
with three groups The mean dependent variable does not differ between group 1 (µ ), group 2 (µ ), and group 3 (µ ) in the population; µ = µ = µ . The mean dependent variable of group 1 (µ ), group 2 (µ ), and group 3 (µ ) are not all equal in the population.
There is no correlation between independent variable and dependent variable in the population; ρ = 0. There is a correlation between independent variable and dependent variable in the population; ρ ≠ 0.
There is no relationship between independent variable and dependent variable in the population; β = 0. There is a relationship between independent variable and dependent variable in the population; β ≠ 0.
Two-proportions test The dependent variable expressed as a proportion does not differ between group 1 ( ) and group 2 ( ) in the population; = . The dependent variable expressed as a proportion differs between group 1 ( ) and group 2 ( ) in the population; ≠ .

Note: The template sentences above assume that you’re performing one-tailed tests . One-tailed tests are appropriate for most studies.

If you want to know more about statistics , methodology , or research bias , make sure to check out some of our other articles with explanations and examples.

  • Normal distribution
  • Descriptive statistics
  • Measures of central tendency
  • Correlation coefficient

Methodology

  • Cluster sampling
  • Stratified sampling
  • Types of interviews
  • Cohort study
  • Thematic analysis

Research bias

  • Implicit bias
  • Cognitive bias
  • Survivorship bias
  • Availability heuristic
  • Nonresponse bias
  • Regression to the mean

Hypothesis testing is a formal procedure for investigating our ideas about the world using statistics. It is used by scientists to test specific predictions, called hypotheses , by calculating how likely it is that a pattern or relationship between variables could have arisen by chance.

Null and alternative hypotheses are used in statistical hypothesis testing . The null hypothesis of a test always predicts no effect or no relationship between variables, while the alternative hypothesis states your research prediction of an effect or relationship.

The null hypothesis is often abbreviated as H 0 . When the null hypothesis is written using mathematical symbols, it always includes an equality symbol (usually =, but sometimes ≥ or ≤).

The alternative hypothesis is often abbreviated as H a or H 1 . When the alternative hypothesis is written using mathematical symbols, it always includes an inequality symbol (usually ≠, but sometimes < or >).

A research hypothesis is your proposed answer to your research question. The research hypothesis usually includes an explanation (“ x affects y because …”).

A statistical hypothesis, on the other hand, is a mathematical statement about a population parameter. Statistical hypotheses always come in pairs: the null and alternative hypotheses . In a well-designed study , the statistical hypotheses correspond logically to the research hypothesis.

Cite this Scribbr article

If you want to cite this source, you can copy and paste the citation or click the “Cite this Scribbr article” button to automatically add the citation to our free Citation Generator.

Turney, S. (2023, June 22). Null & Alternative Hypotheses | Definitions, Templates & Examples. Scribbr. Retrieved August 29, 2024, from https://www.scribbr.com/statistics/null-and-alternative-hypotheses/

Is this article helpful?

Shaun Turney

Shaun Turney

Other students also liked, inferential statistics | an easy introduction & examples, hypothesis testing | a step-by-step guide with easy examples, type i & type ii errors | differences, examples, visualizations, what is your plagiarism score.

  • Skip to secondary menu
  • Skip to main content
  • Skip to primary sidebar

Statistics By Jim

Making statistics intuitive

Null Hypothesis: Definition, Rejecting & Examples

By Jim Frost 6 Comments

What is a Null Hypothesis?

The null hypothesis in statistics states that there is no difference between groups or no relationship between variables. It is one of two mutually exclusive hypotheses about a population in a hypothesis test.

Photograph of Rodin's statue, The Thinker who is pondering the null hypothesis.

  • Null Hypothesis H 0 : No effect exists in the population.
  • Alternative Hypothesis H A : The effect exists in the population.

In every study or experiment, researchers assess an effect or relationship. This effect can be the effectiveness of a new drug, building material, or other intervention that has benefits. There is a benefit or connection that the researchers hope to identify. Unfortunately, no effect may exist. In statistics, we call this lack of an effect the null hypothesis. Researchers assume that this notion of no effect is correct until they have enough evidence to suggest otherwise, similar to how a trial presumes innocence.

In this context, the analysts don’t necessarily believe the null hypothesis is correct. In fact, they typically want to reject it because that leads to more exciting finds about an effect or relationship. The new vaccine works!

You can think of it as the default theory that requires sufficiently strong evidence to reject. Like a prosecutor, researchers must collect sufficient evidence to overturn the presumption of no effect. Investigators must work hard to set up a study and a data collection system to obtain evidence that can reject the null hypothesis.

Related post : What is an Effect in Statistics?

Null Hypothesis Examples

Null hypotheses start as research questions that the investigator rephrases as a statement indicating there is no effect or relationship.

Does the vaccine prevent infections? The vaccine does not affect the infection rate.
Does the new additive increase product strength? The additive does not affect mean product strength.
Does the exercise intervention increase bone mineral density? The intervention does not affect bone mineral density.
As screen time increases, does test performance decrease? There is no relationship between screen time and test performance.

After reading these examples, you might think they’re a bit boring and pointless. However, the key is to remember that the null hypothesis defines the condition that the researchers need to discredit before suggesting an effect exists.

Let’s see how you reject the null hypothesis and get to those more exciting findings!

When to Reject the Null Hypothesis

So, you want to reject the null hypothesis, but how and when can you do that? To start, you’ll need to perform a statistical test on your data. The following is an overview of performing a study that uses a hypothesis test.

The first step is to devise a research question and the appropriate null hypothesis. After that, the investigators need to formulate an experimental design and data collection procedures that will allow them to gather data that can answer the research question. Then they collect the data. For more information about designing a scientific study that uses statistics, read my post 5 Steps for Conducting Studies with Statistics .

After data collection is complete, statistics and hypothesis testing enter the picture. Hypothesis testing takes your sample data and evaluates how consistent they are with the null hypothesis. The p-value is a crucial part of the statistical results because it quantifies how strongly the sample data contradict the null hypothesis.

When the sample data provide sufficient evidence, you can reject the null hypothesis. In a hypothesis test, this process involves comparing the p-value to your significance level .

Rejecting the Null Hypothesis

Reject the null hypothesis when the p-value is less than or equal to your significance level. Your sample data favor the alternative hypothesis, which suggests that the effect exists in the population. For a mnemonic device, remember—when the p-value is low, the null must go!

When you can reject the null hypothesis, your results are statistically significant. Learn more about Statistical Significance: Definition & Meaning .

Failing to Reject the Null Hypothesis

Conversely, when the p-value is greater than your significance level, you fail to reject the null hypothesis. The sample data provides insufficient data to conclude that the effect exists in the population. When the p-value is high, the null must fly!

Note that failing to reject the null is not the same as proving it. For more information about the difference, read my post about Failing to Reject the Null .

That’s a very general look at the process. But I hope you can see how the path to more exciting findings depends on being able to rule out the less exciting null hypothesis that states there’s nothing to see here!

Let’s move on to learning how to write the null hypothesis for different types of effects, relationships, and tests.

Related posts : How Hypothesis Tests Work and Interpreting P-values

How to Write a Null Hypothesis

The null hypothesis varies by the type of statistic and hypothesis test. Remember that inferential statistics use samples to draw conclusions about populations. Consequently, when you write a null hypothesis, it must make a claim about the relevant population parameter . Further, that claim usually indicates that the effect does not exist in the population. Below are typical examples of writing a null hypothesis for various parameters and hypothesis tests.

Related posts : Descriptive vs. Inferential Statistics and Populations, Parameters, and Samples in Inferential Statistics

Group Means

T-tests and ANOVA assess the differences between group means. For these tests, the null hypothesis states that there is no difference between group means in the population. In other words, the experimental conditions that define the groups do not affect the mean outcome. Mu (µ) is the population parameter for the mean, and you’ll need to include it in the statement for this type of study.

For example, an experiment compares the mean bone density changes for a new osteoporosis medication. The control group does not receive the medicine, while the treatment group does. The null states that the mean bone density changes for the control and treatment groups are equal.

  • Null Hypothesis H 0 : Group means are equal in the population: µ 1 = µ 2 , or µ 1 – µ 2 = 0
  • Alternative Hypothesis H A : Group means are not equal in the population: µ 1 ≠ µ 2 , or µ 1 – µ 2 ≠ 0.

Group Proportions

Proportions tests assess the differences between group proportions. For these tests, the null hypothesis states that there is no difference between group proportions. Again, the experimental conditions did not affect the proportion of events in the groups. P is the population proportion parameter that you’ll need to include.

For example, a vaccine experiment compares the infection rate in the treatment group to the control group. The treatment group receives the vaccine, while the control group does not. The null states that the infection rates for the control and treatment groups are equal.

  • Null Hypothesis H 0 : Group proportions are equal in the population: p 1 = p 2 .
  • Alternative Hypothesis H A : Group proportions are not equal in the population: p 1 ≠ p 2 .

Correlation and Regression Coefficients

Some studies assess the relationship between two continuous variables rather than differences between groups.

In these studies, analysts often use either correlation or regression analysis . For these tests, the null states that there is no relationship between the variables. Specifically, it says that the correlation or regression coefficient is zero. As one variable increases, there is no tendency for the other variable to increase or decrease. Rho (ρ) is the population correlation parameter and beta (β) is the regression coefficient parameter.

For example, a study assesses the relationship between screen time and test performance. The null states that there is no correlation between this pair of variables. As screen time increases, test performance does not tend to increase or decrease.

  • Null Hypothesis H 0 : The correlation in the population is zero: ρ = 0.
  • Alternative Hypothesis H A : The correlation in the population is not zero: ρ ≠ 0.

For all these cases, the analysts define the hypotheses before the study. After collecting the data, they perform a hypothesis test to determine whether they can reject the null hypothesis.

The preceding examples are all for two-tailed hypothesis tests. To learn about one-tailed tests and how to write a null hypothesis for them, read my post One-Tailed vs. Two-Tailed Tests .

Related post : Understanding Correlation

Neyman, J; Pearson, E. S. (January 1, 1933).  On the Problem of the most Efficient Tests of Statistical Hypotheses .  Philosophical Transactions of the Royal Society A .  231  (694–706): 289–337.

Share this:

null hypothesis research

Reader Interactions

' src=

January 11, 2024 at 2:57 pm

Thanks for the reply.

January 10, 2024 at 1:23 pm

Hi Jim, In your comment you state that equivalence test null and alternate hypotheses are reversed. For hypothesis tests of data fits to a probability distribution, the null hypothesis is that the probability distribution fits the data. Is this correct?

' src=

January 10, 2024 at 2:15 pm

Those two separate things, equivalence testing and normality tests. But, yes, you’re correct for both.

Hypotheses are switched for equivalence testing. You need to “work” (i.e., collect a large sample of good quality data) to be able to reject the null that the groups are different to be able to conclude they’re the same.

With typical hypothesis tests, if you have low quality data and a low sample size, you’ll fail to reject the null that they’re the same, concluding they’re equivalent. But that’s more a statement about the low quality and small sample size than anything to do with the groups being equal.

So, equivalence testing make you work to obtain a finding that the groups are the same (at least within some amount you define as a trivial difference).

For normality testing, and other distribution tests, the null states that the data follow the distribution (normal or whatever). If you reject the null, you have sufficient evidence to conclude that your sample data don’t follow the probability distribution. That’s a rare case where you hope to fail to reject the null. And it suffers from the problem I describe above where you might fail to reject the null simply because you have a small sample size. In that case, you’d conclude the data follow the probability distribution but it’s more that you don’t have enough data for the test to register the deviation. In this scenario, if you had a larger sample size, you’d reject the null and conclude it doesn’t follow that distribution.

I don’t know of any equivalence testing type approach for distribution fit tests where you’d need to work to show the data follow a distribution, although I haven’t looked for one either!

' src=

February 20, 2022 at 9:26 pm

Is a null hypothesis regularly (always) stated in the negative? “there is no” or “does not”

February 23, 2022 at 9:21 pm

Typically, the null hypothesis includes an equal sign. The null hypothesis states that the population parameter equals a particular value. That value is usually one that represents no effect. In the case of a one-sided hypothesis test, the null still contains an equal sign but it’s “greater than or equal to” or “less than or equal to.” If you wanted to translate the null hypothesis from its native mathematical expression, you could use the expression “there is no effect.” But the mathematical form more specifically states what it’s testing.

It’s the alternative hypothesis that typically contains does not equal.

There are some exceptions. For example, in an equivalence test where the researchers want to show that two things are equal, the null hypothesis states that they’re not equal.

In short, the null hypothesis states the condition that the researchers hope to reject. They need to work hard to set up an experiment and data collection that’ll gather enough evidence to be able to reject the null condition.

' src=

February 15, 2022 at 9:32 am

Dear sir I always read your notes on Research methods.. Kindly tell is there any available Book on all these..wonderfull Urgent

Comments and Questions Cancel reply

Have a thesis expert improve your writing

Check your thesis for plagiarism in 10 minutes, generate your apa citations for free.

  • Knowledge Base
  • Null and Alternative Hypotheses | Definitions & Examples

Null and Alternative Hypotheses | Definitions & Examples

Published on 5 October 2022 by Shaun Turney . Revised on 6 December 2022.

The null and alternative hypotheses are two competing claims that researchers weigh evidence for and against using a statistical test :

  • Null hypothesis (H 0 ): There’s no effect in the population .
  • Alternative hypothesis (H A ): There’s an effect in the population.

The effect is usually the effect of the independent variable on the dependent variable .

Table of contents

Answering your research question with hypotheses, what is a null hypothesis, what is an alternative hypothesis, differences between null and alternative hypotheses, how to write null and alternative hypotheses, frequently asked questions about null and alternative hypotheses.

The null and alternative hypotheses offer competing answers to your research question . When the research question asks “Does the independent variable affect the dependent variable?”, the null hypothesis (H 0 ) answers “No, there’s no effect in the population.” On the other hand, the alternative hypothesis (H A ) answers “Yes, there is an effect in the population.”

The null and alternative are always claims about the population. That’s because the goal of hypothesis testing is to make inferences about a population based on a sample . Often, we infer whether there’s an effect in the population by looking at differences between groups or relationships between variables in the sample.

You can use a statistical test to decide whether the evidence favors the null or alternative hypothesis. Each type of statistical test comes with a specific way of phrasing the null and alternative hypothesis. However, the hypotheses can also be phrased in a general way that applies to any test.

The null hypothesis is the claim that there’s no effect in the population.

If the sample provides enough evidence against the claim that there’s no effect in the population ( p ≤ α), then we can reject the null hypothesis . Otherwise, we fail to reject the null hypothesis.

Although “fail to reject” may sound awkward, it’s the only wording that statisticians accept. Be careful not to say you “prove” or “accept” the null hypothesis.

Null hypotheses often include phrases such as “no effect”, “no difference”, or “no relationship”. When written in mathematical terms, they always include an equality (usually =, but sometimes ≥ or ≤).

Examples of null hypotheses

The table below gives examples of research questions and null hypotheses. There’s always more than one way to answer a research question, but these null hypotheses can help you get started.

( )
Does tooth flossing affect the number of cavities? Tooth flossing has on the number of cavities. test:

The mean number of cavities per person does not differ between the flossing group (µ ) and the non-flossing group (µ ) in the population; µ = µ .

Does the amount of text highlighted in the textbook affect exam scores? The amount of text highlighted in the textbook has on exam scores. :

There is no relationship between the amount of text highlighted and exam scores in the population; β = 0.

Does daily meditation decrease the incidence of depression? Daily meditation the incidence of depression.* test:

The proportion of people with depression in the daily-meditation group ( ) is greater than or equal to the no-meditation group ( ) in the population; ≥ .

*Note that some researchers prefer to always write the null hypothesis in terms of “no effect” and “=”. It would be fine to say that daily meditation has no effect on the incidence of depression and p 1 = p 2 .

The alternative hypothesis (H A ) is the other answer to your research question . It claims that there’s an effect in the population.

Often, your alternative hypothesis is the same as your research hypothesis. In other words, it’s the claim that you expect or hope will be true.

The alternative hypothesis is the complement to the null hypothesis. Null and alternative hypotheses are exhaustive, meaning that together they cover every possible outcome. They are also mutually exclusive, meaning that only one can be true at a time.

Alternative hypotheses often include phrases such as “an effect”, “a difference”, or “a relationship”. When alternative hypotheses are written in mathematical terms, they always include an inequality (usually ≠, but sometimes > or <). As with null hypotheses, there are many acceptable ways to phrase an alternative hypothesis.

Examples of alternative hypotheses

The table below gives examples of research questions and alternative hypotheses to help you get started with formulating your own.

Does tooth flossing affect the number of cavities? Tooth flossing has an on the number of cavities. test:

The mean number of cavities per person differs between the flossing group (µ ) and the non-flossing group (µ ) in the population; µ ≠ µ .

Does the amount of text highlighted in a textbook affect exam scores? The amount of text highlighted in the textbook has an on exam scores. :

There is a relationship between the amount of text highlighted and exam scores in the population; β ≠ 0.

Does daily meditation decrease the incidence of depression? Daily meditation the incidence of depression. test:

The proportion of people with depression in the daily-meditation group ( ) is less than the no-meditation group ( ) in the population; < .

Null and alternative hypotheses are similar in some ways:

  • They’re both answers to the research question
  • They both make claims about the population
  • They’re both evaluated by statistical tests.

However, there are important differences between the two types of hypotheses, summarized in the following table.

A claim that there is in the population. A claim that there is in the population.

Equality symbol (=, ≥, or ≤) Inequality symbol (≠, <, or >)
Rejected Supported
Failed to reject Not supported

To help you write your hypotheses, you can use the template sentences below. If you know which statistical test you’re going to use, you can use the test-specific template sentences. Otherwise, you can use the general template sentences.

The only thing you need to know to use these general template sentences are your dependent and independent variables. To write your research question, null hypothesis, and alternative hypothesis, fill in the following sentences with your variables:

Does independent variable affect dependent variable ?

  • Null hypothesis (H 0 ): Independent variable does not affect dependent variable .
  • Alternative hypothesis (H A ): Independent variable affects dependent variable .

Test-specific

Once you know the statistical test you’ll be using, you can write your hypotheses in a more precise and mathematical way specific to the test you chose. The table below provides template sentences for common statistical tests.

( )
test 

with two groups

The mean dependent variable does not differ between group 1 (µ ) and group 2 (µ ) in the population; µ = µ . The mean dependent variable differs between group 1 (µ ) and group 2 (µ ) in the population; µ ≠ µ .
with three groups The mean dependent variable does not differ between group 1 (µ ), group 2 (µ ), and group 3 (µ ) in the population; µ = µ = µ . The mean dependent variable of group 1 (µ ), group 2 (µ ), and group 3 (µ ) are not all equal in the population.
There is no correlation between independent variable and dependent variable in the population; ρ = 0. There is a correlation between independent variable and dependent variable in the population; ρ ≠ 0.
There is no relationship between independent variable and dependent variable in the population; β = 0. There is a relationship between independent variable and dependent variable in the population; β ≠ 0.
Two-proportions test The dependent variable expressed as a proportion does not differ between group 1 ( ) and group 2 ( ) in the population; = . The dependent variable expressed as a proportion differs between group 1 ( ) and group 2 ( ) in the population; ≠ .

Note: The template sentences above assume that you’re performing one-tailed tests . One-tailed tests are appropriate for most studies.

The null hypothesis is often abbreviated as H 0 . When the null hypothesis is written using mathematical symbols, it always includes an equality symbol (usually =, but sometimes ≥ or ≤).

The alternative hypothesis is often abbreviated as H a or H 1 . When the alternative hypothesis is written using mathematical symbols, it always includes an inequality symbol (usually ≠, but sometimes < or >).

A research hypothesis is your proposed answer to your research question. The research hypothesis usually includes an explanation (‘ x affects y because …’).

A statistical hypothesis, on the other hand, is a mathematical statement about a population parameter. Statistical hypotheses always come in pairs: the null and alternative hypotheses. In a well-designed study , the statistical hypotheses correspond logically to the research hypothesis.

Cite this Scribbr article

If you want to cite this source, you can copy and paste the citation or click the ‘Cite this Scribbr article’ button to automatically add the citation to our free Reference Generator.

Turney, S. (2022, December 06). Null and Alternative Hypotheses | Definitions & Examples. Scribbr. Retrieved 29 August 2024, from https://www.scribbr.co.uk/stats/null-and-alternative-hypothesis/

Is this article helpful?

Shaun Turney

Shaun Turney

Other students also liked, levels of measurement: nominal, ordinal, interval, ratio, the standard normal distribution | calculator, examples & uses, types of variables in research | definitions & examples.

Null Hypothesis Definition and Examples, How to State

What is the null hypothesis, how to state the null hypothesis, null hypothesis overview.

null hypothesis research

Why is it Called the “Null”?

The word “null” in this context means that it’s a commonly accepted fact that researchers work to nullify . It doesn’t mean that the statement is null (i.e. amounts to nothing) itself! (Perhaps the term should be called the “nullifiable hypothesis” as that might cause less confusion).

Why Do I need to Test it? Why not just prove an alternate one?

The short answer is, as a scientist, you are required to ; It’s part of the scientific process. Science uses a battery of processes to prove or disprove theories, making sure than any new hypothesis has no flaws. Including both a null and an alternate hypothesis is one safeguard to ensure your research isn’t flawed. Not including the null hypothesis in your research is considered very bad practice by the scientific community. If you set out to prove an alternate hypothesis without considering it, you are likely setting yourself up for failure. At a minimum, your experiment will likely not be taken seriously.

null hypothesis

  • Null hypothesis : H 0 : The world is flat.
  • Alternate hypothesis: The world is round.

Several scientists, including Copernicus , set out to disprove the null hypothesis. This eventually led to the rejection of the null and the acceptance of the alternate. Most people accepted it — the ones that didn’t created the Flat Earth Society !. What would have happened if Copernicus had not disproved the it and merely proved the alternate? No one would have listened to him. In order to change people’s thinking, he first had to prove that their thinking was wrong .

How to State the Null Hypothesis from a Word Problem

You’ll be asked to convert a word problem into a hypothesis statement in statistics that will include a null hypothesis and an alternate hypothesis . Breaking your problem into a few small steps makes these problems much easier to handle.

how to state the null hypothesis

Step 2: Convert the hypothesis to math . Remember that the average is sometimes written as μ.

H 1 : μ > 8.2

Broken down into (somewhat) English, that’s H 1 (The hypothesis): μ (the average) > (is greater than) 8.2

Step 3: State what will happen if the hypothesis doesn’t come true. If the recovery time isn’t greater than 8.2 weeks, there are only two possibilities, that the recovery time is equal to 8.2 weeks or less than 8.2 weeks.

H 0 : μ ≤ 8.2

Broken down again into English, that’s H 0 (The null hypothesis): μ (the average) ≤ (is less than or equal to) 8.2

How to State the Null Hypothesis: Part Two

But what if the researcher doesn’t have any idea what will happen.

Example Problem: A researcher is studying the effects of radical exercise program on knee surgery patients. There is a good chance the therapy will improve recovery time, but there’s also the possibility it will make it worse. Average recovery times for knee surgery patients is 8.2 weeks. 

Step 1: State what will happen if the experiment doesn’t make any difference. That’s the null hypothesis–that nothing will happen. In this experiment, if nothing happens, then the recovery time will stay at 8.2 weeks.

H 0 : μ = 8.2

Broken down into English, that’s H 0 (The null hypothesis): μ (the average) = (is equal to) 8.2

Step 2: Figure out the alternate hypothesis . The alternate hypothesis is the opposite of the null hypothesis. In other words, what happens if our experiment makes a difference?

H 1 : μ ≠ 8.2

In English again, that’s H 1 (The  alternate hypothesis): μ (the average) ≠ (is not equal to) 8.2

That’s How to State the Null Hypothesis!

Check out our Youtube channel for more stats tips!

Gonick, L. (1993). The Cartoon Guide to Statistics . HarperPerennial. Kotz, S.; et al., eds. (2006), Encyclopedia of Statistical Sciences , Wiley.

Null Hypothesis Definition and Examples

PM Images / Getty Images

  • Chemical Laws
  • Periodic Table
  • Projects & Experiments
  • Scientific Method
  • Biochemistry
  • Physical Chemistry
  • Medical Chemistry
  • Chemistry In Everyday Life
  • Famous Chemists
  • Activities for Kids
  • Abbreviations & Acronyms
  • Weather & Climate
  • Ph.D., Biomedical Sciences, University of Tennessee at Knoxville
  • B.A., Physics and Mathematics, Hastings College

In a scientific experiment, the null hypothesis is the proposition that there is no effect or no relationship between phenomena or populations. If the null hypothesis is true, any observed difference in phenomena or populations would be due to sampling error (random chance) or experimental error. The null hypothesis is useful because it can be tested and found to be false, which then implies that there is a relationship between the observed data. It may be easier to think of it as a nullifiable hypothesis or one that the researcher seeks to nullify. The null hypothesis is also known as the H 0, or no-difference hypothesis.

The alternate hypothesis, H A or H 1 , proposes that observations are influenced by a non-random factor. In an experiment, the alternate hypothesis suggests that the experimental or independent variable has an effect on the dependent variable .

How to State a Null Hypothesis

There are two ways to state a null hypothesis. One is to state it as a declarative sentence, and the other is to present it as a mathematical statement.

For example, say a researcher suspects that exercise is correlated to weight loss, assuming diet remains unchanged. The average length of time to achieve a certain amount of weight loss is six weeks when a person works out five times a week. The researcher wants to test whether weight loss takes longer to occur if the number of workouts is reduced to three times a week.

The first step to writing the null hypothesis is to find the (alternate) hypothesis. In a word problem like this, you're looking for what you expect to be the outcome of the experiment. In this case, the hypothesis is "I expect weight loss to take longer than six weeks."

This can be written mathematically as: H 1 : μ > 6

In this example, μ is the average.

Now, the null hypothesis is what you expect if this hypothesis does not happen. In this case, if weight loss isn't achieved in greater than six weeks, then it must occur at a time equal to or less than six weeks. This can be written mathematically as:

H 0 : μ ≤ 6

The other way to state the null hypothesis is to make no assumption about the outcome of the experiment. In this case, the null hypothesis is simply that the treatment or change will have no effect on the outcome of the experiment. For this example, it would be that reducing the number of workouts would not affect the time needed to achieve weight loss:

H 0 : μ = 6

Null Hypothesis Examples

"Hyperactivity is unrelated to eating sugar " is an example of a null hypothesis. If the hypothesis is tested and found to be false, using statistics, then a connection between hyperactivity and sugar ingestion may be indicated. A significance test is the most common statistical test used to establish confidence in a null hypothesis.

Another example of a null hypothesis is "Plant growth rate is unaffected by the presence of cadmium in the soil ." A researcher could test the hypothesis by measuring the growth rate of plants grown in a medium lacking cadmium, compared with the growth rate of plants grown in mediums containing different amounts of cadmium. Disproving the null hypothesis would set the groundwork for further research into the effects of different concentrations of the element in soil.

Why Test a Null Hypothesis?

You may be wondering why you would want to test a hypothesis just to find it false. Why not just test an alternate hypothesis and find it true? The short answer is that it is part of the scientific method. In science, propositions are not explicitly "proven." Rather, science uses math to determine the probability that a statement is true or false. It turns out it's much easier to disprove a hypothesis than to positively prove one. Also, while the null hypothesis may be simply stated, there's a good chance the alternate hypothesis is incorrect.

For example, if your null hypothesis is that plant growth is unaffected by duration of sunlight, you could state the alternate hypothesis in several different ways. Some of these statements might be incorrect. You could say plants are harmed by more than 12 hours of sunlight or that plants need at least three hours of sunlight, etc. There are clear exceptions to those alternate hypotheses, so if you test the wrong plants, you could reach the wrong conclusion. The null hypothesis is a general statement that can be used to develop an alternate hypothesis, which may or may not be correct.

  • Kelvin Temperature Scale Definition
  • Independent Variable Definition and Examples
  • Theory Definition in Science
  • Hypothesis Definition (Science)
  • de Broglie Equation Definition
  • Law of Combining Volumes Definition
  • Chemical Definition
  • Pure Substance Definition in Chemistry
  • Acid Definition and Examples
  • Extensive Property Definition (Chemistry)
  • Radiation Definition and Examples
  • Valence Definition in Chemistry
  • Atomic Solid Definition
  • Weak Base Definition and Examples
  • Oxidation Definition and Example in Chemistry
  • Definition of Binary Compound

What is The Null Hypothesis & When Do You Reject The Null Hypothesis

Julia Simkus

Editor at Simply Psychology

BA (Hons) Psychology, Princeton University

Julia Simkus is a graduate of Princeton University with a Bachelor of Arts in Psychology. She is currently studying for a Master's Degree in Counseling for Mental Health and Wellness in September 2023. Julia's research has been published in peer reviewed journals.

Learn about our Editorial Process

Saul McLeod, PhD

Editor-in-Chief for Simply Psychology

BSc (Hons) Psychology, MRes, PhD, University of Manchester

Saul McLeod, PhD., is a qualified psychology teacher with over 18 years of experience in further and higher education. He has been published in peer-reviewed journals, including the Journal of Clinical Psychology.

Olivia Guy-Evans, MSc

Associate Editor for Simply Psychology

BSc (Hons) Psychology, MSc Psychology of Education

Olivia Guy-Evans is a writer and associate editor for Simply Psychology. She has previously worked in healthcare and educational sectors.

On This Page:

A null hypothesis is a statistical concept suggesting no significant difference or relationship between measured variables. It’s the default assumption unless empirical evidence proves otherwise.

The null hypothesis states no relationship exists between the two variables being studied (i.e., one variable does not affect the other).

The null hypothesis is the statement that a researcher or an investigator wants to disprove.

Testing the null hypothesis can tell you whether your results are due to the effects of manipulating ​ the dependent variable or due to random chance. 

How to Write a Null Hypothesis

Null hypotheses (H0) start as research questions that the investigator rephrases as statements indicating no effect or relationship between the independent and dependent variables.

It is a default position that your research aims to challenge or confirm.

For example, if studying the impact of exercise on weight loss, your null hypothesis might be:

There is no significant difference in weight loss between individuals who exercise daily and those who do not.

Examples of Null Hypotheses

Research QuestionNull Hypothesis
Do teenagers use cell phones more than adults?Teenagers and adults use cell phones the same amount.
Do tomato plants exhibit a higher rate of growth when planted in compost rather than in soil?Tomato plants show no difference in growth rates when planted in compost rather than soil.
Does daily meditation decrease the incidence of depression?Daily meditation does not decrease the incidence of depression.
Does daily exercise increase test performance?There is no relationship between daily exercise time and test performance.
Does the new vaccine prevent infections?The vaccine does not affect the infection rate.
Does flossing your teeth affect the number of cavities?Flossing your teeth has no effect on the number of cavities.

When Do We Reject The Null Hypothesis? 

We reject the null hypothesis when the data provide strong enough evidence to conclude that it is likely incorrect. This often occurs when the p-value (probability of observing the data given the null hypothesis is true) is below a predetermined significance level.

If the collected data does not meet the expectation of the null hypothesis, a researcher can conclude that the data lacks sufficient evidence to back up the null hypothesis, and thus the null hypothesis is rejected. 

Rejecting the null hypothesis means that a relationship does exist between a set of variables and the effect is statistically significant ( p > 0.05).

If the data collected from the random sample is not statistically significance , then the null hypothesis will be accepted, and the researchers can conclude that there is no relationship between the variables. 

You need to perform a statistical test on your data in order to evaluate how consistent it is with the null hypothesis. A p-value is one statistical measurement used to validate a hypothesis against observed data.

Calculating the p-value is a critical part of null-hypothesis significance testing because it quantifies how strongly the sample data contradicts the null hypothesis.

The level of statistical significance is often expressed as a  p  -value between 0 and 1. The smaller the p-value, the stronger the evidence that you should reject the null hypothesis.

Probability and statistical significance in ab testing. Statistical significance in a b experiments

Usually, a researcher uses a confidence level of 95% or 99% (p-value of 0.05 or 0.01) as general guidelines to decide if you should reject or keep the null.

When your p-value is less than or equal to your significance level, you reject the null hypothesis.

In other words, smaller p-values are taken as stronger evidence against the null hypothesis. Conversely, when the p-value is greater than your significance level, you fail to reject the null hypothesis.

In this case, the sample data provides insufficient data to conclude that the effect exists in the population.

Because you can never know with complete certainty whether there is an effect in the population, your inferences about a population will sometimes be incorrect.

When you incorrectly reject the null hypothesis, it’s called a type I error. When you incorrectly fail to reject it, it’s called a type II error.

Why Do We Never Accept The Null Hypothesis?

The reason we do not say “accept the null” is because we are always assuming the null hypothesis is true and then conducting a study to see if there is evidence against it. And, even if we don’t find evidence against it, a null hypothesis is not accepted.

A lack of evidence only means that you haven’t proven that something exists. It does not prove that something doesn’t exist. 

It is risky to conclude that the null hypothesis is true merely because we did not find evidence to reject it. It is always possible that researchers elsewhere have disproved the null hypothesis, so we cannot accept it as true, but instead, we state that we failed to reject the null. 

One can either reject the null hypothesis, or fail to reject it, but can never accept it.

Why Do We Use The Null Hypothesis?

We can never prove with 100% certainty that a hypothesis is true; We can only collect evidence that supports a theory. However, testing a hypothesis can set the stage for rejecting or accepting this hypothesis within a certain confidence level.

The null hypothesis is useful because it can tell us whether the results of our study are due to random chance or the manipulation of a variable (with a certain level of confidence).

A null hypothesis is rejected if the measured data is significantly unlikely to have occurred and a null hypothesis is accepted if the observed outcome is consistent with the position held by the null hypothesis.

Rejecting the null hypothesis sets the stage for further experimentation to see if a relationship between two variables exists. 

Hypothesis testing is a critical part of the scientific method as it helps decide whether the results of a research study support a particular theory about a given population. Hypothesis testing is a systematic way of backing up researchers’ predictions with statistical analysis.

It helps provide sufficient statistical evidence that either favors or rejects a certain hypothesis about the population parameter. 

Purpose of a Null Hypothesis 

  • The primary purpose of the null hypothesis is to disprove an assumption. 
  • Whether rejected or accepted, the null hypothesis can help further progress a theory in many scientific cases.
  • A null hypothesis can be used to ascertain how consistent the outcomes of multiple studies are.

Do you always need both a Null Hypothesis and an Alternative Hypothesis?

The null (H0) and alternative (Ha or H1) hypotheses are two competing claims that describe the effect of the independent variable on the dependent variable. They are mutually exclusive, which means that only one of the two hypotheses can be true. 

While the null hypothesis states that there is no effect in the population, an alternative hypothesis states that there is statistical significance between two variables. 

The goal of hypothesis testing is to make inferences about a population based on a sample. In order to undertake hypothesis testing, you must express your research hypothesis as a null and alternative hypothesis. Both hypotheses are required to cover every possible outcome of the study. 

What is the difference between a null hypothesis and an alternative hypothesis?

The alternative hypothesis is the complement to the null hypothesis. The null hypothesis states that there is no effect or no relationship between variables, while the alternative hypothesis claims that there is an effect or relationship in the population.

It is the claim that you expect or hope will be true. The null hypothesis and the alternative hypothesis are always mutually exclusive, meaning that only one can be true at a time.

What are some problems with the null hypothesis?

One major problem with the null hypothesis is that researchers typically will assume that accepting the null is a failure of the experiment. However, accepting or rejecting any hypothesis is a positive result. Even if the null is not refuted, the researchers will still learn something new.

Why can a null hypothesis not be accepted?

We can either reject or fail to reject a null hypothesis, but never accept it. If your test fails to detect an effect, this is not proof that the effect doesn’t exist. It just means that your sample did not have enough evidence to conclude that it exists.

We can’t accept a null hypothesis because a lack of evidence does not prove something that does not exist. Instead, we fail to reject it.

Failing to reject the null indicates that the sample did not provide sufficient enough evidence to conclude that an effect exists.

If the p-value is greater than the significance level, then you fail to reject the null hypothesis.

Is a null hypothesis directional or non-directional?

A hypothesis test can either contain an alternative directional hypothesis or a non-directional alternative hypothesis. A directional hypothesis is one that contains the less than (“<“) or greater than (“>”) sign.

A nondirectional hypothesis contains the not equal sign (“≠”).  However, a null hypothesis is neither directional nor non-directional.

A null hypothesis is a prediction that there will be no change, relationship, or difference between two variables.

The directional hypothesis or nondirectional hypothesis would then be considered alternative hypotheses to the null hypothesis.

Gill, J. (1999). The insignificance of null hypothesis significance testing.  Political research quarterly ,  52 (3), 647-674.

Krueger, J. (2001). Null hypothesis significance testing: On the survival of a flawed method.  American Psychologist ,  56 (1), 16.

Masson, M. E. (2011). A tutorial on a practical Bayesian alternative to null-hypothesis significance testing.  Behavior research methods ,  43 , 679-690.

Nickerson, R. S. (2000). Null hypothesis significance testing: a review of an old and continuing controversy.  Psychological methods ,  5 (2), 241.

Rozeboom, W. W. (1960). The fallacy of the null-hypothesis significance test.  Psychological bulletin ,  57 (5), 416.

Print Friendly, PDF & Email

Logo for BCcampus Open Publishing

Want to create or adapt books like this? Learn more about how Pressbooks supports open publishing practices.

Chapter 13: Inferential Statistics

Understanding Null Hypothesis Testing

Learning Objectives

  • Explain the purpose of null hypothesis testing, including the role of sampling error.
  • Describe the basic logic of null hypothesis testing.
  • Describe the role of relationship strength and sample size in determining statistical significance and make reasonable judgments about statistical significance based on these two factors.

The Purpose of Null Hypothesis Testing

As we have seen, psychological research typically involves measuring one or more variables for a sample and computing descriptive statistics for that sample. In general, however, the researcher’s goal is not to draw conclusions about that sample but to draw conclusions about the population that the sample was selected from. Thus researchers must use sample statistics to draw conclusions about the corresponding values in the population. These corresponding values in the population are called  parameters . Imagine, for example, that a researcher measures the number of depressive symptoms exhibited by each of 50 clinically depressed adults and computes the mean number of symptoms. The researcher probably wants to use this sample statistic (the mean number of symptoms for the sample) to draw conclusions about the corresponding population parameter (the mean number of symptoms for clinically depressed adults).

Unfortunately, sample statistics are not perfect estimates of their corresponding population parameters. This is because there is a certain amount of random variability in any statistic from sample to sample. The mean number of depressive symptoms might be 8.73 in one sample of clinically depressed adults, 6.45 in a second sample, and 9.44 in a third—even though these samples are selected randomly from the same population. Similarly, the correlation (Pearson’s  r ) between two variables might be +.24 in one sample, −.04 in a second sample, and +.15 in a third—again, even though these samples are selected randomly from the same population. This random variability in a statistic from sample to sample is called  sampling error . (Note that the term error  here refers to random variability and does not imply that anyone has made a mistake. No one “commits a sampling error.”)

One implication of this is that when there is a statistical relationship in a sample, it is not always clear that there is a statistical relationship in the population. A small difference between two group means in a sample might indicate that there is a small difference between the two group means in the population. But it could also be that there is no difference between the means in the population and that the difference in the sample is just a matter of sampling error. Similarly, a Pearson’s  r  value of −.29 in a sample might mean that there is a negative relationship in the population. But it could also be that there is no relationship in the population and that the relationship in the sample is just a matter of sampling error.

In fact, any statistical relationship in a sample can be interpreted in two ways:

  • There is a relationship in the population, and the relationship in the sample reflects this.
  • There is no relationship in the population, and the relationship in the sample reflects only sampling error.

The purpose of null hypothesis testing is simply to help researchers decide between these two interpretations.

The Logic of Null Hypothesis Testing

Null hypothesis testing  is a formal approach to deciding between two interpretations of a statistical relationship in a sample. One interpretation is called the   null hypothesis  (often symbolized  H 0  and read as “H-naught”). This is the idea that there is no relationship in the population and that the relationship in the sample reflects only sampling error. Informally, the null hypothesis is that the sample relationship “occurred by chance.” The other interpretation is called the  alternative hypothesis  (often symbolized as  H 1 ). This is the idea that there is a relationship in the population and that the relationship in the sample reflects this relationship in the population.

Again, every statistical relationship in a sample can be interpreted in either of these two ways: It might have occurred by chance, or it might reflect a relationship in the population. So researchers need a way to decide between them. Although there are many specific null hypothesis testing techniques, they are all based on the same general logic. The steps are as follows:

  • Assume for the moment that the null hypothesis is true. There is no relationship between the variables in the population.
  • Determine how likely the sample relationship would be if the null hypothesis were true.
  • If the sample relationship would be extremely unlikely, then reject the null hypothesis  in favour of the alternative hypothesis. If it would not be extremely unlikely, then  retain the null hypothesis .

Following this logic, we can begin to understand why Mehl and his colleagues concluded that there is no difference in talkativeness between women and men in the population. In essence, they asked the following question: “If there were no difference in the population, how likely is it that we would find a small difference of  d  = 0.06 in our sample?” Their answer to this question was that this sample relationship would be fairly likely if the null hypothesis were true. Therefore, they retained the null hypothesis—concluding that there is no evidence of a sex difference in the population. We can also see why Kanner and his colleagues concluded that there is a correlation between hassles and symptoms in the population. They asked, “If the null hypothesis were true, how likely is it that we would find a strong correlation of +.60 in our sample?” Their answer to this question was that this sample relationship would be fairly unlikely if the null hypothesis were true. Therefore, they rejected the null hypothesis in favour of the alternative hypothesis—concluding that there is a positive correlation between these variables in the population.

A crucial step in null hypothesis testing is finding the likelihood of the sample result if the null hypothesis were true. This probability is called the  p value . A low  p  value means that the sample result would be unlikely if the null hypothesis were true and leads to the rejection of the null hypothesis. A high  p  value means that the sample result would be likely if the null hypothesis were true and leads to the retention of the null hypothesis. But how low must the  p  value be before the sample result is considered unlikely enough to reject the null hypothesis? In null hypothesis testing, this criterion is called  α (alpha)  and is almost always set to .05. If there is less than a 5% chance of a result as extreme as the sample result if the null hypothesis were true, then the null hypothesis is rejected. When this happens, the result is said to be  statistically significant . If there is greater than a 5% chance of a result as extreme as the sample result when the null hypothesis is true, then the null hypothesis is retained. This does not necessarily mean that the researcher accepts the null hypothesis as true—only that there is not currently enough evidence to conclude that it is true. Researchers often use the expression “fail to reject the null hypothesis” rather than “retain the null hypothesis,” but they never use the expression “accept the null hypothesis.”

The Misunderstood  p  Value

The  p  value is one of the most misunderstood quantities in psychological research (Cohen, 1994) [1] . Even professional researchers misinterpret it, and it is not unusual for such misinterpretations to appear in statistics textbooks!

The most common misinterpretation is that the  p  value is the probability that the null hypothesis is true—that the sample result occurred by chance. For example, a misguided researcher might say that because the  p  value is .02, there is only a 2% chance that the result is due to chance and a 98% chance that it reflects a real relationship in the population. But this is incorrect . The  p  value is really the probability of a result at least as extreme as the sample result  if  the null hypothesis  were  true. So a  p  value of .02 means that if the null hypothesis were true, a sample result this extreme would occur only 2% of the time.

You can avoid this misunderstanding by remembering that the  p  value is not the probability that any particular  hypothesis  is true or false. Instead, it is the probability of obtaining the  sample result  if the null hypothesis were true.

Role of Sample Size and Relationship Strength

Recall that null hypothesis testing involves answering the question, “If the null hypothesis were true, what is the probability of a sample result as extreme as this one?” In other words, “What is the  p  value?” It can be helpful to see that the answer to this question depends on just two considerations: the strength of the relationship and the size of the sample. Specifically, the stronger the sample relationship and the larger the sample, the less likely the result would be if the null hypothesis were true. That is, the lower the  p  value. This should make sense. Imagine a study in which a sample of 500 women is compared with a sample of 500 men in terms of some psychological characteristic, and Cohen’s  d  is a strong 0.50. If there were really no sex difference in the population, then a result this strong based on such a large sample should seem highly unlikely. Now imagine a similar study in which a sample of three women is compared with a sample of three men, and Cohen’s  d  is a weak 0.10. If there were no sex difference in the population, then a relationship this weak based on such a small sample should seem likely. And this is precisely why the null hypothesis would be rejected in the first example and retained in the second.

Of course, sometimes the result can be weak and the sample large, or the result can be strong and the sample small. In these cases, the two considerations trade off against each other so that a weak result can be statistically significant if the sample is large enough and a strong relationship can be statistically significant even if the sample is small. Table 13.1 shows roughly how relationship strength and sample size combine to determine whether a sample result is statistically significant. The columns of the table represent the three levels of relationship strength: weak, medium, and strong. The rows represent four sample sizes that can be considered small, medium, large, and extra large in the context of psychological research. Thus each cell in the table represents a combination of relationship strength and sample size. If a cell contains the word  Yes , then this combination would be statistically significant for both Cohen’s  d  and Pearson’s  r . If it contains the word  No , then it would not be statistically significant for either. There is one cell where the decision for  d  and  r  would be different and another where it might be different depending on some additional considerations, which are discussed in Section 13.2 “Some Basic Null Hypothesis Tests”

Table 13.1 How Relationship Strength and Sample Size Combine to Determine Whether a Result Is Statistically Significant
Sample Size Weak relationship Medium-strength relationship Strong relationship
Small (  = 20) No No  = Maybe

 = Yes

Medium (  = 50) No Yes Yes
Large (  = 100)  = Yes

 = No

Yes Yes
Extra large (  = 500) Yes Yes Yes

Although Table 13.1 provides only a rough guideline, it shows very clearly that weak relationships based on medium or small samples are never statistically significant and that strong relationships based on medium or larger samples are always statistically significant. If you keep this lesson in mind, you will often know whether a result is statistically significant based on the descriptive statistics alone. It is extremely useful to be able to develop this kind of intuitive judgment. One reason is that it allows you to develop expectations about how your formal null hypothesis tests are going to come out, which in turn allows you to detect problems in your analyses. For example, if your sample relationship is strong and your sample is medium, then you would expect to reject the null hypothesis. If for some reason your formal null hypothesis test indicates otherwise, then you need to double-check your computations and interpretations. A second reason is that the ability to make this kind of intuitive judgment is an indication that you understand the basic logic of this approach in addition to being able to do the computations.

Statistical Significance Versus Practical Significance

Table 13.1 illustrates another extremely important point. A statistically significant result is not necessarily a strong one. Even a very weak result can be statistically significant if it is based on a large enough sample. This is closely related to Janet Shibley Hyde’s argument about sex differences (Hyde, 2007) [2] . The differences between women and men in mathematical problem solving and leadership ability are statistically significant. But the word  significant  can cause people to interpret these differences as strong and important—perhaps even important enough to influence the college courses they take or even who they vote for. As we have seen, however, these statistically significant differences are actually quite weak—perhaps even “trivial.”

This is why it is important to distinguish between the  statistical  significance of a result and the  practical  significance of that result.  Practical significance refers to the importance or usefulness of the result in some real-world context. Many sex differences are statistically significant—and may even be interesting for purely scientific reasons—but they are not practically significant. In clinical practice, this same concept is often referred to as “clinical significance.” For example, a study on a new treatment for social phobia might show that it produces a statistically significant positive effect. Yet this effect still might not be strong enough to justify the time, effort, and other costs of putting it into practice—especially if easier and cheaper treatments that work almost as well already exist. Although statistically significant, this result would be said to lack practical or clinical significance.

Key Takeaways

  • Null hypothesis testing is a formal approach to deciding whether a statistical relationship in a sample reflects a real relationship in the population or is just due to chance.
  • The logic of null hypothesis testing involves assuming that the null hypothesis is true, finding how likely the sample result would be if this assumption were correct, and then making a decision. If the sample result would be unlikely if the null hypothesis were true, then it is rejected in favour of the alternative hypothesis. If it would not be unlikely, then the null hypothesis is retained.
  • The probability of obtaining the sample result if the null hypothesis were true (the  p  value) is based on two considerations: relationship strength and sample size. Reasonable judgments about whether a sample relationship is statistically significant can often be made by quickly considering these two factors.
  • Statistical significance is not the same as relationship strength or importance. Even weak relationships can be statistically significant if the sample size is large enough. It is important to consider relationship strength and the practical significance of a result in addition to its statistical significance.
  • Discussion: Imagine a study showing that people who eat more broccoli tend to be happier. Explain for someone who knows nothing about statistics why the researchers would conduct a null hypothesis test.
  • The correlation between two variables is  r  = −.78 based on a sample size of 137.
  • The mean score on a psychological characteristic for women is 25 ( SD  = 5) and the mean score for men is 24 ( SD  = 5). There were 12 women and 10 men in this study.
  • In a memory experiment, the mean number of items recalled by the 40 participants in Condition A was 0.50 standard deviations greater than the mean number recalled by the 40 participants in Condition B.
  • In another memory experiment, the mean scores for participants in Condition A and Condition B came out exactly the same!
  • A student finds a correlation of  r  = .04 between the number of units the students in his research methods class are taking and the students’ level of stress.

Long Descriptions

“Null Hypothesis” long description: A comic depicting a man and a woman talking in the foreground. In the background is a child working at a desk. The man says to the woman, “I can’t believe schools are still teaching kids about the null hypothesis. I remember reading a big study that conclusively disproved it years ago.” [Return to “Null Hypothesis”]

“Conditional Risk” long description: A comic depicting two hikers beside a tree during a thunderstorm. A bolt of lightning goes “crack” in the dark sky as thunder booms. One of the hikers says, “Whoa! We should get inside!” The other hiker says, “It’s okay! Lightning only kills about 45 Americans a year, so the chances of dying are only one in 7,000,000. Let’s go on!” The comic’s caption says, “The annual death rate among people who know that statistic is one in six.” [Return to “Conditional Risk”]

Media Attributions

  • Null Hypothesis by XKCD  CC BY-NC (Attribution NonCommercial)
  • Conditional Risk by XKCD  CC BY-NC (Attribution NonCommercial)
  • Cohen, J. (1994). The world is round: p < .05. American Psychologist, 49 , 997–1003. ↵
  • Hyde, J. S. (2007). New directions in the study of gender similarities and differences. Current Directions in Psychological Science, 16 , 259–263. ↵

Values in a population that correspond to variables measured in a study.

The random variability in a statistic from sample to sample.

A formal approach to deciding between two interpretations of a statistical relationship in a sample.

The idea that there is no relationship in the population and that the relationship in the sample reflects only sampling error.

The idea that there is a relationship in the population and that the relationship in the sample reflects this relationship in the population.

When the relationship found in the sample would be extremely unlikely, the idea that the relationship occurred “by chance” is rejected.

When the relationship found in the sample is likely to have occurred by chance, the null hypothesis is not rejected.

The probability that, if the null hypothesis were true, the result found in the sample would occur.

How low the p value must be before the sample result is considered unlikely in null hypothesis testing.

When there is less than a 5% chance of a result as extreme as the sample result occurring and the null hypothesis is rejected.

Research Methods in Psychology - 2nd Canadian Edition Copyright © 2015 by Paul C. Price, Rajiv Jhangiani, & I-Chant A. Chiang is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License , except where otherwise noted.

Share This Book

null hypothesis research

9.1 Null and Alternative Hypotheses

The actual test begins by considering two hypotheses . They are called the null hypothesis and the alternative hypothesis . These hypotheses contain opposing viewpoints.

H 0 , the — null hypothesis: a statement of no difference between sample means or proportions or no difference between a sample mean or proportion and a population mean or proportion. In other words, the difference equals 0.

H a —, the alternative hypothesis: a claim about the population that is contradictory to H 0 and what we conclude when we reject H 0 .

Since the null and alternative hypotheses are contradictory, you must examine evidence to decide if you have enough evidence to reject the null hypothesis or not. The evidence is in the form of sample data.

After you have determined which hypothesis the sample supports, you make a decision. There are two options for a decision. They are reject H 0 if the sample information favors the alternative hypothesis or do not reject H 0 or decline to reject H 0 if the sample information is insufficient to reject the null hypothesis.

Mathematical Symbols Used in H 0 and H a :

equal (=) not equal (≠) greater than (>) less than (<)
greater than or equal to (≥) less than (<)
less than or equal to (≤) more than (>)

H 0 always has a symbol with an equal in it. H a never has a symbol with an equal in it. The choice of symbol depends on the wording of the hypothesis test. However, be aware that many researchers use = in the null hypothesis, even with > or < as the symbol in the alternative hypothesis. This practice is acceptable because we only make the decision to reject or not reject the null hypothesis.

Example 9.1

H 0 : No more than 30 percent of the registered voters in Santa Clara County voted in the primary election. p ≤ 30 H a : More than 30 percent of the registered voters in Santa Clara County voted in the primary election. p > 30

A medical trial is conducted to test whether or not a new medicine reduces cholesterol by 25 percent. State the null and alternative hypotheses.

Example 9.2

We want to test whether the mean GPA of students in American colleges is different from 2.0 (out of 4.0). The null and alternative hypotheses are the following: H 0 : μ = 2.0 H a : μ ≠ 2.0

We want to test whether the mean height of eighth graders is 66 inches. State the null and alternative hypotheses. Fill in the correct symbol (=, ≠, ≥, <, ≤, >) for the null and alternative hypotheses.

  • H 0 : μ __ 66
  • H a : μ __ 66

Example 9.3

We want to test if college students take fewer than five years to graduate from college, on the average. The null and alternative hypotheses are the following: H 0 : μ ≥ 5 H a : μ < 5

We want to test if it takes fewer than 45 minutes to teach a lesson plan. State the null and alternative hypotheses. Fill in the correct symbol ( =, ≠, ≥, <, ≤, >) for the null and alternative hypotheses.

  • H 0 : μ __ 45
  • H a : μ __ 45

Example 9.4

An article on school standards stated that about half of all students in France, Germany, and Israel take advanced placement exams and a third of the students pass. The same article stated that 6.6 percent of U.S. students take advanced placement exams and 4.4 percent pass. Test if the percentage of U.S. students who take advanced placement exams is more than 6.6 percent. State the null and alternative hypotheses. H 0 : p ≤ 0.066 H a : p > 0.066

On a state driver’s test, about 40 percent pass the test on the first try. We want to test if more than 40 percent pass on the first try. Fill in the correct symbol (=, ≠, ≥, <, ≤, >) for the null and alternative hypotheses.

  • H 0 : p __ 0.40
  • H a : p __ 0.40

Collaborative Exercise

Bring to class a newspaper, some news magazines, and some internet articles. In groups, find articles from which your group can write null and alternative hypotheses. Discuss your hypotheses with the rest of the class.

This book may not be used in the training of large language models or otherwise be ingested into large language models or generative AI offerings without OpenStax's permission.

Want to cite, share, or modify this book? This book uses the Creative Commons Attribution License and you must attribute Texas Education Agency (TEA). The original material is available at: https://www.texasgateway.org/book/tea-statistics . Changes were made to the original material, including updates to art, structure, and other content updates.

Access for free at https://openstax.org/books/statistics/pages/1-introduction
  • Authors: Barbara Illowsky, Susan Dean
  • Publisher/website: OpenStax
  • Book title: Statistics
  • Publication date: Mar 27, 2020
  • Location: Houston, Texas
  • Book URL: https://openstax.org/books/statistics/pages/1-introduction
  • Section URL: https://openstax.org/books/statistics/pages/9-1-null-and-alternative-hypotheses

© Apr 16, 2024 Texas Education Agency (TEA). The OpenStax name, OpenStax logo, OpenStax book covers, OpenStax CNX name, and OpenStax CNX logo are not subject to the Creative Commons license and may not be reproduced without the prior and express written consent of Rice University.

Enago Academy

What is Null Hypothesis? What Is Its Importance in Research?

' src=

Scientists begin their research with a hypothesis that a relationship of some kind exists between variables. The null hypothesis is the opposite stating that no such relationship exists. Null hypothesis may seem unexciting, but it is a very important aspect of research. In this article, we discuss what null hypothesis is, how to make use of it, and why you should use it to improve your statistical analyses.

What is the Null Hypothesis?

The null hypothesis can be tested using statistical analysis  and is often written as H 0 (read as “H-naught”). Once you determine how likely the sample relationship would be if the H 0   were true, you can run your analysis. Researchers use a significance test to determine the likelihood that the results supporting the H 0 are not due to chance.

The null hypothesis is not the same as an alternative hypothesis. An alternative hypothesis states, that there is a relationship between two variables, while H 0 posits the opposite. Let us consider the following example.

A researcher wants to discover the relationship between exercise frequency and appetite. She asks:

Q: Does increased exercise frequency lead to increased appetite? Alternative hypothesis: Increased exercise frequency leads to increased appetite. H 0 assumes that there is no relationship between the two variables: Increased exercise frequency does not lead to increased appetite.

Let us look at another example of how to state the null hypothesis:

Q: Does insufficient sleep lead to an increased risk of heart attack among men over age 50? H 0 : The amount of sleep men over age 50 get does not increase their risk of heart attack.

Why is Null Hypothesis Important?

Many scientists often neglect null hypothesis in their testing. As shown in the above examples, H 0 is often assumed to be the opposite of the hypothesis being tested. However, it is good practice to include H 0 and ensure it is carefully worded. To understand why, let us return to our previous example. In this case,

Alternative hypothesis: Getting too little sleep leads to an increased risk of heart attack among men over age 50.

H 0 : The amount of sleep men over age 50 get has no effect on their risk of heart attack.

Note that this H 0 is different than the one in our first example. What if we were to conduct this experiment and find that neither H 0 nor the alternative hypothesis was supported? The experiment would be considered invalid . Take our original H 0 in this case, “the amount of sleep men over age 50 get, does not increase their risk of heart attack”. If this H 0 is found to be untrue, and so is the alternative, we can still consider a third hypothesis. Perhaps getting insufficient sleep actually decreases the risk of a heart attack among men over age 50. Because we have tested H 0 , we have more information that we would not have if we had neglected it.

Do I Really Need to Test It?

The biggest problem with the null hypothesis is that many scientists see accepting it as a failure of the experiment. They consider that they have not proven anything of value. However, as we have learned from the replication crisis , negative results are just as important as positive ones. While they may seem less appealing to publishers, they can tell the scientific community important information about correlations that do or do not exist. In this way, they can drive science forward and prevent the wastage of resources.

Do you test for the null hypothesis? Why or why not? Let us know your thoughts in the comments below.

' src=

The following null hypotheses were formulated for this study: Ho1. There are no significant differences in the factors that influence urban gardening when respondents are grouped according to age, sex, household size, social status and average combined monthly income.

Rate this article Cancel Reply

Your email address will not be published.

null hypothesis research

Enago Academy's Most Popular Articles

Revolutionize Your Learning: The Power of Webinars in a Digital Age

  • Career Corner

Academic Webinars: Transforming knowledge dissemination in the digital age

Digitization has transformed several areas of our lives, including the teaching and learning process. During…

Secure Research Funding in 2024: AI-Powered Grant Writing Strategies

  • Manuscripts & Grants
  • Reporting Research

Mastering Research Grant Writing in 2024: Navigating new policies and funder demands

Entering the world of grants and government funding can leave you confused; especially when trying…

How to Create a Poster Presentation : A step-by-step guide

How to Create a Poster That Stands Out: Tips for a smooth poster presentation

It was the conference season. Judy was excited to present her first poster! She had…

Types of Essays in Academic Writing - Quick Guide (2024)

Academic Essay Writing Made Simple: 4 types and tips

The pen is mightier than the sword, they say, and nowhere is this more evident…

What is Academic Integrity and How to Uphold it [FREE CHECKLIST]

Ensuring Academic Integrity and Transparency in Academic Research: A comprehensive checklist for researchers

Academic integrity is the foundation upon which the credibility and value of scientific findings are…

Recognizing the Signs: A guide to overcoming academic burnout

Intersectionality in Academia: Dealing with diverse perspectives

null hypothesis research

Sign-up to read more

Subscribe for free to get unrestricted access to all our resources on research writing and academic publishing including:

  • 2000+ blog articles
  • 50+ Webinars
  • 10+ Expert podcasts
  • 50+ Infographics
  • 10+ Checklists
  • Research Guides

We hate spam too. We promise to protect your privacy and never spam you.

  • Industry News
  • Publishing Research
  • AI in Academia
  • Promoting Research
  • Diversity and Inclusion
  • Infographics
  • Expert Video Library
  • Other Resources
  • Enago Learn
  • Upcoming & On-Demand Webinars
  • Peer Review Week 2024
  • Open Access Week 2023
  • Conference Videos
  • Enago Report
  • Journal Finder
  • Enago Plagiarism & AI Grammar Check
  • Editing Services
  • Publication Support Services
  • Research Impact
  • Translation Services
  • Publication solutions
  • AI-Based Solutions
  • Thought Leadership
  • Call for Articles
  • Call for Speakers
  • Author Training
  • Edit Profile

I am looking for Editing/ Proofreading services for my manuscript Tentative date of next journal submission:

null hypothesis research

In your opinion, what is the most effective way to improve integrity in the peer review process?

Back Home

  • Science Notes Posts
  • Contact Science Notes
  • Todd Helmenstine Biography
  • Anne Helmenstine Biography
  • Free Printable Periodic Tables (PDF and PNG)
  • Periodic Table Wallpapers
  • Interactive Periodic Table
  • Periodic Table Posters
  • Science Experiments for Kids
  • How to Grow Crystals
  • Chemistry Projects
  • Fire and Flames Projects
  • Holiday Science
  • Chemistry Problems With Answers
  • Physics Problems
  • Unit Conversion Example Problems
  • Chemistry Worksheets
  • Biology Worksheets
  • Periodic Table Worksheets
  • Physical Science Worksheets
  • Science Lab Worksheets
  • My Amazon Books

Null Hypothesis Examples

Null Hypothesis Example

The null hypothesis (H 0 ) is the hypothesis that states there is no statistical difference between two sample sets. In other words, it assumes the independent variable does not have an effect on the dependent variable in a scientific experiment .

The null hypothesis is the most powerful type of hypothesis in the scientific method because it’s the easiest one to test with a high confidence level using statistics. If the null hypothesis is accepted, then it’s evidence any observed differences between two experiment groups are due to random chance. If the null hypothesis is rejected, then it’s strong evidence there is a true difference between test sets or that the independent variable affects the dependent variable.

  • The null hypothesis is a nullifiable hypothesis. A researcher seeks to reject it because this result strongly indicates observed differences are real and not just due to chance.
  • The null hypothesis may be accepted or rejected, but not proven. There is always a level of confidence in the outcome.

What Is the Null Hypothesis?

The null hypothesis is written as H 0 , which is read as H-zero, H-nought, or H-null. It is associated with another hypothesis, called the alternate or alternative hypothesis H A or H 1 . When the null hypothesis and alternate hypothesis are written mathematically, they cover all possible outcomes of an experiment.

An experimenter tests the null hypothesis with a statistical analysis called a significance test. The significance test determines the likelihood that the results of the test are not due to chance. Usually, a researcher uses a confidence level of 95% or 99% (p-value of 0.05 or 0.01). But, even if the confidence in the test is high, there is always a small chance the outcome is incorrect. This means you can’t prove a null hypothesis. It’s also a good reason why it’s important to repeat experiments.

Exact and Inexact Null Hypothesis

The most common type of null hypothesis assumes no difference between two samples or groups or no measurable effect of a treatment. This is the exact hypothesis . If you’re asked to state a null hypothesis for a science class, this is the one to write. It is the easiest type of hypothesis to test and is the only one accepted for certain types of analysis. Examples include:

There is no difference between two groups H 0 : μ 1  = μ 2 (where H 0  = the null hypothesis, μ 1  = the mean of population 1, and μ 2  = the mean of population 2)

Both groups have value of 100 (or any number or quality) H 0 : μ = 100

However, sometimes a researcher may test an inexact hypothesis . This type of hypothesis specifies ranges or intervals. Examples include:

Recovery time from a treatment is the same or worse than a placebo: H 0 : μ ≥ placebo time

There is a 5% or less difference between two groups: H 0 : 95 ≤ μ ≤ 105

An inexact hypothesis offers “directionality” about a phenomenon. For example, an exact hypothesis can indicate whether or not a treatment has an effect, while an inexact hypothesis can tell whether an effect is positive of negative. However, an inexact hypothesis may be harder to test and some scientists and statisticians disagree about whether it’s a true null hypothesis .

How to State the Null Hypothesis

To state the null hypothesis, first state what you expect the experiment to show. Then, rephrase the statement in a form that assumes there is no relationship between the variables or that a treatment has no effect.

Example: A researcher tests whether a new drug speeds recovery time from a certain disease. The average recovery time without treatment is 3 weeks.

  • State the goal of the experiment: “I hope the average recovery time with the new drug will be less than 3 weeks.”
  • Rephrase the hypothesis to assume the treatment has no effect: “If the drug doesn’t shorten recovery time, then the average time will be 3 weeks or longer.” Mathematically: H 0 : μ ≥ 3

This null hypothesis (inexact hypothesis) covers both the scenario in which the drug has no effect and the one in which the drugs makes the recovery time longer. The alternate hypothesis is that average recovery time will be less than three weeks:

H A : μ < 3

Of course, the researcher could test the no-effect hypothesis (exact null hypothesis): H 0 : μ = 3

The danger of testing this hypothesis is that rejecting it only implies the drug affected recovery time (not whether it made it better or worse). This is because the alternate hypothesis is:

H A : μ ≠ 3 (which includes μ <3 and μ >3)

Even though the no-effect null hypothesis yields less information, it’s used because it’s easier to test using statistics. Basically, testing whether something is unchanged/changed is easier than trying to quantify the nature of the change.

Remember, a researcher hopes to reject the null hypothesis because this supports the alternate hypothesis. Also, be sure the null and alternate hypothesis cover all outcomes. Finally, remember a simple true/false, equal/unequal, yes/no exact hypothesis is easier to test than a more complex inexact hypothesis.

Does chewing willow bark relieve pain?Pain relief is the same compared with a . (exact)
Pain relief after chewing willow bark is the same or worse versus taking a placebo. (inexact)
Pain relief is different compared with a placebo. (exact)
Pain relief is better compared to a placebo. (inexact)
Do cats care about the shape of their food?Cats show no food preference based on shape. (exact)Cat show a food preference based on shape. (exact)
Do teens use mobile devices more than adults?Teens and adults use mobile devices the same amount. (exact)
Teens use mobile devices less than or equal to adults. (inexact)
Teens and adults used mobile devices different amounts. (exact)
Teens use mobile devices more than adults. (inexact)
Does the color of light influence plant growth?The color of light has no effect on plant growth. (exact)The color of light affects plant growth. (exact)
  • Adèr, H. J.; Mellenbergh, G. J. & Hand, D. J. (2007).  Advising on Research Methods: A Consultant’s Companion . Huizen, The Netherlands: Johannes van Kessel Publishing. ISBN  978-90-79418-01-5 .
  • Cox, D. R. (2006).  Principles of Statistical Inference . Cambridge University Press. ISBN  978-0-521-68567-2 .
  • Everitt, Brian (1998).  The Cambridge Dictionary of Statistics . Cambridge, UK New York: Cambridge University Press. ISBN 978-0521593465.
  • Weiss, Neil A. (1999).  Introductory Statistics  (5th ed.). ISBN 9780201598773.

Related Posts

Null Hypothesis

A hypothesis that states that there is no relationship between two population parameters

What is the Null Hypothesis?

The null hypothesis states that there is no relationship between two population parameters, i.e., an independent variable and a dependent variable . If the hypothesis shows a relationship between the two parameters, the outcome could be due to an experimental or sampling error. However, if the null hypothesis returns false, there is a relationship in the measured phenomenon.

Null Hypothesis

The null hypothesis is useful because it can be tested to conclude whether or not there is a relationship between two measured phenomena. It can inform the user whether the results obtained are due to chance or manipulating a phenomenon. Testing a hypothesis sets the stage for rejecting or accepting a hypothesis within a certain confidence level.

Two main approaches to statistical inference in a null hypothesis can be used– significance testing by Ronald Fisher and hypothesis testing by Jerzy Neyman and Egon Pearson. Fisher’s significance testing approach states that a null hypothesis is rejected if the measured data is significantly unlikely to have occurred (the null hypothesis is false). Therefore, the null hypothesis is rejected and replaced with an alternative hypothesis.

If the observed outcome is consistent with the position held by the null hypothesis, the hypothesis is accepted. On the other hand, the hypothesis testing by Neyman and Pearson is compared to an alternative hypothesis to make a conclusion about the observed data. The two hypotheses are differentiated based on observed data.

  • A null hypothesis refers to a hypothesis that states that there is no relationship between two population parameters.
  • Researchers reject or disprove the null hypothesis to set the stage for further experimentation or research that explains the position of interest.
  • The inverse of a null hypothesis is an alternative hypothesis, which states that there is statistical significance between two variables.

How the Null Hypothesis Works

A null hypothesis is a theory based on insufficient evidence that requires further testing to prove whether the observed data is true or false. For example, a null hypothesis statement can be “the rate of plant growth is not affected by sunlight.” It can be tested by measuring the growth of plants in the presence of sunlight and comparing this with the growth of plants in the absence of sunlight.

Rejecting the null hypothesis sets the stage for further experimentation to see a relationship between the two variables exists. Rejecting a null hypothesis does not necessarily mean that the experiment did not produce the required results, but it sets the stage for further experimentation.

To differentiate the null hypothesis from other forms of hypothesis, a null hypothesis is written as H 0 , while the alternate hypothesis is written as H A or H 1 . A significance test is used to establish confidence in a null hypothesis and determine whether the observed data is not due to chance or manipulation of data.

Researchers test the hypothesis by examining a random sample of the plants being grown with or without sunlight. If the outcome demonstrates a statistically significant change in the observed change, the null hypothesis is rejected.

Null Hypothesis Example

The annual return of ABC Limited bonds is assumed to be 7.5%. To test if the scenario is true or false, we take the null hypothesis to be “the mean annual return for ABC limited bond is not 7.5%.” To test the hypothesis, we first accept the null hypothesis.

Any information that is against the stated null hypothesis is taken to be the alternative hypothesis for the purpose of testing the hypotheses. In such a case, the alternative hypothesis is “the mean annual return of ABC Limited is 7.5%.”

We take samples of the annual returns of the bond for the last five years to calculate the sample mean for the previous five years. The result is then compared to the assumed annual return average of 7.5% to test the null hypothesis.

The average annual returns for the five-year period are 7.5%; the null hypothesis is rejected. Consequently, the alternative hypothesis is accepted.

What is an Alternative Hypothesis?

An alternative hypothesis is the inverse of a null hypothesis. An alternative hypothesis and a null hypothesis are mutually exclusive, which means that only one of the two hypotheses can be true.

A statistical significance exists between the two variables. If samples used to test the null hypothesis return false, it means that the alternate hypothesis is true, and there is statistical significance between the two variables.

Purpose of Hypothesis Testing

Hypothesis testing is a statistical process of testing an assumption regarding a phenomenon or population parameter. It is a critical part of the scientific method, which is a systematic approach to assessing theories through observations and determining the probability that a stated statement is true or false.

A good theory can make accurate predictions. For an analyst who makes predictions, hypothesis testing is a rigorous way of backing up his prediction with statistical analysis. It also helps determine sufficient statistical evidence that favors a certain hypothesis about the population parameter.

Additional Resources

Thank you for reading CFI’s guide to Null Hypothesis. To keep advancing your career, the additional resources below will be useful:

  • Free Statistics Fundamentals Course
  • Coefficient of Determination
  • Independent Variable
  • Expected Value
  • Nonparametric Statistics
  • See all data science resources
  • Share this article

Excel Fundamentals - Formulas for Finance

Create a free account to unlock this Template

Access and download collection of free Templates to help power your productivity and performance.

Already have an account? Log in

Supercharge your skills with Premium Templates

Take your learning and productivity to the next level with our Premium Templates.

Upgrading to a paid membership gives you access to our extensive collection of plug-and-play Templates designed to power your performance—as well as CFI's full course catalog and accredited Certification Programs.

Already have a Self-Study or Full-Immersion membership? Log in

Access Exclusive Templates

Gain unlimited access to more than 250 productivity Templates, CFI's full course catalog and accredited Certification Programs, hundreds of resources, expert reviews and support, the chance to work with real-world finance and research tools, and more.

Already have a Full-Immersion membership? Log in

U.S. flag

An official website of the United States government

The .gov means it’s official. Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

The site is secure. The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

  • Publications
  • Account settings

Preview improvements coming to the PMC website in October 2024. Learn More or Try it out now .

  • Advanced Search
  • Journal List

Logo of f1000res

  • PMC5635437.1 ; 2015 Aug 25
  • PMC5635437.2 ; 2016 Jul 13
  • ➤ PMC5635437.3; 2016 Oct 10

Null hypothesis significance testing: a short tutorial

Cyril pernet.

1 Centre for Clinical Brain Sciences (CCBS), Neuroimaging Sciences, The University of Edinburgh, Edinburgh, UK

Version Changes

Revised. amendments from version 2.

This v3 includes minor changes that reflect the 3rd reviewers' comments - in particular the theoretical vs. practical difference between Fisher and Neyman-Pearson. Additional information and reference is also included regarding the interpretation of p-value for low powered studies.

Peer Review Summary

Review dateReviewer name(s)Version reviewedReview status
Dorothy Vera Margaret Bishop Approved with Reservations
Stephen J. Senn Approved
Stephen J. Senn Approved with Reservations
Marcel ALM van Assen Not Approved
Daniel Lakens Not Approved

Although thoroughly criticized, null hypothesis significance testing (NHST) remains the statistical method of choice used to provide evidence for an effect, in biological, biomedical and social sciences. In this short tutorial, I first summarize the concepts behind the method, distinguishing test of significance (Fisher) and test of acceptance (Newman-Pearson) and point to common interpretation errors regarding the p-value. I then present the related concepts of confidence intervals and again point to common interpretation errors. Finally, I discuss what should be reported in which context. The goal is to clarify concepts to avoid interpretation errors and propose reporting practices.

The Null Hypothesis Significance Testing framework

NHST is a method of statistical inference by which an experimental factor is tested against a hypothesis of no effect or no relationship based on a given observation. The method is a combination of the concepts of significance testing developed by Fisher in 1925 and of acceptance based on critical rejection regions developed by Neyman & Pearson in 1928 . In the following I am first presenting each approach, highlighting the key differences and common misconceptions that result from their combination into the NHST framework (for a more mathematical comparison, along with the Bayesian method, see Christensen, 2005 ). I next present the related concept of confidence intervals. I finish by discussing practical aspects in using NHST and reporting practice.

Fisher, significance testing, and the p-value

The method developed by ( Fisher, 1934 ; Fisher, 1955 ; Fisher, 1959 ) allows to compute the probability of observing a result at least as extreme as a test statistic (e.g. t value), assuming the null hypothesis of no effect is true. This probability or p-value reflects (1) the conditional probability of achieving the observed outcome or larger: p(Obs≥t|H0), and (2) is therefore a cumulative probability rather than a point estimate. It is equal to the area under the null probability distribution curve from the observed test statistic to the tail of the null distribution ( Turkheimer et al. , 2004 ). The approach proposed is of ‘proof by contradiction’ ( Christensen, 2005 ), we pose the null model and test if data conform to it.

In practice, it is recommended to set a level of significance (a theoretical p-value) that acts as a reference point to identify significant results, that is to identify results that differ from the null-hypothesis of no effect. Fisher recommended using p=0.05 to judge whether an effect is significant or not as it is roughly two standard deviations away from the mean for the normal distribution ( Fisher, 1934 page 45: ‘The value for which p=.05, or 1 in 20, is 1.96 or nearly 2; it is convenient to take this point as a limit in judging whether a deviation is to be considered significant or not’). A key aspect of Fishers’ theory is that only the null-hypothesis is tested, and therefore p-values are meant to be used in a graded manner to decide whether the evidence is worth additional investigation and/or replication ( Fisher, 1971 page 13: ‘it is open to the experimenter to be more or less exacting in respect of the smallness of the probability he would require […]’ and ‘no isolated experiment, however significant in itself, can suffice for the experimental demonstration of any natural phenomenon’). How small the level of significance is, is thus left to researchers.

What is not a p-value? Common mistakes

The p-value is not an indication of the strength or magnitude of an effect . Any interpretation of the p-value in relation to the effect under study (strength, reliability, probability) is wrong, since p-values are conditioned on H0. In addition, while p-values are randomly distributed (if all the assumptions of the test are met) when there is no effect, their distribution depends of both the population effect size and the number of participants, making impossible to infer strength of effect from them.

Similarly, 1-p is not the probability to replicate an effect . Often, a small value of p is considered to mean a strong likelihood of getting the same results on another try, but again this cannot be obtained because the p-value is not informative on the effect itself ( Miller, 2009 ). Because the p-value depends on the number of subjects, it can only be used in high powered studies to interpret results. In low powered studies (typically small number of subjects), the p-value has a large variance across repeated samples, making it unreliable to estimate replication ( Halsey et al. , 2015 ).

A (small) p-value is not an indication favouring a given hypothesis . Because a low p-value only indicates a misfit of the null hypothesis to the data, it cannot be taken as evidence in favour of a specific alternative hypothesis more than any other possible alternatives such as measurement error and selection bias ( Gelman, 2013 ). Some authors have even argued that the more (a priori) implausible the alternative hypothesis, the greater the chance that a finding is a false alarm ( Krzywinski & Altman, 2013 ; Nuzzo, 2014 ).

The p-value is not the probability of the null hypothesis p(H0), of being true, ( Krzywinski & Altman, 2013 ). This common misconception arises from a confusion between the probability of an observation given the null p(Obs≥t|H0) and the probability of the null given an observation p(H0|Obs≥t) that is then taken as an indication for p(H0) (see Nickerson, 2000 ).

Neyman-Pearson, hypothesis testing, and the α-value

Neyman & Pearson (1933) proposed a framework of statistical inference for applied decision making and quality control. In such framework, two hypotheses are proposed: the null hypothesis of no effect and the alternative hypothesis of an effect, along with a control of the long run probabilities of making errors. The first key concept in this approach, is the establishment of an alternative hypothesis along with an a priori effect size. This differs markedly from Fisher who proposed a general approach for scientific inference conditioned on the null hypothesis only. The second key concept is the control of error rates . Neyman & Pearson (1928) introduced the notion of critical intervals, therefore dichotomizing the space of possible observations into correct vs. incorrect zones. This dichotomization allows distinguishing correct results (rejecting H0 when there is an effect and not rejecting H0 when there is no effect) from errors (rejecting H0 when there is no effect, the type I error, and not rejecting H0 when there is an effect, the type II error). In this context, alpha is the probability of committing a Type I error in the long run. Alternatively, Beta is the probability of committing a Type II error in the long run.

The (theoretical) difference in terms of hypothesis testing between Fisher and Neyman-Pearson is illustrated on Figure 1 . In the 1 st case, we choose a level of significance for observed data of 5%, and compute the p-value. If the p-value is below the level of significance, it is used to reject H0. In the 2 nd case, we set a critical interval based on the a priori effect size and error rates. If an observed statistic value is below and above the critical values (the bounds of the confidence region), it is deemed significantly different from H0. In the NHST framework, the level of significance is (in practice) assimilated to the alpha level, which appears as a simple decision rule: if the p-value is less or equal to alpha, the null is rejected. It is however a common mistake to assimilate these two concepts. The level of significance set for a given sample is not the same as the frequency of acceptance alpha found on repeated sampling because alpha (a point estimate) is meant to reflect the long run probability whilst the p-value (a cumulative estimate) reflects the current probability ( Fisher, 1955 ; Hubbard & Bayarri, 2003 ).

An external file that holds a picture, illustration, etc.
Object name is f1000research-4-10487-g0000.jpg

The figure was prepared with G-power for a one-sided one-sample t-test, with a sample size of 32 subjects, an effect size of 0.45, and error rates alpha=0.049 and beta=0.80. In Fisher’s procedure, only the nil-hypothesis is posed, and the observed p-value is compared to an a priori level of significance. If the observed p-value is below this level (here p=0.05), one rejects H0. In Neyman-Pearson’s procedure, the null and alternative hypotheses are specified along with an a priori level of acceptance. If the observed statistical value is outside the critical region (here [-∞ +1.69]), one rejects H0.

Acceptance or rejection of H0?

The acceptance level α can also be viewed as the maximum probability that a test statistic falls into the rejection region when the null hypothesis is true ( Johnson, 2013 ). Therefore, one can only reject the null hypothesis if the test statistics falls into the critical region(s), or fail to reject this hypothesis. In the latter case, all we can say is that no significant effect was observed, but one cannot conclude that the null hypothesis is true. This is another common mistake in using NHST: there is a profound difference between accepting the null hypothesis and simply failing to reject it ( Killeen, 2005 ). By failing to reject, we simply continue to assume that H0 is true, which implies that one cannot argue against a theory from a non-significant result (absence of evidence is not evidence of absence). To accept the null hypothesis, tests of equivalence ( Walker & Nowacki, 2011 ) or Bayesian approaches ( Dienes, 2014 ; Kruschke, 2011 ) must be used.

Confidence intervals

Confidence intervals (CI) are builds that fail to cover the true value at a rate of alpha, the Type I error rate ( Morey & Rouder, 2011 ) and therefore indicate if observed values can be rejected by a (two tailed) test with a given alpha. CI have been advocated as alternatives to p-values because (i) they allow judging the statistical significance and (ii) provide estimates of effect size. Assuming the CI (a)symmetry and width are correct (but see Wilcox, 2012 ), they also give some indication about the likelihood that a similar value can be observed in future studies. For future studies of the same sample size, 95% CI give about 83% chance of replication success ( Cumming & Maillardet, 2006 ). If sample sizes however differ between studies, CI do not however warranty any a priori coverage.

Although CI provide more information, they are not less subject to interpretation errors (see Savalei & Dunn, 2015 for a review). The most common mistake is to interpret CI as the probability that a parameter (e.g. the population mean) will fall in that interval X% of the time. The correct interpretation is that, for repeated measurements with the same sample sizes, taken from the same population, X% of times the CI obtained will contain the true parameter value ( Tan & Tan, 2010 ). The alpha value has the same interpretation as testing against H0, i.e. we accept that 1-alpha CI are wrong in alpha percent of the times in the long run. This implies that CI do not allow to make strong statements about the parameter of interest (e.g. the mean difference) or about H1 ( Hoekstra et al. , 2014 ). To make a statement about the probability of a parameter of interest (e.g. the probability of the mean), Bayesian intervals must be used.

The (correct) use of NHST

NHST has always been criticized, and yet is still used every day in scientific reports ( Nickerson, 2000 ). One question to ask oneself is what is the goal of a scientific experiment at hand? If the goal is to establish a discrepancy with the null hypothesis and/or establish a pattern of order, because both requires ruling out equivalence, then NHST is a good tool ( Frick, 1996 ; Walker & Nowacki, 2011 ). If the goal is to test the presence of an effect and/or establish some quantitative values related to an effect, then NHST is not the method of choice since testing is conditioned on H0.

While a Bayesian analysis is suited to estimate that the probability that a hypothesis is correct, like NHST, it does not prove a theory on itself, but adds its plausibility ( Lindley, 2000 ). No matter what testing procedure is used and how strong results are, ( Fisher, 1959 p13) reminds us that ‘ […] no isolated experiment, however significant in itself, can suffice for the experimental demonstration of any natural phenomenon'. Similarly, the recent statement of the American Statistical Association ( Wasserstein & Lazar, 2016 ) makes it clear that conclusions should be based on the researchers understanding of the problem in context, along with all summary data and tests, and that no single value (being p-values, Bayesian factor or else) can be used support or invalidate a theory.

What to report and how?

Considering that quantitative reports will always have more information content than binary (significant or not) reports, we can always argue that raw and/or normalized effect size, confidence intervals, or Bayes factor must be reported. Reporting everything can however hinder the communication of the main result(s), and we should aim at giving only the information needed, at least in the core of a manuscript. Here I propose to adopt optimal reporting in the result section to keep the message clear, but have detailed supplementary material. When the hypothesis is about the presence/absence or order of an effect, and providing that a study has sufficient power, NHST is appropriate and it is sufficient to report in the text the actual p-value since it conveys the information needed to rule out equivalence. When the hypothesis and/or the discussion involve some quantitative value, and because p-values do not inform on the effect, it is essential to report on effect sizes ( Lakens, 2013 ), preferably accompanied with confidence or credible intervals. The reasoning is simply that one cannot predict and/or discuss quantities without accounting for variability. For the reader to understand and fully appreciate the results, nothing else is needed.

Because science progress is obtained by cumulating evidence ( Rosenthal, 1991 ), scientists should also consider the secondary use of the data. With today’s electronic articles, there are no reasons for not including all of derived data: mean, standard deviations, effect size, CI, Bayes factor should always be included as supplementary tables (or even better also share raw data). It is also essential to report the context in which tests were performed – that is to report all of the tests performed (all t, F, p values) because of the increase type one error rate due to selective reporting (multiple comparisons and p-hacking problems - Ioannidis, 2005 ). Providing all of this information allows (i) other researchers to directly and effectively compare their results in quantitative terms (replication of effects beyond significance, Open Science Collaboration, 2015 ), (ii) to compute power to future studies ( Lakens & Evers, 2014 ), and (iii) to aggregate results for meta-analyses whilst minimizing publication bias ( van Assen et al. , 2014 ).

[version 3; referees: 1 approved

Funding Statement

The author(s) declared that no grants were involved in supporting this work.

  • Christensen R: Testing Fisher, Neyman, Pearson, and Bayes. The American Statistician. 2005; 59 ( 2 ):121–126. 10.1198/000313005X20871 [ CrossRef ] [ Google Scholar ]
  • Cumming G, Maillardet R: Confidence intervals and replication: Where will the next mean fall? Psychological Methods. 2006; 11 ( 3 ):217–227. 10.1037/1082-989X.11.3.217 [ PubMed ] [ CrossRef ] [ Google Scholar ]
  • Dienes Z: Using Bayes to get the most out of non-significant results. Front Psychol. 2014; 5 :781. 10.3389/fpsyg.2014.00781 [ PMC free article ] [ PubMed ] [ CrossRef ] [ Google Scholar ]
  • Fisher RA: Statistical Methods for Research Workers . (Vol. 5th Edition). Edinburgh, UK: Oliver and Boyd.1934. Reference Source [ Google Scholar ]
  • Fisher RA: Statistical Methods and Scientific Induction. Journal of the Royal Statistical Society, Series B. 1955; 17 ( 1 ):69–78. Reference Source [ Google Scholar ]
  • Fisher RA: Statistical methods and scientific inference . (2nd ed.). NewYork: Hafner Publishing,1959. Reference Source [ Google Scholar ]
  • Fisher RA: The Design of Experiments . Hafner Publishing Company, New-York.1971. Reference Source [ Google Scholar ]
  • Frick RW: The appropriate use of null hypothesis testing. Psychol Methods. 1996; 1 ( 4 ):379–390. 10.1037/1082-989X.1.4.379 [ CrossRef ] [ Google Scholar ]
  • Gelman A: P values and statistical practice. Epidemiology. 2013; 24 ( 1 ):69–72. 10.1097/EDE.0b013e31827886f7 [ PubMed ] [ CrossRef ] [ Google Scholar ]
  • Halsey LG, Curran-Everett D, Vowler SL, et al.: The fickle P value generates irreproducible results. Nat Methods. 2015; 12 ( 3 ):179–85. 10.1038/nmeth.3288 [ PubMed ] [ CrossRef ] [ Google Scholar ]
  • Hoekstra R, Morey RD, Rouder JN, et al.: Robust misinterpretation of confidence intervals. Psychon Bull Rev. 2014; 21 ( 5 ):1157–1164. 10.3758/s13423-013-0572-3 [ PubMed ] [ CrossRef ] [ Google Scholar ]
  • Hubbard R, Bayarri MJ: Confusion over measures of evidence (p’s) versus errors ([alpha]’s) in classical statistical testing. The American Statistician. 2003; 57 ( 3 ):171–182. 10.1198/0003130031856 [ CrossRef ] [ Google Scholar ]
  • Ioannidis JP: Why most published research findings are false. PLoS Med. 2005; 2 ( 8 ):e124. 10.1371/journal.pmed.0020124 [ PMC free article ] [ PubMed ] [ CrossRef ] [ Google Scholar ]
  • Johnson VE: Revised standards for statistical evidence. Proc Natl Acad Sci U S A. 2013; 110 ( 48 ):19313–19317. 10.1073/pnas.1313476110 [ PMC free article ] [ PubMed ] [ CrossRef ] [ Google Scholar ]
  • Killeen PR: An alternative to null-hypothesis significance tests. Psychol Sci. 2005; 16 ( 5 ):345–353. 10.1111/j.0956-7976.2005.01538.x [ PMC free article ] [ PubMed ] [ CrossRef ] [ Google Scholar ]
  • Kruschke JK: Bayesian Assessment of Null Values Via Parameter Estimation and Model Comparison. Perspect Psychol Sci. 2011; 6 ( 3 ):299–312. 10.1177/1745691611406925 [ PubMed ] [ CrossRef ] [ Google Scholar ]
  • Krzywinski M, Altman N: Points of significance: Significance, P values and t -tests. Nat Methods. 2013; 10 ( 11 ):1041–1042. 10.1038/nmeth.2698 [ PubMed ] [ CrossRef ] [ Google Scholar ]
  • Lakens D: Calculating and reporting effect sizes to facilitate cumulative science: a practical primer for t -tests and ANOVAs. Front Psychol. 2013; 4 :863. 10.3389/fpsyg.2013.00863 [ PMC free article ] [ PubMed ] [ CrossRef ] [ Google Scholar ]
  • Lakens D, Evers ER: Sailing From the Seas of Chaos Into the Corridor of Stability: Practical Recommendations to Increase the Informational Value of Studies. Perspect Psychol Sci. 2014; 9 ( 3 ):278–292. 10.1177/1745691614528520 [ PubMed ] [ CrossRef ] [ Google Scholar ]
  • Lindley D: The philosophy of statistics. Journal of the Royal Statistical Society. 2000; 49 ( 3 ):293–337. 10.1111/1467-9884.00238 [ CrossRef ] [ Google Scholar ]
  • Miller J: What is the probability of replicating a statistically significant effect? Psychon Bull Rev. 2009; 16 ( 4 ):617–640. 10.3758/PBR.16.4.617 [ PubMed ] [ CrossRef ] [ Google Scholar ]
  • Morey RD, Rouder JN: Bayes factor approaches for testing interval null hypotheses. Psychol Methods. 2011; 16 ( 4 ):406–419. 10.1037/a0024377 [ PubMed ] [ CrossRef ] [ Google Scholar ]
  • Neyman J, Pearson ES: On the Use and Interpretation of Certain Test Criteria for Purposes of Statistical Inference: Part I. Biometrika. 1928; 20A ( 1/2 ):175–240. 10.3389/fpsyg.2015.00245 [ CrossRef ] [ Google Scholar ]
  • Neyman J, Pearson ES: On the problem of the most efficient tests of statistical hypotheses. Philos Trans R Soc Lond Ser A. 1933; 231 ( 694–706 ):289–337. 10.1098/rsta.1933.0009 [ CrossRef ] [ Google Scholar ]
  • Nickerson RS: Null hypothesis significance testing: a review of an old and continuing controversy. Psychol Methods. 2000; 5 ( 2 ):241–301. 10.1037/1082-989X.5.2.241 [ PubMed ] [ CrossRef ] [ Google Scholar ]
  • Nuzzo R: Scientific method: statistical errors. Nature. 2014; 506 ( 7487 ):150–152. 10.1038/506150a [ PubMed ] [ CrossRef ] [ Google Scholar ]
  • Open Science Collaboration. PSYCHOLOGY. Estimating the reproducibility of psychological science. Science. 2015; 349 ( 6251 ):aac4716. 10.1126/science.aac4716 [ PubMed ] [ CrossRef ] [ Google Scholar ]
  • Rosenthal R: Cumulating psychology: an appreciation of Donald T. Campbell. Psychol Sci. 1991; 2 ( 4 ):213–221. 10.1111/j.1467-9280.1991.tb00138.x [ CrossRef ] [ Google Scholar ]
  • Savalei V, Dunn E: Is the call to abandon p -values the red herring of the replicability crisis? Front Psychol. 2015; 6 :245. 10.3389/fpsyg.2015.00245 [ PMC free article ] [ PubMed ] [ CrossRef ] [ Google Scholar ]
  • Tan SH, Tan SB: The Correct Interpretation of Confidence Intervals. Proceedings of Singapore Healthcare. 2010; 19 ( 3 ):276–278. 10.1177/201010581001900316 [ CrossRef ] [ Google Scholar ]
  • Turkheimer FE, Aston JA, Cunningham VJ: On the logic of hypothesis testing in functional imaging. Eur J Nucl Med Mol Imaging. 2004; 31 ( 5 ):725–732. 10.1007/s00259-003-1387-7 [ PubMed ] [ CrossRef ] [ Google Scholar ]
  • van Assen MA, van Aert RC, Nuijten MB, et al.: Why Publishing Everything Is More Effective than Selective Publishing of Statistically Significant Results. PLoS One. 2014; 9 ( 1 ):e84896. 10.1371/journal.pone.0084896 [ PMC free article ] [ PubMed ] [ CrossRef ] [ Google Scholar ]
  • Walker E, Nowacki AS: Understanding equivalence and noninferiority testing. J Gen Intern Med. 2011; 26 ( 2 ):192–196. 10.1007/s11606-010-1513-8 [ PMC free article ] [ PubMed ] [ CrossRef ] [ Google Scholar ]
  • Wasserstein RL, Lazar NA: The ASA’s Statement on p -Values: Context, Process, and Purpose. The American Statistician. 2016; 70 ( 2 ):129–133. 10.1080/00031305.2016.1154108 [ CrossRef ] [ Google Scholar ]
  • Wilcox R: Introduction to Robust Estimation and Hypothesis Testing . Edition 3, Academic Press, Elsevier: Oxford, UK, ISBN: 978-0-12-386983-8.2012. Reference Source [ Google Scholar ]

Referee response for version 3

Dorothy vera margaret bishop.

1 Department of Experimental Psychology, University of Oxford, Oxford, UK

I can see from the history of this paper that the author has already been very responsive to reviewer comments, and that the process of revising has now been quite protracted.

That makes me reluctant to suggest much more, but I do see potential here for making the paper more impactful. So my overall view is that, once a few typos are fixed (see below), this could be published as is, but I think there is an issue with the potential readership and that further revision could overcome this.

I suspect my take on this is rather different from other reviewers, as I do not regard myself as a statistics expert, though I am on the more quantitative end of the continuum of psychologists and I try to keep up to date. I think I am quite close to the target readership , insofar as I am someone who was taught about statistics ages ago and uses stats a lot, but never got adequate training in the kinds of topic covered by this paper. The fact that I am aware of controversies around the interpretation of confidence intervals etc is simply because I follow some discussions of this on social media. I am therefore very interested to have a clear account of these issues.

This paper contains helpful information for someone in this position, but it is not always clear, and I felt the relevance of some of the content was uncertain. So here are some recommendations:

  • As one previous reviewer noted, it’s questionable that there is a need for a tutorial introduction, and the limited length of this article does not lend itself to a full explanation. So it might be better to just focus on explaining as clearly as possible the problems people have had in interpreting key concepts. I think a title that made it clear this was the content would be more appealing than the current one.
  • P 3, col 1, para 3, last sentence. Although statisticians always emphasise the arbitrary nature of p < .05, we all know that in practice authors who use other values are likely to have their analyses queried. I wondered whether it would be useful here to note that in some disciplines different cutoffs are traditional, e.g. particle physics. Or you could cite David Colquhoun’s paper in which he recommends using p < .001 ( http://rsos.royalsocietypublishing.org/content/1/3/140216) - just to be clear that the traditional p < .05 has been challenged.

What I can’t work out is how you would explain the alpha from Neyman-Pearson in the same way (though I can see from Figure 1 that with N-P you could test an alternative hypothesis, such as the idea that the coin would be heads 75% of the time).

‘By failing to reject, we simply continue to assume that H0 is true, which implies that one cannot….’ have ‘In failing to reject, we do not assume that H0 is true; one cannot argue against a theory from a non-significant result.’

I felt most readers would be interested to read about tests of equivalence and Bayesian approaches, but many would be unfamiliar with these and might like to see an example of how they work in practice – if space permitted.

  • Confidence intervals: I simply could not understand the first sentence – I wondered what was meant by ‘builds’ here. I understand about difficulties in comparing CI across studies when sample sizes differ, but I did not find the last sentence on p 4 easy to understand.
  • P 5: The sentence starting: ‘The alpha value has the same interpretation’ was also hard to understand, especially the term ‘1-alpha CI’. Here too I felt some concrete illustration might be helpful to the reader. And again, I also found the reference to Bayesian intervals tantalising – I think many readers won’t know how to compute these and something like a figure comparing a traditional CI with a Bayesian interval and giving a source for those who want to read on would be very helpful. The reference to ‘credible intervals’ in the penultimate paragraph is very unclear and needs a supporting reference – most readers will not be familiar with this concept.

P 3, col 1, para 2, line 2; “allows us to compute”

P 3, col 2, para 2, ‘probability of replicating’

P 3, col 2, para 2, line 4 ‘informative about’

P 3, col 2, para 4, line 2 delete ‘of’

P 3, col 2, para 5, line 9 – ‘conditioned’ is either wrong or too technical here: would ‘based’ be acceptable as alternative wording

P 3, col 2, para 5, line 13 ‘This dichotomisation allows one to distinguish’

P 3, col 2, para 5, last sentence, delete ‘Alternatively’.

P 3, col 2, last para line 2 ‘first’

P 4, col 2, para 2, last sentence is hard to understand; not sure if this is better: ‘If sample sizes differ between studies, the distribution of CIs cannot be specified a priori’

P 5, col 1, para 2, ‘a pattern of order’ – I did not understand what was meant by this

P 5, col 1, para 2, last sentence unclear: possible rewording: “If the goal is to test the size of an effect then NHST is not the method of choice, since testing can only reject the null hypothesis.’ (??)

P 5, col 1, para 3, line 1 delete ‘that’

P 5, col 1, para 3, line 3 ‘on’ -> ‘by’

P 5, col 2, para 1, line 4 , rather than ‘Here I propose to adopt’ I suggest ‘I recommend adopting’

P 5, col 2, para 1, line 13 ‘with’ -> ‘by’

P 5, col 2, para 1 – recommend deleting last sentence

P 5, col 2, para 2, line 2 ‘consider’ -> ‘anticipate’

P 5, col 2, para 2, delete ‘should always be included’

P 5, col 2, para 2, ‘type one’ -> ‘Type I’

I have read this submission. I believe that I have an appropriate level of expertise to confirm that it is of an acceptable scientific standard, however I have significant reservations, as outlined above.

The University of Edinburgh, UK

I wondered about changing the focus slightly and modifying the title to reflect this to say something like: Null hypothesis significance testing: a guide to commonly misunderstood concepts and recommendations for good practice

Thank you for the suggestion – you indeed saw the intention behind the ‘tutorial’ style of the paper.

  • P 3, col 1, para 3, last sentence. Although statisticians always emphasise the arbitrary nature of p < .05, we all know that in practice authors who use other values are likely to have their analyses queried. I wondered whether it would be useful here to note that in some disciplines different cutoffs are traditional, e.g. particle physics. Or you could cite David Colquhoun’s paper in which he recommends using p < .001 ( http://rsos.royalsocietypublishing.org/content/1/3/140216)  - just to be clear that the traditional p < .05 has been challenged.

I have added a sentence on this citing Colquhoun 2014 and the new Benjamin 2017 on using .005.

I agree that this point is always hard to appreciate, especially because it seems like in practice it makes little difference. I added a paragraph but using reaction times rather than a coin toss – thanks for the suggestion.

Added an example based on new table 1, following figure 1 – giving CI, equivalence tests and Bayes Factor (with refs to easy to use tools)

Changed builds to constructs (this simply means they are something we build) and added that the implication that probability coverage is not warranty when sample size change, is that we cannot compare CI.

I changed ‘ i.e. we accept that 1-alpha CI are wrong in alpha percent of the times in the long run’ to ‘, ‘e.g. a 95% CI is wrong in 5% of the times in the long run (i.e. if we repeat the experiment many times).’ – for Bayesian intervals I simply re-cited Morey & Rouder, 2011.

It is not the CI cannot be specified, it’s that the interval is not predictive of anything anymore! I changed it to ‘If sample sizes, however, differ between studies, there is no warranty that a CI from one study will be true at the rate alpha in a different study, which implies that CI cannot be compared across studies at this is rarely the same sample sizes’

I added (i.e. establish that A > B) – we test that conditions are ordered, but without further specification of the probability of that effect nor its size

Yes it works – thx

P 5, col 2, para 2, ‘type one’ -> ‘Type I’ 

Typos fixed, and suggestions accepted – thanks for that.

Stephen J. Senn

1 Luxembourg Institute of Health, Strassen, L-1445, Luxembourg

The revisions are OK for me, and I have changed my status to Approved.

I have read this submission. I believe that I have an appropriate level of expertise to confirm that it is of an acceptable scientific standard.

Referee response for version 2

On the whole I think that this article is reasonable, my main reservation being that I have my doubts on whether the literature needs yet another tutorial on this subject.

A further reservation I have is that the author, following others, stresses what in my mind is a relatively unimportant distinction between the Fisherian and Neyman-Pearson (NP) approaches. The distinction stressed by many is that the NP approach leads to a dichotomy accept/reject based on probabilities established in advance, whereas the Fisherian approach uses tail area probabilities calculated from the observed statistic. I see this as being unimportant and not even true. Unless one considers that the person carrying out a hypothesis test (original tester) is mandated to come to a conclusion on behalf of all scientific posterity, then one must accept that any remote scientist can come to his or her conclusion depending on the personal type I error favoured. To operate the results of an NP test carried out by the original tester, the remote scientist then needs to know the p-value. The type I error rate is then compared to this to come to a personal accept or reject decision (1). In fact Lehmann (2), who was an important developer of and proponent of the NP system, describes exactly this approach as being good practice. (See Testing Statistical Hypotheses, 2nd edition P70). Thus using tail-area probabilities calculated from the observed statistics does not constitute an operational difference between the two systems.

A more important distinction between the Fisherian and NP systems is that the former does not use alternative hypotheses(3). Fisher's opinion was that the null hypothesis was more primitive than the test statistic but that the test statistic was more primitive than the alternative hypothesis. Thus, alternative hypotheses could not be used to justify choice of test statistic. Only experience could do that.

Further distinctions between the NP and Fisherian approach are to do with conditioning and whether a null hypothesis can ever be accepted.

I have one minor quibble about terminology. As far as I can see, the author uses the usual term 'null hypothesis' and the eccentric term 'nil hypothesis' interchangeably. It would be simpler if the latter were abandoned.

Referee response for version 1

Marcel alm van assen.

1 Department of Methodology and Statistics, Tilburgh University, Tilburg, Netherlands

Null hypothesis significance testing (NHST) is a difficult topic, with misunderstandings arising easily. Many texts, including basic statistics books, deal with the topic, and attempt to explain it to students and anyone else interested. I would refer to a good basic text book, for a detailed explanation of NHST, or to a specialized article when wishing an explaining the background of NHST. So, what is the added value of a new text on NHST? In any case, the added value should be described at the start of this text. Moreover, the topic is so delicate and difficult that errors, misinterpretations, and disagreements are easy. I attempted to show this by giving comments to many sentences in the text.

Abstract: “null hypothesis significance testing is the statistical method of choice in biological, biomedical and social sciences to investigate if an effect is likely”. No, NHST is the method to test the hypothesis of no effect.

Intro: “Null hypothesis significance testing (NHST) is a method of statistical inference by which an observation is tested against a hypothesis of no effect or no relationship.” What is an ‘observation’? NHST is difficult to describe in one sentence, particularly here. I would skip this sentence entirely, here.

Section on Fisher; also explain the one-tailed test.

Section on Fisher; p(Obs|H0) does not reflect the verbal definition (the ‘or more extreme’ part).

Section on Fisher; use a reference and citation to Fisher’s interpretation of the p-value

Section on Fisher; “This was however only intended to be used as an indication that there is something in the data that deserves further investigation. The reason for this is that only H0 is tested whilst the effect under study is not itself being investigated.” First sentence, can you give a reference? Many people say a lot about Fisher’s intentions, but the good man is dead and cannot reply… Second sentence is a bit awkward, because the effect is investigated in a way, by testing the H0.

Section on p-value; Layout and structure can be improved greatly, by first again stating what the p-value is, and then statement by statement, what it is not, using separate lines for each statement. Consider adding that the p-value is randomly distributed under H0 (if all the assumptions of the test are met), and that under H1 the p-value is a function of population effect size and N; the larger each is, the smaller the p-value generally is.

Skip the sentence “If there is no effect, we should replicate the absence of effect with a probability equal to 1-p”. Not insightful, and you did not discuss the concept ‘replicate’ (and do not need to).

Skip the sentence “The total probability of false positives can also be obtained by aggregating results ( Ioannidis, 2005 ).” Not strongly related to p-values, and introduces unnecessary concepts ‘false positives’ (perhaps later useful) and ‘aggregation’.

Consider deleting; “If there is an effect however, the probability to replicate is a function of the (unknown) population effect size with no good way to know this from a single experiment ( Killeen, 2005 ).”

The following sentence; “ Finally, a (small) p-value  is not an indication favouring a hypothesis . A low p-value indicates a misfit of the null hypothesis to the data and cannot be taken as evidence in favour of a specific alternative hypothesis more than any other possible alternatives such as measurement error and selection bias ( Gelman, 2013 ).” is surely not mainstream thinking about NHST; I would surely delete that sentence. In NHST, a p-value is used for testing the H0. Why did you not yet discuss significance level? Yes, before discussing what is not a p-value, I would explain NHST (i.e., what it is and how it is used). 

Also the next sentence “The more (a priori) implausible the alternative hypothesis, the greater the chance that a finding is a false alarm ( Krzywinski & Altman, 2013 ;  Nuzzo, 2014 ).“ is not fully clear to me. This is a Bayesian statement. In NHST, no likelihoods are attributed to hypotheses; the reasoning is “IF H0 is true, then…”.

Last sentence: “As  Nickerson (2000)  puts it ‘theory corroboration requires the testing of multiple predictions because the chance of getting statistically significant results for the wrong reasons in any given case is high’.” What is relation of this sentence to the contents of this section, precisely?

Next section: “For instance, we can estimate that the probability of a given F value to be in the critical interval [+2 +∞] is less than 5%” This depends on the degrees of freedom.

“When there is no effect (H0 is true), the erroneous rejection of H0 is known as type I error and is equal to the p-value.” Strange sentence. The Type I error is the probability of erroneously rejecting the H0 (so, when it is true). The p-value is … well, you explained it before; it surely does not equal the Type I error.

Consider adding a figure explaining the distinction between Fisher’s logic and that of Neyman and Pearson.

“When the test statistics falls outside the critical region(s)” What is outside?

“There is a profound difference between accepting the null hypothesis and simply failing to reject it ( Killeen, 2005 )” I agree with you, but perhaps you may add that some statisticians simply define “accept H0’” as obtaining a p-value larger than the significance level. Did you already discuss the significance level, and it’s mostly used values?

“To accept or reject equally the null hypothesis, Bayesian approaches ( Dienes, 2014 ;  Kruschke, 2011 ) or confidence intervals must be used.” Is ‘reject equally’ appropriate English? Also using Cis, one cannot accept the H0.

Do you start discussing alpha only in the context of Cis?

“CI also indicates the precision of the estimate of effect size, but unless using a percentile bootstrap approach, they require assumptions about distributions which can lead to serious biases in particular regarding the symmetry and width of the intervals ( Wilcox, 2012 ).” Too difficult, using new concepts. Consider deleting.

“Assuming the CI (a)symmetry and width are correct, this gives some indication about the likelihood that a similar value can be observed in future studies, with 95% CI giving about 83% chance of replication success ( Lakens & Evers, 2014 ).” This statement is, in general, completely false. It very much depends on the sample sizes of both studies. If the replication study has a much, much, much larger N, then the probability that the original CI will contain the effect size of the replication approaches (1-alpha)*100%. If the original study has a much, much, much larger N, then the probability that the original Ci will contain the effect size of the replication study approaches 0%.

“Finally, contrary to p-values, CI can be used to accept H0. Typically, if a CI includes 0, we cannot reject H0. If a critical null region is specified rather than a single point estimate, for instance [-2 +2] and the CI is included within the critical null region, then H0 can be accepted. Importantly, the critical region must be specified a priori and cannot be determined from the data themselves.” No. H0 cannot be accepted with Cis.

“The (posterior) probability of an effect can however not be obtained using a frequentist framework.” Frequentist framework? You did not discuss that, yet.

“X% of times the CI obtained will contain the same parameter value”. The same? True, you mean?

“e.g. X% of the times the CI contains the same mean” I do not understand; which mean?

“The alpha value has the same interpretation as when using H0, i.e. we accept that 1-alpha CI are wrong in alpha percent of the times. “ What do you mean, CI are wrong? Consider rephrasing.

“To make a statement about the probability of a parameter of interest, likelihood intervals (maximum likelihood) and credibility intervals (Bayes) are better suited.” ML gives the likelihood of the data given the parameter, not the other way around.

“Many of the disagreements are not on the method itself but on its use.” Bayesians may disagree.

“If the goal is to establish the likelihood of an effect and/or establish a pattern of order, because both requires ruling out equivalence, then NHST is a good tool ( Frick, 1996 )” NHST does not provide evidence on the likelihood of an effect.

“If the goal is to establish some quantitative values, then NHST is not the method of choice.” P-values are also quantitative… this is not a precise sentence. And NHST may be used in combination with effect size estimation (this is even recommended by, e.g., the American Psychological Association (APA)).

“Because results are conditioned on H0, NHST cannot be used to establish beliefs.” It can reinforce some beliefs, e.g., if H0 or any other hypothesis, is true.

“To estimate the probability of a hypothesis, a Bayesian analysis is a better alternative.” It is the only alternative?

“Note however that even when a specific quantitative prediction from a hypothesis is shown to be true (typically testing H1 using Bayes), it does not prove the hypothesis itself, it only adds to its plausibility.” How can we show something is true?

I do not agree on the contents of the last section on ‘minimal reporting’. I prefer ‘optimal reporting’ instead, i.e., the reporting the information that is essential to the interpretation of the result, to any ready, which may have other goals than the writer of the article. This reporting includes, for sure, an estimate of effect size, and preferably a confidence interval, which is in line with recommendations of the APA.

I have read this submission. I believe that I have an appropriate level of expertise to state that I do not consider it to be of an acceptable scientific standard, for reasons outlined above.

The idea of this short review was to point to common interpretation errors (stressing again and again that we are under H0) being in using p-values or CI, and also proposing reporting practices to avoid bias. This is now stated at the end of abstract.

Regarding text books, it is clear that many fail to clearly distinguish Fisher/Pearson/NHST, see Glinet et al (2012) J. Exp Education 71, 83-92. If you have 1 or 2 in mind that you know to be good, I’m happy to include them.

I agree – yet people use it to investigate (not test) if an effect is likely. The issue here is wording. What about adding this distinction at the end of the sentence?: ‘null hypothesis significance testing is the statistical method of choice in biological, biomedical and social sciences used to investigate if an effect is likely, even though it actually tests for the hypothesis of no effect’.

I think a definition is needed, as it offers a starting point. What about the following: ‘NHST is a method of statistical inference by which an experimental factor is tested against a hypothesis of no effect or no relationship based on a given observation’

The section on Fisher has been modified (more or less) as suggested: (1) avoiding talking about one or two tailed tests (2) updating for p(Obs≥t|H0) and (3) referring to Fisher more explicitly (ie pages from articles and book) ; I cannot tell his intentions but these quotes leave little space to alternative interpretations.

The reasoning here is as you state yourself, part 1: ‘a p-value is used for testing the H0; and part 2: ‘no likelihoods are attributed to hypotheses’ it follows we cannot favour a hypothesis. It might seems contentious but this is the case that all we can is to reject the null – how could we favour a specific alternative hypothesis from there? This is explored further down the manuscript (and I now point to that) – note that we do not need to be Bayesian to favour a specific H1, all I’m saying is this cannot be attained with a p-value.

The point was to emphasise that a p value is not there to tell us a given H1 is true and can only be achieved through multiple predictions and experiments. I deleted it for clarity.

This sentence has been removed

Indeed, you are right and I have modified the text accordingly. When there is no effect (H0 is true), the erroneous rejection of H0 is known as type 1 error. Importantly, the type 1 error rate, or alpha value is determined a priori. It is a common mistake but the level of significance (for a given sample) is not the same as the frequency of acceptance alpha found on repeated sampling (Fisher, 1955).

A figure is now presented – with levels of acceptance, critical region, level of significance and p-value.

I should have clarified further here – as I was having in mind tests of equivalence. To clarify, I simply states now: ‘To accept the null hypothesis, tests of equivalence or Bayesian approaches must be used.’

It is now presented in the paragraph before.

Yes, you are right, I completely overlooked this problem. The corrected sentence (with more accurate ref) is now “Assuming the CI (a)symmetry and width are correct, this gives some indication about the likelihood that a similar value can be observed in future studies. For future studies of the same sample size, 95% CI giving about 83% chance of replication success (Cumming and Mallardet, 2006). If sample sizes differ between studies, CI do not however warranty any a priori coverage”.

Again, I had in mind equivalence testing, but in both cases you are right we can only reject and I therefore removed that sentence.

Yes, p-values must be interpreted in context with effect size, but this is not what people do. The point here is to be pragmatic, does and don’t. The sentence was changed.

Not for testing, but for probability, I am not aware of anything else.

Cumulative evidence is, in my opinion, the only way to show it. Even in hard science like physics multiple experiments. In the recent CERN study on finding Higgs bosons, 2 different and complementary experiments ran in parallel – and the cumulative evidence was taken as a proof of the true existence of Higgs bosons.

Daniel Lakens

1 School of Innovation Sciences, Eindhoven University of Technology, Eindhoven, Netherlands

I appreciate the author's attempt to write a short tutorial on NHST. Many people don't know how to use it, so attempts to educate people are always worthwhile. However, I don't think the current article reaches it's aim. For one, I think it might be practically impossible to explain a lot in such an ultra short paper - every section would require more than 2 pages to explain, and there are many sections. Furthermore, there are some excellent overviews, which, although more extensive, are also much clearer (e.g., Nickerson, 2000 ). Finally, I found many statements to be unclear, and perhaps even incorrect (noted below). Because there is nothing worse than creating more confusion on such a topic, I have extremely high standards before I think such a short primer should be indexed. I note some examples of unclear or incorrect statements below. I'm sorry I can't make a more positive recommendation.

“investigate if an effect is likely” – ambiguous statement. I think you mean, whether the observed DATA is probable, assuming there is no effect?

The Fisher (1959) reference is not correct – Fischer developed his method much earlier.

“This p-value thus reflects the conditional probability of achieving the observed outcome or larger, p(Obs|H0)” – please add 'assuming the null-hypothesis is true'.

“p(Obs|H0)” – explain this notation for novices.

“Following Fisher, the smaller the p-value, the greater the likelihood that the null hypothesis is false.”  This is wrong, and any statement about this needs to be much more precise. I would suggest direct quotes.

“there is something in the data that deserves further investigation” –unclear sentence.

“The reason for this” – unclear what ‘this’ refers to.

“ not the probability of the null hypothesis of being true, p(H0)” – second of can be removed?

“Any interpretation of the p-value in relation to the effect under study (strength, reliability, probability) is indeed

wrong, since the p-value is conditioned on H0”  - incorrect. A big problem is that it depends on the sample size, and that the probability of a theory depends on the prior.

“If there is no effect, we should replicate the absence of effect with a probability equal to 1-p.” I don’t understand this, but I think it is incorrect.

“The total probability of false positives can also be obtained by aggregating results (Ioannidis, 2005).” Unclear, and probably incorrect.

“By failing to reject, we simply continue to assume that H0 is true, which implies that one cannot, from a nonsignificant result, argue against a theory” – according to which theory? From a NP perspective, you can ACT as if the theory is false.

“(Lakens & Evers, 2014”) – we are not the original source, which should be cited instead.

“ Typically, if a CI includes 0, we cannot reject H0.”  - when would this not be the case? This assumes a CI of 1-alpha.

“If a critical null region is specified rather than a single point estimate, for instance [-2 +2] and the CI is included within the critical null region, then H0 can be accepted.” – you mean practically, or formally? I’m pretty sure only the former.

The section on ‘The (correct) use of NHST’ seems to conclude only Bayesian statistics should be used. I don’t really agree.

“ we can always argue that effect size, power, etc. must be reported.” – which power? Post-hoc power? Surely not? Other types are unknown. So what do you mean?

The recommendation on what to report remains vague, and it is unclear why what should be reported.

This sentence was changed, following as well the other reviewer, to ‘null hypothesis significance testing is the statistical method of choice in biological, biomedical and social sciences to investigate if an effect is likely, even though it actually tests whether the observed data are probable, assuming there is no effect’

Changed, refers to Fisher 1925

I changed a little the sentence structure, which should make explicit that this is the condition probability.

This has been changed to ‘[…] to decide whether the evidence is worth additional investigation and/or replication (Fisher, 1971 p13)’

my mistake – the sentence structure is now ‘ not the probability of the null hypothesis p(H0), of being true,’ ; hope this makes more sense (and this way refers back to p(Obs>t|H0)

Fair enough – my point was to stress the fact that p value and effect size or H1 have very little in common, but yes that the part in common has to do with sample size. I left the conditioning on H0 but also point out the dependency on sample size.

The whole paragraph was changed to reflect a more philosophical take on scientific induction/reasoning. I hope this is clearer.

Changed to refer to equivalence testing

I rewrote this, as to show frequentist analysis can be used  - I’m trying to sell Bayes more than any other approach.

I’m arguing we should report it all, that’s why there is no exhausting list – I can if needed.

  • Search Search Please fill out this field.

What Is a Null Hypothesis?

The alternative hypothesis.

  • Additional Examples
  • Null Hypothesis and Investments

The Bottom Line

  • Corporate Finance
  • Financial Ratios

Null Hypothesis: What Is It, and How Is It Used in Investing?

Adam Hayes, Ph.D., CFA, is a financial writer with 15+ years Wall Street experience as a derivatives trader. Besides his extensive derivative trading expertise, Adam is an expert in economics and behavioral finance. Adam received his master's in economics from The New School for Social Research and his Ph.D. from the University of Wisconsin-Madison in sociology. He is a CFA charterholder as well as holding FINRA Series 7, 55 & 63 licenses. He currently researches and teaches economic sociology and the social studies of finance at the Hebrew University in Jerusalem.

null hypothesis research

A null hypothesis is a type of statistical hypothesis that proposes that no statistical significance exists in a set of given observations. Hypothesis testing is used to assess the credibility of a hypothesis by using sample data. Sometimes referred to simply as the “null,” it is represented as H 0 .

The null hypothesis, also known as “the conjecture,” is used in quantitative analysis to test theories about markets, investing strategies, and economies to decide if an idea is true or false.

Key Takeaways

  • A null hypothesis is a type of conjecture in statistics that proposes that there is no difference between certain characteristics of a population or data-generating process.
  • The alternative hypothesis proposes that there is a difference.
  • Hypothesis testing provides a method to reject a null hypothesis within a certain confidence level.
  • If you can reject the null hypothesis, it provides support for the alternative hypothesis.
  • Null hypothesis testing is the basis of the principle of falsification in science.

Alex Dos Diaz / Investopedia

Understanding a Null Hypothesis

A gambler may be interested in whether a game of chance is fair. If it is, then the expected earnings per play come to zero for both players. If it is not, then the expected earnings are positive for one player and negative for the other.

To test whether the game is fair, the gambler collects earnings data from many repetitions of the game, calculates the average earnings from these data, then tests the null hypothesis that the expected earnings are not different from zero.

If the average earnings from the sample data are sufficiently far from zero, then the gambler will reject the null hypothesis and conclude the alternative hypothesis—namely, that the expected earnings per play are different from zero. If the average earnings from the sample data are near zero, then the gambler will not reject the null hypothesis, concluding instead that the difference between the average from the data and zero is explainable by chance alone.

A null hypothesis can only be rejected, not proven.

The null hypothesis assumes that any kind of difference between the chosen characteristics that you see in a set of data is due to chance. For example, if the expected earnings for the gambling game are truly equal to zero, then any difference between the average earnings in the data and zero is due to chance.

Analysts look to reject   the null hypothesis because doing so is a strong conclusion. This requires evidence in the form of an observed difference that is too large to be explained solely by chance. Failing to reject the null hypothesis—that the results are explainable by chance alone—is a weak conclusion because it allows that while factors other than chance may be at work, they may not be strong enough for the statistical test to detect them.

An important point to note is that we are testing the null hypothesis because there is an element of doubt about its validity. Whatever information that is against the stated null hypothesis is captured in the alternative (alternate) hypothesis (H 1 ).

For the examples below, the alternative hypothesis would be:

  • Students score an average that is not equal to seven.
  • The mean annual return of a mutual fund is not equal to 8% per year.

In other words, the alternative hypothesis is a direct contradiction of the null hypothesis.

Null Hypothesis Examples

Here is a simple example: A school principal claims that students in her school score an average of seven out of 10 in exams. The null hypothesis is that the population mean is not 7.0. To test this null hypothesis, we record marks of, say, 30 students ( sample ) from the entire student population of the school (say, 300) and calculate the mean of that sample.

We can then compare the (calculated) sample mean to the (hypothesized) population mean of 7.0 and attempt to reject the null hypothesis. (The null hypothesis here—that the population mean is not 7.0—cannot be proved using the sample data. It can only be rejected.)

Take another example: The annual return of a particular  mutual fund  is claimed to be 8%. Assume that the mutual fund has been in existence for 20 years. The null hypothesis is that the mean return is not 8% for the mutual fund. We take a random sample of annual returns of the mutual fund for, say, five years (sample) and calculate the sample mean. We then compare the (calculated) sample mean to the (claimed) population mean (8%) to test the null hypothesis.

For the above examples, null hypotheses are:

  • Example A: Students in the school don’t score an average of seven out of 10 in exams.
  • Example B: The mean annual return of the mutual fund is not 8% per year.

For the purposes of determining whether to reject the null hypothesis (abbreviated H0), said hypothesis is assumed, for the sake of argument, to be true. Then the likely range of possible values of the calculated statistic (e.g., the average score on 30 students’ tests) is determined under this presumption (e.g., the range of plausible averages might range from 6.2 to 7.8 if the population mean is 7.0).

If the sample average is outside of this range, the null hypothesis is rejected. Otherwise, the difference is said to be “explainable by chance alone,” being within the range that is determined by chance alone.

How Null Hypothesis Testing Is Used in Investments

As an example related to financial markets, assume Alice sees that her investment strategy produces higher average returns than simply buying and holding a stock . The null hypothesis states that there is no difference between the two average returns, and Alice is inclined to believe this until she can conclude contradictory results.

Refuting the null hypothesis would require showing statistical significance, which can be found by a variety of tests. The alternative hypothesis would state that the investment strategy has a higher average return than a traditional buy-and-hold strategy.

One tool that can determine the statistical significance of the results is the p-value. A p-value represents the probability that a difference as large or larger than the observed difference between the two average returns could occur solely by chance.

A p-value that is less than or equal to 0.05 often indicates whether there is evidence against the null hypothesis. If Alice conducts one of these tests, such as a test using the normal model, resulting in a significant difference between her returns and the buy-and-hold returns (the p-value is less than or equal to 0.05), she can then reject the null hypothesis and conclude the alternative hypothesis.

How Is the Null Hypothesis Identified?

The analyst or researcher establishes a null hypothesis based on the research question or problem they are trying to answer. Depending on the question, the null may be identified differently. For example, if the question is simply whether an effect exists (e.g., does X influence Y?), the null hypothesis could be H 0 : X = 0. If the question is instead, is X the same as Y, the H 0 would be X = Y. If it is that the effect of X on Y is positive, H 0 would be X > 0. If the resulting analysis shows an effect that is statistically significantly different from zero, the null can be rejected.

How Is Null Hypothesis Used in Finance?

In finance , a null hypothesis is used in quantitative analysis. It tests the premise of an investing strategy, the markets, or an economy to determine if it is true or false.

For instance, an analyst may want to see if two stocks, ABC and XYZ, are closely correlated. The null hypothesis would be ABC ≠ XYZ.

How Are Statistical Hypotheses Tested?

Statistical hypotheses are tested by a four-step process . The first is for the analyst to state the two hypotheses so that only one can be right. The second is to formulate an analysis plan, which outlines how the data will be evaluated. The third is to carry out the plan and physically analyze the sample data. The fourth and final step is to analyze the results and either reject the null hypothesis or claim that the observed differences are explainable by chance alone.

What Is an Alternative Hypothesis?

An alternative hypothesis is a direct contradiction of a null hypothesis. This means that if one of the two hypotheses is true, the other is false.

A null hypothesis states there is no difference between groups or relationship between variables. It is a type of statistical hypothesis and proposes that no statistical significance exists in a set of given observations. “Null” means nothing.

The null hypothesis is used in quantitative analysis to test theories about economies, investing strategies, and markets to decide if an idea is true or false. Hypothesis testing assesses the credibility of a hypothesis by using sample data. It is represented as H 0 and is sometimes simply known as “the null.”

Sage Publishing. “ Chapter 8: Introduction to Hypothesis Testing ,” Page 4.

Sage Publishing. “ Chapter 8: Introduction to Hypothesis Testing ,” Pages 4 to 7.

Sage Publishing. “ Chapter 8: Introduction to Hypothesis Testing ,” Page 7.

null hypothesis research

  • Terms of Service
  • Editorial Policy
  • Privacy Policy

American Psychological Association Logo

Statistics in Psychological Research

  • Data Collection and Analysis

Psychological Research

August 2023

null hypothesis research

Unlock the power of data with this 10-hour, comprehensive course in data analysis. This course is perfect for anyone looking to deepen their knowledge and apply statistical methods effectively in psychology or related fields.

The course begins with consideration of how researchers define and categorize variables, including the nature of various scales of measurement and how these classifications impact data analysis and interpretation. This is followed by a thorough introduction to the measures of central tendency, variability, and correlation that researchers use to describe their findings, providing an understanding of such topics as which descriptive statistics are appropriate for given research designs, the meaning of a correlation coefficient, and how graphs are used to visualize data.

The course then moves on to a conceptual treatment of foundational inferential statistics that researchers use to make predictions or inferences about a population based on a sample. The focus is on understanding the logic of these statistics, rather than on making calculations. Specifically, the course explores the logic behind null hypothesis significance testing, long a cornerstone of statistical analysis. Learn how to formulate and test hypotheses and understand the significance of p-values in determining the validity of your results. The course reviews how to select the appropriate inferential test based on your study criteria. Whether it’s t-tests, ANOVA, chi-square tests, or regression analysis, you’ll know which test to apply and when.

In keeping with growing concerns about some of the limitations of null hypothesis significance testing, such as its role in the so-called replication crisis, the course also delves into these concerns and possible ways to address them, including introductory consideration of statistical power and alternatives to hypothesis testing like estimation techniques and confidence intervals, meta-analysis, modeling, and Bayesian inference.

Learning objectives

  • Explain various ways to categorize variables.
  • Describe the logic of inferential statistics.
  • Explain the logic of null hypothesis significance testing.
  • Select the appropriate inferential test based on study criteria.
  • Compare and contrast the use of statistical significance, effect size, and confidence intervals.
  • Explain the importance of statistical power.
  • Describe how alternative procedures address the major objections to null hypothesis significance testing.
  • Explain various ways to describe data.
  • Describe how graphs are used to visualize data.
  • Explain the meaning of a correlation coefficient.

This program does not offer CE credit.

More in this series

Introduces the scientific research process and concepts such as the nature of variables for undergraduates, high school students, and professionals.

August 2023 On Demand Training

Introduces the importance of ethical practice in scientific research for undergraduates, high school students, and professionals.

Local means-based fuzzy k -nearest neighbor classifier with Minkowski distance and relevance-complementarity feature weighting

  • Original Paper
  • Open access
  • Published: 31 August 2024
  • Volume 9 , article number  73 , ( 2024 )

Cite this article

You have full access to this open access article

null hypothesis research

  • Mahinda Mailagaha Kumbure 1 &
  • Pasi Luukka 1  

This paper introduces an enhanced fuzzy k -nearest neighbor (FKNN) approach called the feature-weighted Minkowski distance and local means-based fuzzy k -nearest neighbor (FWM-LMFKNN). This method improves classification accuracy by incorporating feature weights, Minkowski distance, and class representative local mean vectors. The feature weighting process is developed based on feature relevance and complementarity. We improve the distance calculations between instances by utilizing feature information-based weighting and Minkowski distance, resulting in a more precise set of nearest neighbors. Furthermore, the FWM-LMFKNN classifier considers the local structure of class subsets by using local mean vectors instead of individual neighbors, which improves its classification performance. Empirical results using twenty different real-world data sets demonstrate that the proposed method achieves statistically significantly higher classification performance than traditional KNN, FKNN, and six other related state-of-the-art methods.

Similar content being viewed by others

null hypothesis research

A feature weighted K-nearest neighbor algorithm based on association rules

null hypothesis research

A new globally adaptive k -nearest neighbor classifier based on local mean optimization

null hypothesis research

A generalized fuzzy k -nearest neighbor regression model based on Minkowski distance

Explore related subjects.

  • Artificial Intelligence

Avoid common mistakes on your manuscript.

1 Introduction

Fuzzy k -nearest neighbor (FKNN) (Keller et al. 1985 ), a variation of the traditional k -nearest neighbor (KNN) (Cover and Hart 1967 ) classifier, is considered a more robust classifier than the traditional KNN (Derrac et al. 2015 ). In the traditional KNN classifier, each instance in the data set is assigned a single class label based on the majority class of its k nearest neighbors. By contrast, the FKNN classifier assigns a degree of membership to each instance in a specific class in the data set based the distances of its k nearest neighbors (Keller et al. 1985 ). These membership degrees are used as weights in the classification process, which makes the FKNN more robust to noise at the class boundaries (Keller et al. 1985 ; Maillo et al. 2020 ). The FKNN classification is an active topic of recent research and is being used in various applications (for examples, see González et al. 2021 ; Kumar and Thakur 2021 ; Maillo et al. 2020 ). However, the classical FKNN method has several limitations, such as sensitivity to the choice of the membership function (Derrac et al. 2016 ), number of nearest neighbors ( k ) (Derrac et al. 2015 ), and difficulty in handling high-dimensional data (Maillo et al. 2020 ). To address these limitations, researchers, such as Kassani et al. ( 2017 ) and Biswas et al. ( 2018 ), have proposed several enhancements. Recently, Zeraatkar and Afsari ( 2021 ) introduced two novel extensions of the FKNN classifier by incorporating the concepts of interval-valued fuzzy (IVF) sets, intuitionistic fuzzy (IF) sets, and the resampling method known as SMOTE, with a focus on addressing class imbalance classification problems. The primary purpose of these new FKNN variants, SMOTE-IVF and SMOTE-IVIF, was to enhance the classification performance of instances in the minority class. Moreover, González et al. ( 2021 ) proposed a novel fuzzy KNN method, called MonFKNN, based on monotonic constraints to enhance classification performance by addressing the issue of class noise. Based on the concept of multiple pseudo-metrics of fuzzy parameterized fuzzy soft matrices ( fpfs -matrices), Memis et al. ( 2022 ) introduced the fuzzy parameterized fuzzy soft KNN (FPFS-kNN) classifier. This classifier takes into account the impact of model parameters on classification. A distinctive feature of this method is its use of five distance measures within the fpfs -matrices to generate multiple k nearest neighbors. Additionally, Bian et al. ( 2022 ) proposed a new FKNN approach, the fuzzy KNN method with adaptive nearest neighbors (A-FKNN), which focuses on determining a fixed k value for each test data instance. The core idea of A-FKNN is to find the optimal k value for each training instance during the training phase and to build a decision tree, called the A-FKNN tree. During the testing phase, A-FKNN identifies the optimal k for each testing instance by searching through the A-FKNN tree and then runs FKNN with the optimal k for each testing instance. Regarding specific application context, several enhancements for the FKNN algorithm have also been proposed, for example, a boosted particle swarm optimization with FKNN (bSRWPSO-FKNN) classifier for predicting atopic dermatitis disease (Li et al. 2023 ) and a Harris Hawks optimization and Sobol sequence and stochastic fractal search based FKNN (SSFSHHO-FKNN) model for diagnosis of Alzheimer’s disease (Zhang et al. 2023 ).

Furthermore, Kumbure et al. ( 2019 ) particularly focused on FKNN to address class imbalance problems. They proposed a new variant of FKNN that employs class-representative local means, which are locally representative of their respective classes, to enhance classification accuracy. This method was further improved in Kumbure et al. ( 2020 ) by introducing a Bonferroni mean-based local mean computation process that outperforms traditional FKNN and other competitive classifiers. In the present study, we focus on further improving the performance of the local means-based FKNN method.

Our research aims to design and develop a local means-based FKNN classifier that effectively addresses the noise and uncertainty in the data, thereby yielding improved performance. To achieve this goal, we employ a feature weighting process based on fuzzy entropy (De Luca and Termini 1971 ), similarity measures (Luukka 2011 ), and feature selection concepts, such as relevance and complementarity (Ma and Ju 2022 ; Vergara and Estevez 2014 ; Yu and Liu 2004 ), to assess feature importance. We also used Minkowski distance to calculate distance between instances and employed class representative local mean vectors from class subsets instead of individual nearest neighbors to find class memberships.

In the context of feature selection, the theoretical concepts of relevance and complementarity have been widely employed as effective methods to identify optimal feature subsets (Ma and Ju 2022 ). Feature relevance recognizes the features that carry out valuable information regarding the target variable (Vergara and Estevez 2014 ). By contrast, feature complementarity highlights that combining two or more features, even those that may be individually insufficient, can collectively provide meaningful information about the class variable (Ma and Ju 2022 ; Vergara and Estevez 2014 ). Thus, we focus on these theoretic feature selection measures to generate weights for features across the importance of each feature by estimating relevance and complementarity information. This enables us to obtain better weights for features, which can then be utilized, in conjunction with distance calculation to identify more suitable nearest neighbors for the new instance. Furthermore, entropy is used to measure the level of uncertainty in the values of a feature (Luukka 2011 ; Vergara and Estevez 2014 ). This can be useful in cases where the data are uncertain or noisy, as it can help identify features that are more informative or relevant for classification. Therefore, we specifically focus on fuzzy entropy and similarity-based relevance and complementarity weighting methods to improve the accuracy of the proposed classifier.

The Minkowski distance, a generalized distance measure, is characterized by a specific parameter (called order parameter), making it more flexible and adaptable to various types of data distributions and feature spaces. Therefore, by using the Minkowski distance with an appropriate order parameter, can achieve better performance for nearest neighbor search in such data sets, as it can handle different types of data distributions more effectively. Accordingly, the Minkowski distance is employed in combination with fuzzy entropy and class prototypes to develop and propose the feature-weighted Minkowski distance-based local mean fuzzy k -nearest neighbor (FWM-LMFKNN) method.

The flowchart in Fig. 1 outlines the steps of the proposed FWM-LMFKNN method. The process consists of two main phases. The first phase generates feature weights by combining the effects of feature relevance and complementarity. The second phase performs the classification, which includes the following steps: calculating the Minkowski distance between training instances ( \(X_i\) ) and the query sample ( y ), finding the k nearest neighbors for each class ( j ), calculating the class representative local mean ( \(V_j\) ), determining the distance between local means and the query sample, computing class memberships, and finally classifying the query sample.

figure 1

Flowchart of the proposed method

The main contributions of this paper can be summarized as follows:

Feature weights that are generated using a combined effect of feature relevance and complementarity are used to weight the distances between testing and training instances in the learning process of the FKNN algorithm.

Minkowski distance is adopted for distance calculation to identify the most reasonable nearest neighbors, thereby achieving better class representative local mean vectors (i.e., class prototypes).

The decision rule on classification is made by considering the membership values, which are calculated using weighted Minkowski distances between the new instance and class representative local mean vectors.

The effectiveness of the proposed FWM-LMFKNN classifier is examined using various real-life data sets in both low- and high-dimensional spaces, covering binary- and multi-class problems. In the empirical analysis, the proposed method is compared with classical KNN (Cover and Hart 1967 ) and FKNN (Keller et al. 1985 ) methods and six more competitive methods, including LM-FKNN (Kumbure et al. 2019 ), LM-PNN (Gou et al. 2014 ), MLM-KHNN (Pan et al. 2017 ), FPFS-kNN (Memis et al. 2022 ), generalized mean distance-based KNN (GMD-KNN) (Gou et al. 2019 ), and interval-valued k-nearest neighbor (IV-KNN) (Derrac et al. 2015 ).

2 Preliminaries

This section briefly discusses the related KNN methods, fuzzy entropy, similarity measure, (relevance- and complementarity-based) feature weighting strategy, and the Minkowski distance.

2.1 Related KNN methods

The KNN algorithm (Cover and Hart 1967 ) is simple and effective, and it is based on the idea that an unseen data instance can be classified by looking at its closest “neighbors" from the training data set. It starts by calculating the distance between the unknown instance and all the instances in the training set. The distance can be computed using various distance metrics, but the Euclidean distance is the most commonly used (Derrac et al. 2015 ). Once the distances are calculated, the k nearest neighbors are selected based on their nearness to the unknown data instance. Then, the class labels of the k nearest neighbors are counted, and the class with the majority number of votes is assigned to the unknown instance.

By contrast, the FKNN method (Keller et al. 1985 ) uses fuzzy set theory (Zadeh 1965 ) to assign class membership degrees to each data instance instead of crisp labels. The basic idea of the FKNN is to assign a membership degree to the unknown instance in each class based on the degree of similarity to its k nearest neighbors and their memberships to each class. The membership degree of the unknown instance ( y ) in each class ( j ) represented by its k nearest neighbors is calculated as follows:

where \(x_i\) is the i th neighboring instance in the training set, and \(u_{ij}\) is the membership degree of i th neighboring instance in the j th class. \(m>1\) is the fuzzy strength parameter. There are two approaches to calculating \(u_{ij}\) : the crisp membership method and fuzzy membership method—detailed information about these methods can be found in Keller et al. ( 1985 ).

As previously noted, the KNN and FKNN methods are both affected by the value of parameter k (Yang and Sinaga 2021 ), and are particularly susceptible to the effects of outliers. To deal with these issues, particularly to the KNN classifier, the idea of class representative local mean (LM) vectors was first introduced by Mitania and Hamamotob ( 2006 ). The resulting LM-KNN classifier (Mitania and Hamamotob 2006 ) computes a local mean vector of nearest neighbors from each class. The unknown instance is then assigned to the class represented by the local mean vector that is closest to it. The effectiveness of this method has led to the development of several variants that aim to improve performance by addressing not only the impact of outliers but also the sensitivity to the neighborhood size k . The local mean-based pseudo k -nearest neighbor (LM-PNN) (Gou et al. 2014 ) and multi-local means-based k -harmonic nearest neighbor (MLM-KHNN) (Pan et al. 2017 ) classifiers are examples of variants that have demonstrated improved classification performance. Based on the concept of class prototypes, recent enhancements to the classical FKNN method have been proposed, such as the multi-local power means fuzzy k -nearest neighbor (MLPM-FKNN) (Kumbure et al. 2019 ) and Bonferroni mean-based fuzzy k -nearest neighbor (BM-FKNN) (Kumbure et al. 2020 ) methods. These methods have been successful in achieving appropriate local class prototypes by incorporating various mean operators, such as generalized and Bonferroni means.

In this paper, our objective is to enhance the performance of the local means-based FKNN method by incorporating the Minkowski distance measure, class prototypes, and a feature weighting scheme.

2.2 Minkowski distance

Minkowski distance is a generalization of the Euclidean distance and Manhattan distance. It is a measure of the distance between two points in a metric space, which is defined by a norm. The Minkowski distance between two instances \(x_i\) and \(x_r\) in d -feature space is defined as:

where the parameter p is called the order of the Minkowski distance. By using different values for p , we can define several different distances; for example, when \(p = 2\) , the Minkowski distance is equivalent to the Euclidean distance, which is the most commonly used distance measure for continuous features. Further, we can obtain the Manhattan distance by setting \(p = 1\) and harmonic distance with \(p = -1\) as examples. Due to this special property of Minkowski distance, it has been used in many applications, such as Bergamasco and Nunes ( 2019 ) and Gueorguieva et al. ( 2017 ). Additionally, the weighted Minkowski distances can be defined with feature weights \(w^{l}\) for \(l=1,\dots ,d\) according to:

2.3 Fuzzy entropy

Entropy (H) is a concept used in information theory to measure the uncertainty or randomness of a system or feature (Vergara and Estevez 2014 ). Entropy, usually discussed in a probability space (De Luca and Termini 1971 ), measures the amount of information required to describe the outcome of a random feature. The higher the entropy, the more uncertain or unpredictable the feature is. Fuzzy entropy, first defined by De Luca and Termini ( 1971 ), is an expanded version of classical entropy in the fuzzy sets theory. It is a measure that quantifies the degree of fuzziness of a fuzzy set (Al-sharhan et al. 2001 ). It is defined based on the concept of Shannon entropy (Shannon 1948 ), which is a measure of randomness in a probability distribution. However, fuzzy entropy differs from Shannon entropy, as it deals with vagueness and ambiguity uncertainties rather than probabilistic concepts (Al-sharhan et al. 2001 ). According to De Luca and Termini ( 1971 ) and Luukka ( 2011 ), the fuzzy entropy ( h ) can be defined for a given fuzzy set A defined over a universe U as:

where, \(\mu _A(x_i)\) represents the membership degree of an element \(x_i\) in the set A . The fuzzy entropy has been used to find feature importance concerning the target variable in the feature selection process (Luukka 2011 ; Lohrmann et al. 2018 ). It has also been applied for classification problems, especially to improve the KNN performance; for example, see Morente-Molinera et al. ( 2017 ).

2.4 Similarity measure

In feature selection, a similarity measure quantifies the degree of closeness or correlation between two features. It helps assess how similar or related two features are to each other. Łukasiewicz similarity (Łukasiewics 1970 ) is a specific similarity measure based on the Łukasiewicz t-norm, which is a triangular norm used in fuzzy set theory (Zadeh 1965 ). For our study, the Łukasiewicz similarity is utilized together with fuzzy entropy to find feature relevance, and it can defined according to Luukka et al. ( 2001 ) as follows:

where \(x, y \in [0,1]\) Footnote 1 and \(x, y \in {\mathbb {R}}^d\) . This is chosen because it satisfies monotonicity, symmetricity, and transitivity properties (Luukka et al. 2001 ).

2.5 Mutual information

Mutual information is a well-known method for measuring the amount of information one random feature provides compared to another feature (Vergara and Estevez 2014 ). This notion has been dominant and valuable in the context of feature selection, where mutual information is measured for each feature concerning the target variable. Features with higher mutual information are considered more relevant, as they contribute more information to predicting or classifying the target variable. In Meyer et al. ( 2008 ) and Vergara and Estevez ( 2014 ), for given X and Y two random features, mutual information (I) is defined as follows:

where H ( X ) and H ( Y ) represent the entropy of features X and Y , respectively. H ( X ,  Y ) denotes the joint entropy of X and Y , while \(H(X \backslash Y)\) indicates the conditional entropy of X given Y . \(H(Y \backslash X)\) is the conditional entropy of Y given X . I ( X ;  Y ) measures the degree of correlation between features X and Y or the amount of information X covers about Y .

2.6 Feature relevance and complementarity

This subsection briefly discusses the concepts of relevance and complementarity in the context of feature selection. Before briefly describing these concepts, we first introduce some basic notations and terminologies in Table 1 .

The sets mentioned in Table 1 are related as follows: \(X=f_i \cup A \cup \lnot \{f_i,A\}\) , \(\emptyset =f_i \cap A \cap \lnot \{f_i,A\}\) .

2.6.1 Relevance

A feature is considered relevant if individually or together with other features it provides information about class variable C . There are many definitions of “relevance” in the literature, but roughly, they can be divided into probabilistic framework (Kohavi and John 1997 ) and mutual information framework (Meyer et al. 2008 ). The probabilistic framework defines three levels of relevance: strongly relevant, weakly relevant, and irrelevant features. Strongly relevant features give unique information about C and cannot be replaced by other features. Weakly relevant features also provide information about C , but other features can replace them without losing information about C . Irrelevant features do not give information on C , and they can be removed without losing any vital information. Similarly, a mutual information framework is defined into these three categories as given in Eq. ( 6 ).

2.6.2 Complementarity

The notion of complementarity (also called information synergy) signifies that the working of two features together could carry more information than the sum of their individual values to the target variable (Singha and Shenoy 2018 ). It is used to measure the degree of interaction between an individual feature \(f_i\) and feature subset A given C (Vergara and Estevez 2014 ). This can be again measured, for example, by using mutual information [ \(I(f_i;A|C)\) ]. One way to understand the complementarity effect is the following: when information that A has about C is greater when it interacts with feature \(f_i\) compared to when it does not interact, then complementarity information is present.

Feature selection is a critical step in preparing data for machine learning models. Most feature selection approaches are usually based on notions of relevance, redundancy, and complementarity (Sun et al. 2017 ; Singha and Shenoy 2018 ). Redundancy occurs when multiple features provide the same or similar information about the class variable (Singha and Shenoy 2018 ). Redundant features are often highly correlated with each other and do not offer new or additional information to the model. Therefore, in a typical feature selection task. redundant features are identified and removed. However, to create the feature weighting criterion in the proposed method, we focus solely on relevance and complementarity concepts and do not consider the redundancy measure. This is because this approach avoids the calculation of pairwise correlations and interdependencies between features, reducing computational time. Especially in cases where the number of features is very large, redundancy checks can be more computationally expensive. Moreover, in some sense, complementarity is closely related to redundancy (Sun et al. 2017 ), and it is also known that complementarity approach is efficient that redundancy approach (Singha and Shenoy 2018 ). For these reasons, relevance and complementarity are considered in our study to have an efficient and effective feature weighting strategy. In general, relevant and complementary features maximize the mutual information with the class variable, ensuring that the feature informativeness is observed reasonably well (Singha and Shenoy 2018 ).

3 The proposed FWM-LMFKNN classifier

This section introduces the feature-weighted Minkowski distance-based local mean fuzzy k-nearest neighbor (FWM-LMFKNN) method, which is based on the concepts of feature weighting, Minkowski distance, and class representative local mean vectors. The feature weighting employs fuzzy entropy and similarity measures, incorporating relevance and complementarity notions. We begin by defining the calculation of feature relevance and complementarity and, subsequently, the weighting strategy for the new classifier.

3.1 Feature weighting based on relevance and complementarity

The calculation of relevance is based on fuzzy entropy and similarity measures. Suppose a training data set \(\{X_i \in {\mathbb {R}}^d, \omega _i\}_{i=1}^n\) consisting n number of instances in d -dimensional feature space [i.e., \(X_i = (x_i^1, x_i^2,\ldots , x_i^d)\) ] and t different classes [i.e., \(\omega \in (C= (c_1, c_2, \ldots , c_t)\) )]. Given these, relevance of features to class variable is calculated using the following steps:

Normalize feature data into unit interval, that is, \(X_i \rightarrow [0,1]^d\)

Obtain ideal vectors Footnote 2 \(v_j \in {\mathbb {R}}^d\) for each class j from the training set data using arithmetic mean.

where \(n_j\) denotes the number of instances belonging to class j , that is, the mean calculation is restricted to only those samples that belong to class j . \(X_i\) is an instance belonging to j th class.

Compute similarity measure S from each training instance \(X_i = (x_i^1, x_i^2,\ldots , x_i^d)\) to corresponding ideal vector \(v_j = (v_j^1, v_j^2,\ldots , v_j^d)\) as follows:

for \(x_i, v_j \in [0, 1]^d\) . For the sake of simplicity, we use \(p=1\) in the proposed method. Notice that the matrix \([S]_{n \times d \times t}\) needs to be reshaped by reducing the dimensions as \(n \times t\) and d .

Compute relevance (denoted by \(f^{REL}\) ) by measuring fuzzy entropy ( h ) for each feature \(l \in [1, d]\) using the similarity values ( \(S \langle x_i, v_j\rangle\) ) from the previous step and Eq. ( 4 ) as:

where \(S^l\) is the similarity for feature l of a sample \(X_i\) with ideal vector \(v_j\) of class j is summed over all samples ( \(i=1, \ldots ,n\) ) and classes ( \(j=1,\ldots ,t\) ).

The calculation of complementarity ( \(f^{COMP}\) ) is performed for \(l=1,\dots ,d\) using following steps:

Compute intersection between l th feature and all the other features by using algebraic product, that is,

where \(l^{'}=1,\dots ,d\) and this way, the intersection matrix \(T_1\) is formed.

Add class variable C to the intersection matrix as \(T_2=\{T_1,C\}\) .

Compute correlation Footnote 3 between features, \(Corr(T_2)\) .

Subtract identity matrix I from the correlation matrix and take absolute values from it, that is, \(T_3=|Corr(T_2)-I|\) .

Find the maximum correlation, \(H_c=\max (T_3)\)

Compute complementarity value ( \(f^{COMP}\) ) using the negation of correlation, such as \(f^{COMP}=1-H_{c}\) , and collect the complementarity information in the matrix.

In the proposed method, we generate feature weights focusing on an aggregate effect of relevance and complementarity in the learning process. Both relevance and complementarity have positive influence on feature weights, in fact, their combined effect could offer a best trade-off considering model’s performance and flexibility with small data sets (Singha and Shenoy 2018 ). Moreover, combining relevance with complementarity may identify individually relevant features to the class variable and provide unique information when considered simultaneously. This strategy can weigh features by considering different characteristics of the class distribution and features. This ensures the identification of more suitable nearest neighbors to the query instance based on distance, ultimately enhancing the classification performance of the method.

3.2 The FWM-LMFKNN algorithm

Based on the fundamental concepts of the previously discussed classifiers, the feature weighting strategy, Minkowski distance, and local means, we propose an extension of the FKNN method: the FWM-LMFKNN classifier.

The proposed classification method utilizes a two-phase approach: it calculates weights for each feature, and then it employs nearest neighbor classification to an unknown instance. In the first phase, a strategy incorporating the concepts based on relevance and complementarity is utilized to calculate the feature weights. The second phase involves finding sets of k nearest neighbors for an unknown instance ( y ) from each class based on the feature-weighted Minkowski distances between y and each training instance, followed by calculating local mean vectors for each set of k nearest neighbors. Next, the Minkowski distance between y and each local mean vector is calculated (feature weights are also applied), and fuzzy memberships for each class concerning y are computed. Finally, y is classified into the class with the highest membership degree. A formal definition of the proposed FWM-LMFKNN algorithm can be defined as follows. Suppose that we have a training data set \(\{x_i, \omega _i\}_{i=1}^n\) that is composed of n number of instances in d -dimensional feature space [i.e., \(x_i = (x_i^1, x_i^2,\ldots , x_i^d)\) ] and t different classes [i.e., \(\omega \in (c = (c_1, c_2,\ldots , c_t)\) )]. In the FWM-LMFKNN method, the class label \(\omega ^*\) for a given unknown instance [ \(y = (y^1, y^2,\ldots , y^d)\) ] is achieved as described below.

Calculate of relevance and complementarity measures using the notions presented in Sect. 3.1 and consider a combined effect of those measures by summing them (i.e., \(f^{REL}+f^{COMP}\) ). It is well-known that higher entropy values correspond to lower similarity and increased uncertainty in the corresponding feature, while lower entropy values indicate greater similarity and increased importance. This is reflected in the relevance value, which is based on fuzzy entropy and similarity. Therefore, the complementary values of relevance are employed as feature weights ( w ). This principle holds for complementarity as well. Accordingly, for a given feature l , the weight ( \(w^{l}\) ) can be defined as:

Compute the Minkowski distance, \(d_{mink\_dis}(y, x_j)\) between y and each training instance \(x_i\) according to:

As shown in this computation, a feature weight \(w^{l}\) , which is computed in the previous phase, is allocated to distance, which allows instances that are closer to y to be given a higher weight (lower distance), while instances that are farther away are given a lower weight (higher distance). This can help to mitigate the effects of noise and outliers in the training data.

Find the set of k nearest neighbors, \(\{x^{nn}_{ij}\}_{i=1}^k\) of y from each class j based on the ordered distances computed in the previous step. Here, nn stands for “nearest neighbor.”

Compute a local mean vector ( z ) using the set of k nearest neighbors in each class j according to:

Compute the Minkowski distances between y and each local mean vector as:

Compute fuzzy membership ( \(u_j\) ) to class j concerning y using the distances \(d_{mink\_dis}(y, z_{j})\) for \(j=1,2,\ldots ,t\) according to:

where \(u_{jj}\) is 1 for the known class and 0 is for the other classes. Notice that here \(u_{jj}\) has twice j , and it is because number of classes and the number of local mean vectors are the same.

Return the class \(\omega ^*\) of y , which has the highest membership degree [i.e., \(\omega ^* = \text {arg}\,\max _{\omega _i}\, u_i(y)\) ].

By incorporating a combination of relevance and complementarity into the feature weighting process, the FWM-LMFKNN method can effectively handle uncertainty and vagueness in the data, leading to more reasonable decision boundaries. Using the Minkowski distance metric also allows for more flexible and powerful distance computations, further improving the classifier’s performance. The steps of the proposed method discussed under Phase 1 and Phase 2 are summarized as Algorithm 1 and Algorithm 2.

figure a

FWM-LMFKNN (Phase 1:Feature weights generation)

figure b

FWM-LMFKNN (Phase 2: Classification)

To demonstrate the impact of using Minkowski distance in the proposed method, a simple experiment was conducted, as indicated by Karimi and Torabi ( 2022 ), by selecting the Vehicle data set from UCI repository (Dheeru and Taniskidou 2017 ). Three data instances, labeled \(x_1, x_2\) , and \(x_3\) , were selected from the data set. The Minkowski distances between \(x_1\) and \(x_2\) and between \(x_2\) and \(x_3\) were then calculated for varying values of p , as depicted in Fig. 2 . The results, as shown in the figure, indicate that when p is less than 4, \(x_3\) is closer to \(x_2\) than \(x_1\) . Conversely, when p is greater than 4, \(x_1\) is closer to \(x_2\) than \(x_3\) . Accordingly, when \(x_2\) is used as a test instance with \(x_1\) and \(x_3\) serving as training instances, the performance of FWM-LMFKNN classifier can be highly dependent on the value of the parameter p . This clearly indicates that using the Minkowski distance rather than the Euclidean distance in the proposed method allows it to find nearest neighbors in a more flexible way.

figure 2

Minkowski distance between \(x_1\) and \(x_2\) , and \(x_2\) and \(x_3\) with respect to different values of p

Furthermore, incorporating class representative local mean vectors in the FWM-LMFKNN enhances the classifier’s robustness and effectively addresses challenges arising from data distribution, including class imbalance, feature noise, and the impact of outliers. Previous studies in Kumbure et al. ( 2019 , 2020 ) have comprehensively examined and demonstrated the significance of utilizing class prototypes in the FKNN classifier. This strategy contributes to the classifier’s ability to handle complex data sets, providing a more reliable and accurate classification results.

3.3 Computational complexity analysis

In this subsection, we briefly discuss the computational complexities of the proposed FWM-LMFKNN method. Let us consider n , which indicates the number of training instances characterized by c classes in d -dimensional feature space, and query sample y (the sample to be classified). The KNN (Cover and Hart 1967 ) classifier consists of a calculation of the distances from y to all training instances to find k nearest neighbors and then observes the majority class among them. Therefore, according to Guo et al. ( 2014 , 2019 ), the computational complexity of the KNN method is \(O(nd+nk+k)\) . The FKNN (Keller et al. 1985 ) algorithm extends the KNN method by adding an additional step for computing the membership of y considering each class based on the distances between y and k nearest neighbors. Therefore, its computational complexity is \(O(nd+nk+ck+c)\) . Compared to the FKNN, the LM-FKNN consists of an additional step of the local mean computation using the set of k nearest neighbors from each class, thus it requires the computation of \(O(nd+nk+cdk+cd+c)\) . According to Duarte et al. ( 2019 ), the computational complexity of the Minkowski distance between two points of dimension d is O ( d ). But when we have n data instances, it requires O ( nd ). Then the computational complexity of the LM-FKNN method combined with the Minkowski distance is \(O(2nd+nk+cdk+cd+c)\) . The computation of feature relevance includes the calculation of ideal vectors from each class, the similarity measure, and fuzzy entropy. Therefore, it requires a computational complexity of O (3 ncd ). The calculation of the feature complementarity contains the steps of forming an intersection matrix, computation of the correlation, and complementarity values. Therefore, its computational complexity is \(O(nd^2+n+n) \approx O(nd^2)\) . The proposed FWM-LMFKNN combined those steps; therefore, its computational complexity is \(O(2nd+nk+cdk+cd+c+3ncd+ nd^2)\) , and when the constant terms are ignored and \(n>>k, c, d\) , it is \(\approx O(nd+ncd+nk+ nd^2)\) .

Based on the above analysis, it is evident that the FWM-LMFKNN method requires more computation time, particularly compared to the classical KNN, FKNN, and LM-FKNN methods. The primary reason for this is the additional computation involved in generating feature weights using relevance and complementarity, as well as the use of the Minkowski distance measure.

4 Experiment

To evaluate the performance of the proposed classifier, a series of experiments were conducted using real-world data sets and comparing the results to well-established baseline models. The following sub-sections describe the data sets used, the models compared, the evaluation metrics, and the experimental procedure.

4.1 Data sets

We used 20 real-world data sets to evaluate the performance of the proposed approach. These data sets are freely available at the UCI (Dheeru and Taniskidou 2017 ) and KEEL (Alcala-Fdez et al. 2011 ) machine learning repositories. Table 2 provides a summary of the main characteristics of the data sets, including the number of instances, features, classes, and corresponding data repository. The data sets ranged in size from 106 to 5500 instances and encompass binary and multi-class problems.

4.2 Testing methodology

This study employed a thirty-fold holdout validation procedure across all experiments with the data. In each run, utilizing the stratified random sampling technique, the data set was randomly divided into training and testing sets, while \(30\%\) of instances were allocated to the test set. The average classification accuracies with a \(95\%\) level of confidence of 30 splits of each data set were reported as the final results. This experimental setup was adopted based on the indications by Gou et al. ( 2014 , 2019 ) and Pan et al. ( 2017 ). As classification accuracy alone may not be sufficient in evaluating the performance of a classifier, additional measures such as sensitivity and specificity are often necessary to provide a more comprehensive evaluation (Kumbure et al. 2019 , 2020 ). For this reason, in addition to the accuracy, we calculated sensitivity and specificity values in our analysis, as they are commonly used measures in this context.

To provide a comprehensive comparison, several well-established models were chosen as baselines for the proposed classifier. The models included classical KNN (Cover and Hart 1967 ), FKNN (Keller et al. 1985 ), and six other competitive classifiers: MLM-KHNN (Pan et al. 2017 ), LM-PNN (Gou et al. 2014 ), LM-FKNN (Kumbure et al. 2019 ), FPFS-kNN (Memis et al. 2022 ), GMD-KNN (Gou et al. 2019 ) and IV-KNN (Derrac et al. 2015 ). The number of nearest neighbors ( k ) and the Minkowski distance parameter ( p ) were systematically varied during the validation process to optimize the performance of the proposed method and baseline models (i.e., grid search was performed). Specifically, the range of k was set from 1 to 20, and the set of p values considered were \(\{1,1.5,\ldots ,4\}\) for all the data sets. In fuzzy KNN-based methods, the fuzzy strength parameter ( m ) was fixed at a value of 2 for all FKNN classifiers throughout the experiments, as suggested by Derrac et al. ( 2015 ). Lastly, statistical tests, including Friedman and Bonferrni-Dunn tests, were applied to evaluate the statistical significance of the performance improvement of the proposed method compared to benchmark methods.

This section presents the experimental results of the proposed method’s performance against selected real-world data sets compared to the related baseline models. Optimal parameter values are also presented and discussed. Finally, a statistical analysis demonstrates that the proposed method achieved statistically significantly higher performance than the benchmark methods.

5.1 Evaluation of the proposed method

Table 3 presents a comprehensive comparison of the classification accuracy results and corresponding standard deviations of the proposed FWM-LMFKNN classifier and seven other baseline models, across 20 data sets. Note that the highest classification performance among the competing methods is highlighted in bold for each data set.

The table results show that the proposed FWM-LMFKNN method outperformed all other classifiers in terms of accuracy in 15 data sets (achieving the highest average accuracy of \(82.50\%\) ). The table also shows the second-best performance in four data sets (Balance and Texture). Additionally, the proposed method had the lowest average standard deviation (of \(2.57\%\) ) among all the classifiers, indicating that its performance was more consistent across the different data sets. This suggests that the proposed method was not only more accurate but also more robust than the other classifiers evaluated. In addition, corresponding sensitivity and specificity values (see Table 4 and Table 5 ) were reasonable and supported indications given by accuracy results. Besides, the performance of the LM-FKNN classifier appeared to be the second best (gaining average accuracy of \(81.24\%\) ), indicating the effectiveness of using class representative local means in the FKNN classifier, as indicated by related classifiers presented by Kumbure et al. ( 2019 , 2020 ).

Considering the optimal parameter values (see Table 4 ), the proposed method achieved the highest performance with a 1–3.5 range for the Minkowski distance parameter across all data sets. Among them, the Manhattan distance ( \(p=1\) ) appeared to have worked reasonably well in most cases, which is in line with the previous study by Kumbure and Luukka ( 2022 ). Regarding the parameter k , classification performance significantly improved with higher k values in the proposed method as well as the local means-based KNN methods. This finding aligns with previous research (Gou et al. 2014 ; Pan et al. 2017 ), demonstrating that multi-local mean vectors with nearest neighbors represented each class more accurately. This outcome is expected, as more data instances make local mean vectors more representative.

To further illustrate the performance of the proposed method in comparison to the baseline models, the accuracy results of all classifiers with varying values of parameter k on four selected cases, Appendicitis, Bupa, Cleveland, and Retinopathy data sets are presented in Fig. 3 . As clearly shown in these sample cases, the proposed method generally outperformed the benchmark methods across a range of k values, particularly at high values of k .

figure 3

The accuracy of each method with respect to parameter k in the Appendicitis ( a ), Bupa ( b ), Cleveland ( c ), and Retinopathy ( d )) data sets

Furthermore, the classification performances of each method are depicted in box plots in Fig. 4 , where each box represents the distribution of accuracy values over 30 runs during the validation on the Cleveland, Ionosphere, Spambase, and Vehicle data sets. A box plot analysis was conducted for the selected cases to understand the variability of each method’s performance across the cross-validation. As shown in the box plots, the proposed FWM-LMFKNN method had the highest median accuracy among all the classifiers, with a minimal interquartile range (IQR) across all cases considered, indicating that the accuracy values are relatively consistent across all runs.

figure 4

Box-plot distributions of accuracy by each classifier in the Cleveland ( a ), Ionosphere ( b ), Spambase ( c ), and Vehicle ( d ) data sets

5.2 Statistical analysis for benchmark comparison

To evaluate the significance of this improvement, we performed the Friedman test (Friedman 1937 ) and subsequently conducted the Bonferroni-Dunn test (Dunn 1961 ), following the methodology presented by Demšar ( 2006 ). In the Friedman test, classifiers were ranked individually for each data set—the top-performing classifier received a rank of 1, followed by the second-best with a rank of 2, and so forth, as detailed in Table 6 . Subsequently, we calculated the Friedman statistic using the following formula:

In the formula, \(R_j=\frac{1}{n}\sum _{i=1}r_i^j\) , where \(r_i^j\) represents the rank of the j th classifier on the i th data set, and there are \(c_n\) classifiers and N data sets. In line with this, we computed \(\chi ^2=42.9\) based on the averaged ranks presented in Table 6 (here, \(c_n=9\) and \(N=20\) ), resulting in a corresponding p -value of \(9.17 \times 10^{-7}\) . This result offers sufficient evidence to reject the null hypothesis that all classifiers perform equally. In other words, this result supports the conclusion that the chosen classifiers exhibited statistically significant differences in mean accuracies at a significance level of 0.05. Since the null hypothesis was rejected, a post-hoc test can be applied now.

Accordingly, we conducted the Bonferroni-Dunn test to compare the performance of the proposed FWM-MLFKNN classifier with each other method, as indicated by Demšar ( 2006 ). In this test, the performances of two classifiers are considered significantly different if the corresponding average rank difference is greater than or equal to the critical difference (CD), which is defined as:

where \(q_\alpha\) represents the critical value from the two-tailed Bonferroni-Dunn test. After we applied the test to our analysis, we observed \(CD=2.401\) ( \(q_{0.05}=2.72\) , \(c_n=9\) , and \(N=20\) ). By comparing this statistic with the difference in average rank between the FWM-LMFKNN method and each baseline method, we found that the proposed method demonstrated statistically significantly higher performance in terms of mean accuracy compared to all other methods. Table 7 presents the test results, where “Yes" denotes a significant difference between the mean accuracy of the proposed FWM-LMFKNN method and each benchmark classifier.

5.3 Ablation studies

In machine learning research, an ablation study is used to determine the significance of various components or aspects of a model and to evaluate their impact on overall performance (Meyes et al. 2019 ; Kwon and Lee 2024 ). Accordingly, we conducted ablation studies on the main components of our proposed method—namely feature weights (based on relevance and complementarity) and Minkowski distance—to demonstrate their effectiveness on classification performance. For the ablation studies, the performance of the original method, ML-FKNN (Kumbure et al. 2019 ), was compared with its variants: one using feature weights based on the feature relevance and complementarity and another utilizing a Minkowski distance metric-based similarity calculation. These methods were also compared with the proposed method (FWM-LMFKNN), which combines both feature weighting and the Minkowski distance. Models’ performances were compared using the experimental setup described in Sect. 4 . The best average accuracy values (along with their standard deviations) across all data sets are presented in Table 8 .

The results in Table 8 indicate that LM-FKNN with feature weights slightly improved performance compared to the original method, with an increase in average accuracy from 81.54 to 81.69%. By contrast, LM-FKNN with Minkowski distance considerably improved performance, with an increase from 81.54 to 82.45%. This improvement may be due to the parameterized Minkowski distance allowing the classifier to adopt the most suitable distance metric for the data, thus achieving a more accurate set of nearest neighbors. However, the proposed FWM-LMFKNN method, which combines both feature weighting and the Minkowski distance measure, achieved the best overall results, with the highest average accuracy of 82.52%. Although the overall performance difference between the proposed method and LM-FKNN with Minkowski distance was not large, FWM-LMFKNN performed the best on many data sets, highlighting the positive impact of feature weighting. The standard deviation results of the proposed method were also low and reasonable, further supporting its robustness.

6 Conclusion

In this paper, we proposed a new fuzzy k -nearest neighbor method called FWM-LMFKNN based on feature weighting, Minkowski distance, and class representative local mean vectors. To determine the optimal feature weights in the proposed approach, we explicitly developed a feature weighting scheme considering a combined effect of relevance and complementarity. The proposed method was evaluated using a variety of real-world data sets, and the results show that it outperformed the baseline models in terms of used evaluation metrics. The use of feature weights and Minkowski distance allows for a more accurate calculation of the distances between new instance and training instances based on their nearness, which improves the accuracy of the proposed FWM-LMFKNN method. The ablation study conducted demonstrated the effectiveness of these aspects of the FMW-LMFKNN. Additionally, the proposed method takes into account the local structure of class subsets by using local mean vectors, further improving the performance of the classification. The results of this study demonstrate that the proposed method is a powerful tool for classification tasks and can be applied to a wide range of data sets.

Future work includes further testing of various data sets, evaluating the reliability of the proposed method, and investigating the possibility of incorporating other mean operators, such as generalized mean, in the local mean and ideal vector computation.

Data availability

Data used in the manuscript are freely available in the UCI and KEEL repositories.

Code availability

The code of the proposed method is available at  https://github.com/MahindaMK/FWM-LMFKNN-classifier .

Note that since similarity degree \(s \in [0,1]\) , it can be used with fuzzy entropy measures even though it is initially defined for fuzzy sets.

The ideal vectors represent the mean vectors of each class subset in the training set.

For correlation, Kendall ( 1938 ) rank correlation is used.

Alcala-Fdez J, Fernandez A, Luengo J, Derrac J, García S, Sánchez L, Herrera F (2011) Keel data-mining software tool: data set repository, integration of algorithms and experimental analysis framework. J Multiple-Valued Logic Soft Comput 17:255–287

Google Scholar  

Al-sharhan S, Karray F, Gueaieb W, Basir O (2001) Fuzzy entropy: a brief survey. In: 10th IEEE int. conf. on fuzzy systems, vol. 3, pp 1135–1139

Bergamasco LCC, Nunes FLS (2019) Intelligent retrieval and classification in three-dimensional biomedical images - a systematic mapping. Comput Sci Rev 31:19–38

Article   Google Scholar  

Bian Z, Vong CM, Wong PK, Wang S (2022) Fuzzy knn method with adaptive nearest neighbors. IEEE Trans Cybern 52(6):5380–5393

Biswas N, Chakraborty S, Mullick SS, Das S (2018) A parameter independent fuzzy weighted k-nearest neighbor classifier. Pattern Recogn Lett 101:80–87

Cover T, Hart P (1967) Nearest neighbor pattern classification. IEEE Trans Inf Theory 13:21–27

De Luca A, Termini S (1971) A definition of non-probabilistic entropy in setting of fuzzy set theory. Inf Controls 20:301–312

Demšar J (2006) Statistical comparisons of classifiers over multiple data sets. J Mach Learn Res 7(1):1–30

MathSciNet   Google Scholar  

Derrac J, Chiclana F, García S, Herrera F (2016) Evolutionary fuzzy k-nearest neighbors algorithm using interval-valued fuzzy sets. Inf Sci 329:144–163 ( Special issue on Discovery Science )

Derrac J, Chiclana F, García S, Herrera F (2015) An interval valued k-nearest neighbors classifier. In: Proc. of the 2015 conf. of the int. fuzzy systems association and the European society for fuzzy logic and technology, pp 378–384. Atlantis Press

Dheeru D, Taniskidou EK (2017) UCI machine learning repository

Duarte FS, Rios RA, Hruschka ER, de Mello RF (2019) Decomposing time series into deterministic and stochastic influences: a survey. Digital Signal Process 95:102582

Dunn OJ (1961) Multiple comparisons among means. J Am Stat Assoc 56:52–64

Article   MathSciNet   Google Scholar  

Friedman M (1937) The use of ranks to avoid the assumption of normality implicit in the analysis of variance. J Am Stat Assoc 32:675–701

González S, García S, Li S-T, John R, Herrera F (2021) Fuzzy k-nearest neighbors with monotonicity constraints: Moving towards the robustness of monotonic noise. Neurocomputing 439:106–121

Gou J, Zhan Y, Rao Y, Shen X, Wang X, He W (2014) Improved pseudo nearest neighbor classification. Knowl-Based Syst 70:361–375

Gou J, Ma H, Ou W, Zeng S, Rao Y, Yang H (2019) A generalized mean distance-based k-nearest neighbor classifier. Expert Syst Appl 115:356–372

Gueorguieva N, Valova I, Georgiev G (2017) M &MFCM: Fuzzy c-means clustering with mahalanobis and minkowski distance metrics. Procedia Comput Sci 114:224–233

Karimi Z, Torabi Z (2022) An adaptive k-nearest neighbor classifier using differential evolution with auto-enhanced population diversity for intrusion detection. Research Square

Kassani PH, Teoh ABJ, Kim E (2017) Evolutionary-modified fuzzy nearest-neighbor rule for pattern classification. Expert Syst Appl 88:258–269

Keller JM, Gray MR, Givens JA (1985) A fuzzy k-nearest neighbor algorithm. IEEE Trans Syst 15:580–585

Kendall M (1938) A new measure of rank correlation. Biometrika 30(1–2):81–89

Kohavi R, John G (1997) Wrappers for feature subset selection. Artif Intell 1–2:273–324

Kumar P, Thakur RS (2021) Liver disorder detection using variable-neighbor weighted fuzzy k nearest neighbor approach. Multimed Tools Appl 80:16515–16535

Kumbure MM, Luukka P, Collan M (2019) An enhancement of fuzzy k-nearest neighbor classifier using multi-local power means. In: Proc. of the 11th conf. of the European society for fuzzy logic and technology (eusflat 2019), pp. 83–90. Atlantis Press

Kumbure MM, Luukka P (2022) A generalized fuzzy k-nearest neighbor regression model based on minkowski distance. Granular Comput 7:657–671

Kumbure MM, Luukka P, Collan M (2020) A new fuzzy k-nearest neighbor classifier based on the Bonferroni mean. Pattern Recogn Lett 140:172–178

Kwon Y, Lee Z (2024) A hybrid decision support system for adaptive trading strategies: combining a rule-based expert system with a deep reinforcement learning strategy. Decis Support Syst 177:114100

Li Y, Zhao D, Xu Z, Heidari AA, Chen H, Jiang X, Xu S (2023) BSRWPSO-FKNN: a boosted pso with fuzzy k-nearest neighbor classifier for predicting a topic dermatitis disease. Front Neuroinform 16:1063048

Lohrmann C, Luukka P, Jablonska-Sabuka M, Kauranne T (2018) A combination of fuzzy similarity measures and fuzzy entropy measures for supervised feature selection. Expert Syst Appl 110:216–236

Łukasiewics J (1970) Selected work. Cambridge University Press, Cambridge

Luukka P (2011) Feature selection using fuzzy entropy measures with similarity classifier. Expert Syst Appl 38:4600–4607

Luukka P, Saastamoinen K, Könönen V (2001) A classifier based on the maximal fuzzy similarity in the generalized łukasiewicz-structure. In: Proceedings of 10th IEEE international conference on fuzzy systems

Ma X-A, Ju C (2022) Fuzzy information-theoretic feature selection via relevance, redundancy, and complementarity criteria. Inf Sci 611:564–590

Maillo J, García S, Luengo J, Herrera F, Triguero I (2020) Fast and scalable approaches to accelerate the fuzzy k-nearest neighbors classifier for big data. IEEE Trans Fuzzy Syst 28(5):874–886

Memis S, Enginoglu S, Erkan U (2022) Fuzzy parameterized fuzzy soft k-nearest neighbor classifier. Neurocomputing 500:351–378

Meyer P, Schretter C, Bontempi G (2008) Information-theoretic feature selection in microarray data using variable complementarity. IEEE J Sel Top Signal Process 2:261–274

Meyes R, Lu M, de Puiseau CW, Meisen T (2019) Ablation studies in artificial neural networks. https://arxiv.org/abs/1901.08644

Mitania Y, Hamamotob Y (2006) A local mean-based nonparametric classifier. Pattern Recogn Lett 27:1151–1159

Morente-Molinera JA, Mezei J, Carlsson C, Herrera-Viedma E (2017) Improving supervised learning classification methods using multigranular linguistic modeling and fuzzy entropy. IEEE Trans Fuzzy Syst 25:1078–1089

Pan Z, Wang Y, Ku W (2017) A new k-harmonic nearest neighbor classifier based on the multi-local means. Expert Syst Appl 67:115–125

Shannon CE (1948) A mathematical theory of communication. Bell Syst Tech J 27:623–659

Singha S, Shenoy P (2018) An adaptive heuristic for feature selection based on complementarity. Mach Learn 107:2027–2071

Sun L, Wang J, Wei J (2017) Avc: Selecting discriminative features on basis of auc by maximizing variable complementarity. BMC Bioinformatics 18:50

Vergara J, Estevez P (2014) A review of feature selection methods based on mutual information. Neural Comput Appl 24:175–186

Yang M-S, Sinaga KP (2021) Collaborative feature-weighted multi-view fuzzy c-means clustering. Pattern Recogn 119:108064

Yu L, Liu H (2004) Efficient feature selection via analysis of relevance and redundancy. J Mach Learn Res 5:207–228

Zadeh LA (1965) Fuzzy sets. Inf Control 8:338–353

Zeraatkar S, Afsari F (2021) Interval-valued fuzzy and intuitionistic fuzzy-knn for imbalanced data classification. Expert Syst Appl 184:115510

Zhang Q, Sheng J, Zhang Q, Wang L, Yang Z, Xin Y (2023) Enhanced Harris Hawks optimization-based fuzzy k-nearest neighbor algorithm for diagnosis of Alzheimer’s disease. Comput Biol Med 165:107392

Download references

Open Access funding provided by LUT University (previously Lappeenranta University of Technology (LUT)).

Author information

Authors and affiliations.

Business School, LUT University, Yliopistonkatu 34, 53850, Lappeenranta, Finland

Mahinda Mailagaha Kumbure & Pasi Luukka

You can also search for this author in PubMed   Google Scholar

Contributions

Mahinda Mailagaha Kumbure: Conceptualization, Methodology, Software, Validation, Investigation, Writing - Original Draft. Pasi Luukka: Conceptualization, Methodology, Writing - Review & Editing, Supervision.

Corresponding author

Correspondence to Mahinda Mailagaha Kumbure .

Ethics declarations

Conflict of interest.

On behalf of all authors, the corresponding author states that there is no Conflict of interest.

Additional information

Publisher's note.

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/ .

Reprints and permissions

About this article

Mailagaha Kumbure, M., Luukka, P. Local means-based fuzzy k -nearest neighbor classifier with Minkowski distance and relevance-complementarity feature weighting. Granul. Comput. 9 , 73 (2024). https://doi.org/10.1007/s41066-024-00496-0

Download citation

Received : 04 June 2024

Accepted : 10 August 2024

Published : 31 August 2024

DOI : https://doi.org/10.1007/s41066-024-00496-0

Share this article

Anyone you share the following link with will be able to read this content:

Sorry, a shareable link is not currently available for this article.

Provided by the Springer Nature SharedIt content-sharing initiative

  • Classification
  • Complementarity
  • Fuzzy entropy
  • Local means
  • Machine learning
  • Find a journal
  • Publish with us
  • Track your research

IMAGES

  1. 15 Null Hypothesis Examples (2024)

    null hypothesis research

  2. Null Hypothesis Examples

    null hypothesis research

  3. Null Hypothesis

    null hypothesis research

  4. 10 Easy Steps to Find Null Hypothesis in Research Articles

    null hypothesis research

  5. How to Formulate a Null Hypothesis (With Examples)

    null hypothesis research

  6. Null And Research Hypothesis Examples /

    null hypothesis research

VIDEO

  1. Hypothesis Testing: the null and alternative hypotheses

  2. Understanding the Null Hypothesis

  3. NEGATIVE RESEARCH HYPOTHESIS STATEMENTS l 3 EXAMPLES l RESEARCH PAPER WRITING GUIDE l THESIS TIPS

  4. How To Formulate The Hypothesis/What is Hypothesis?

  5. Difference between null and alternative hypothesis |research methodology in tamil #sscomputerstudies

  6. Hypothesis

COMMENTS

  1. Null hypothesis

    In scientific research, the null hypothesis (often denoted H 0) [1] is the claim that the effect being studied does not exist. [note 1] The null hypothesis can also be described as the hypothesis in which no relationship exists between two sets of data or variables being analyzed.If the null hypothesis is true, any experimentally observed effect is due to chance alone, hence the term "null".

  2. Null & Alternative Hypotheses

    The null hypothesis (H0) answers "No, there's no effect in the population.". The alternative hypothesis (Ha) answers "Yes, there is an effect in the population.". The null and alternative are always claims about the population. That's because the goal of hypothesis testing is to make inferences about a population based on a sample.

  3. Null Hypothesis: Definition, Rejecting & Examples

    When your sample contains sufficient evidence, you can reject the null and conclude that the effect is statistically significant. Statisticians often denote the null hypothesis as H 0 or H A.. Null Hypothesis H 0: No effect exists in the population.; Alternative Hypothesis H A: The effect exists in the population.; In every study or experiment, researchers assess an effect or relationship.

  4. How to Write a Null Hypothesis (5 Examples)

    H 0 (Null Hypothesis): Population parameter =, ≤, ≥ some value. H A (Alternative Hypothesis): Population parameter <, >, ≠ some value. Note that the null hypothesis always contains the equal sign. We interpret the hypotheses as follows: Null hypothesis: The sample data provides no evidence to support some claim being made by an individual.

  5. Null and Alternative Hypotheses

    The null and alternative hypotheses offer competing answers to your research question. When the research question asks "Does the independent variable affect the dependent variable?", the null hypothesis (H 0) answers "No, there's no effect in the population.". On the other hand, the alternative hypothesis (H A) answers "Yes, there ...

  6. Null Hypothesis Definition and Examples, How to State

    Step 1: Figure out the hypothesis from the problem. The hypothesis is usually hidden in a word problem, and is sometimes a statement of what you expect to happen in the experiment. The hypothesis in the above question is "I expect the average recovery period to be greater than 8.2 weeks.". Step 2: Convert the hypothesis to math.

  7. How to Formulate a Null Hypothesis (With Examples)

    To distinguish it from other hypotheses, the null hypothesis is written as H 0 (which is read as "H-nought," "H-null," or "H-zero"). A significance test is used to determine the likelihood that the results supporting the null hypothesis are not due to chance. A confidence level of 95% or 99% is common. Keep in mind, even if the confidence level is high, there is still a small chance the ...

  8. Null Hypothesis Definition and Examples

    Null Hypothesis Examples. "Hyperactivity is unrelated to eating sugar " is an example of a null hypothesis. If the hypothesis is tested and found to be false, using statistics, then a connection between hyperactivity and sugar ingestion may be indicated. A significance test is the most common statistical test used to establish confidence in a ...

  9. An Introduction to Statistics: Understanding Hypothesis Testing and

    HYPOTHESIS TESTING. A clinical trial begins with an assumption or belief, and then proceeds to either prove or disprove this assumption. In statistical terms, this belief or assumption is known as a hypothesis. Counterintuitively, what the researcher believes in (or is trying to prove) is called the "alternate" hypothesis, and the opposite ...

  10. Null Hypothesis

    Definition. In formal hypothesis testing, the null hypothesis ( H0) is the hypothesis assumed to be true in the population and which gives rise to the sampling distribution of the test statistic in question (Hays 1994 ). The critical feature of the null hypothesis across hypothesis testing frameworks is that it is stated with enough precision ...

  11. What Is The Null Hypothesis & When To Reject It

    A tutorial on a practical Bayesian alternative to null-hypothesis significance testing. Behavior research methods, 43, 679-690. Nickerson, R. S. (2000). Null hypothesis significance testing: a review of an old and continuing controversy. Psychological methods, 5(2), 241. Rozeboom, W. W. (1960). The fallacy of the null-hypothesis significance test.

  12. 9.1: Null and Alternative Hypotheses

    The actual test begins by considering two hypotheses.They are called the null hypothesis and the alternative hypothesis.These hypotheses contain opposing viewpoints. \(H_0\): The null hypothesis: It is a statement of no difference between the variables—they are not related. This can often be considered the status quo and as a result if you cannot accept the null it requires some action.

  13. Understanding Null Hypothesis Testing

    A crucial step in null hypothesis testing is finding the likelihood of the sample result if the null hypothesis were true. This probability is called the p value. A low p value means that the sample result would be unlikely if the null hypothesis were true and leads to the rejection of the null hypothesis. A high p value means that the sample ...

  14. 9.1 Null and Alternative Hypotheses

    The actual test begins by considering two hypotheses.They are called the null hypothesis and the alternative hypothesis.These hypotheses contain opposing viewpoints. H 0, the —null hypothesis: a statement of no difference between sample means or proportions or no difference between a sample mean or proportion and a population mean or proportion. In other words, the difference equals 0.

  15. What is Null Hypothesis? What Is Its Importance in Research?

    The null hypothesis is the opposite stating that no such relationship exists. Null hypothesis may seem unexciting, but it is a very important aspect of research. In this article, we discuss what null hypothesis is, how to make use of it, and why you should use it to improve your statistical analyses.

  16. 7.3: The Research Hypothesis and the Null Hypothesis

    This null hypothesis can be written as: H0: X¯ = μ H 0: X ¯ = μ. For most of this textbook, the null hypothesis is that the means of the two groups are similar. Much later, the null hypothesis will be that there is no relationship between the two groups. Either way, remember that a null hypothesis is always saying that nothing is different.

  17. Null Hypothesis Examples

    An example of the null hypothesis is that light color has no effect on plant growth. The null hypothesis (H 0) is the hypothesis that states there is no statistical difference between two sample sets. In other words, it assumes the independent variable does not have an effect on the dependent variable in a scientific experiment.

  18. Null Hypothesis

    A null hypothesis is a theory based on insufficient evidence that requires further testing to prove whether the observed data is true or false. For example, a null hypothesis statement can be "the rate of plant growth is not affected by sunlight.". It can be tested by measuring the growth of plants in the presence of sunlight and comparing ...

  19. Null hypothesis significance testing: a short tutorial

    Abstract: "null hypothesis significance testing is the statistical method of choice in biological, biomedical and social sciences to investigate if an effect is likely". No, NHST is the method to test the hypothesis of no effect. I agree - yet people use it to investigate (not test) if an effect is likely.

  20. Null Hypothesis: What Is It, and How Is It Used in Investing?

    The null hypothesis is used in quantitative analysis to test theories about economies, investing strategies, and markets to decide if an idea is true or false. Hypothesis testing assesses the ...

  21. Statistics in psychological research

    In keeping with growing concerns about some of the limitations of null hypothesis significance testing, such as its role in the so-called replication crisis, the course also delves into these concerns and possible ways to address them, including introductory consideration of statistical power and alternatives to hypothesis testing like ...

  22. SAGE Open July-September 2024: 1-19 Financial ...

    Original Research SAGE Open July-September 2024: 1-19 The Author(s) 2024 ... Empirical Literature and Hypothesis Development Numerous studies have examined the link between FI ... tory variables. Also, we do not reject the null hypothesis of the Sargan test of over-identification restriction, indi-cating that, there is overall exogeneity of ...

  23. Local means-based fuzzy k-nearest neighbor classifier with ...

    Our research aims to design and develop a local means-based FKNN classifier that effectively addresses the noise and uncertainty in the data, thereby yielding improved performance. ... This result offers sufficient evidence to reject the null hypothesis that all classifiers perform equally. In other words, this result supports the conclusion ...