How (and because) use outliers function in excel

Contents


An outlier is a value that is significantly higher or lower than most of the values ​​in your data. When using Excel to analyze data, outliers can skew results. As an example, the mean average of a data set could truly reflect its values. Excel provides some useful functions to help you manage your outliers, so let's take a look.

A quick example

In the image below, outliers are reasonably easy to spot: The value of two assigned to Eric and the value of 173 assigned to Ryan. In a data set like this, it's pretty easy to detect and deal with those outliers manually.

Range of values ​​containing outliers

In a larger data set, that will not be the case. Being able to identify outliers and remove them from statistical calculations is essential, and that is what we will see in this post.

How to Find Outliers in Your Data

To find outliers in a data set, we use the following steps:

  1. Find the first and third quartiles (we will talk a little about what they are).
  2. Evaluate the interquartile range (we will also explain it a little later).
  3. Returns the upper and lower limits of our data range.
  4. Use these limits to identify peripheral data points.

The cell range to the right of the dataset seen in the image below will be used to store these values.

Quartile range

Let's get started.

Step one: calculate quartiles

If you divide your data into quarters, each of these sets is called a quartile. The 25% lowest number of numbers in the range constitutes the first quartile, the next 25% the second quartile, and so on. We take this step first because the most commonly used definition of an outlier is a data point that is more than 1,5 interquartile ranges (IQR) below the first quartile and 1,5 interquartile ranges above the third quartile. To set those values, first we have to find out what the quartiles are.

Excel provides a QUARTILE function to calculate quartiles. You need two pieces of information: the matrix and the quarter.

=QUARTILE(Array, Quart)

the training is the range of values you are evaluating. And the quart is a number that represents the quartile that you want to return (as an example, 1 for him 1S t quartile, 2 for the second quartile and so on).

Note: In excel 2010, Microsoft released quartile features. INC and QUARTILE. EXC as improvements to the QUARTILE function. quartile is more backward compatible when working in multiple versions of excel.

Let's go back to our example table.

Quartile range

To calculate the 1S t Quartile we can use the next formula in cell F2.

=QUARTILE(B2:B14,1)

As you enter the formula, Excel provides a list of alternatives for the argument of a quarter.

To calculate the 3rd quartile, we can enter a formula like above in cell F3, but using a three instead of one.

=QUARTILE(B2:B14,3)

Now, we have the quartile data points displayed in the cells.

Values ​​of the first and third quartiles

Step two: examine the interquartile range

The interquartile range (o IQR) is he 50% mean of the values ​​in your data. It is calculated as the difference between the value of the first quartile and the value of the third quartile.

We will use a simple formula in cell F4 that subtracts the 1S t quartile of 3rd pastern:

=F3-F2

Now, we can see our interquartile range shown.

Interquartile value

Step three: return the lower and upper limits

The lower and upper limits are the smallest and largest values ​​of the data range that we want to use. Any value less than or greater than these limit values ​​are the outliers.

We will calculate the lower limit in cell F5 by multiplying the IQR value by 1.5 and later subtracting it from data point Q1:

=F2-(1.5*F4)

Excel formula for lower bound value

Note: The brackets in this formula are not necessary because the multiplication part will be calculated before the subtraction part, but they make the formula easier to read.

To calculate the upper bound in cell F6, we will multiply the IQR by 1.5 again, but this time add to data point Q3:

=F3+(1.5*F4)

Lower and upper limit values

Step four: identify outliers

Now that we have all of our underlying data set up, it's time to identify our peripheral data points, those that are lower than the lower limit value or higher than the upper limit value.

We will use the OR function to perform this logical test and display the values ​​that meet these criteria by entering the next formula in cell C2:

=OR(B2<$F$5,B2>$F$6)

OR function to identify outliers

Then we will copy that value into our cells C3-C14. A TRUE value indicates an outlier and, as you can see, we have two in our data.

Ignore outliers when calculating mean average

The use of the QUARTILE function allows us to calculate the IQR and work with the most used definition of an outlier.. Despite this, when calculating the mean average for a range of values ​​and ignoring outliers, there is a faster and easier to use function. This technique will not identify an outlier like before, but it will allow us to be flexible with what we might consider our portion of outliers.

The function we need is called TRIMMEAN, and you can see its syntax below:

=TRIMMEAN(Array, percent)

the training is the range of values ​​you want to average. the percent is the percentage of data points to exclude from the top and bottom of the data set (you can enter it as a percentage or a decimal value).

We enter the formula below into cell D3 in our example to calculate the average and exclude the 20% outliers.

=TRIMMEAN(B2:B14, 20%)

TRIMMEAN formula for average excluding outliers


There you have two different functions to handle outliers. Whether you want to identify them for some reporting needs or exclude them from calculations as averages, Excel has a function that adapts to your needs.

Subscribe to our Newsletter

We will not send you SPAM mail. We hate it as much as you.