WHAT IS RANK IN STATISTICS? (WITH EXAMPLES) - DUDAS

The range, range or amplitude, in statistics, is the difference (subtraction) between the maximum value and the minimum value of a set of data from a sample or a population. If the range is represented by the letter R and the data is represented by x, the formula for the range is simply:

R = x _max - x _min

Where x _max is the maximum value of the data and x _min is the minimum.

Figure 1. Range of data corresponding to the population of Cádiz in the last two centuries. Source: Wikimedia Commons.

The concept is very useful as a simple measure of dispersion to quickly appreciate the variability of the data, since it indicates the extension or length of the interval where these are found.

For example, suppose the height of a group of 25 male first-year engineering students at a university is measured. The tallest student in the group is 1.93 m and the shortest 1.67 m. These are the extreme values of the sample data, therefore their path is:

R = 1.93 - 1.67 m = 0.26 m or 26 cm.

The height of the students in this group is distributed along this range.

Advantages and disadvantages

Range is, as we said before, a measure of how spread out the data is. A small range indicates that the data are more or less close and the spread is low. On the other hand, a larger range is indicative that the data is more dispersed.

The advantages of calculating the range are obvious: it is very easy and fast to find, as it is a simple difference.

It also has the same units as the data with which it works and the concept is very easy to interpret for any observer.

In the example of the height of engineering students, if the range had been 5 cm, we would say that the students are all approximately the same size. But with a range of 26 cm, we immediately assume that there are students of all intermediate heights in the sample. Is this assumption always correct?

Disadvantages of range as a measure of dispersion

If we look carefully, it may be that in our sample of 25 engineering students, only one of them measures 1.93 and the remaining 24 have heights close to 1.67 m.

And yet the range remains the same, although the opposite is perfectly possible: that the height of the majority is around 1.90 m and only one is 1.67 m.

In either case, the distribution of the data is quite different.

The disadvantages of range as a measure of dispersion are because it only uses extreme values and ignores all others. Since most of the information is lost, you have no idea how the sample data is distributed.

Another important characteristic is that the range of the sample never decreases. If we add more information, that is, we consider more data, the range increases or stays the same.

And in any case, it is only useful when working with small samples, its sole use as a measure of dispersion in large samples is not recommended.

What must be done is to complement it with the calculation of other measures of dispersion that do take into account the information provided by the total data: interquartile range, variance, standard deviation and coefficient of variation.

Interquartile range, quartiles and worked example

We have realized that the weakness of the range as a measure of dispersion is that it only makes use of the extreme values of the data distribution, omitting the others.

To avoid this inconvenience, quartiles are used: three values known as position measures.

They distribute the ungrouped data into four parts (other widely used position measures are deciles and percentiles). These are its characteristics:

-The first quartile Q ₁ is the value of the data such that 25% of all of them is less than Q ₁.

-The second quartile Q ₂ is the median of the distribution, which means that half (50%) of the data is less than this value.

-Finally, the third quartile Q ₃ indicates that 75% of the data are less than Q ₃.

Then, the interquartile range or interquartile range is defined as the difference between the third quartile Q ₃ and the first quartile Q ₁ of the data:

Interquartile range = R _Q = Q ₃ - Q ₁

In this way, the value of the range R _{Q is} not so affected by extreme values. For this reason, it is advisable to use it when dealing with skewed distributions, such as the very tall or very short students described above.

- Calculation of quartiles

There are several ways to calculate them, here we will propose one, but in any case it is necessary to know the order number "N _o ", which is the place that the respective quartile occupies in the distribution.

That is, if for example the term that corresponds to Q ₁ is the second, the third or the fourth and so on of the distribution.

First quartile

N _or (Q ₁) = (N + 1) / 4

Second quartile or median

N _or (Q ₂) = (N + 1) / 2

Third quartile

N _or (Q ₃) = 3 (N + 1) / 4

Where N is the number of data.

The median is the value that is right in the middle of the distribution. If the number of data is odd there is no problem in finding it, but if it is even, then the two central values are averaged to become one.

Once the order number has been calculated, one of these three rules is followed:

-If there are no decimals, the data indicated in the distribution is searched and this will be the quartile sought.

-When the order number is halfway between two, then the data indicated by the integer part is averaged with the following data, and the result is the corresponding quartile.

-In any other case, it is rounded to the nearest integer and that will be the quartile position.

Worked example

On a scale of 0 to 20, a group of 16 math I students earned the following marks (points) on a midterm exam:

16, 10, 12, 8, 9, 15, 18, 20, 9, 11, 1, 13, 17, 9, 10, 14

Find:

a) The range or range of the data.

b) The values of the quartiles Q ₁ and Q ₃

c) The interquartile range.

Figure 2. Do the scores on this math test have that much variability? Source: Pixabay.

Solution to

The first thing to do to find the route is to order the data in increasing or decreasing order. For example in increasing order you have:

1, 8, 9, 9, 9, 10, 10, 11, 12, 13, 14, 15, 16, 17, 18, 20

Using the formula given at the beginning: R = x _max - x _min

R = 20 - 1 points = 19 points.

According to the result, these ratings have a great dispersion.

Solution b

N = 16

N _or (Q ₁) = (N + 1) / 4 = (16 + 1) / 4 = 17/4 = 4.25

It is a number with decimals, whose integer part is 4. Then we go to the distribution, we look for the data that occupies the fourth place and its value is averaged with that of the fifth position. Since they are both 9, the average is also 9 and so:

Q ₁ = 9

Now we repeat the procedure to find Q ₃:

N _or (Q ₃) = 3 (N + 1) / 4 = 3 (16 +1) / 4 = 12.75

Again it is a decimal, but since it is not half way, it is rounded to 13. The quartile sought occupies the thirteenth position and is:

Q ₃ = 16

Solution c

R _Q = Q ₃ - Q ₁ = 16 - 9 = 7 points.

Which, as we can see, is much smaller than the range of data calculated in section a), because the minimum score was 1 point, a value much further from the rest.

References

Berenson, M. 1985. Statistics for management and economics. Interamericana SA
Canavos, G. 1988. Probability and Statistics: Applications and methods. McGraw Hill.
Devore, J. 2012. Probability and Statistics for Engineering and Science. 8th. Edition. Cengage.
Examples of quartiles. Recovered from: matematicas10.net.
Levin, R. 1988. Statistics for Administrators. 2nd. Edition. Prentice Hall.
Walpole, R. 2007. Probability and Statistics for Engineering and Sciences. Pearson.

WHAT IS RANK IN STATISTICS? (WITH EXAMPLES) - DUDAS - 2025