maths > statistics-basics

Mode of Group Data

what you'll learn...

overview

In this page, finding mode of grouped data is explained with some examples.

The formula is derived for students to understand how mode is calculated -- this is not available in other books.

recap

Mode of a data is, the value that is repeated the most number of times in the data.

Consider the data: Number of pens carried by 10 students.

$1,2,1,2,2,3,2,2,3,4$$1 , 2 , 1 , 2 , 2 , 3 , 2 , 2 , 3 , 4$

The mode of the data is $2$$2$".

To compute the mode, the number of times a data value is repeated is calculated.

$1$$1$ is repeated twice
$2$$2$ is repeated $5$$5$ times
$3$$3$ is repeated twice
$4$$4$ is repeated once

From this, it is concluded that $2$$2$ is repeated the most and so $2$$2$ is the mode of the given data.

Consider the data given in the table in frequency form. The mode is the data value having highest frequency.

The mode is $2$$2$.

mode of grouped data

Consider the grouped data given in the table. Let us see how to find the mode of the grouped data.

Figuring out the mode of grouped data is given in table.

In grouped data, the class with the highest frequency is the modal class. There are three possibilities to provide the mode of the data.

The modal class is specified for the data. This is very broad range to use in applications.

The class mark of the modal class is specified for the data. This value is not accurate enough to use in applications.

The value at which the frequency distribution has a maximum within the class interval is calculated. This provides a good approximation to the mode of the data.

The first two are not used as mode of the data. Finding mode as per the third method is explained in the next pages.

better estimate of mode of grouped data

Consider the histogram of grouped data.

It is noted that the underlying distribution is a continuous curve. The grouped data partitions into intervals.

Mode cannot be computed using the values on the curve, as the available information is only the grouped data.

Mode is to be computed using the values of histogram. Though we know that the distribution is a continuous curve, the values on the curve is not known. We only have the histogram values.

Consider the class partition and the underlying continuous frequency distribution given in the figure.
For simplicity, only three classes of the histogram is shown. In the given examples, by chance, the class lower and upper limits are chosen, such that the modal class is at the exact center of the curve.

Consider another class partition and the underlying continuous frequency distribution given in the figure.

In this, the class partitions happens to be not at the center of the underlying curve. The objective of defining a formula for mode is to compute the approximate position of the maximum in the underlying curve, which is the frequency distribution.

It is noted that the maximum is within the modal class.

It is also noted that the position of the maximum affects the frequency of the classes on either side of the modal class.

If the position of the maximum is towards the left, then the class on the left has higher frequency than the class on the right. (This is shown in the figure.)

If the position of the maximum is towards the right, then the class on the right has higher frequency than the class on the left.

Consider a simple grouped data given in the figure.

This has three class intervals given as
first class is from $l-h$$l - h$ to $l$$l$
second class is from $l$$l$ to $l+h$$l + h$
third class is from $l+h$$l + h$ to $l+2h$$l + 2 h$

The underlying continuous distribution is visualized and shown in the figure.

The position of maximum on the curve is the mode. The position of maximum is to be computed based on the three classes.

Consider the grouped data given in the figure. The objective is to find position of maximum as the mode of the grouped data.

We can use straight lines to approximate the rate at which data changes near the maximum. The following approximations are used to derive the mode

the rate of changes (or the slope of the lines) on the two sides of the maximum frequency (peak point) are equal. That is the slopes are equal.

the points $\left(l,p\right)$$\left(l , p\right)$ and $\left(l+h,n\right)$$\left(l + h , n\right)$ are on either side of the maximum.

To work out the approximation, a peak position $\left(m,f\right)$$\left(m , f\right)$ is assumed.

The $m$$m$ in point $\left(m,f\right)$$\left(m , f\right)$ denotes the mode of the grouped data.

Consider the grouped data given in the figure. The objective is to find the position of maximum as the mode of the grouped data.

The slope of the two line segments are equal. The line segment $\overline{AB}$$\overline{A B}$ is considered and the slope of that equals the slope of the line segments on either side of the maximum

Slope is worked out with line
change in $y$$y$ axis $=\left(f-p\right)+\left(f-n\right)=\left(2f-p-n\right)$$= \left(f - p\right) + \left(f - n\right) = \left(2 f - p - n\right)$
Change in $x$$x$ axis $=h$$= h$
Slope $=\left(2f-p-n\right)/h$$= \left(2 f - p - n\right) / h$

$m$$m$ is the mode of the data.

To find the $m$$m$, equate the slope computed to the slope of the line segment connecting $\left(l,p\right)$$\left(l , p\right)$, and $\left(m,f\right)$$\left(m , f\right)$.

$\left(f-p\right)/\left(m-l\right)=\left(2f-p-n\right)/h$$\left(f - p\right) / \left(m - l\right) = \left(2 f - p - n\right) / h$

Solving for $m$$m$

$m=l+\frac{f-p}{2f-p-n}×h$$m = l + \frac{f - p}{2 f - p - n} \times h$

Consider the grouped data given in the figure. The objective is to find the position of maximum as the mode of the grouped data.

Using properties of similar triangles, the problem is modified to sectioning line $\overline{AB}$$\overline{A B}$ in the ratio $f-p:f-n$$f - p : f - n$ ratio. This is illustrated in the figure.

Considering only the x-axis, the section formula is given as $m=l+\frac{f-p}{f-p+f-n}×\left(l+h-l\right)$$m = l + \frac{f - p}{f - p + f - n} \times \left(l + h - l\right)$

$m=l+\frac{f-p}{2f-p-n}×h$$m = l + \frac{f - p}{2 f - p - n} \times h$

It is easier to remember the formula for mode -- the mode is the section formula of the x-axis in the ratio of the difference to the frequency on the left to the difference to the frequency on the right.

examples

Consider the given grouped data in the table.

The mode is given as $m=l+\frac{f-p}{2f-p-n}×h$$m = l + \frac{f - p}{2 f - p - n} \times h$

$l=90$$l = 90$
$h=5$$h = 5$
$p=5$$p = 5$
$n=5$$n = 5$
$f=8$$f = 8$

Mode
$=90+\frac{8-5}{16-5-5}×5$$= 90 + \frac{8 - 5}{16 - 5 - 5} \times 5$
$=92.5$$= 92.5$

Consider the given grouped data in the table.

The mode is given as $m=l+\frac{f-p}{2f-p-n}×h$$m = l + \frac{f - p}{2 f - p - n} \times h$

$l=90$$l = 90$
$h=5$$h = 5$
$p=3$$p = 3$
$n=7$$n = 7$
$f=8$$f = 8$

Mode
$=90+\frac{8-3}{16-3-7}×5$$= 90 + \frac{8 - 3}{16 - 3 - 7} \times 5$
$=94.17$$= 94.17$

summary

Modal Class: The class that has the highest frequency is the modal class.

Mode of Grouped Data : mode of the grouped data is $m=l+\frac{f-p}{2f-p-n}×h$$m = l + \frac{f - p}{2 f - p - n} \times h$
where
$l$$l$ is lower limit of modal class
$f$$f$ is the frequency of the modal class
$p$$p$ is the frequency of the class previous in order to the modal class
$n$$n$ is the frequency of the class next in order to the modal class
$h$$h$ is the class-interval

It is easy to remember the formula : The formula is the sectioning of the x-axis of the modal class in the ratio of difference in frequency.

Outline