Lakes of Missouri Volunteer Program
LMVP Publications
Contact the LMVP
About the LMVP
LMVP Data
Join the LMVP
LMVP Links
     
 

Box Plots
A common method for presenting data is with box plots (Figure 9). At first glance these plots can be intimidating but once you know what each aspect of the box plot represents you can tell a lot about the distribution of the data. These plots consist of three components: 1) the box, which encompasses the middle 50% of the data, 2) the horizontal line within the box, which represents the median value, and 3) the vertical lines extending above and below the box, which indicate maximum and minimum values respectively. The location of these components relative to each other illustrates how the data is distributed. We will now look at four examples presented in Figure 9, describe the average, median, minimum, and maximum values and investigate how the data influences the box plot.


Figure 9. Examples of Box Plots

Example 1 is what is known as an even distribution (Figure 9). The individual values are; 2 3 4 5 6 7 8 and 9. The average of these values is 5.5 which also happens to be the median value (remember when dealing with an even number of values the median equals the average of the two middle values, in this case 5 and 6). The minimum and maximum values are 2 and 9 respectively, giving us a range of 7. This is considered an even distribution because the mean and median are located in the middle of the range. In other words the minimum and maximum value are equal distances from the median value. The box plot for this data has the median line in the middle of the box and the minimum and maximum lines extend equal distances from the box.   Example 2 contains the following values; 2 3 4 5 6 7 10 and 13 (Figure 9). The average equals 6.25 and the median is 5.5. Minimum and maximum values are 2 and 13, giving us a range of 11. Having an average that is larger than the median suggests that the data is not balanced around the median. Review of the data shows that the two high values (10 and 13) deviate from the median more than the two low values (2 and 3). This leads to data referred to as skewed. When we compare the box plot for Example 2 to the box plot for Example 1 we see two indications that the data was skewed: 1) the vertical line identifying the maximum is longer than the minimum line, and 2) the box extends higher above the median line than it does below.
     
In Example 3 the values are 2 3 4 4.25 4.75 7 8 and 9 (Figure 9). The average is 5.25 and the median is 4.5. The range is 7 with minimum and maximum values of 2 and 9 respectively. We again have skewed data as the median and average are not equal. This time instead of extreme high values, the skewing is caused by the values of 4, 4.25 and 4.75 being clumped close together. Comparison of this box plot to Example 1 shows that the plots are the same with the exception of the median line. In Example 3 we are tipped off to the skewness in the data by the fact that the area of box above the median line is larger than the area below.   Example 4 consist of the following values; 2 3 3.25 3.50 3.75 4 5 and 9 (Figure 9). The average for these data equals 4.19 and the median is 3.375. The minimum, maximum and range are the same as seen in Examples 1 and 3. Again we have skewed data caused by a clumping of low values (similar to Example 3). This time the number of values clumped together is greater, causing the box to be smaller. Clues that the data are skewed are that the median line is not in the center of the box and the maximum line extends farther away from the box than the minimum line.

For more about Box Plots, try this page:
http://www.lmvp.org/introduction/understanding.htm

 
©The Lakes of Missouri Volunteer Program 2014
The Lakes of Missouri Volunteer Program is operated by employees of the University of Missouri