Table of Contents

Introduction

Tool Use

Boxplot example

CADStat: Statistical Tools for Causal Analysis

Boxplot

Introduction

Boxplots are an efficient graphical method for representing the distribution of a set of data. Usually, the top and bottom of the box in the boxplot represent the Inter-Quartile Range (IQR), identified by the 75th and 25th percentiles of the data respectively. The horizontal line in the middle of the box represents the 50th percentile (the median). Vertical lines (called whiskers) extend to the most extreme data point that is no more than 1.5 times the IQR from the box. Data points outside 1.5 times the IQR from the box are represented by points.

Tool Use

Select Graph -> Boxplot from the menus. A dialog box will open. Select the data set of interest from the pull-down menu, or browse for a tab-delimited text file.

Select the variable that you wish to plot from the pull-down menu, Result.

By default, the module will produce two plots, one in the original units of Result, and one in log transformed units.

Text indicating sample size of the data set is included in the plot if Sample Sizes is selected.

You can alter the labels for the plot by modifying Plot Title and Result-Axis.

By default, the labels on the x-axis are printed horizontally, but can be changed to vertical with Rotate X Axis Labels.

Boxplot example

Select Graph -> Boxplot. For the Active Dataset, select mergedData (Consult the help page on Loading and merging data to load this example data).

Screenshot of the boxplot dialog page with the file mergedData.txt, generated previously in the data merging tutorial, selected as the Active Data Set. Under 'Variables', 'Result' has been set to 'sed.log'. The plot type has been set to 'Original'. Grouping has been enabled, and factors are grouped by the 'REF' factor. The plot title has been set to 'Oregon sediment' and the Result-Axis label has been set to 'Percent sand/fines'.

For this example, we have chosen to plot the variable sed.log in its the Original units, and we have grouped the data by the variable REF.

Here is the resulting plot, comparing the distributions of sediment in test and reference sites.

Boxplot of Oregon sediment.