CH 1 Continued - Categorical Variables in Statistics


Continuing from the previous chapter, we will now look at how to represent categorical data in statistics with an example.

1000 ball bearings classified as:

  • conforming (910)
  • too thick (53)
  • too thin (37)

Here, the data is classified into three categories, which are mutually exclusive and exhaustive. The data is categorical because it is classified into categories and not measured.

Lets make a table of counts for the data.


Table of Counts

Classificationfrequencyf/n - relative frequency proportion
Conforming910.910
too thick53.053
too thin37.037

Bar Chart with MatPlotLib

Python
Output

This will create a bar chart with the proportion of each category, where the height of the bar represents the proportion of the category.


Variable Statistics and Boxplots

Now, lets look at another example

Chromium VS Nickel Simple Dataset

Chromium: 31, 1, 511, 2, 574, 496, 322, 424, 269, 140, 244, 252, 76, 108, 24, 38, 18, 43, 30, 191

Nickel: 23, 22, 55, 39, 283, 34, 159, 37, 61, 34, 163, 140, 32, 23, 54, 837, 64, 354, 376, 471

We are going to find:

  • Mean
  • Median
  • Standard Deviation
  • Quartiles

Lets find these with python.

This will yield the 1-Variable Statistics we would normally get from a Ti-84 Graphing Calculator. And lets include a graph to go along with it.

Python
Output