Data visualization with Pandas#

  • In this section we dive in to data visualization options with Pandas library
  • Data visualization will give better understanding of underlying data, especially for those who don't actively work with data
  • Another benefit is that deviations are easier to spot from visualized data window compared to data stored in table
  • Pandas visualization is based on Matplotlib library
Important: installation of Matplotlib package is required when using plotting with Pandas library! In Anaconda environments this package is preinstalled in base environments.

Bar chart#

  • Below is an example where DataFrame containing six sensor's temperature information is visualized
  • Plot() method is used for visualization
  • The following arguments will be given:
    • kind: defines the styling method
    • x: sets the names for x-axis
    • figsize: sets width and height for the chart area
import pandas as pd

sensordata = pd.DataFrame({
    "ID": ["S1","S2","S3","S4","S5","S6"],
    "temp": [21.2,20.3,20.9,21.3,21.0,20.8]

  • There is also horizontal version of the bar chart called barh
  • Below is the visualization example for the same data as above, but this time with horizontal bars
Line chart#

  • Line chart is the default styling for plot method if kind argument is not provided
  • Below is an example where data from stocks.json file is loaded
  • In this example also the y-axis is defined and four different value types are presented for stocks (open, close, high and low)
stockdata = pd.read_json("stocks.json")
Pie chart#

  • Pie chart is useful visualization style when you want to present a numerical proportion of items
  • Each slice in pie chart presents the proportion of a single item
  • Below is an example where proportions of eight items is illustrated
  • List of labels is prepared as the input which contains percentage for each item
  • Item list is prepared as a legend list by using legend() method for plot()
item_collection = pd.DataFrame({
    "name": ["item 1","item 2","item 3","item 4","item 5","item 6","item 7","item 8"],
    "temp": [120,132,29,23,146,92,111,18]

label_list = [str(round(100 * (i / sum(item_collection["temp"])),1)) + ' %' for i in item_collection["temp"]]
Area plot#

  • The idea in area chart is to combine both line and bar chart to visualize the progression of numeric values and comparison to other grouped values
  • Area chart can be presented either with overlapped or stacked model
  • Below is an example of a stacked model (stacked is the default value)
areadata = pd.DataFrame([[13,21,23],[12,17,20],[14,16,19]], columns=["item 1","item 2","item 3"])

  • Area chart can also be presented in overlapped mode by setting the stacked parameter to False
Scatter chart#

  • In scatter chart data is presented in dots which will be placed according to two different numeric variables
  • Scatter chart is great for examining relationships between variables
  • x and y arguments are given for defining x- and y-axis for scatter chart
dataset = pd.DataFrame({
    "acceleration": [2.2,1.3,1.8,1.5,3.2,2.8,1.3,2.2,2.0,1.9],
    "weight": [100,110,108,102,106,111,109,100,101,107],
    "name": ["a1","a2","a3","a4","a5","a6","a7","a8","a9","a10"]

  • Scatter chart can be made more informative by adding a colormap to the chart when there is a third parameter available for the data
  • Available colormap parameter values can be found here
  • Let's add weight column values with c parameter and bind it to the colormap
dataset = pd.DataFrame({
    "acceleration": [2.2,1.3,1.8,1.5,3.2,2.8,1.3,2.2,2.0,1.9],
    "weight": [100,110,108,102,106,111,109,100,101,107],
    "name": ["a1","a2","a3","a4","a5","a6","a7","a8","a9","a10"]

Outlier value spotting with scatter chart#

  • As presented in the numpy lecture, outlier values are values that are distant (outside the standard deviation)
  • Scatter chart is useful for spotting outlier values
dataset = pd.DataFrame({
    "temp": [22.2,21.3,21.8,40.1,21.5,23.2,22.8,21.3,22.2,22.0,21.9,1.5,23.1,22.2,21.8,20.9,22.8],
    "hum": [100,110,108,130,102,106,111,109,100,101,107,101,102,108,107,104,106]

  • As can be seen from the scatter chart, there are two distant points from other points