Notebook guide#

  • This guide highlights the most important issues related to notebook files
  • The following issues will be highlighted:
    • Notebook file content
    • Running notebook files
    • Printing results in notebook

Notebook file content#

  • Notebook file can contain one of the following cell types:

    • Raw: Raw unformatted data (plaintext)
    • Code: Python (or other supported languages) code cells that will be executed when notebook is run
    • Markdown: Markdown cells are usually used for documenting purposes. These cells can include all parts that are supported by markdown files also (.md) including text paragraphs, lists, images, links etc.
  • Cell descriptions are presented in the image below

Notebook cell descriptions

  • Usually in notebook files one or more libraries are used and data is loaded from external sources
  • Library definitions and data loading should only be done once in one code cell as after that those are usable in the entire notebook
  • Below is an example where this is depicted

Library and data loading in notebook files

  • Loading libraries and data (especially this) many times in notebook can also slow down the execution of notebook cells and the whole notebook!

Running notebook files#

  • Notebook files are run from top to bottom like Python files are (.py)
  • This means that cells are executed one by one
  • Below is the example where five notebook cells are executed

Cell execution in notebook file

  • As can be seen from the image:

    • Markdown cells (first and fourth cell) are rendered
    • Code cells (second, third and fifth cell) are executed in order so that order numbers go from 1-3
  • The main difference to regular code file (for example Python file .py) is that code cells can be executed individually and thus also the order of the cells may change

  • In addition, cell order can be changed by dragging cells with mouse cursor to a different position as can be seen from the example image below

Cell order manipulation

  • Code cell from the end of the notebook has been moved to a new position between second and third cell
  • This may cause problems as can be seen from the example image below

Problems in cell execution

  • When notebook is run with Run All Cells operation, original cell order is preserved
  • This may cause confusion as user may not be aware of this since the execution numbering will be reset
  • On the other hand, if notebook is run with Restart Kernel and Run All Cells operation, then the execution order table is reset and notebook will be run similar to any Python file for example (from top to bottom) and new order numbers are generated
  • In the right side of the execution image the execution halts in the third code cell as variable b used in the given formula (a = b + c) is not yet defined
Important: If you want to be sure that the order of executable cells is correct and notebook would be readable also for other people, it is a good practice to run the notebook with Restart Kernel and Run All Cells operation!

Printing results in notebook#

  • In printing topic we are talking about the code cells and how the results should be printed
  • If variable value is inserted to the last line of the code cell, its content will be printed (see example below)

Printing in notebook files

  • When printing the contents of DataFrames, jupyter will automatically render the contents if variable containing the DataFrame is placed to the last line of the code cell (see example below)

Printing DataFrames in notebook files

  • By default, in larger DataFrames some columns and rows may become hidden (see example below)

Hidden rows and columns in DataFrames

  • As can be seen from the output, there are 3168 rows and 21 columns available in the DataFrame
  • Its not necessary to print all rows especially, but if required, the row and column display limitations can be adjusted with the following Pandas settings
pd.set_option('display.max_columns', value)
pd.set_option('display.max_rows', value)
  • For example, if None will be used as a value, all rows and columns will be displayed
  • Usually its a good practice to print only the first 3-10 rows from the DataFrame if not required to display more
  • This way the notebook file will be kept more readable