Data sources#

  • In this lecture we examine different data sources and how we could read datasets in different formats available through these sources with Python
  • Datasets usually don't have one common data format so it's essential to familiarize ourselves with these

CSV#

  • CSV (Comma Separated Values) is a file type where data is stored in plain text
  • It is tabular data where columns are usually separated with commas (,)
  • CSV file can also have other separator types like tab (\t), colon (:) and semi-colon (;)
  • Later in this course we are going to use Pandas for data manipulation
  • Pandas has CSV file reading capability included so we could use either it or import csv library to Python
  • Both of these approaches are presented below
# Method 1: Using read_csv method in Pandas
import pandas as pd

# Read CSV file containing car data
cars_csv_pandas = pd.read_csv("cars.csv",delimiter=";")
cars_csv_pandas
Car MPG Cylinders Displacement Horsepower Weight Acceleration Model Origin
0 STRING DOUBLE INT DOUBLE DOUBLE DOUBLE DOUBLE INT CAT
1 Chevrolet Chevelle Malibu 18.0 8 307.0 130.0 3504. 12.0 70 US
2 Buick Skylark 320 15.0 8 350.0 165.0 3693. 11.5 70 US
3 Plymouth Satellite 18.0 8 318.0 150.0 3436. 11.0 70 US
4 AMC Rebel SST 16.0 8 304.0 150.0 3433. 12.0 70 US
5 Ford Torino 17.0 8 302.0 140.0 3449. 10.5 70 US
6 Ford Galaxie 500 15.0 8 429.0 198.0 4341. 10.0 70 US
7 Chevrolet Impala 14.0 8 454.0 220.0 4354. 9.0 70 US
8 Plymouth Fury iii 14.0 8 440.0 215.0 4312. 8.5 70 US
9 Pontiac Catalina 14.0 8 455.0 225.0 4425. 10.0 70 US
10 AMC Ambassador DPL 15.0 8 390.0 190.0 3850. 8.5 70 US
11 Citroen DS-21 Pallas 0 4 133.0 115.0 3090. 17.5 70 Europe
12 Chevrolet Chevelle Concours (sw) 0 8 350.0 165.0 4142. 11.5 70 US
13 Ford Torino (sw) 0 8 351.0 153.0 4034. 11.0 70 US
# Method 2: Using csv library
import csv

with open("cars.csv","r") as f:
    cars_csv = csv.reader(f)
    for row in cars_csv:
        print(row)
['Car;MPG;Cylinders;Displacement;Horsepower;Weight;Acceleration;Model;Origin']
['STRING;DOUBLE;INT;DOUBLE;DOUBLE;DOUBLE;DOUBLE;INT;CAT']
['Chevrolet Chevelle Malibu;18.0;8;307.0;130.0;3504.;12.0;70;US']
['Buick Skylark 320;15.0;8;350.0;165.0;3693.;11.5;70;US']
['Plymouth Satellite;18.0;8;318.0;150.0;3436.;11.0;70;US']
['AMC Rebel SST;16.0;8;304.0;150.0;3433.;12.0;70;US']
['Ford Torino;17.0;8;302.0;140.0;3449.;10.5;70;US']
['Ford Galaxie 500;15.0;8;429.0;198.0;4341.;10.0;70;US']
['Chevrolet Impala;14.0;8;454.0;220.0;4354.;9.0;70;US']
['Plymouth Fury iii;14.0;8;440.0;215.0;4312.;8.5;70;US']
['Pontiac Catalina;14.0;8;455.0;225.0;4425.;10.0;70;US']
['AMC Ambassador DPL;15.0;8;390.0;190.0;3850.;8.5;70;US']
['Citroen DS-21 Pallas;0;4;133.0;115.0;3090.;17.5;70;Europe']
['Chevrolet Chevelle Concours (sw);0;8;350.0;165.0;4142.;11.5;70;US']
['Ford Torino (sw);0;8;351.0;153.0;4034.;11.0;70;US']

  • As can be seen from the two examples above, the data output is a bit different on both of these
  • Pandas we will be using later on this course when we convert the data into the dataframe
  • At this point let's modify the csv data read with csv library to easier to read format
# Modify csv library example output
with open("cars.csv","r") as f:
    cars_csv = csv.reader(f)
    car_data = []
    for row in cars_csv:
        # Each row is inside a list as a string (one element in each list) so we will access it with [0]
        # In addition, each row will be converted from a string datatype into a list using split() method
        element = row[0].split(";")
        # After splitting, each row will be appended to car_data list
        car_data.append(element)
    print(car_data)
[['Car', 'MPG', 'Cylinders', 'Displacement', 'Horsepower', 'Weight', 'Acceleration', 'Model', 'Origin'], ['STRING', 'DOUBLE', 'INT', 'DOUBLE', 'DOUBLE', 'DOUBLE', 'DOUBLE', 'INT', 'CAT'], ['Chevrolet Chevelle Malibu', '18.0', '8', '307.0', '130.0', '3504.', '12.0', '70', 'US'], ['Buick Skylark 320', '15.0', '8', '350.0', '165.0', '3693.', '11.5', '70', 'US'], ['Plymouth Satellite', '18.0', '8', '318.0', '150.0', '3436.', '11.0', '70', 'US'], ['AMC Rebel SST', '16.0', '8', '304.0', '150.0', '3433.', '12.0', '70', 'US'], ['Ford Torino', '17.0', '8', '302.0', '140.0', '3449.', '10.5', '70', 'US'], ['Ford Galaxie 500', '15.0', '8', '429.0', '198.0', '4341.', '10.0', '70', 'US'], ['Chevrolet Impala', '14.0', '8', '454.0', '220.0', '4354.', '9.0', '70', 'US'], ['Plymouth Fury iii', '14.0', '8', '440.0', '215.0', '4312.', '8.5', '70', 'US'], ['Pontiac Catalina', '14.0', '8', '455.0', '225.0', '4425.', '10.0', '70', 'US'], ['AMC Ambassador DPL', '15.0', '8', '390.0', '190.0', '3850.', '8.5', '70', 'US'], ['Citroen DS-21 Pallas', '0', '4', '133.0', '115.0', '3090.', '17.5', '70', 'Europe'], ['Chevrolet Chevelle Concours (sw)', '0', '8', '350.0', '165.0', '4142.', '11.5', '70', 'US'], ['Ford Torino (sw)', '0', '8', '351.0', '153.0', '4034.', '11.0', '70', 'US']]

  • At this point data can be more easily accessed since each field can be referred with list index
  • This collection of car data could now be transferred into the dictionary like shown below
# First element in the car data collection contains titles which will be used as keys in dictionary
titles = car_data[0]

# Second element contains the information of datatypes so it will be left out from the actual car data
cars = car_data[2:]

# Now we can create a set of dictionaries which we will store into a new list
car_collection = []
# Outer loop will go through sublists which each contains the data for one car
for car in cars:
    # Empty object will be created for each iteration
    obj = {}
    # Inner loop will go through index numbers from 0 to 8 (same amount for titles and cars)
    for j in range(len(car)):
        # Create key-value pairs for the object (match index numbers in titles and cars)
        obj[titles[j]] = car[j]
    # Insert created object to collection
    car_collection.append(obj)
print(car_collection)
[{'Car': 'Chevrolet Chevelle Malibu', 'MPG': '18.0', 'Cylinders': '8', 'Displacement': '307.0', 'Horsepower': '130.0', 'Weight': '3504.', 'Acceleration': '12.0', 'Model': '70', 'Origin': 'US'}, {'Car': 'Buick Skylark 320', 'MPG': '15.0', 'Cylinders': '8', 'Displacement': '350.0', 'Horsepower': '165.0', 'Weight': '3693.', 'Acceleration': '11.5', 'Model': '70', 'Origin': 'US'}, {'Car': 'Plymouth Satellite', 'MPG': '18.0', 'Cylinders': '8', 'Displacement': '318.0', 'Horsepower': '150.0', 'Weight': '3436.', 'Acceleration': '11.0', 'Model': '70', 'Origin': 'US'}, {'Car': 'AMC Rebel SST', 'MPG': '16.0', 'Cylinders': '8', 'Displacement': '304.0', 'Horsepower': '150.0', 'Weight': '3433.', 'Acceleration': '12.0', 'Model': '70', 'Origin': 'US'}, {'Car': 'Ford Torino', 'MPG': '17.0', 'Cylinders': '8', 'Displacement': '302.0', 'Horsepower': '140.0', 'Weight': '3449.', 'Acceleration': '10.5', 'Model': '70', 'Origin': 'US'}, {'Car': 'Ford Galaxie 500', 'MPG': '15.0', 'Cylinders': '8', 'Displacement': '429.0', 'Horsepower': '198.0', 'Weight': '4341.', 'Acceleration': '10.0', 'Model': '70', 'Origin': 'US'}, {'Car': 'Chevrolet Impala', 'MPG': '14.0', 'Cylinders': '8', 'Displacement': '454.0', 'Horsepower': '220.0', 'Weight': '4354.', 'Acceleration': '9.0', 'Model': '70', 'Origin': 'US'}, {'Car': 'Plymouth Fury iii', 'MPG': '14.0', 'Cylinders': '8', 'Displacement': '440.0', 'Horsepower': '215.0', 'Weight': '4312.', 'Acceleration': '8.5', 'Model': '70', 'Origin': 'US'}, {'Car': 'Pontiac Catalina', 'MPG': '14.0', 'Cylinders': '8', 'Displacement': '455.0', 'Horsepower': '225.0', 'Weight': '4425.', 'Acceleration': '10.0', 'Model': '70', 'Origin': 'US'}, {'Car': 'AMC Ambassador DPL', 'MPG': '15.0', 'Cylinders': '8', 'Displacement': '390.0', 'Horsepower': '190.0', 'Weight': '3850.', 'Acceleration': '8.5', 'Model': '70', 'Origin': 'US'}, {'Car': 'Citroen DS-21 Pallas', 'MPG': '0', 'Cylinders': '4', 'Displacement': '133.0', 'Horsepower': '115.0', 'Weight': '3090.', 'Acceleration': '17.5', 'Model': '70', 'Origin': 'Europe'}, {'Car': 'Chevrolet Chevelle Concours (sw)', 'MPG': '0', 'Cylinders': '8', 'Displacement': '350.0', 'Horsepower': '165.0', 'Weight': '4142.', 'Acceleration': '11.5', 'Model': '70', 'Origin': 'US'}, {'Car': 'Ford Torino (sw)', 'MPG': '0', 'Cylinders': '8', 'Displacement': '351.0', 'Horsepower': '153.0', 'Weight': '4034.', 'Acceleration': '11.0', 'Model': '70', 'Origin': 'US'}]

  • Now we could present a single car information like shown in the example below
print(car_collection[10])
{'Car': 'Citroen DS-21 Pallas', 'MPG': '0', 'Cylinders': '4', 'Displacement': '133.0', 'Horsepower': '115.0', 'Weight': '3090.', 'Acceleration': '17.5', 'Model': '70', 'Origin': 'Europe'}

  • Another example would be to show all cars with acceleration value 10 or less
# Show cars with acceleration value 10 or less
for i in car_collection:
    if float(i["Acceleration"]) <= 10:
        print(i)
{'Car': 'Ford Galaxie 500', 'MPG': '15.0', 'Cylinders': '8', 'Displacement': '429.0', 'Horsepower': '198.0', 'Weight': '4341.', 'Acceleration': '10.0', 'Model': '70', 'Origin': 'US'}
{'Car': 'Chevrolet Impala', 'MPG': '14.0', 'Cylinders': '8', 'Displacement': '454.0', 'Horsepower': '220.0', 'Weight': '4354.', 'Acceleration': '9.0', 'Model': '70', 'Origin': 'US'}
{'Car': 'Plymouth Fury iii', 'MPG': '14.0', 'Cylinders': '8', 'Displacement': '440.0', 'Horsepower': '215.0', 'Weight': '4312.', 'Acceleration': '8.5', 'Model': '70', 'Origin': 'US'}
{'Car': 'Pontiac Catalina', 'MPG': '14.0', 'Cylinders': '8', 'Displacement': '455.0', 'Horsepower': '225.0', 'Weight': '4425.', 'Acceleration': '10.0', 'Model': '70', 'Origin': 'US'}
{'Car': 'AMC Ambassador DPL', 'MPG': '15.0', 'Cylinders': '8', 'Displacement': '390.0', 'Horsepower': '190.0', 'Weight': '3850.', 'Acceleration': '8.5', 'Model': '70', 'Origin': 'US'}

JSON#

  • Json (JavaScript Object Notation) is usually used as a data format when data is transferred between client and server
  • Data structure in json is quite similar to dictionary presented earlier in this course
  • However whereas dictionaries are usually used for working with data in your program locally, json is used for sending it between programs (for example between client and server like mentioned above)
  • And to be precise, json is a serialization format and dictionary is a data structure
  • Below is an example where json containing stock price data is read using json library in Python
import json

with open('stocks.json') as stockdata:
    stocks = json.load(stockdata)
    print(stocks)
[{'date': '2021-01-04', 'open': 214.49, 'high': 214.72, 'low': 208.22, 'close': 210.22, 'adjusted_close': 208.9345, 'volume': 4055409}, {'date': '2021-01-05', 'open': 210.18, 'high': 211.95, 'low': 209.62, 'close': 211.48, 'adjusted_close': 210.1868, 'volume': 2576065}, {'date': '2021-01-06', 'open': 211.3, 'high': 211.7083, 'low': 209.03, 'close': 211, 'adjusted_close': 209.7097, 'volume': 3083426}, {'date': '2021-01-07', 'open': 213.22, 'high': 213.22, 'low': 210.56, 'close': 211.98, 'adjusted_close': 210.6837, 'volume': 3142014}, {'date': '2021-01-08', 'open': 212.9, 'high': 216.12, 'low': 212.23, 'close': 215.87, 'adjusted_close': 214.5499, 'volume': 2639146}, {'date': '2021-01-11', 'open': 215.09, 'high': 216.12, 'low': 213.1234, 'close': 214.23, 'adjusted_close': 212.9199, 'volume': 2545893}, {'date': '2021-01-12', 'open': 213.69, 'high': 214.33, 'low': 210.94, 'close': 211.6, 'adjusted_close': 210.306, 'volume': 2951952}, {'date': '2021-01-13', 'open': 210.91, 'high': 213.13, 'low': 210.9046, 'close': 212.09, 'adjusted_close': 210.793, 'volume': 2069926}, {'date': '2021-01-14', 'open': 212.1, 'high': 212.67, 'low': 208, 'close': 208.5, 'adjusted_close': 207.225, 'volume': 3666716}, {'date': '2021-01-15', 'open': 207.97, 'high': 210.7, 'low': 207.4203, 'close': 209.91, 'adjusted_close': 208.6264, 'volume': 3593732}, {'date': '2021-01-19', 'open': 210.68, 'high': 211, 'low': 207.88, 'close': 209.09, 'adjusted_close': 207.8114, 'volume': 3331595}, {'date': '2021-01-20', 'open': 210.15, 'high': 214.41, 'low': 209.63, 'close': 213.63, 'adjusted_close': 212.3236, 'volume': 4077731}, {'date': '2021-01-21', 'open': 214.06, 'high': 215.96, 'low': 213.32, 'close': 213.53, 'adjusted_close': 212.2242, 'volume': 2676309}, {'date': '2021-01-22', 'open': 212.51, 'high': 214.14, 'low': 211.17, 'close': 213.38, 'adjusted_close': 212.0751, 'volume': 2196479}, {'date': '2021-01-25', 'open': 212.22, 'high': 214.05, 'low': 210.56, 'close': 213.34, 'adjusted_close': 212.0354, 'volume': 2615384}, {'date': '2021-01-26', 'open': 212.51, 'high': 215.51, 'low': 212.17, 'close': 215.38, 'adjusted_close': 214.0629, 'volume': 2907817}, {'date': '2021-01-27', 'open': 212.39, 'high': 213.15, 'low': 207, 'close': 207, 'adjusted_close': 205.7342, 'volume': 5464011}, {'date': '2021-01-28', 'open': 208.63, 'high': 210.36, 'low': 205.13, 'close': 206.82, 'adjusted_close': 205.5553, 'volume': 5401592}, {'date': '2021-01-29', 'open': 205.11, 'high': 209.4, 'low': 203.11, 'close': 207.84, 'adjusted_close': 206.569, 'volume': 5203314}, {'date': '2021-02-01', 'open': 208.48, 'high': 209.69, 'low': 206.6, 'close': 207.93, 'adjusted_close': 206.6585, 'volume': 2812476}]

  • We could now calculate retrieve the maximum and minimum values from stock prices in the following manner:
    • get the maximum number from the high key
    • get the minimum value from the low key
# Initialize 
high_values = []
low_values = []

# All daily high and low stock values are recorded in appropriate lists
for i in stocks:
    high_values.append(i["high"])
    low_values.append(i["low"])

# max() and min() functions are used for retrieving the lowest and highest values from lists
print("Highest stock price was {}".format(max(high_values)))
print("Lowest stock price was {}".format(min(low_values)))
Highest stock price was 216.12
Lowest stock price was 203.11

XML#

  • XML (Extensible Markup Language) is used to store and transfer data between systems
  • Similarly to HTML, data in XML is presented between tags
  • Below is an example where XML file containing person data is read
  • XML data can be accessed with the following attributes:
    • .tag = Name of the element
    • .text = Text content between tags
import xml.etree.ElementTree as e

tree = e.parse("persons.xml")
root = tree.getroot()

person_collection = []

# Gather titles from one element
titles = []
for i in root[0]:
    titles.append(i.tag)

for elem in root:
    person = []
    for subelem in elem:
        person.append(subelem.text)
    person_collection.append(person)
print(person_collection)
[['Brandon', '(016977) 7168', 'Okotoks', 'P.O. Box 513, 562 Convallis St.', 'Sed@ornareegestas.co.uk'], ['Aladdin', '070 8561 5308', 'Abbottabad', 'Ap #885-8100 Nunc. St.', 'Nulla.facilisi.Sed@ornaresagittis.net'], ['Zachery', '0338 179 0202', 'Angol', '7842 Eget Street', 'faucibus.id.libero@molestieorcitincidunt.org'], ['Nehru', '(01440) 80650', 'Purranque', '807-6075 Neque St.', 'tellus.lorem@euaccumsan.com'], ['Beau', '076 8009 0074', 'Graz', 'P.O. Box 148, 7936 Eget Rd.', 'vitae@laoreetipsumCurabitur.ca'], ['Travis', '(01852) 976172', 'Calbuco', 'Ap #179-3553 Vel Av.', 'ut@aceleifendvitae.net'], ['Dominic', '07367 604196', 'Bhiwandi', 'Ap #184-8147 Sed St.', 'Proin.vel.nisl@Duis.net'], ['Luke', '0800 090603', 'Barrow-in-Furness', 'P.O. Box 919, 7538 Interdum St.', 'metus.facilisis@tincidunt.ca'], ['Hunter', '(0115) 763 7644', 'Salon-de-Provence', 'Ap #453-2471 Sed Av.', 'ipsum.Donec.sollicitudin@rutrumurna.co.uk'], ['Rigel', '07618 519575', 'Corroy-le-Ch‰teau', 'Ap #604-5195 Eu Street', 'iaculis.enim.sit@Integertincidunt.com'], ['Wang', '0500 779762', 'Pishin Valley', 'P.O. Box 160, 5359 Semper Rd.', 'enim.Etiam.imperdiet@cursusluctus.edu'], ['Hop', '0800 858 5704', 'Stroud', 'P.O. Box 680, 8832 Pellentesque Rd.', 'Praesent.eu.dui@nasceturridiculusmus.ca']]

  • Now that we have all our XML data moved to a list containing each person information as a sublist, we could clear the structure of the data by creating dictionaries from each person's information and storing them in the list
persons = []
for lst in person_collection:
    person_obj = {}
    for i in range(len(lst)):
        person_obj[titles[i]] = lst[i]
    persons.append(person_obj)
print(persons)
[{'Name': 'Brandon', 'Phone': '(016977) 7168', 'City': 'Okotoks', 'StreetAddress': 'P.O. Box 513, 562 Convallis St.', 'Email': 'Sed@ornareegestas.co.uk'}, {'Name': 'Aladdin', 'Phone': '070 8561 5308', 'City': 'Abbottabad', 'StreetAddress': 'Ap #885-8100 Nunc. St.', 'Email': 'Nulla.facilisi.Sed@ornaresagittis.net'}, {'Name': 'Zachery', 'Phone': '0338 179 0202', 'City': 'Angol', 'StreetAddress': '7842 Eget Street', 'Email': 'faucibus.id.libero@molestieorcitincidunt.org'}, {'Name': 'Nehru', 'Phone': '(01440) 80650', 'City': 'Purranque', 'StreetAddress': '807-6075 Neque St.', 'Email': 'tellus.lorem@euaccumsan.com'}, {'Name': 'Beau', 'Phone': '076 8009 0074', 'City': 'Graz', 'StreetAddress': 'P.O. Box 148, 7936 Eget Rd.', 'Email': 'vitae@laoreetipsumCurabitur.ca'}, {'Name': 'Travis', 'Phone': '(01852) 976172', 'City': 'Calbuco', 'StreetAddress': 'Ap #179-3553 Vel Av.', 'Email': 'ut@aceleifendvitae.net'}, {'Name': 'Dominic', 'Phone': '07367 604196', 'City': 'Bhiwandi', 'StreetAddress': 'Ap #184-8147 Sed St.', 'Email': 'Proin.vel.nisl@Duis.net'}, {'Name': 'Luke', 'Phone': '0800 090603', 'City': 'Barrow-in-Furness', 'StreetAddress': 'P.O. Box 919, 7538 Interdum St.', 'Email': 'metus.facilisis@tincidunt.ca'}, {'Name': 'Hunter', 'Phone': '(0115) 763 7644', 'City': 'Salon-de-Provence', 'StreetAddress': 'Ap #453-2471 Sed Av.', 'Email': 'ipsum.Donec.sollicitudin@rutrumurna.co.uk'}, {'Name': 'Rigel', 'Phone': '07618 519575', 'City': 'Corroy-le-Ch‰teau', 'StreetAddress': 'Ap #604-5195 Eu Street', 'Email': 'iaculis.enim.sit@Integertincidunt.com'}, {'Name': 'Wang', 'Phone': '0500 779762', 'City': 'Pishin Valley', 'StreetAddress': 'P.O. Box 160, 5359 Semper Rd.', 'Email': 'enim.Etiam.imperdiet@cursusluctus.edu'}, {'Name': 'Hop', 'Phone': '0800 858 5704', 'City': 'Stroud', 'StreetAddress': 'P.O. Box 680, 8832 Pellentesque Rd.', 'Email': 'Praesent.eu.dui@nasceturridiculusmus.ca'}]

  • We will filter our data so that we only print persons who:
    1. have brackets () in their phone number and
    2. have their email address ending with either co.uk or com
# Let's loop each sublist and check the value in second and the last field
for i in persons:
    # Check that BOTH brackets will be found from the string in the second index position
    # Check that either '.com' OR '.co.uk' substring is found from the string in the last index position 
    if ('(' and ')' in i["Phone"]) and ('.com' or '.co.uk' in i["Email"]):
        print(i)
{'Name': 'Brandon', 'Phone': '(016977) 7168', 'City': 'Okotoks', 'StreetAddress': 'P.O. Box 513, 562 Convallis St.', 'Email': 'Sed@ornareegestas.co.uk'}
{'Name': 'Nehru', 'Phone': '(01440) 80650', 'City': 'Purranque', 'StreetAddress': '807-6075 Neque St.', 'Email': 'tellus.lorem@euaccumsan.com'}
{'Name': 'Travis', 'Phone': '(01852) 976172', 'City': 'Calbuco', 'StreetAddress': 'Ap #179-3553 Vel Av.', 'Email': 'ut@aceleifendvitae.net'}
{'Name': 'Hunter', 'Phone': '(0115) 763 7644', 'City': 'Salon-de-Provence', 'StreetAddress': 'Ap #453-2471 Sed Av.', 'Email': 'ipsum.Donec.sollicitudin@rutrumurna.co.uk'}

Rest API#

  • Rest APIs (Representational state transfer Application Programming Interface) are built for data retrieval, update, delete and insertion of a new data
  • API queries are run on top of the HTTP protocol and thus data manipulation is done using HTTP methods
  • Data can be retrieved straight with web browser (although not very effective way especially with larger data sets)
  • The following HTTP methods can be used for data manipulation:
    • POST (data insertion)
    • GET (data retrieval)
    • PUT (data update)
    • DELETE (data deletion)
  • Data manipulation is based on
    • url paths,
    • parameters included in url and
    • data included in HTTP header (usually authentication data)
  • The basic idea of the Rest API is presented in the image below

Rest API

  • User sends Rest API request to the server
    • Request includes HTTP method (What to do with the data) and URI (the location and name of the resource)
  • Server responds with the appropriate data, usually in XML or JSON format

  • Below is an example which presents the cat facts API

  • Query is sent using Python's requests library
import requests

endpoint = "/facts"
user_id = "5894af975cdc7400113ef7f9"
url = "https://cat-fact.herokuapp.com{0}/{1}".format(endpoint, user_id)

req = requests.get(url=url)

req.status_code
200
  • We got status code 200 as a response from the server, which means a successful HTTP request
  • Since the data is available in JSON format, we could save it to a variable for further manipulation
cat_data = req.json()
cat_data
{'status': {'verified': True, 'sentCount': 1},
 '_id': '5894af975cdc7400113ef7f9',
 'user': {'name': {'first': 'Alex', 'last': 'Wohlbruck'},
  '_id': '5a9ac18c7478810ea6c06381',
  'photo': 'https://lh3.googleusercontent.com/a/AAcHTteYDsh_-f7iA5ez11zM8FsH9DCICjcvLtNRYbzhgZA=s50'},
 'text': 'The technical term for a cat’s hairball is a bezoar.',
 '__v': 0,
 'source': 'user',
 'updatedAt': '2020-11-25T21:20:03.895Z',
 'type': 'cat',
 'createdAt': '2018-02-27T21:20:02.854Z',
 'deleted': False,
 'used': True}
  • API returns one cat fact based on given ID at the end of the url
  • We can now pick the actual text from the response
cat_data["text"]
'The technical term for a cat’s hairball is a bezoar.'