Pandas part 1#

  • Pandas is a Python library used for data analytics
  • Pandas introduces a new data structure called DataFrame
  • DataFrame structure is tabular similar to SQL for example so it has rows and columns
  • Below is an example where employee data is stored in dictionary and this data is then converted to DataFrame
import pandas as pd

dict = [
    {"firstname":"Jennifer","lastname":"Davis","phone":"0613512441","email":"jennifer.davis@company.com"},
    {"firstname":"Mike","lastname":"Johnson","phone":"0513643123","email":"mike.johnson@company.com"},
    {"firstname":"Steve","lastname":"Mclaud","phone":"0561253343","email":"steve.mclaud@company.com"},
    {"firstname":"Lucy","lastname":"Brown","phone":"0613412353","email":"lucy.brown@company.com"},
    {"firstname":"Gary","lastname":"Stone","phone":"0612343521","email":"gary.stone@company.com"}
    ]

employees = pd.DataFrame(dict)
employees
firstname lastname phone email
0 Jennifer Davis 0613512441 jennifer.davis@company.com
1 Mike Johnson 0513643123 mike.johnson@company.com
2 Steve Mclaud 0561253343 steve.mclaud@company.com
3 Lucy Brown 0613412353 lucy.brown@company.com
4 Gary Stone 0612343521 gary.stone@company.com
  • As can be seen from the example above each row will be given a unique key value starting from number 0
  • User defined values can also be used as index values
  • Let's use employee ID numbers as index numbers for the existing DataFrame
employees.index = ["JD111","MJ222","SM333","LB444","GS555"]

employees
firstname lastname phone email
JD111 Jennifer Davis 0613512441 jennifer.davis@company.com
MJ222 Mike Johnson 0513643123 mike.johnson@company.com
SM333 Steve Mclaud 0561253343 steve.mclaud@company.com
LB444 Lucy Brown 0613412353 lucy.brown@company.com
GS555 Gary Stone 0612343521 gary.stone@company.com

Creating DataFrame from imported JSON file#

  • Like we showed above, reading CSV files is quick and easy with pandas
  • Same goes also with JSON files when we read JSON data with read_json() function
stockdata = pd.read_json("stocks.json")
stockdata.head()
date open high low close adjusted_close volume
0 2021-01-04 214.49 214.7200 208.22 210.22 208.9345 4055409
1 2021-01-05 210.18 211.9500 209.62 211.48 210.1868 2576065
2 2021-01-06 211.30 211.7083 209.03 211.00 209.7097 3083426
3 2021-01-07 213.22 213.2200 210.56 211.98 210.6837 3142014
4 2021-01-08 212.90 216.1200 212.23 215.87 214.5499 2639146

Creating DataFrame from imported CSV file#

  • Earlier in this course we learned how CSV file can be read with using either Python or Pandas
  • Next we will read the cars.csv file using read_csv() function from Pandas library and convert it to DataFrame
import pandas as pd

car_data = pd.read_csv("cars.csv",delimiter=";")

car_data
Car MPG Cylinders Displacement Horsepower Weight Acceleration Model Origin
0 STRING DOUBLE INT DOUBLE DOUBLE DOUBLE DOUBLE INT CAT
1 Chevrolet Chevelle Malibu 18.0 8 307.0 130.0 3504. 12.0 70 US
2 Buick Skylark 320 15.0 8 350.0 165.0 3693. 11.5 70 US
3 Plymouth Satellite 18.0 8 318.0 150.0 3436. 11.0 70 US
4 AMC Rebel SST 16.0 8 304.0 150.0 3433. 12.0 70 US
5 Ford Torino 17.0 8 302.0 140.0 3449. 10.5 70 US
6 Ford Galaxie 500 15.0 8 429.0 198.0 4341. 10.0 70 US
7 Chevrolet Impala 14.0 8 454.0 220.0 4354. 9.0 70 US
8 Plymouth Fury iii 14.0 8 440.0 215.0 4312. 8.5 70 US
9 Pontiac Catalina 14.0 8 455.0 225.0 4425. 10.0 70 US
10 AMC Ambassador DPL 15.0 8 390.0 190.0 3850. 8.5 70 US
11 Citroen DS-21 Pallas 0 4 133.0 115.0 3090. 17.5 70 Europe
12 Chevrolet Chevelle Concours (sw) 0 8 350.0 165.0 4142. 11.5 70 US
13 Ford Torino (sw) 0 8 351.0 153.0 4034. 11.0 70 US

Printing n rows of DataFrame#

  • head() and tail() methods can be used to print first or last n rows
  • The default value is 5 if no value is provided with these methods
# Show the first five rows
car_data.head()
Car MPG Cylinders Displacement Horsepower Weight Acceleration Model Origin
0 STRING DOUBLE INT DOUBLE DOUBLE DOUBLE DOUBLE INT CAT
1 Chevrolet Chevelle Malibu 18.0 8 307.0 130.0 3504. 12.0 70 US
2 Buick Skylark 320 15.0 8 350.0 165.0 3693. 11.5 70 US
3 Plymouth Satellite 18.0 8 318.0 150.0 3436. 11.0 70 US
4 AMC Rebel SST 16.0 8 304.0 150.0 3433. 12.0 70 US
# Show the last three rows
car_data.tail(3)
Car MPG Cylinders Displacement Horsepower Weight Acceleration Model Origin
11 Citroen DS-21 Pallas 0 4 133.0 115.0 3090. 17.5 70 Europe
12 Chevrolet Chevelle Concours (sw) 0 8 350.0 165.0 4142. 11.5 70 US
13 Ford Torino (sw) 0 8 351.0 153.0 4034. 11.0 70 US
  • While head() and tail() provide the method of presenting n rows from the beginning and the end of the DataFrame, slicing allows more flexible presentation of rows
  • Slicing is done using [] operator
  • This is similar to slicing string or list like shown previously in Python fundamentals
# Show the rows from 3 to 8 (inclusive) of car data
car_data[3:9]
Car MPG Cylinders Displacement Horsepower Weight Acceleration Model Origin
3 Plymouth Satellite 18.0 8 318.0 150.0 3436. 11.0 70 US
4 AMC Rebel SST 16.0 8 304.0 150.0 3433. 12.0 70 US
5 Ford Torino 17.0 8 302.0 140.0 3449. 10.5 70 US
6 Ford Galaxie 500 15.0 8 429.0 198.0 4341. 10.0 70 US
7 Chevrolet Impala 14.0 8 454.0 220.0 4354. 9.0 70 US
8 Plymouth Fury iii 14.0 8 440.0 215.0 4312. 8.5 70 US
# Show the last five rows of car data in reverse order
# [-5:] chooses the rows starting from the fifth last row till the end and [::-1] reverses the output
car_data[-5:][::-1]
Car MPG Cylinders Displacement Horsepower Weight Acceleration Model Origin
13 Ford Torino (sw) 0 8 351.0 153.0 4034. 11.0 70 US
12 Chevrolet Chevelle Concours (sw) 0 8 350.0 165.0 4142. 11.5 70 US
11 Citroen DS-21 Pallas 0 4 133.0 115.0 3090. 17.5 70 Europe
10 AMC Ambassador DPL 15.0 8 390.0 190.0 3850. 8.5 70 US
9 Pontiac Catalina 14.0 8 455.0 225.0 4425. 10.0 70 US

Selecting data from the DataFrame#

  • When we want to select certain data from the DataFrame, loc and iloc are great tools
    • loc means that rows and columns are selected based on their labels (loc[row_label,column_label])
    • iloc means that rows and columns are selected based on their index numbers (iloc[row_position,column_position])
  • Let's go through a couple of examples with the car data
# Select Horsepower information of Ford Torino using .loc
print(car_data.loc[13,"Horsepower"])

# Select Horsepower information of Ford Torino using .iloc
print(car_data.iloc[13,4])
153.0
153.0

  • Like shown in the example above we could retrieve the data from the single cell
  • If we would like to select a greater portion of the data or all of the data we could utilize slicing technique with .loc and iloc
# Select all car data from the car DataFrame
car_data.loc[:,:]
Car MPG Cylinders Displacement Horsepower Weight Acceleration Model Origin
0 STRING DOUBLE INT DOUBLE DOUBLE DOUBLE DOUBLE INT CAT
1 Chevrolet Chevelle Malibu 18.0 8 307.0 130.0 3504. 12.0 70 US
2 Buick Skylark 320 15.0 8 350.0 165.0 3693. 11.5 70 US
3 Plymouth Satellite 18.0 8 318.0 150.0 3436. 11.0 70 US
4 AMC Rebel SST 16.0 8 304.0 150.0 3433. 12.0 70 US
5 Ford Torino 17.0 8 302.0 140.0 3449. 10.5 70 US
6 Ford Galaxie 500 15.0 8 429.0 198.0 4341. 10.0 70 US
7 Chevrolet Impala 14.0 8 454.0 220.0 4354. 9.0 70 US
8 Plymouth Fury iii 14.0 8 440.0 215.0 4312. 8.5 70 US
9 Pontiac Catalina 14.0 8 455.0 225.0 4425. 10.0 70 US
10 AMC Ambassador DPL 15.0 8 390.0 190.0 3850. 8.5 70 US
11 Citroen DS-21 Pallas 0 4 133.0 115.0 3090. 17.5 70 Europe
12 Chevrolet Chevelle Concours (sw) 0 8 350.0 165.0 4142. 11.5 70 US
13 Ford Torino (sw) 0 8 351.0 153.0 4034. 11.0 70 US
  • In order to select multiple columns from the DataFrame, columns can be given in a list
# Select car, cylinder, displacement and weight information and show six rows starting from index number 1
car_data.loc[1:6,["Car","Cylinders","Displacement","Weight"]]
Car Cylinders Displacement Weight
1 Chevrolet Chevelle Malibu 8 307.0 3504.
2 Buick Skylark 320 8 350.0 3693.
3 Plymouth Satellite 8 318.0 3436.
4 AMC Rebel SST 8 304.0 3433.
5 Ford Torino 8 302.0 3449.
6 Ford Galaxie 500 8 429.0 4341.

Sorting data by column#

  • Data sorting is done with sort_values() method
  • This method takes the target column or columns inside a list and the sort direction (ascending or descending) as an input
  • Below is an example where car data is sorted by weight in descending order
car_data.sort_values(["Weight"], ascending=False)
Car MPG Cylinders Displacement Horsepower Weight Acceleration Model Origin
0 STRING DOUBLE INT DOUBLE DOUBLE DOUBLE DOUBLE INT CAT
9 Pontiac Catalina 14.0 8 455.0 225.0 4425. 10.0 70 US
7 Chevrolet Impala 14.0 8 454.0 220.0 4354. 9.0 70 US
6 Ford Galaxie 500 15.0 8 429.0 198.0 4341. 10.0 70 US
8 Plymouth Fury iii 14.0 8 440.0 215.0 4312. 8.5 70 US
12 Chevrolet Chevelle Concours (sw) 0 8 350.0 165.0 4142. 11.5 70 US
13 Ford Torino (sw) 0 8 351.0 153.0 4034. 11.0 70 US
10 AMC Ambassador DPL 15.0 8 390.0 190.0 3850. 8.5 70 US
2 Buick Skylark 320 15.0 8 350.0 165.0 3693. 11.5 70 US
1 Chevrolet Chevelle Malibu 18.0 8 307.0 130.0 3504. 12.0 70 US
5 Ford Torino 17.0 8 302.0 140.0 3449. 10.5 70 US
3 Plymouth Satellite 18.0 8 318.0 150.0 3436. 11.0 70 US
4 AMC Rebel SST 16.0 8 304.0 150.0 3433. 12.0 70 US
11 Citroen DS-21 Pallas 0 4 133.0 115.0 3090. 17.5 70 Europe
  • Sorting can be done to multiple columns
  • Direction of sort can be set to each sorted column
  • In the example below car data is first sorted by MPG in descending order and then with Weight column in ascending order
car_data.sort_values(["MPG","Weight"], ascending=[False,True])
Car MPG Cylinders Displacement Horsepower Weight Acceleration Model Origin
0 STRING DOUBLE INT DOUBLE DOUBLE DOUBLE DOUBLE INT CAT
3 Plymouth Satellite 18.0 8 318.0 150.0 3436. 11.0 70 US
1 Chevrolet Chevelle Malibu 18.0 8 307.0 130.0 3504. 12.0 70 US
5 Ford Torino 17.0 8 302.0 140.0 3449. 10.5 70 US
4 AMC Rebel SST 16.0 8 304.0 150.0 3433. 12.0 70 US
2 Buick Skylark 320 15.0 8 350.0 165.0 3693. 11.5 70 US
10 AMC Ambassador DPL 15.0 8 390.0 190.0 3850. 8.5 70 US
6 Ford Galaxie 500 15.0 8 429.0 198.0 4341. 10.0 70 US
8 Plymouth Fury iii 14.0 8 440.0 215.0 4312. 8.5 70 US
7 Chevrolet Impala 14.0 8 454.0 220.0 4354. 9.0 70 US
9 Pontiac Catalina 14.0 8 455.0 225.0 4425. 10.0 70 US
11 Citroen DS-21 Pallas 0 4 133.0 115.0 3090. 17.5 70 Europe
13 Ford Torino (sw) 0 8 351.0 153.0 4034. 11.0 70 US
12 Chevrolet Chevelle Concours (sw) 0 8 350.0 165.0 4142. 11.5 70 US
  • As can be seen from the output, data is sorted primarily with MPG column and each row with equal MPG value will then be sorted with Weight value in ascending order

Row deletion#

  • DataFrame rows could be removed with drop() method
  • By default, when removing rows or columns from the DataFrame, new DataFrame is always created and changes are not made to the original
  • If we want to make changes to the original, we should use inplace=True definition with drop() method
  • Using inplace=True is the same as reassigning the DataFrame to the same variable again (overwrites the existing variable content)
  • Car data still has extra row with row index number of 0
  • Let's remove this row and save the modified DataFrame
# Remove the first row (index number 0) from the DataFrame and save the changes for the new DataFrame
# This would be the same as car_data = car_data.drop(0)
car_data.drop(0,inplace=True)
car_data
Car MPG Cylinders Displacement Horsepower Weight Acceleration Model Origin
1 Chevrolet Chevelle Malibu 18.0 8 307.0 130.0 3504. 12.0 70 US
2 Buick Skylark 320 15.0 8 350.0 165.0 3693. 11.5 70 US
3 Plymouth Satellite 18.0 8 318.0 150.0 3436. 11.0 70 US
4 AMC Rebel SST 16.0 8 304.0 150.0 3433. 12.0 70 US
5 Ford Torino 17.0 8 302.0 140.0 3449. 10.5 70 US
6 Ford Galaxie 500 15.0 8 429.0 198.0 4341. 10.0 70 US
7 Chevrolet Impala 14.0 8 454.0 220.0 4354. 9.0 70 US
8 Plymouth Fury iii 14.0 8 440.0 215.0 4312. 8.5 70 US
9 Pontiac Catalina 14.0 8 455.0 225.0 4425. 10.0 70 US
10 AMC Ambassador DPL 15.0 8 390.0 190.0 3850. 8.5 70 US
11 Citroen DS-21 Pallas 0 4 133.0 115.0 3090. 17.5 70 Europe
12 Chevrolet Chevelle Concours (sw) 0 8 350.0 165.0 4142. 11.5 70 US
13 Ford Torino (sw) 0 8 351.0 153.0 4034. 11.0 70 US
  • Multiple rows can be deleted with one drop method by presenting the index labels or numbers in a list inside the drop() method
# Drop rows with index number 6, 8 and 12
car_data.drop([6,8,12],inplace=True)
car_data
Car MPG Cylinders Displacement Horsepower Weight Acceleration Model Origin
1 Chevrolet Chevelle Malibu 18.0 8 307.0 130.0 3504. 12.0 70 US
2 Buick Skylark 320 15.0 8 350.0 165.0 3693. 11.5 70 US
3 Plymouth Satellite 18.0 8 318.0 150.0 3436. 11.0 70 US
4 AMC Rebel SST 16.0 8 304.0 150.0 3433. 12.0 70 US
5 Ford Torino 17.0 8 302.0 140.0 3449. 10.5 70 US
7 Chevrolet Impala 14.0 8 454.0 220.0 4354. 9.0 70 US
9 Pontiac Catalina 14.0 8 455.0 225.0 4425. 10.0 70 US
10 AMC Ambassador DPL 15.0 8 390.0 190.0 3850. 8.5 70 US
11 Citroen DS-21 Pallas 0 4 133.0 115.0 3090. 17.5 70 Europe
13 Ford Torino (sw) 0 8 351.0 153.0 4034. 11.0 70 US
  • Now we see that dropped index numbers are missing
  • reset_index() method can be used to correct index numbering
  • We use drop=True here so the old index won't be added as a new column
car_data.reset_index(inplace=True,drop=True)
car_data
Car MPG Cylinders Displacement Horsepower Weight Acceleration Model Origin
0 Chevrolet Chevelle Malibu 18.0 8 307.0 130.0 3504. 12.0 70 US
1 Buick Skylark 320 15.0 8 350.0 165.0 3693. 11.5 70 US
2 Plymouth Satellite 18.0 8 318.0 150.0 3436. 11.0 70 US
3 AMC Rebel SST 16.0 8 304.0 150.0 3433. 12.0 70 US
4 Ford Torino 17.0 8 302.0 140.0 3449. 10.5 70 US
5 Chevrolet Impala 14.0 8 454.0 220.0 4354. 9.0 70 US
6 Pontiac Catalina 14.0 8 455.0 225.0 4425. 10.0 70 US
7 AMC Ambassador DPL 15.0 8 390.0 190.0 3850. 8.5 70 US
8 Citroen DS-21 Pallas 0 4 133.0 115.0 3090. 17.5 70 Europe
9 Ford Torino (sw) 0 8 351.0 153.0 4034. 11.0 70 US

Adding a new column to DataFrame#

  • There are many ways to insert new column to existing DataFrame:
    • assign method (adds new column to the end of the DataFrame)
    • insert method (adds new column to the DataFrame using the position of choice)
    • with column name as a reference
    • loc with column name as label
  • Below are couple of examples for adding a new column
  • In first example we add color column for cars using the assign method
  • This method takes column name and value or valueset as an input
# Add color column for car DataFrame using assign method
colors = ["red","lightblue","green","red","blue","yellow","green","red","red","black"]

car_data = car_data.assign(Color=colors)

car_data
Car MPG Cylinders Displacement Horsepower Weight Acceleration Model Origin Color
0 Chevrolet Chevelle Malibu 18.0 8 307.0 130.0 3504. 12.0 70 US red
1 Buick Skylark 320 15.0 8 350.0 165.0 3693. 11.5 70 US lightblue
2 Plymouth Satellite 18.0 8 318.0 150.0 3436. 11.0 70 US green
3 AMC Rebel SST 16.0 8 304.0 150.0 3433. 12.0 70 US red
4 Ford Torino 17.0 8 302.0 140.0 3449. 10.5 70 US blue
5 Chevrolet Impala 14.0 8 454.0 220.0 4354. 9.0 70 US yellow
6 Pontiac Catalina 14.0 8 455.0 225.0 4425. 10.0 70 US green
7 AMC Ambassador DPL 15.0 8 390.0 190.0 3850. 8.5 70 US red
8 Citroen DS-21 Pallas 0 4 133.0 115.0 3090. 17.5 70 Europe red
9 Ford Torino (sw) 0 8 351.0 153.0 4034. 11.0 70 US black
  • In second example we add rating column for cars using insert method
  • The benefit of insert method compared to assign is that you may freely choose the position of the new column
  • Insert method takes four input values:
    • position
    • name
    • values for the column
    • check if column with the same name already exists in the DataFrame (True / False)
  • In the example below we add our rating column to column index position 5 (sixth column in DataFrame)
# Add rating column for car DataFrame using loc
ratings = [4.1,3.8,4.2,4.0,3.9,3.9,4.4,4.2,4.1,3.7]
car_data.insert(5,"Rating",ratings,True)
car_data
Car MPG Cylinders Displacement Horsepower Rating Weight Acceleration Model Origin Color
0 Chevrolet Chevelle Malibu 18.0 8 307.0 130.0 4.1 3504. 12.0 70 US red
1 Buick Skylark 320 15.0 8 350.0 165.0 3.8 3693. 11.5 70 US lightblue
2 Plymouth Satellite 18.0 8 318.0 150.0 4.2 3436. 11.0 70 US green
3 AMC Rebel SST 16.0 8 304.0 150.0 4.0 3433. 12.0 70 US red
4 Ford Torino 17.0 8 302.0 140.0 3.9 3449. 10.5 70 US blue
5 Chevrolet Impala 14.0 8 454.0 220.0 3.9 4354. 9.0 70 US yellow
6 Pontiac Catalina 14.0 8 455.0 225.0 4.4 4425. 10.0 70 US green
7 AMC Ambassador DPL 15.0 8 390.0 190.0 4.2 3850. 8.5 70 US red
8 Citroen DS-21 Pallas 0 4 133.0 115.0 4.1 3090. 17.5 70 Europe red
9 Ford Torino (sw) 0 8 351.0 153.0 3.7 4034. 11.0 70 US black

Column deletion#

  • drop() method can be used for removing columns also from the DataFrame
  • drop() method takes axis parameter with either value of 0 or 1
    • axis=0 (default value) removes rows
    • axis=1 removes columns
  • Let's remove the Model column from the DataFrame
car_data.drop(["Model"],inplace=True,axis=1)
car_data
Car MPG Cylinders Displacement Horsepower Rating Weight Acceleration Origin Color
0 Chevrolet Chevelle Malibu 18.0 8 307.0 130.0 4.1 3504. 12.0 US red
1 Buick Skylark 320 15.0 8 350.0 165.0 3.8 3693. 11.5 US lightblue
2 Plymouth Satellite 18.0 8 318.0 150.0 4.2 3436. 11.0 US green
3 AMC Rebel SST 16.0 8 304.0 150.0 4.0 3433. 12.0 US red
4 Ford Torino 17.0 8 302.0 140.0 3.9 3449. 10.5 US blue
5 Chevrolet Impala 14.0 8 454.0 220.0 3.9 4354. 9.0 US yellow
6 Pontiac Catalina 14.0 8 455.0 225.0 4.4 4425. 10.0 US green
7 AMC Ambassador DPL 15.0 8 390.0 190.0 4.2 3850. 8.5 US red
8 Citroen DS-21 Pallas 0 4 133.0 115.0 4.1 3090. 17.5 Europe red
9 Ford Torino (sw) 0 8 351.0 153.0 3.7 4034. 11.0 US black

Conversion between datatypes#

  • Like in basic Python, with Pandas we sometimes need to convert from datatype to another
  • Below is a table containing the possible datatypes in Pandas
Pandas dtype Python type Usage
object str or mixed Text or mixed numeric and non-numeric values
int64 int Whole numbers
float64 float Floating point numbers
bool bool Boolean values (True/False)
datetime64 datetime Date and time value
timedelta N/A Difference between two datetime values
category N/A Finite list of text values
  • Current datatypes used in DataFrame can be retrieved with dtypes method
# List datatypes for DataFrame
car_data.dtypes
Car              object
MPG              object
Cylinders        object
Displacement     object
Horsepower       object
Rating          float64
Weight           object
Acceleration     object
Origin           object
Color            object
dtype: object
  • As can be seen from the output, we currently have all columns marked as objects
  • In order to ease our future data filtering, let's set our datatypes to match the column values
  • Important: String values will be treated as objects
car_data = car_data.astype({"Car":"string","MPG":"float","Cylinders":"int","Displacement":"float","Horsepower":"float","Weight":"float","Acceleration":"float","Origin":"string"})
car_data.dtypes
Car              string
MPG             float64
Cylinders         int32
Displacement    float64
Horsepower      float64
Rating          float64
Weight          float64
Acceleration    float64
Origin           string
Color            object
dtype: object

Data filtering based on values#

  • With Pandas library we can also filter data using fundamentals we have acquired earlier in this course with Python
  • Below is an example where car data is further filtered so that it matches the following criteria:
    • Acceleration value must be 12.0 or less
    • Horsepower value should be between 100 and 250 (inclusive)
    • Car name should start with letter C
car_data = car_data[(car_data["Acceleration"] <= 12.0) & ((car_data["Horsepower"] >= 100) & (car_data["Horsepower"] <= 250)) & (car_data["Car"].str.startswith("C"))]
  • Let's analyze what we did here
    • All three conditions were combined with & (AND)
    • Car name in Car column is still handled as series so we use str method in order to use string methods from Python for series data
    • Changes are stored in current DataFrame by reassigning the variable

Writing DataFrame to CSV file#

  • DataFrame can be written back to CSV file using .to_csv() method for DataFrame
  • Below is an example where current modified DataFrame containing information of two cars is written into the csv file called cars-filtered.csv
car_data.to_csv("cars-filtered.csv")
  • Finally, let's read the created file in order to verify that it includes the correct data
new_car_data = pd.read_csv("cars-filtered.csv")
new_car_data
Unnamed: 0 Car MPG Cylinders Displacement Horsepower Rating Weight Acceleration Origin Color
0 0 Chevrolet Chevelle Malibu 18.0 8 307.0 130.0 4.1 3504.0 12.0 US red
1 5 Chevrolet Impala 14.0 8 454.0 220.0 3.9 4354.0 9.0 US yellow