Pandas are built on top of Numpy. It is one of the most used and preferred data analysis library. Data manipulation with Pandas become a lot easier and intuitve. If you are familiar with Numpy arrays, then moving onto pandas would be a lot easier. You can convert Numpy array into pandas dataframe by passing array object to Pandas' DataFrame function.
The cool thing about pandas is that it can take data from multiple sources (like numpy arrays, Ecxcel sheets in csv formant, or SQL databases) and creates table like grid having rows and columns that is very similar to format we see in relational databases. If you are familiar with R language, you will see similarities too.
Pandas also allows us to easily access a portion of data using indexing and perform operations on that portion of data. Perfoming operations on a portion of data, especially when there are multiple lists, becomes cumbersome using builtin python lists.
You can do away with a lot of overhead, when you load tabular data into a pandas DataFrame, as you can see most commonly used statistical information like mean,average,max,std,count with just one describe() function.
Example #1: Importing Data From CSV File
You can import any excel sheet into pandas DataFrame, but for sake of this tutorial, I am going to load this open source titanic dataset.
import pandas as pd
import shutil
import glob
import os
if not 'script_dir' in globals():
script_dir = os.getcwd()
data_directory = 'data\\'
example_directory = 'PandasExample\\'
source_file_name = 'titanic.csv'
target_file_name = 'female_dataset.csv'
source_path = os.path.join(script_dir, data_directory, example_directory, file_name)
target_path = os.path.join(script_dir, data_directory, example_directory, target_file_name)
#Import and show top five rows.
dataset = pd.read_csv(source_path)
dataset.head(5)
Example #2: Exploring Your Data
Below is some basic exploritory data analysis.
dataset.describe()
#Let's only select femal passenger's data
female_dataset = dataset[dataset.Sex == "female"]
female_dataset.head(5)
Example #3: Writing DataFrames To Disk
Let's save the new dataset into csv file Piple delimited. The default delimiter is comma (You can use any delimiter). Run the code and check the example folder for the new file.
female_dataset.to_csv(target_path, sep='|')
Copyright © 2020, Mass Street Analytics, LLC. All Rights Reserved.