Lesson 30. Looping Over Files In A Directory

Looping over files in a directory is a basic ETL task. In this tutorial, I’m going to introduce you to the syntax. If you downloaded the code from GitHub, there will be small sample files to work with.

In later lessons, you will see how it is done with live files.

Examples

Example #1: Loop Over Everything In Folder

In [2]:
import os

script_dir = os.getcwd()
data_directory = 'data\\'
example_directory = 'FileLoopExample\\'
path = os.path.join(script_dir,data_directory,example_directory)

for filename in os.listdir(path):
    print(filename)
01SampleFolder
02SampleFolder
03SampleFolder
full200606.csv
full200607.csv
full200608.csv

Example #2: Loop Over Files With A Specific File Extention

In [3]:
import os

script_dir = os.getcwd()
data_directory = 'data\\'
example_directory = 'FileLoopExample\\'
path = os.path.join(script_dir,data_directory,example_directory)

for filename in os.listdir(path):
    if filename.endswith('.csv'):
        print(filename)
full200606.csv
full200607.csv
full200608.csv

Example #3: Loop Over Files In Subdirectories Recursively

In [5]:
import os

script_dir = os.getcwd()
data_directory = 'data\\'
example_directory = 'FileLoopExample\\'
path = os.path.join(script_dir,data_directory,example_directory)

for subdir, dirs, files in os.walk(path):
     for filename in files:
            print(filename)
full200606.csv
full200607.csv
full200608.csv
full200509.csv
full200510.csv
full200511.csv
full200512.csv
full200601.csv
full200602.csv
full200603.csv
full200604.csv
full200605.csv

Copyright © 2020, Mass Street Analytics, LLC. All Rights Reserved.