How do I return all lines in a csv containing specific words or phrases in a specific column?

General Tech Bugs & Fixes 2 years ago

0 2 0 0 0 tuteeHUB earn credit +10 pts

5 Star Rating 1 Rating

Posted on 16 Aug 2022, this text provides information on Bugs & Fixes related to General Tech. Please note that while accuracy is prioritized, the data presented might not be entirely correct or up-to-date. This information is offered for general knowledge and informational purposes only, and should not be considered as a substitute for professional advice.

Take Quiz To Earn Credits!

Turn Your Knowledge into Earnings.

tuteehub_quiz

Answers (2)

Post Answer
profilepic.png
manpreet Tuteehub forum best answer Best Answer 2 years ago

 

I have a csv file containing a data set (in this case addresses). I would like to make a second csv file containing only the entries which have one of a set of phrases in a specific column. For example I would like to return all the people who currently live in "Viridian" but not those who previously lived there or never lived there.

The example data is:

First Name,Second Name,ID,Home Town,County,Current Town,Street
Sam,Smith,1234,Pallet,North,Orange,Lemon
Jenny,Walton,1456,Viridian,West,York,High View
Alan,Kirk,2378,Orange,West,Viridian,High street
Reese,Small,9840,Minsk,East,Viridian,Ocean Avenue
Audry,Owen,7865,York,South,Blackmarsh,8th Street
Marco,Jefferson,1580,Amsterdam,Central,Oxford,Church Road
Jim,Lowe,5218,Windy City,East,Windy City,Oak
Gillian,Pope,3217,Rome,Central,Rome,Low road

I have previously used this code:

town = ["Viridian", "Rome"]

with open("addresses.csv",) as oldfile, open("Filtered addresses.csv", "w") as newfile:
    for line in oldfile:
        if any(town in line.strip().lower() for town in town):
            newfile.write(line)

However, this returns lines with the specified cities in all columns - I just want the ones with the specified cities in the column "current town".

I tried this instead:

import csv

town = ["Viridian", "Rome"]

with open("Filtered addresses.csv", "w", encoding="Latin-1") as newfile:

    reader = csv.reader(open("addresses.csv", 'r', encoding="Latin-1"))

    for data in reader:
        if any(town in data[6] for town in town):
            newfile.write(data)

But this results in an error:

TypeError: write() argument must be str, not list

While altering the code to read:

newfile.write(str(data))

returns some entries but they are formatted as a single long line rather than rows.

What is the best way to achieve my aim? I would like to keep the full row of data in each case.

Thanks!

profilepic.png
manpreet 2 years ago

 

pandas will make it extremely easy:

import pandas as pd

town = ["Viridian", "Rome"]
# Read csv as pandas dataframe
original = pd.read_csv("addresses.csv", index_col=False)
# Select rows where `Current Town` column's value is in `town`
filtered = original[original['Current Town'].isin(town)]
# Save the filtered dataframe to a file
filtered.to_csv("Filtered addresses.csv")

If you don't have pandas installed, you can easily install it running:

pip install pandas

in your command line


0 views   0 shares

No matter what stage you're at in your education or career, TuteeHub will help you reach the next level that you're aiming for. Simply,Choose a subject/topic and get started in self-paced practice sessions to improve your knowledge and scores.