Removing \n \\n and other unwanted characters from a json unicode dictionary with python

General Tech Learning Aids/Tools 2 years ago

0 2 0 0 0 tuteeHUB earn credit +10 pts

5 Star Rating 1 Rating

Posted on 16 Aug 2022, this text provides information on Learning Aids/Tools related to General Tech. Please note that while accuracy is prioritized, the data presented might not be entirely correct or up-to-date. This information is offered for general knowledge and informational purposes only, and should not be considered as a substitute for professional advice.

Take Quiz To Earn Credits!

Turn Your Knowledge into Earnings.

tuteehub_quiz

Answers (2)

Post Answer
profilepic.png
manpreet Tuteehub forum best answer Best Answer 2 years ago

 

I've tried a couple of different solutions to fix my problem with some "funny" newlines within my json dictionary and none of them works, so I thought I might make a post. The dictionary is achieved by scraping a website.

I have a json dictionary:

my_dict = {
    u"Danish title": u"Avanceret", 
    u"Course type": u"MScTechnol",
    u"Type of":  u"assessmen",
    u"Date": u"\nof exami",
    u"Evaluation": u"7 step sca",
    u"Learning objectives": u"\nA studen",
    u"Participants restrictions": u"Minimum 10",
    u"Aid": u"No Aid",
    u"Duration of Course": u"13 weeks",
    u"name": u"Advanced u",
    u"Department": u"31\n",
    u"Mandatory Prerequisites": u"31545",
    u"General course objectives": u"\nThe cour",
    u"Responsible": u"\nMartin C",
    u"Location": u"Campus Lyn",
    u"Scope and form": u"Lectures, ",
    u"Point( ECTS )": u"10",
    u"Language": u"English",
    u"number": u"31548",
    u"Content": u"\nThe cour",
    u"Schedule": u"F4 (Tues 1"
}

I have stripped the value content to [:10] to reduce clutter, but some of the values have a length of 300 characters. It might not be portrayed well here, but some of values have a lot of newline characters in them and I've tried a lot of different solutions to remove them, such as str.strip and str.replace but without success because my 'values' are unicode. And by values I mean key, value in my_dict.items().

How do I remove all the newlines appearing in my dictionary? (With the values in focus as some of the newlines are trailing, some are leading and others are in the middle of the content: e.i \nI have a\ngood\n idea\n).

EDIT

I am using Python v. 2.7.11 and the following piece of code doesn't produce what I need. I want all the newlines to be changed to a single whitespace character.

for key, value in test.items():
    value = str(value[:10]).replace("\n", " ")
    print key, value
profilepic.png
manpreet 2 years ago

If you're trying to remove all \n or any junk character apart from numbers or letters then use regex

for key in my_dict.keys():
    my_dict[key] = mydict[key].replace('\\n', '')
    my_dict[key] = re.sub('[^A-Za-z0-9 ]+', '', my_dict[key])
print my_dict

If you wish to keep anything apart from those then add it on to the character class inside the regex


0 views   0 shares

No matter what stage you're at in your education or career, TuteeHub will help you reach the next level that you're aiming for. Simply,Choose a subject/topic and get started in self-paced practice sessions to improve your knowledge and scores.