Popular Categories

Fill nan values with random value from another DataFrame pandas

General Tech Technology & Software 2 years ago

2.61K 2 0 0 0

Manpreet Singh

Previous Next

Posted on 16 Aug 2022, this text provides information on Technology & Software related to General Tech. Please note that while accuracy is prioritized, the data presented might not be entirely correct or up-to-date. This information is offered for general knowledge and informational purposes only, and should not be considered as a substitute for professional advice.

Answers (2)

Post Answer

manpreet Best Answer 2 years ago

I have a DataFrame with millon of rows and a lot of NaN values. Some example:

index     Company        Area
    0     Google         Technology
    1     Coca Cola      Drinks
    2     NaN            Drinks
    3     Apple          Technology
    4     NaN            Technology
    5     Gatorade       Drinks
    6     Dell           Technology
    7     Apple          Technology
    8     Coca Cola      Drinks
    9     NaN            Drinks
    10    Google         Technology

My idea is to fill Companies NaN values with one of the 2 most common values for its Area.

From example: If the most frequent Companies in Technology area are Apple and Google, I Would like to fill the "df['Area'] == 'Technology'" NaN values with one of that values (randomly)

I've already created a Group By DataFrame with the most common values, it is something like this:

Area          Company
Technology    Google
Technology    Apple
Drinks        Coca Cola
Drinks        Pepsi

The result should be something like this:

index     Company        Area
    0     Google         Technology
    1     Coca Cola      Drinks
    2     Pepsi          Drinks
    3     Apple          Technology
    4     Google         Technology
    5     Gatorade       Drinks
    6     Dell           Technology
    7     Apple          Technology
    8     Coca Cola      Drinks
    9     Pepsi          Drinks
    10    Google         Technology

I hope you can help me.

Thanks!!!

0 views

0 shares

$userId = is_array($answer) ? ($answer['user_id'] ?? null) : ($answer->user_id ?? null); $commentuser = getUserWithId($userId);

manpreet 2 years ago

I come up with this solution by using random.choice

import random

s=df1.groupby('Area').Company.apply(list).reindex(df.Area).apply(lambda x :random.choice(x) )
s.index=df.index

df.Company=df.Company.fillna(s)

df
Out[200]: 
    index   Company        Area
0       0    Google  Technology
1       1  CocaCola      Drinks
2       2  CocaCola      Drinks
3       3     Apple  Technology
4       4    Google  Technology
5       5  Gatorade      Drinks
6       6      Dell  Technology
7       7     Apple  Technology
8       8  CocaCola      Drinks
9       9     Pepsi      Drinks
10     10    Google  Technology

0 views 0 shares

No matter what stage you're at in your education or career, TuteeHUB will help you reach the next level that you're aiming for. Simply,Choose a subject/topic and get started in self-paced practice sessions to improve your knowledge and scores.

Popular Categories

Fill nan values with random value from another DataFrame pandas

Manpreet Singh

Answers (2)

manpreet Best Answer 2 years ago

manpreet 2 years ago

Similar Forum

Which operating system you favour and why?

What are the most popular tech portals in India?

What are best technologies available today for education / aiding learning?

Explore Other Libraries

Online Exams

Question Bank

Career News

Feeds

Full Forms

Dictionary

Interview Question

Gigs

Quotes

Lyrics

Videos

Courses

Blogs

Tutorials

Forum

Educators

Corporates

Tools

Related Searches

Important General Tech Links

Join Our Community Today