Machine Learning, Imputing values that should be blank

General Tech Learning Aids/Tools 2 years ago

0 2 0 0 0 tuteeHUB earn credit +10 pts

5 Star Rating 1 Rating

Posted on 16 Aug 2022, this text provides information on Learning Aids/Tools related to General Tech. Please note that while accuracy is prioritized, the data presented might not be entirely correct or up-to-date. This information is offered for general knowledge and informational purposes only, and should not be considered as a substitute for professional advice.

Take Quiz To Earn Credits!

Turn Your Knowledge into Earnings.

tuteehub_quiz

Answers (2)

Post Answer
profilepic.png
manpreet Tuteehub forum best answer Best Answer 2 years ago

Sometimes data sets contain variables that indicate the presence of an event and the value that represented the event.

As an example say a teacher wants to predict the grades of his students. Some of the students may have been in his class last year and he can use that grade as a variable. However maybe only 20% of the students were in his class so the rest of the 80% will have a Null value. Most ML algorithms cannot accept Null values so the variable would have to somehow be imputed.

I cannot think of an imputation method that would make sense here, the standard mean/mode would imply that all students were in the class and since the variable is pretty unbalance and 80% of the values would be imputed I don't imagine it would hold any valuable information.

Are there any methods to deal with this scenario?

profilepic.png
manpreet 2 years ago

There is usually latent information in the null/missing value which a tree model or deep work">network can signal out.

It gets a bit trickier in the peculiar case where 80% of the response is missing. It might be a good idea to wait for more data or try imputation with a probabilistic model (maybe a Bayesian model for imputation).

Anyhow, being able to classify the non-passing students is probably just as important though ;)

A boosted zero or one inflated beta regression might help you out here, as you could classify the non-passing students and run a beta regression over the grades at the same time. You would be sharing information in the gradient boosting as well.

I probably wouldn’t bother with deep learning here, unless you’ve got a substantial amount of data. LightGBM and Xgboost tends to work just as well if not better on structured 2d tensors.


0 views   0 shares

No matter what stage you're at in your education or career, TuteeHub will help you reach the next level that you're aiming for. Simply,Choose a subject/topic and get started in self-paced practice sessions to improve your knowledge and scores.