Speak now
Please Wait Image Converting Into Text...
Embark on a journey of knowledge! Take the quiz and earn valuable credits.
Challenge yourself and boost your learning! Start the quiz now to earn credits.
Unlock your potential! Begin the quiz, answer questions, and accumulate credits along the way.
General Tech Learning Aids/Tools 2 years ago
Posted on 16 Aug 2022, this text provides information on Learning Aids/Tools related to General Tech. Please note that while accuracy is prioritized, the data presented might not be entirely correct or up-to-date. This information is offered for general knowledge and informational purposes only, and should not be considered as a substitute for professional advice.
Turn Your Knowledge into Earnings.
I have a list of around 700 variables which I need to perform a variable cleanup on. What complicates things is there are different numeric codes which flag an invalid value and these differ by the variable type. I wanted to see if I can use some form of unsupervised learning to aid in this task. Would appreciate any advice/suggestions.
Let me elaborate on what I'm working with.
When I mention variable type, they're all numeric, but I'm trying to classify them into categories like a dollar-amount, age, number of something based off of the name of the variable as the rules for invalid flag differ by those categories.
Because of this, I'd like to classify my variables into things like:
Here's an example of what invalid values look like:
Invalid values for a variable of type "Number of items":
Invalid values for a variable of type "Dollar amounts":
Some additional points:
These variables have maximums, like 9 for number of things. But isn't the most reliable to filter by, as it could definitely affect other types like $ amounts.
The variable names can sometimes be telling of the type of variable
1) It could have a keyword in the variable name like "N_" to indicate that variable is number of items.
2) Sometimes the rule may not be so simple and can be confounded by other keywords, eg: N_ITEMS_PCT_50. This is actually a number of items with a percentage over 50% rather than a percentage value.
Some of the features I collected to help with measuring similarity:
1) The variable names, of course
2) Maximum values of each variable
3) # of times that an invalid flag (for all types) would appear for each variable. I would do this by calculating how many observations fall in the range of invalid values. So if I did this for "number of items", I would count number of observations ranging from 6 to 9. I would calculate another column to look for invalid dollar amounts by counting number of observations between 99996 to 99999.
I'm interested to see if this could be a viable approach as I'd try have my work cut out for me rather than making this a very manual process for 700 variables. Would appreciate any insight.
Thanks
No matter what stage you're at in your education or career, TuteeHub will help you reach the next level that you're aiming for. Simply,Choose a subject/topic and get started in self-paced practice sessions to improve your knowledge and scores.
General Tech 10 Answers
General Tech 7 Answers
General Tech 3 Answers
General Tech 9 Answers
General Tech 2 Answers
Ready to take your education and career to the next level? Register today and join our growing community of learners and professionals.