Detecting spammers with artificially generated target class

General Tech Learning Aids/Tools 2 years ago

0 2 0 0 0 tuteeHUB earn credit +10 pts

5 Star Rating 1 Rating

Posted on 16 Aug 2022, this text provides information on Learning Aids/Tools related to General Tech. Please note that while accuracy is prioritized, the data presented might not be entirely correct or up-to-date. This information is offered for general knowledge and informational purposes only, and should not be considered as a substitute for professional advice.

Take Quiz To Earn Credits!

Turn Your Knowledge into Earnings.

tuteehub_quiz

Answers (2)

Post Answer
profilepic.png
manpreet Tuteehub forum best answer Best Answer 2 years ago

 

I have a bunch of data related to my company's incoming and outgoing calls, and I'm trying to detect spam callers in order to blacklist them. The features I've extracted from the data are a bit simple right now:

  • Caller Number: Who is calling
  • N_Unique: The amount of different phones called
  • N_Answered: The amount of times the call was connected
  • N_Missed: The amount of times the call was ignored/missed
  • Avg_Duration: The average duration answered calls lasted

Initially my superior sent me the dataset with a field called "IsSpammer", which was binary, and after a very simple Random Forest (1000 trees, default scikit-learn everything else) I got a 100% accuracy even after cross-validation. I took a closer look at the data and realized anyone who called more than 20 different numbers inside the company or had average call duration less than 20 seconds where flagged as spammers (which may be true), but then I asked supervisor how this field was generated, and he told me he set it under that rule the random forest discovered.

I then realized every ML problem I've faced came from a Kaggle contest that gave me a clear field to predict, and a training set that had "correct" information on the fields to predict. I don't know how to proceed with ML if I don't have the information of Spammer/NotSpammer, since apparently if I come up with a clever (or not) rule, then the best I can hope to discover is that rule I'm using in the first place... I'm not sure if getting more features and making and obscure rule using all of them would help. Obviously I lack experience in this, so I figured I would ask.

Is this a problem that could benefit from using machine learning?

profilepic.png
manpreet 2 years ago

Machine learning is the process of automatically discovering (inductively) the formula/rule that models a certain random variable. If you can discover the rule by yourself, you don't need the machine to aid you.

You (or your boss), as the domain expert, has written a rule to model the target class. As long as this rule is deemed good enough for the business, there's no need to spend resources to generate another model.


0 views   0 shares

No matter what stage you're at in your education or career, TuteeHub will help you reach the next level that you're aiming for. Simply,Choose a subject/topic and get started in self-paced practice sessions to improve your knowledge and scores.