Posted on 16 Aug 2022, this text provides information on Learning Aids/Tools related to General Tech. Please note that while accuracy is prioritized, the data presented might not be entirely correct or up-to-date. This information is offered for general knowledge and informational purposes only, and should not be considered as a substitute for professional advice.

Answers (2)

Post Answer

manpreet Best Answer 2 years ago

I have a bunch of data related to my company's incoming and outgoing calls, and I'm trying to detect spam callers in order to blacklist them. The features I've extracted from the data are a bit simple right now:

Caller Number: Who is calling
N_Unique: The amount of different phones called
N_Answered: The amount of times the call was connected
N_Missed: The amount of times the call was ignored/missed
Avg_Duration: The average duration answered calls lasted

Initially my superior sent me the dataset with a field called "IsSpammer", which was binary, and after a very simple Random Forest (1000 trees, default scikit-learn everything else) I got a 100% accuracy even after cross-validation. I took a closer look at the data and realized anyone who called more than 20 different numbers inside the company or had average call duration less than 20 seconds where flagged as spammers (which may be true), but then I asked supervisor how this field was generated, and he told me he set it under that rule the random forest discovered.

I then realized every ML problem I've faced came from a Kaggle contest that gave me a clear field to predict, and a training set that had "correct" information on the fields to predict. I don't know how to proceed with ML if I don't have the information of Spammer/NotSpammer, since apparently if I come up with a clever (or not) rule, then the best I can hope to discover is that rule I'm using in the first place... I'm not sure if getting more features and making and obscure rule using all of them would help. Obviously I lack experience in this, so I figured I would ask.

Is this a problem that could benefit from using machine learning?

0 views

0 shares

$userId = is_array($answer) ? ($answer['user_id'] ?? null) : ($answer->user_id ?? null); $commentuser = getUserWithId($userId);

manpreet 2 years ago

Machine learning is the process of automatically discovering (inductively) the formula/rule that models a certain random variable. If you can discover the rule by yourself, you don't need the machine to aid you.

You (or your boss), as the domain expert, has written a rule to model the target class. As long as this rule is deemed good enough for the business, there's no need to spend resources to generate another model.

0 views 0 shares

No matter what stage you're at in your education or career, TuteeHUB will help you reach the next level that you're aiming for. Simply,Choose a subject/topic and get started in self-paced practice sessions to improve your knowledge and scores.

Popular Categories

Detecting spammers with artificially generated target class

Manpreet Singh

Answers (2)

manpreet Best Answer 2 years ago

manpreet 2 years ago

Similar Forum

Which operating system you favour and why?

What are the most popular tech portals in India?

What are best technologies available today for education / aiding learning?

Explore Other Libraries

Online Exams

Question Bank

Career News

Feeds

Full Forms

Dictionary

Interview Question

Gigs

Quotes

Lyrics

Videos

Courses

Blogs

Tutorials

Forum

Educators

Corporates

Tools

Related Searches

Important General Tech Links

Join Our Community Today