Machine learning is the process of automatically discovering (inductively) the formula/rule that models a certain random variable. If you can discover the rule by yourself, you don't need the machine to aid you.
You (or your boss), as the domain expert, has written a rule to model the target class. As long as this rule is deemed good enough for the business, there's no need to spend resources to generate another model.
manpreet
Best Answer
2 years ago
I have a bunch of data related to my company's incoming and outgoing calls, and I'm trying to detect spam callers in order to blacklist them. The features I've extracted from the data are a bit simple right now:
Initially my superior sent me the dataset with a field called "IsSpammer", which was binary, and after a very simple Random Forest (1000 trees, default scikit-learn everything else) I got a 100% accuracy even after cross-validation. I took a closer look at the data and realized anyone who called more than 20 different numbers inside the company or had average call duration less than 20 seconds where flagged as spammers (which may be true), but then I asked supervisor how this field was generated, and he told me he set it under that rule the random forest discovered.
I then realized every ML problem I've faced came from a Kaggle contest that gave me a clear field to predict, and a training set that had "correct" information on the fields to predict. I don't know how to proceed with ML if I don't have the information of Spammer/NotSpammer, since apparently if I come up with a clever (or not) rule, then the best I can hope to discover is that rule I'm using in the first place... I'm not sure if getting more features and making and obscure rule using all of them would help. Obviously I lack experience in this, so I figured I would ask.
Is this a problem that could benefit from using machine learning?