No matter what stage you're at in your education or career, TuteeHub will help you reach the next level that you're aiming for. Simply,Choose a subject/topic and get started in self-paced practice sessions to improve your knowledge and scores.
General Tech 10 Answers
General Tech 7 Answers
General Tech 3 Answers
General Tech 9 Answers
manpreet
Best Answer
2 years ago
I'm trying to extract locations from blobs of text (NER/IE) and have tried many solutions all which are far too innaccurate spacy, Stanford etc etc.
All really are only about 80-90% accurate on my dataset (spacy was like 70%), another problem I'm having is not having a probability that means anything for these entities so I don't know confidence and can't proceed accordingly.
I tried a super naive approach of splitting my blobs into singular words then extracting surrounding context as features, also used a location placename lookup (30/40k location placenames) as a feature aswell. Then I used just a classifier(XGDBoost) and the results where much better once I trained the classifier on about 3k manually labelled datapoints (100k total only 3k where locations). 95% precision for states/countries and about 85% for cities.
This approach sucks obviously but why is it outperforming everything I have tried? I think the black box approach to NER just isn't working for my data problem, I tried spacy custom training and it really just didn't seem like it was going to work. Not having a confidence in the entity is kind of killer also as the probability they give you for that is almost meaningless.
Is there someway I can approach this problem a little better to improve my results even more? shallow nlp for like 2/3/4-grams? Another problem I have with my approach is the output of the classifier isnt some sequential entity, its literally just classified word blobs which somehow need to be clustered back into one entity i.e : -> San Francisco, CA is just 'city','city', '0','state' with no concept of them being the same entity
spacy example:
example blob: