Issues with Dates in Apache OpenNLP

General Tech Learning Aids/Tools 2 years ago

0 1 0 0 0 tuteeHUB earn credit +10 pts

5 Star Rating 1 Rating

Posted on 16 Aug 2022, this text provides information on Learning Aids/Tools related to General Tech. Please note that while accuracy is prioritized, the data presented might not be entirely correct or up-to-date. This information is offered for general knowledge and informational purposes only, and should not be considered as a substitute for professional advice.

Take Quiz To Earn Credits!

Turn Your Knowledge into Earnings.

tuteehub_quiz

Answers (1)

Post Answer
profilepic.png
manpreet Tuteehub forum best answer Best Answer 2 years ago

For a recent project to aid me learning NLP I am working on a number of documents, each of which contain a date. What I would like to be able to do is read the unstructured data and identify the date or dates within, converting it into a numeric format and possibly setting it to the documents metadata. (Note: Since the documents being used is all pseudo formation">information, the actual meta data of the files being read in are false).

Recently I have been attempting to use OpenNLP in conjunction with Lucene to do so and it works to a given degree. However if the date is written as "13 January 1990" or "2010/01/05", OpenNLP only identifies "January 1990" and "2010" respectively, but not the entire date. Other date formats may have issues as well, I have yet to try them all. While I recognise that OpenNLP works upon a statistical basis rather than a format basis, I can't help but get the feeling I'm making an elementary mistake.

Am I making a mistake? If not is there an easy manner in which to rectify this?

I understand that I may be able to construct my own trained model based on a training data set. Is the Apache OpenNLP one freely available, so I may extend it? Are there any others that are freely available?

Is there a better way to do this? I've heard of Apache UIMA, the main reason why I went for OpenNLP is due to its mention in Taming Text by Manning. I should note that the extraction of dates is the first stage of the project and other data will be extracted later as well.

Many thanks for any response.

0 views
0 shares

No matter what stage you're at in your education or career, TuteeHub will help you reach the next level that you're aiming for. Simply,Choose a subject/topic and get started in self-paced practice sessions to improve your knowledge and scores.