Posted on 16 Aug 2022, this text provides information on Syllabus Queries related to Course Queries. Please note that while accuracy is prioritized, the data presented might not be entirely correct or up-to-date. This information is offered for general knowledge and informational purposes only, and should not be considered as a substitute for professional advice.
I'm just looking for advice on how I can get my code to operate faster. It's pretty quick right now with searching through 30 3-page PDFs, but I imagine once there gets to be thousands of files to search that it will take longer than I'd like. I can change SearchOption.AllDirectories to TopDirectoryOnly. I've done some testing though and it seems like what takes the longest is the searching in the files not actually enumerating the directory.
The major bottleneck is most likely in the ReadPdfFile method as we are dealing with a PDF file.
In your ReadPdfFilemethod, a PdfReader is created to read through every page of the document to find the searchText and the page numbers on which the searchText is found is stored inside a List named pages. Once the reader ran through every page, the method returns null or the filename based on whether numbers of pages is 0.
What you could do is to return as soon as you have found the text, so that you don't have to look through the entire document for nothing.
The method has been renamed to reflect more what it actually performs, and the return type has been changed to bool, since we only need to know if the file contains the search text.
publicboolSearchPdfFile(string fileName,String searchText){/* technically speaking this should not happen, since "you" are calling it
therefore this should be handled critically
if (!File.Exists(fileName)) return false; //original workflow
*/if(!File.Exists(fileName))thrownewFileNotFoundException("File not found", fileName);
using (PdfReaderreader =newPdfReader(fileName)){varstrategy=newSimpleTextExtractionStrategy();for(int page =1; page <= pdfReader.NumberOfPages; page++){var currentPageText =PdfTextExtractor.GetTextFromPage(pdfReader, page,strategy);if(currentPageText.Contains(searchText))returntrue;}}returnfalse;}
No matter what stage you're at in your education or career, TuteeHUB will help you reach the next level that
you're aiming for. Simply,Choose a subject/topic and get started in self-paced practice
sessions to improve your knowledge and scores.
manpreet
Best Answer
2 years ago
I'm just looking for advice on how I can get my code to operate faster. It's pretty quick right now with searching through 30 3-page PDFs, but I imagine once there gets to be thousands of files to search that it will take longer than I'd like. I can change
SearchOption.AllDirectories
toTopDirectoryOnly
. I've done some testing though and it seems like what takes the longest is the searching in the files not actually enumerating the directory.