This is a walkthrough for installing tesseract on Windows and configuring it to be able to programatically use it with Python.
As a bonus I show how you can parallelized execution with the multiprocessing package Pool module on Windows.
Also, the recon screenshot tool for Bug Bounties I was thinking of is GoWitness. https://github.com/sensepost/gowitness
Use your powers for good.....or evil. Whatever, I'm a YouTuber not a cop.
00:00 Intro/What is Tesseract
00:55 Installing pytesseract with pip
01:55 Installing Tesseract OCR exe
04:00 Extracting text
04:12 Adding Tesseract to Path
09:45 Creating a function to classify documents
19:50 A bit about what an image file really is
22:10 Handling errors gracefully
22:35 Adding Return Codes
26:15 Vertical Scaling using Multiprocessing
30:00 Putting it all together
35:00 Demo of how multiprocessing doesn't work in a notebook on Windows
36:14 Conclusion
Download
0 formats
No download links available.
How to Install Tesseract OCR on Windows and use it with Python | NatokHD