Back to Browse

Mindee docTR - Probably the Best Open-Source OCR

15.3K views
Apr 18, 2022
15:32

Do you want to build ML pipeline to automate data extraction from business documents (receipts, invoices, forms)? Then your first step should be to integrate OCR for text extraction. OCR extraction quality must be good, the whole pipeline will depend on initial text data extraction quality. If extracted data will be accurate, this means ML models will be able to run proper classification. I spent time researching available solutions for OCR and I think Mindee docTR currently is one of the best open-source OCR solutions available. Check the video, where I run and show multiple tests. Mindee docTR GitHub: https://github.com/mindee/doctr SRD Receipts dataset: https://expressexpense.com/blog/free-receipt-images-ocr-machine-learning-dataset/ Sparrow GitHub: https://github.com/katanaml/sparrow/tree/main/sparrow-ocr 0:00 Introduction 2:41 Mindee docTR 5:27 Test 1 7:43 Test 2 9:12 Test 3 11:58 Test 4 13:19 Test 5 14: 21 Summary CONNECT: - Subscribe to this YouTube channel - Twitter: https://twitter.com/andrejusb - LinkedIn: https://www.linkedin.com/in/andrej-baranovskij/ - Medium: https://medium.com/@andrejusb #OCR #MachineLearning #Python

Download

1 formats

Video Formats

360pmp428.8 MB

Right-click 'Download' and select 'Save Link As' if the file opens in a new tab.

Mindee docTR - Probably the Best Open-Source OCR | NatokHD