OCR, or Optical Character Recognition, is a powerful technology that allows us to extract text from images. This can be incredibly useful for a variety of tasks, from digitizing old documents to creating machine-readable text from images. While there are many online OCR tools available, these often come with ads, malware, and slow processing speeds. In this article, we'll explore how to use Bash and Tesseract to perform OCR quickly and easily on your own computer.
The Bash Script
Here's the simple Bash script that we'll be using for OCR:
tesseract ~/Playground/tesseract.png output; cat output.txt | pbcopy; rm tesseract.png; rm output.txt;
Let's break down what each part of this script does.
Storing and Naming Throwaway Files
First, it's important to have a safe place to store and create throwaway files. For this, we'll use the directory
~/Playground. This directory is a great place to store temporary files that we don't need to keep around permanently.
We'll also need an easy-to-remember name for our disposable image file. In this case, we'll use
tesseract.png. This makes it easy to specify the input file for Tesseract without having to remember a long and complex filename.
Now that we have our image file, we can use Tesseract to extract the text. The
tesseract command takes two arguments: the input file and the output file. In this case, we're using
tesseract ~/Playground/tesseract.png output. This tells Tesseract to use
tesseract.png as the input file and
output.txt as the output file.
Copying Text to Clipboard
Once Tesseract has finished processing the image, we'll want to copy the resulting text to the clipboard. For this, we'll use the
pbcopy command, which copies text to the clipboard on macOS.
Finally, we'll want to clean up the temporary files that we created. We can do this using the
rm command. In this case, we're deleting both
Using Bash and Tesseract, we can perform OCR quickly and easily on our own computers. By storing temporary files in a safe location and using simple, memorable filenames, we can streamline the OCR process and avoid the pitfalls of online OCR tools. Give it a try and see how it works for you!
OCR Successful. Text Copied!
By using Bash and Tesseract, we can extract text from images quickly and easily on our own computers. This allows us to avoid the risks and annoyances of online OCR tools, and streamline the OCR process with simple and memorable filenames. Give it a try and see how it can help you with your OCR needs!