Use of Morphological Operations in Text Reading





An application of erosion and dilation in images to read a receipt text.



We will use morphological operations (erode and dilate) in practical applications! In this activity, we shall use it to extract handwritten or printed text with lines on it. We shall use previous concepts we have learned.





To start, we take a picture (or scan) of some text. For our purpose, we have used an image of a receipt form from MicroData Systems and Management Inc., and as written on the top it is a Demo Checklist (see Figure 1a). The image is a little tilted so an external program helped us (GIMP) in correcting it. (Thank you GIMP!:P). We have cropped a certain portion (Figure 1b) in the image of the text as seen in the red box in Figure 1a. We have chosen a portion of the image with handwritten text with lines in the form.


Figure 1. The original receipt form (a) and the cropped image (b) to be used in this experiment.


and now we’re ready to roll…


First, we ‘clean’ the image by removing lines. Remember our activity entitled ‘Enhancement in the Frequency Domain’? The one where we removed the lines on an image of the moon and the weave pattern in a painting? We’ll do that again once more. If you are unfamiliar of what I am talking about, you better check out my previous blog. 😛 Basically, we go to the Fourier domain and then remove the unwanted lines there and then go back to the ‘real’ world. Figure 2a shows the Fourier Transform of the cropped image. We then create a mask (Figure 2b) to remove the lines. The ‘cross’ represents the horizontal and vertical lines in the original image. We then multiply the original FFT of the image with the mask producing Figure 2c.


Figure 2. The Fourier Spectrum. The original spectrum (a) of the cropped image and mask used (b). When the original spectrum and the mask is multiplied, the filtered Fourier spectrum (c) is obtained.


Next is we return Figure 2c to the ‘real’ world. We take its inverse FFT. Figure 3 shows a comparison of the original and the inverse FFT of Figure 2c. As we can see, the lines are removed with the handwritten texts still visible. (Although, I really can’t understand some of the hand writings. T.T)


Figure 3. The original cropped image (a) and the filtered cropped image. Notice that the horizontal and vertical lines have vanished.


Now, what we will do is binarize figure 3b and then process it such that each handwritten letter is one pixel thick. *whew*. First is we binarize it using im2bw with a threshold of 0.45. Figure 4a shows the black and white image. ­­­­Next is we use the erode morphological operator to make the letters one pixel thick. However, to my dismay :’( I was not able to fully make the letters one pixel thick. After trying different structuring elements, I have found out that I am getting a more or less 2 pixel thick letter when I use a horizontal structuring element and then a vertical structuring element. Figure 4b shows the eroded image.


Figure 4. The binarized version of filtered image (a) and the eroded image (b). The red box in (b) is zoomed in as seen in (c).


As you can see in Figure 4, the erode morphological function makes the letter thickness smaller. We have zoomed in (Figure 4c) the red box in Figure 4b. As you can see we were not able to exactly make the letters one pixel thick. There were some points where the letters are still thick. Other structuring element or combinations of such may be able to make it one pixel thick. One may also define an erode algorithm that will do such.


I have also done this method in ‘easier’ parts of the Demo Checklist. Figure 5 shows my results. Figure 5a shows the original cropped image. As you can see, only horizontal lines were seen, it is not hand written and the lines and the texts are not overlapping. We have removed the lines as seen in Figure 5b. Other elements are now gone. Next is we binarize it as seen in Figure 5c. The texts are now separated! Lastly, we use the erode function, again with a horizontal then a vertical structuring element. Figure 5d shows the result.


Figure 5. Another example. The original cropped image (a) is filtered removing the lines (b). It is then binarized (c) and then eroded (d).


We zoom in the “DEMO CHECKLIST” in Figure 5D. As seen in Figure 6, most of the letters are 1-pixel thick which is better than that of the hand writing. However, there still remains non-1 pixel letters. What we can infer from this is that generally, for letters that are not exactly uniform in nature (hand-written) and those with lines over them are harder to reduce to 1 pixel than that of computerized letters without lines over them.


Figure 6. Zoomed in part of Figure 5d.