Semi-automatic Segmentation & Alignment of Handwritten


HPP closeness-based line segmentation



Yüklə 11,83 Mb.
səhifə12/23
tarix07.09.2023
ölçüsü11,83 Mb.
#141855
1   ...   8   9   10   11   12   13   14   15   ...   23

HPP closeness-based line segmentation


This method is purely based on the HPP of the image. Since the HPP finds the y-values where there are a lot of black pixels, each peak roughly correlates to the middle of a line. Thus, the y-values can be extracted directly from the HPP. The components of the page can then be assigned line y-values according to the middle y-value of each component. In this, each component is assigned to the closest line that was found. The same method as in GMM-based line segmentation can then be applied to find the line bounding box. That is, to find the extremes for all directions of all components assigned to each line.


    1. Word segmentation


When the lines have been found, each line can be segmented further into words. In this, two slightly different methods were developed.


For each line, the gaps in the text are found for the original image and a 45-degree slanted version of the image. A gap is defined as one or more pixel columns in a row where all values have an intensity equal to zero (white). This is achieved by iterating through all pixel columns and summing up the value, also known as calculating the vertical projection profile. The starting column and ending column of the gap are also calculated.


From the gaps, the intersection of gaps between the original image and the slanted image is noted. Once all gaps have been found for each line, a minimum threshold is selected. This threshold filters the gaps so that gaps with a width smaller than the threshold are not counted as a word divider and instead as part of a word. Gaps greater than the threshold are considered to be word dividers/blank spaces. The gaps are filtered and the middle point of each gap is calculated.


The most extreme coordinates of the connected components located between two gaps are set as bounding coordinates for the word. The difference between the two methods developed is that in method 1, the connected components used are calculated from the whole image, while in method 2, the connected components are calculated from each line separated from the other parts of the image. An example of the word segmentation output can be seen in Figure 9b.



(a) (b)

(c)



Figure 9: Example outputs from segmentation and alignment. (a) Example of the output lines (red) and the connected components (green) from HPP closeness-based line segmentation. (b) An example output after the word segmentation algorithm using method 1. (c) Output example of alignment using Intersection-over-Union-based alignment.

This section contains information about different alignment algorithms evaluated in the pipeline. This thesis handles three types of alignment depending on the available data. If GT coordinates for the word bounding boxes are available, they will be used for aligning, as this will give a more accurate result. Note that when the GT coordinates are being used, they are transformed using the same transformation matrix as in 4.1.1. This reflects the location of the boxes in the image with the borders the same as in the image without the borders. In the case that these coordinates are not available, the transcript text will


be used for alignment. If no GT is available, the option to manually transcribe the text will be given (See 4.4).





      1. Yüklə 11,83 Mb.

        Dostları ilə paylaş:
1   ...   8   9   10   11   12   13   14   15   ...   23




Verilənlər bazası müəlliflik hüququ ilə müdafiə olunur ©azkurs.org 2024
rəhbərliyinə müraciət

gir | qeydiyyatdan keç
    Ana səhifə


yükləyin