Linear alignment is the most basic alignment algorithm out of the ones tested. Assume
we have vectors W⃗ = (w1, w2, ..., wi), containing all the segmented words in the docu-
ment, and T⃗ = (t1, t2, ..., ti), containing all of the transcript words. The algorithm then simply assigns the words in the order of the vectors. wi gets assigned to ti. One of the biggest problems with this algorithm is that it makes one major assumption, it assumes that the word segmentation is perfect every time and ordered correctly. If the length of
W⃗ does not match the length of T⃗, the alignment will not be completed for all words
and will most likely be wrong.
Distance-based alignment is generally a better approach than linear alignment if GT co- ordinates for the bounding boxes are available. This is because the segmentation will not always be precise. If the text is under-segmented or over-segmented, linear align- ment will consequently fail because there are either too many bounding boxes or too few. In distance-based alignment, the coordinates of each word bounding box are taken into consideration and the algorithm matches the closest GT bounding box to that of the newly segmented words bounding box. The algorithm loops through each GT bounding box, and for each GT bounding box, loops through all the segmented bounding boxes. It then calculates the distance between each corresponding point according to Equation 1.
dist(p1(x1, y1), p2(x2, y2) = √(x2 − x1)2 + (y2 − y1)2 (1)
The algorithm then takes the minimum distance that was calculated for each GT bound- ing box and saves the corresponding segmented bounding box together with the string content attached to the GT bounding box. In the case that the image is under-segmented, (i.e. too few bounding boxes, which means that a bounding box might contain several words), the algorithm will assign multiple annotations to this particular bounding box. This method works but can be volatile if the segmentation is very out of line or segmented incorrectly. That is, if the GT coordinates are not accurate enough, the algorithm will align some boxes incorrectly.
Dostları ilə paylaş: |