Semi-automatic Segmentation & Alignment of Handwritten

Performance for parameters set by Bayesian optimisation

Yüklə 11,83 Mb.

səhifə	20/23
tarix	07.09.2023
ölçüsü	11,83 Mb.
	#141855

1 ... 15 16 17 18 19 20 21 22 23

Performance on manually made ground truth for one image
Visualisation of the algorithm pipeline
Discussion
Performance analysis

Performance for parameters set by Bayesian optimisation

In Tables 4 and 5, the mean performance of method 1 and method 2 using parameters set by the Bayesian optimisation technique is summarised. In terms of the IoU values, the use of Bayesian Optimisation slightly outperforms setting parameters manually. Al- though as stated earlier, the IoU value does not provide all information.

Table 4: Performance metrics for method 1 using parameters set by Bayesian optimisation based on method 1. Data presented as mean (standard deviation). IoU=Intersection Over Union, 1=Method 1, EsB=Error segmentation boxes, EaB=Error alignment boxes, EW=Error words, EL=Error lines.

Data set	IoU1	EsB1	EaB1	EW1	EL1
Labour’s Memory	0.334	0.160	0.278	0.044	0.084
	(0.070)	(0.190)	(0.170)	(0.148)	(0.203)
IAM	0.750	0.175	0.182	0.095	0.006
	(0.154)	(0.094)	(0.099)	(0.037)	(0.028)

Table 5: Performance metrics for method 2 using parameters set by Bayesian optimisation based on method 2. Data presented as mean (standard deviation). IoU=Intersection Over Union, 2=Method 2, EsB=Error segmentation boxes, EaB=Error alignment boxes, EW=Error words, EL=Error lines.

Data set	IoU2	EsB2	EaB2	EW2	EL2
Labour’s Memory	0.359	0.108	0.176	0.009	0.039
	(0.032)	(0.104)	(0.010)	(0.040)	(.062)
IAM	0.850	0.066	0.136	0.059	0.006
	(0.078)	(0.039)	(0.055)	(0.059)	(0.028)

Performance on manually made ground truth for one image

The general performance for both methods 1 and 2 is improved by using the self-made GT, which is to be expected. The most substantial change is the IoU value for method 2 which is improved by 48.4% by using the new GT, while in method 1, there is an improvement of 14.7%.

Table 6: Performance metrics for one image from Labour’s Memory, comparing original GT to self-made GT on method 1/2 using manually set parameters. Data presented as mean. IoU=Intersection Over Union, 1=Method 1, 2=Method 2, EsB=Error segmentation boxes, EaB=Error alignment boxes, EW=Error words, EL=Error lines.

GT/Method	IoU	EsB	EaB	EW	EL
Original 1	0.337	0	0.297	0	0
Original 2	0.337	0.495	0.165	0	0
Self-made 1	0.484	0	0.163	0	0
Self-made 2	0.821	0	0.054	0	0

Visualisation of the algorithm pipeline

The first step of the pipeline is to set the threshold value for the binarisation, which is seen in Figure 17. Once the threshold is set, the algorithm will perform connected components and segment the lines, which are modifiable by first removing lines, and then adding lines (Figure 18). The word segmentation will begin with setting the min_gap parameter, which roughly decides where blank spaces should be, see figure 19. Once the word segmentation is completed, the words can be modified in the same way as the lines. The alignment is dependent on what data is available, and therefore three choices are given. The first choice is to manually transcribe the document as in Figure 20. The second choice is to use linear alignment with the GT transcript, and the last choice is to use intersection-over-union-based alignment using GT coordinates of bounding boxes.

Figure 17: The noise removal slider window. The slider adjusts the threshold value for the binary threshold.

(a) (b)

Figure 18: Removal and adding of lines. a) The line removal window. The placed red dot will remove the line box in which it is located. b) The line adding window.

Figure 19: The min_gap adjusting window. The slider adjusts the value for the min_gap param- eter.

Figure 20: The manual transcription process

Discussion

The primary aim of this thesis was to develop a semi-automatic alignment algorithm with integrated self-learning for mainly historical document images. Self-learning without using Neural network architectures was quite the challenge but was accomplished by using Bayesian Optimisation.

One important goal of the project was to evaluate the algorithm’s performance, or in this case, two different methods for two different ways of setting parameters. The algorithm, together with the data that is used, has many performance aspects that need to be taken into consideration for a reasonable conclusion in the end. The IoU is the most general metric for the algorithm’s performance but misses information such as how many words are missed, how many words are subject to segmentation faults etc. Also, since the IoU value is calculated after the alignment has been performed, it does not count words/boxes that are not aligned. This is intentional, as the metric is mainly used for indicating the alignment performance.

Performance analysis

It is observed through the tables in the results chapter that method 2 performs better than method 1 in every performance metric and scenario except for the segmentation boxes error for manually setting parameters for Labour’s Memory. This metric (EsB) is interpreted as the error of how many boxes are extracted from the segmentation in total and the GT. So, for the alignment to have a chance of being 100% successful, this error (for an individual image) does not need to be 0% but instead needs to be based on an equal amount or more boxes than what is found in the GT. It is better if the segmentation over-segments the document rather than under-segments, as bad boxes will get filtered out in the alignment.

A limitation to note is the use of the technique Connected Components. Since this method is the main technique used to separate ROI from other parts of the image, if this method misses detecting a word, this word will not be able to get segmented out in- dividually at all. Between method 1 and method 2, there seems to be no difference in the number of segmented lines, as the error is equal for the manual experiment. However, for parameters set by Bayesian optimisation, method two seems to have slightly better line segmentation. In order to get more insights on which line segmentation method is best, the IoU values for the line boxes will need to be checked, which was not carried out due to time constraints. As for the word segmentation, as stated earlier, method 2

seems to be superior. It is important to note that the placement of the segmented boxes is as important, if not more, than the number of segmented boxes being close to GT. The EsB value only includes information about the segmentation and no information about the end result alignment.

When comparing what strategy is best for parameter setting, manual or Bayesian opti- misation, it is seen that the latter performs better overall, with an IoU increase across all tests. This could be translated to the algorithm being more accurate for the boxes it man- ages to align. One downside to setting parameters manually is that it comes with a learn- ing curve. Knowing how the parameters affect the algorithm for the best result could take time to master. This could be a factor as to why Bayesian optimisation is preferable in some instances. Although the IoU is better, the alignment boxes error (EaB) is worse across all tests except one, which indicates that the algorithm is more prone to under- segmenting when the parameters are set by Bayesian optimisation. Under-segmentation can be interpreted as some aligned words being clumped together in the same box or being missed. The word error (EW) indicates that it misses words more often in the alignment, which is a cause of worse segmentation in the sense that it segments out fewer boxes. Although there are some disadvantages to using Bayesian optimisation for setting the parameters, the results indicate that it is reasonable and more time-efficient if you have multiple images that need processing.

Yüklə 11,83 Mb.

Dostları ilə paylaş:

1 ... 15 16 17 18 19 20 21 22 23

Semi-automatic Segmentation & Alignment of Handwritten

Performance for parameters set by Bayesian optimisation

Performance for parameters set by Bayesian optimisation

Performance on manually made ground truth for one image

Visualisation of the algorithm pipeline

Discussion

Performance analysis