The importance of ground truth quality
According to the performance metrics in chapter 6, Results, the data set IAM performs considerably better than Labour’s Memory. This is caused by multiple factors that make the IAM data set a better fit for the algorithm. The images in IAM contain much less noise in general. This causes the segmentation algorithm to generally not be disturbed by noise in the image since it is more easily filtered out using noise removal and thresholding techniques. The IAM data set also consists of manually verified labels, making the GT more consistent and accurate. This will consequently make the alignment more accurate since the IoU value will be more in line with what is true. However, the GT for IAM includes box labels for individual symbols such as [ . ... , : ; ” - ) ( ] and treats them as words, which will, in turn, be neglected in the alignment since the developed algorithm is limited to segmenting words. This observation may potentially explain the absence of over-segmentation by the algorithm on the IAM data set (Figure 16 b).
In the case of the Labour’s Memory data set, the GT is generated with the software tool Transkribus developed by READ COOP, described in chapter 2.4. The quality of this GT could be questioned if it has enough value to use as a comparison for the evaluation
experiment. In Figure 11c, a comparison between the GT and the resulting segmentation from the algorithm can be seen. The number of words on a page is fairly accurate but will often miss a few. The size and location of the bounding box labels are not nearly accurate enough to give a faultless evaluation. The GT boxes tend to all have the same height and do not encapsulate the words thoroughly enough, resulting in multiple issues during the alignment and IoU evaluation. The experimental results in Table 6 are made to get an overview of the effects of the quality of the GT on the performance metrics. And as seen, the effects are significant. To note is that because this experiment only is carried out for one image because of time constraints, the values will not give a fair representation of the effects of the whole data set. Albeit, they indicate how the quality affects the performance evaluation.
Dostları ilə paylaş: |