Semi-automatic Segmentation & Alignment of Handwritten

Yüklə 11,83 Mb.

səhifə	4/23
tarix	07.09.2023
ölçüsü	11,83 Mb.
	#141855

1 2 3 4 5 6 7 8 9 ... 23

Segmentation
Alignment

Subject Theory

This chapter gives some theoretical background about the thesis subject divided into five parts. The first part describes image segmentation. The second part describes what alignment is. The third part gives a brief introduction to HTR systems. The fourth part consists of work related to this project. The last part describes machine learning in the context of this thesis.

Segmentation

Segmentation in the field of Optical Character Recognition and HTR is the process of dividing the document image into regions of interest (ROI), depending on the desired segmentation level. This could be on a document, text line, word, or character level. The analysis of handwritten document layouts is often a complex task because of: the degradation of the document due to age, different writing styles, the presence of dec- orations, variations in text size, and other irregularities (Capobianco et al. 2018). A recurring problem when extracting words from historical handwritten documents is that the spacing and size between characters and words are not homogeneous throughout the document. For example, words from a line might overlap with words on the line above or below, or the spacing between characters might make it difficult to distinguish if the characters make up one or several words (Isaac 2020).

Alignment

Text alignment is the procedure of aligning/linking a document image to its transcribed counterpart. Information that is conventionally included in an alignment is the transcript and the coordinates for the location of the ROI. The alignment of an image document is necessary to supplement the image to draw the full potential of the available digital tools. Nevertheless, the digital transcription produced by an HTR model does not get linked to the document image by in itself. The lack of means to correlate the transcription with its corresponding counterpart in the image creates challenges when attempting to use both together (De Gregorio et al. 2022). This connection between image and transcript is especially important when building deep learning models, as labelled data is crucial for training.

Figure 2: A typical HTR system workflow, based on Figure 1 in (Neto et al. 2020)

Yüklə 11,83 Mb.

Dostları ilə paylaş:

1 2 3 4 5 6 7 8 9 ... 23