研究目的
To address the lack of data in historical handwritten document recognition by proposing a deep-learning based recognizer that separates optical and language models for multilingual transductive transfer learning.
研究成果
Separating optical and language models with a letter-n-grams vector as a pivot enables multilingual transfer learning in handwriting recognition, showing promise despite current data limitations. Future work could involve adding attention mechanisms, increasing data, or refining the encoder architecture.
研究不足
The amount of data used for training is still too low for effective transfer learning, leading to suboptimal results. The encoder's noisy output (e.g., predicting extra letter-n-grams) degrades decoder performance. The approach is constrained to languages sharing the same alphabet.
1:Experimental Design and Method Selection:
The study uses an encoder-decoder strategy inspired by machine translation and image captioning. The optical encoder is a fully convolutional neural network (FCN) that processes grayscale word images, and the language decoder is a recurrent neural network (RNN) for sequence generation. The interface is a vector of letter-n-grams to enable language independence.
2:Sample Selection and Data Sources:
Datasets include RIMES (French), George Washington (English), Los Esposalles (Spanish), Google Book (French and Italian digitized books), and French Wikipedia. These are used for training and evaluation, with ESP as the target dataset for transfer learning.
3:List of Experimental Equipment and Materials:
No specific hardware or software brands/models are mentioned; the focus is on neural network architectures (e.g., FCN, GRU) and datasets.
4:Experimental Procedures and Operational Workflow:
The optical encoder is trained on GW and RIMES datasets, and the language decoder is trained on vocabulary sets from various sources. Evaluation metrics include recall, precision, accuracy, F1-score for the encoder, and character recognition rate (CRR) and word recognition rate (WRR) for the decoder.
5:Data Analysis Methods:
Performance is assessed using statistical metrics (e.g., recall, precision) and edit distance calculations. Optimization uses Adam function with a learning rate of 0.0001.
独家科研数据包,助您复现前沿成果,加速创新突破
获取完整内容