研究目的
To propose a supervised learning-based Optical Character Recognition (OCR) system for Nastalique Urdu script languages to convert printed or handwritten text into digital format with high accuracy.
研究成果
The proposed supervised learning-based OCR system achieves up to 98.4% accuracy for Nastalique Urdu script, the highest reported for Urdu OCR. It is simple to implement and effective for both printed and handwritten text, providing a foundation for future developments in Urdu OCR systems.
研究不足
The study does not include diacritics (Airabs) or the Aspirated (?) family of characters. It is limited to Nastalique font and may not generalize to other Urdu writing styles. The test data sample is small (4 instances), and noise reduction techniques are not integrated.
1:Experimental Design and Method Selection:
The study uses a supervised learning approach with a grid-based feature extraction method. A 4x8 grid is designed to map characters, where each cell's state (on/off) represents features. Various supervised learning algorithms (e.g., Na?ve Bayes, Random Forest) are applied for classification.
2:Sample Selection and Data Sources:
The dataset includes 129 instances covering 40 Urdu alphabets, 10 numerals, and 3 special characters in Nastalique font, considering different writing styles based on position in words.
3:List of Experimental Equipment and Materials:
No specific equipment is mentioned; the study relies on software tools like Weka 3.8.0 for machine learning.
4:0 for machine learning.
Experimental Procedures and Operational Workflow:
4. Experimental Procedures and Operational Workflow: Steps include image acquisition (scanned documents or camera images), segmentation (direct segmentation technique to divide text into characters), feature extraction (mapping characters to the grid to generate binary data), dataset generation (creating labeled instances), and training/testing with algorithms in Weka.
5:Data Analysis Methods:
Performance is evaluated using accuracy, precision, recall, F-score, and other metrics from Weka outputs. Algorithms like Na?ve Bayes and Random Forest are compared.
独家科研数据包,助您复现前沿成果,加速创新突破
获取完整内容