Automated transformation of
PDF to DOCX for Arabic documents
INTELLIGENT DOCUMENT PROCESSING
How we designed a software exclusive for conversion of
Arabic documents in PDF to Microsoft word documents.
Arabic documents in PDF to Microsoft word documents.
Industry – University/ Schools
Geography -Middle East
Status – Product deployed.
Microsoft
Word API
FILE FORMAT
CONVERSION
AWS
OCR.space
Challenge
- Handling right to left writing pattern and fonts of Arabic documents.
-
Identifying the appropriate conversion tools that
supports Arabic Language. -
Ensuring content is kept intact during the transformation
from PDF document to Microsoft word document. - Managing simultaneous conversion of multiple documents.
-
Support conversion of machine encoded PDF and
Image encoded PDF to DOCX.
Technologies
ASP.NET
Microsoft Word API
OCR space
Angular
AWS
ASP.NET
Microsoft Word API
OCR space
Angular
AWS
Solution
- Microsoft Word API and OCR.space was selected to convert Machine encoded PDF and Image encoded PDF, respectively. Compared with other tools like Amazon Rekognition, Google Vision, and Azure computer vision, the selected tools were better suited to conversion of Arabic Language. Tools were used to recognise images, tables and text in the document as well as to recognize the layout.
- Software was deployed in Amazon Web Services. Admin users were given monitoring benefits through the administrative interface.
IMPACT
-
The product automates information extraction of text,
tables, images, charts, etc from Arabic documents. -
The administrative interface provided statistical and
usage information.
Find out how an enterprise level project
management experience helped us make
seamless deliveries during Covid-19
management experience helped us make
seamless deliveries during Covid-19
Find out how an enterprise level project management experience helped us make seamless deliveries during Covid-19