Tool
Lisān OCR
Read Lisān ud-Daʿwat pages into editable text, in your browser.
Upload a scan, photo, or PDF of a Lisān ud-Daʿwat page — typically set in FatemiMaqala or Kanz al-Marjaan — and this tool recognises the text and drops it into an editable box rendered in FatemiMaqala. Multi-page PDFs are read page by page. Recognition runs entirely in your browser; the image isn't uploaded for OCR. (You can separately, and only if you opt in, contribute a correction to help train the model — see below.)
Default model: a custom recogniser trained on Lisān ud-Daʿwat text rendered in FatemiMaqala and Kanz al-Marjaan (voweled and plain) plus real legacy pages — it reads the extended Urdu/Persian letters stock Arabic OCR misses and captures the iʿrāb when present. Getting the base letters right is the priority; vocalization is a nice-to-have, captured when it's there. You can also drop in a PDF (each page is rendered and read). Treat the output as a draft to correct, not a finished transcription — and use Export .docx to keep working in Word.
Recognised text
Help improve the model
Optional and off by default. If you turn this on, the image you uploaded and your corrected text are sent to an open Lisān ud-Daʿwat OCR training set, so the recogniser improves over time. Please don't contribute anything private or sensitive. See the privacy policy.
How it works & where it's headed
- Now: in-browser recognition via Tesseract.js (WASM), no backend.
- Next: a custom recognizer trained on synthetic lines — Unicode Lisān ud-Daʿwat text rendered in FatemiMaqala and Kanz al-Marjaan with scan-like augmentation, in both voweled and plain forms — to read the extended letters stock Arabic OCR misses and capture the iʿrāb when it's present.
- Now (opt-in): a correction loop — if you choose to contribute, your edits become real-world training data that improves the model over time.
- Then: page layout detection for multi-line scans.