"Ideal" Proofreading Utility
From Project Gutenberg, the first producer of free electronic books (ebooks).
PrufIt (suggested name) is a design for an "ideal" proofreading utility. It was designed by a proofreader (and programmer) to make it easier for a proofreader to compare etext with a scanned image.
Contents |
Summary
PrufIt displays a textbox from one to several lines high, in the middle of a window with a scanned image. The image is always shown split horizontally just beneath the current line of text being proofed. The top part is shown above the textbox and the bottom part shown beneath it. As the user scrolls the image, the textbox is kept filled with the corresponding lines of text to be proofed.
Ideally, OCR software would generate positioning information to support the synchronization of scrolling and textbox filling. The current version of GOCR, an open source OCR program, is capable of finding the text regions in a scanned image and could be modified to serve this purpose. All one should need to do is:
- Add code to export the region positions and dimensions
- Remove the character recognition code
Until then, a project manager would need to determine the line heights in the book, then make this adjustment information available for the proofreaders. The proofreaders would normally only need to align each image once.
HTML and JavaScript Implementation
Implementing this as a webpage and JavaScript, would be great, but would require negative coordinates for positioning the scanned image to acheive the "split" effect. This author isn't sure such positioning is possible. The only other solution for a JavaScript implementation would be to pre-slice the images into lines of text and arrange them dynamically at run time.
Note
There are many technical details already worked out for this proposed project, but weren't considered appropriate for this article.
Links
PrufIt has a diagram, and more ideas for the PrufIt project.
GOCR, an open source OCR program.