"Ideal" Proofreading Utility

From Project Gutenberg, the first producer of free electronic books (ebooks).

Jump to: navigation, search

PrufIt (suggested name) is a design for an "ideal" proofreading utility. It was designed by a proofreader (and programmer) to make it easier for a proofreader to compare etext with a scanned image.

Contents

Summary

PrufIt displays a textbox from one to several lines high, in the middle of a window with a scanned image. The image is always shown split horizontally just beneath the current line of text being proofed. The top part is shown above the textbox and the bottom part shown beneath it. As the user scrolls the image, the textbox is kept filled with the corresponding lines of text to be proofed.

Ideally, OCR software would generate positioning information to support the synchronization of scrolling and textbox filling. The current version of GOCR, an open source OCR program, is capable of finding the text regions in a scanned image and could be modified to serve this purpose. All one should need to do is:

  1. Add code to export the region positions and dimensions
  2. Remove the character recognition code

Until then, a project manager would need to determine the line heights in the book, then make this adjustment information available for the proofreaders. The proofreaders would normally only need to align each image once.

HTML and JavaScript Implementation

Implementing this as a webpage and JavaScript, would be great, but would require negative coordinates for positioning the scanned image to acheive the "split" effect. This author isn't sure such positioning is possible. The only other solution for a JavaScript implementation would be to pre-slice the images into lines of text and arrange them dynamically at run time.

Note

There are many technical details already worked out for this proposed project, but weren't considered appropriate for this article.

Links

PrufIt has a diagram, and more ideas for the PrufIt project.

GOCR, an open source OCR program.