Uses of Metadata

From Project Gutenberg, the first producer of free electronic books (ebooks).

Jump to: navigation, search

There have been various attempts to create software to turn Project Gutenberg into a user-friendly virtual library. This article describes some issues and concepts for designing such software. Much of this proposed solution surrounds storing metadata in a separate file for each etext. The use of metadata files allows one to make virtual corrections, additions and enhancements to etexts without ever modifying them in any way. The information would be applied by software to the etext when it loads it and only to the copy it retains in RAM while running. The metadata could include many kinds of information which is thought to be useful or interesting which wouldn't be appropriate to include in the etext itself. The most important type considered here though, is "fancy-formatting" data. The reason to insist on storing "fancy-formatting" data rather than using a program such as GutenMark to generate it automatically, is that: although much of the fancy formatting for an etext can be generated automatically and perfectly, often some of the formatting for the same etext will be done incorrectly because there are too many things the program can't anticipate without it being extremely sophisticated, and that would mean being excessively large, slow and complex. So, rather than trying to write a perfect program, one can write a program which can do most of the formatting perfectly and use a small amount of external data to tell it how to avoid specific mistakes. The result is faster and "perfect" formatting, while using much less data. The degree of perfection depends only on human reviewers spotting imperfections and telling the program about them so it can generate corrective metadata for them.

Procedure for Generating Formatting Metadata:

So, why not just store complete HTML tags along with the positions to insert them in the metadata?...

Metadata Types to Consider Including

These could also be used for tooltip-style definitions.