Gutenberg:Mirroring How-To

From Project Gutenberg, the first producer of free electronic books (ebooks).

Jump to: navigation, search

Project Gutenberg is always seeking sites to mirror (copy) our collection. This can bring the collection closer to people in your region. This HOW-TO describes how to set up a mirror.

The Project Gutenberg eBook collection may be distributed by FTP, HTTP, rsync or other means. BitTorrent, p2p networks and other formats are ideal for many files.

For example, these urls point to the same content:

The collection is about 350GB (as of March 2008), and expected to continue growth. New eBooks are added almost every day, so it's desirable to mirror nightly. There are over 875,000 files, 24 languages, and dozens of different file formats.

Our experience has been that a static IP address and T1 (~1.5Mb symmetric) or faster permanent network connection is minimal for a public mirror. (Of course, you can build a private mirror with a DSL or cable modem, but sharing it with the world requires a somewhat higher bandwidth.)

The best place to mirror from currently is our master download site at ftp://ftp.ibiblio.org/pub/docs/books/gutenberg/

Most mirrors use rsync (easiest and recommended) or the mirror Perl software (requires some configuration). Note that wget and cURL are not recommended, because they need to "touch" hundreds of thousands of files just to get the few that were updated recently. Here is an overview for each:

  1. Rsync: Available for all Unix systems; standard on Linux; part of Cygwin for Windows. The last argument is the local directory for the mirror destination:
    rsync -avHS --delete --delete-after ftp@ftp.ibiblio.org::gutenberg /home/ftp/pub/mirrors/gutenberg
  2. Perl Mirror software: Available from http://sunsite.org.uk/packages/mirror/ (among other places). We can help you set this up for a Unix system. The mirror Perl software has been reported to work with Perl for WinNT, as well as Unix/Linux/BSD. Note that the wu-ftpd software patch supplied with the program must be applied for it to work!

For any mirror method, run a daily job to check for newly updated files. Unix/Linux employs cron for this; Windows systems could use the task scheduler. We can help you with setting up the mirroring software, or any other details, if you would like.

We'll add your site to the list of mirrors, so people can find you. The FTP directories are the only part we offer for mirror. You can download the Project Gutenberg catalog in XML/RDF format via http://www.gutenberg.org/feeds/ if you would like to make your own search software. We do not distribute the central search software at www.gutenberg.org, however.

Once you tell us your mirror is active (email mirrors_AT_pglaf.org, we'll announce it in our next weekly & monthly newsletters. After a month or so (to confirm stability) we'll add you to the mirror list and download facility at http://www.gutenberg.org/.

You might want to view our mirror list to check whether the geographical location of your server would be a good addition to the list.

Thanks for your interest in helping Project Gutenberg reach more readers.