Primes to One Trillion

The Gutenberg text "The First 100,000 Prime Numbers", EBook #65, 
lists the primes up to 1,318,699. This somewhat more ambitious 
version lists the primes up to one trillion (1,000,000,000,000 or 
1E12).

Introduction

I became interested in prime numbers after hearing about 
Goldbach's Conjecture,

"Every even integer greater than 2 can be expressed as the sum of 
two primes".

Verifying this requires a source of primes. Short lists (or 
programs to generate them) are widely available. Really long 
lists are scarce, except for
primos.mat.br.

To make these lists more accessible, I have reformatted them to a 
size easily manageable by ordinary text editors and 
viewers--about 55MB. The file names correspond to the range of 
primes the file contains:
00000000000_to_00100000000.txt	Zero to 100 Million
00100000000_to_00200000000.txt	100 Million to 200 Million
etc.	

The leading zero digits cause the file names to collate in order 
of their content. Longer lists can be composed with the DOS copy 
command. Move the required prime txt files to a temporary 
directory and use:

copy *.txt longList.txt



Prime Text Files

This is a collection of 10,000 files, occupying about 486GB of 
disk space in their unzipped native txt format. Since adjacent 
primes have about 90% identical leading digits, the compressed 
(zip) versions total 61GB. Each zip file contains 100 txt files.

Do not use Windows Explorer to copy or move large numbers of 
files at a time. Use DOS copy or xcopy for large copies. I find 
Beyond Compare (Scooter Software) handy for keeping track of 
large numbers of files.

  0000.zip    000G to  010G (0 to 1E10)
  0010.zip    010G to  020G
  0020.zip    020G to  030G
  0030.zip    030G to  040G
  0040.zip    040G to  050G
  0050.zip    050G to  060G
  0060.zip    060G to  070G
  0070.zip    070G to  080G
  0080.zip    080G to  090G
  0090.zip    090G to  100G

  0100.zip    100G to  110G
  0110.zip    110G to  120G
  0120.zip    120G to  130G
  0130.zip    130G to  140G
  0140.zip    140G to  150G
  0150.zip    150G to  160G
  0160.zip    160G to  170G
  0170.zip    170G to  180G
  0180.zip    180G to  190G
  0190.zip    190G to  100G
 
  0200.zip    200G to  210G
  0210.zip    210G to  220G
  0220.zip    220G to  230G
  0230.zip    230G to  240G
  0240.zip    240G to  250G
  0250.zip    250G to  260G
  0260.zip    260G to  270G
  0270.zip    270G to  280G
  0280.zip    280G to  290G
  0290.zip    290G to  200G

  0300.zip    300G to  310G
  0310.zip    310G to  320G
  0320.zip    320G to  330G
  0330.zip    330G to  340G
  0340.zip    340G to  350G
  0350.zip    350G to  360G
  0360.zip    360G to  370G
  0370.zip    370G to  380G
  0380.zip    380G to  390G
  0390.zip    390G to  300G

  0400.zip    400G to  410G
  0410.zip    410G to  420G
  0420.zip    420G to  430G
  0430.zip    430G to  440G
  0440.zip    440G to  450G
  0450.zip    450G to  460G
  0460.zip    460G to  470G
  0470.zip    470G to  480G
  0480.zip    480G to  490G
  0490.zip    490G to  400G

  0500.zip    500G to  510G
  0510.zip    510G to  520G
  0520.zip    520G to  530G
  0530.zip    530G to  540G
  0540.zip    540G to  550G
  0550.zip    550G to  560G
  0560.zip    560G to  570G
  0570.zip    570G to  580G
  0580.zip    580G to  590G
  0590.zip    590G to  500G

  0600.zip    600G to  610G
  0610.zip    610G to  620G
  0620.zip    620G to  630G
  0630.zip    630G to  640G
  0640.zip    640G to  650G
  0650.zip    650G to  660G
  0660.zip    660G to  670G
  0670.zip    670G to  680G
  0680.zip    680G to  690G
  0690.zip    690G to  600G

  0700.zip    700G to  710G
  0710.zip    710G to  720G
  0720.zip    720G to  730G
  0730.zip    730G to  740G
  0740.zip    740G to  750G
  0750.zip    750G to  760G
  0760.zip    760G to  770G
  0770.zip    770G to  780G
  0780.zip    780G to  790G
  0790.zip    790G to  700G

  0800.zip    800G to  810G
  0810.zip    810G to  820G
  0820.zip    820G to  830G
  0830.zip    830G to  840G
  0840.zip    840G to  850G
  0850.zip    850G to  860G
  0860.zip    860G to  870G
  0870.zip    870G to  880G
  0880.zip    880G to  890G
  0890.zip    890G to  800G

  0900.zip    900G to  910G
  0910.zip    910G to  920G
  0920.zip    920G to  930G
  0930.zip    930G to  940G
  0940.zip    940G to  950G
  0950.zip    950G to  960G
  0960.zip    960G to  970G
  0970.zip    970G to  980G
  0980.zip    980G to  990G
  0990.zip    990G to 1000G


Additional prime files will be posted here.

PrimeC File Format and
Miscellaneous C++ Programs

While working with primes, I developed the primec format, a file 
or array representation for primes that is roughly the same size 
of the compressed (zipped) txt representation, and supports fast 
access, both sequential and direct. The exact location of the 
primality specification of any number in the file (or memory 
array) is computed with a few instructions and no search.

If you wish to examine and experiment with the C++ programs used 
to reformat these prime lists and test the Goldbach Conjecture, 
download the "programs.zip" package. It contains Generating and 
Analyzing Prime Numbers, a description of the content and use of 
these files, including the primec file format.

Primec Format

The primec format exploits the fact that all primes greater than 
5 end in the decimal digits 1, 3, 7, or 9. Thus, the primality of 
20 successive numbers can be specified in one 8 bit byte. The 
file begins with the complete binary representation of:

  The beginning of the sequence
  The end of the sequence
  A check sum of all data bytes
  (All three are 8 bytes for this implementation).

The first and last values are a multiple of twenty, thus are 
never primes. There is no overlap of primes between successive 
files that use the same number for the upper boundary of the 
first file, and the lower boundary of the second file.

The primality of any number in the range of the file is 
determined as follows:

If the number ends in 0, 2, 4, 5, 6, or 8, it is not prime.

Otherwise, the location of the specifying byte is at offset:

  ( value - start ) / 20

Within that byte, the primality of the value is specified by the 
bit as shown in the following table.

The only tedious programming tasks were:

Special case code for values less than 20, which include 2 and 5, 
and exclude 1 and 9. All larger values follow the same simple 
pattern.

The increment and decrement operators for the corresponding 
iterators must search forward (or backward) for the next true 
bit, specifying the next prime number.

This table shows the layout and content for a file containing 20 
to 60. The first 24 bytes (start value, end value, check sum) 
are not shown.

Byte    0------------------------------| 1----------------------------
Bit     0   1   2   3    4   5   6   7   0   1   2   3   4   5   6   7
Value  21  23  27  29   31  33  37  39  41  43  47  49  51  53  57  59
Prime   F   T   F   T    T   F   T   F   T   T   T   F   F   T   F   T
Hex    5A                               E5

As primes become larger, the density of primes becomes smaller as 
1/ln(n). Thus the density of true bits also falls off. The number 
of digits (binary or decimal) to represent the primes grows as 
ln(n). Thus, a sequence of primes represented as primec is always 
competitive in size with the corresponding sequence in ASCI text 
or binary, besides providing fast direct access by value:

    bool isPrime(value).

The results for the sequence of the largest 64 bit primes 
(18446744073707000000 to 18446744073709551558) is:

Format           Size
                 (KB)
64 Bit Binary	 445
Txt	        1150
Zip Txt          114
PrimeC           125


Programs

Among the programs in the "program.zip" package are:

  BuildTxtPrime   Create file of primes, txt format
  TxtToPrimeC     Convert txt to primec format.
  Goldbach        Verify Goldbach's conjecture for zero to 1E12

Among the more than 15 classes and utilities are:

  PrimeGenerator     Create prime numbers in a given range.

  Directory          A vector of strings containing the names of 
                     files in a file directory.

  Progress           A class to manage the periodic reporting
                     of program activity.

  PrimeCVector       Abstract class providing the algorithms 
                     to access primec data.

  PrimeCFileWriter   Create a primec file.

  PrimeCFileReader   Read a primec file.

I hope you find them useful.

If you have any questions, observations or bug reports concerning 
the C++ programming or the content of the prime files, send an 
email (after changing "at" to "@".

primes1e12 at earthlink.net

I embarked on this project as a programming challenge. I am not a 
mathematician. I have no deep insight into prime number theory. 
Please confine messages to programming issues. Here are some 
references:

 Prime Numbers: http://www.primos.mat.br/indexen.html
 Wikipedia: List of Prime Numbers (with numerous references): https://en.wikipedia.org/wiki/List_of_prime_numbers
 The Math Forum: http://mathforum.org/dr.math/faq/faq.prime.num.html
 The Prime Pages https://primes.utm.edu

The program files are also posted on
http://home.earthlink.net/~primes1E12.
Corrections and additions will be posted there as they occur.

Don Kostuch
October, 2018.