Approximating the prime counting function

Jon Maiga, 2021-08-08

In the end of 18th century Gauss guessed that there where about xlog(x)\frac{x}{log(x)} primes up to xx. This was later proved as the prime number theorem together with the more accurate logarithmic integral, li(x)li(x) as an approximation.

Here we show a similar estimation based on an observation of the number of primes less than a square. From that we create a continuous function to approximate the prime counting function π(x)\pi(x). The approximation appears to yield similar results to the Riemann prime counting function, R(x)R(x).

An observation of the number of primes up to a square

While playing with sequencedb.net it was observed that the simple recurrence

a(n)=n1log(n)+a(n1) a(n)=\frac{n-1}{\log(n)} + a(n-1)

seems to yield a good estimate of the number of primes between 00 and n2n^2.

nn a(n)a(n) π(n2)\pi(n^2) a(n)π(n2)a(n)-\pi(n^2) li(n2)π(n2)li(n^2)-\pi(n^2)
10 25 25 0 5
100 1226 1229 -3 17
1000 78521 78498 23 130
10000 5761505 5761455 50 754
100000 455050326 455052511 -2185 3104
1000000 37607907843 37607912018 -4175 38263

If we expand the recurrence a(n)a(n) for n>1n>1 it looks like this

a(n)=1log(2)+2log(3)++n1log(n)a(n)=\frac{1}{\log(2)}+\frac{2}{\log(3)}+\cdots+\frac{n-1}{\log(n)}

So it seems like n1log(n)\frac{n-1}{\log(n)} approximates the number of primes between the two consecutive squares (n1)2(n-1)^2 and n2n^2

n1log(n)π(n2)π((n1)2)\frac{n-1}{\log(n)} \approx \pi(n^2)-\pi((n-1)^2)

a(n)

Comparison with the PNT

The a(n)a(n) approximation is close to what we get from the prime number theorem li(n2)k=1nddkli(k2)=k=1nklog(k)li(n^2) \approx \sum_{k=1}^{n}\frac{d}{dk}li(k^2) = \sum_{k=1}^{n}\frac{k}{log(k)}, this gives li(n2)li((n1)2)nlog(n)π(n2)π((n1)2)li(n^2)-li((n-1)^2) \approx \frac{n}{log(n)} \approx \pi(n^2)-\pi((n-1)^2).

While Gauss correctly guessed (apparently at the age of 15!) that there are about nlog(n)\frac{n}{log(n)} primes up to nn, the version with the logarithmic integral improves this guess by saying, there are about nlog(n)\frac{n}{log(n)} between two consecutive squares.

The observed formula a(n)a(n) states that there are about n1log(n)\frac{n-1}{log(n)} between two consecutive squares, so it’s a bit more conservative than the lili version.

Making our approximation work for all x

We can express the recurrence, a(n)a(n) as the sum

a(n)=k=2nk1log(k)π(n2)a(n)=\sum_{k=2}^{n}\frac{k-1}{\log(k)} \approx \pi(n^2)

This discrete sum estimates the number of primes less than a given square. But what if we want to approximate something that is not a perfect square like a(30)a(30)? Calculus to the rescue! Let’s replace the sum by an continuous function by taking the integral

x1log(x)dx=li(x2)li(x)+C\int \frac{x-1}{\log(x)}dx=li(x^2)-li(x) + C

where lili is the logarithmic integral. We set c=0c=0 and we also adjust the offset so we get

π(x2)b(x)=li((x+12)2)li(x+12)\pi(x^2)\approx b(x)=li((x+\frac{1}{2})^2)-li(x+\frac{1}{2})

Sure enough, giving it a test drive yields results very close to a(n)b(n)a(n) \approx b(n).

With our continuous prime counting function b(x)b(x) we should now be able to approximate the number of primes for any number. To do that, we just take the square root of xx and pass it in to bb. Let’s create a new function that counts the primes up to nn instead of n2n^2, that is c(n)=b(n)c(n)=b(\sqrt{n}). I also expanded the argument to the first lili

π(n)c(n)=li(n+n+14)li(n+12)\pi(n) \approx c(n)=li(n+\sqrt{n}+\frac{1}{4})-li(\sqrt{n}+\frac{1}{2})

Voilà - we have a prime counting function! Some tests show promising results (see table later for more results and comparisons)

c(10)=4.84π(10)=4c(10)=4.84 \approx \pi(10)=4
c(100)=25.95π(100)=25c(100)=25.95 \approx \pi(100)=25
c(1000)=168.57π(1000)=168c(1000)=168.57 \approx \pi(1000)=168
c(10000)=1227.78π(10000)=1229c(10000)=1227.78 \approx \pi(10000)=1229
c(100000)=9586.04π(100000)=9592c(100000)=9586.04 \approx \pi(100000)=9592
c(1000000)=78522.3π(1000000)=78498c(1000000)=78522.3 \approx \pi(1000000)=78498

Riemann prime counting function

We already mentioned that π(n)li(n)\pi(n) \approx li(n) so we will use that in our comparisons. Further, we add another well known approximation, namely the Riemann prime counting formula

R(x)=n1μ(n)nli(x1/n)R(x)=\sum_{n-1}^{\infin} \frac{\mu(n)}{n}li(x^{1/n})

where μ\mu is the Möbius function. One interesting thing is that c(n)c(n) seems to be stay pretty close to R(n)R(n).

c(n)

See tables below for a comparison between c(n)c(n), li(n)li(n) and R(n)R(n).

Conclusion

My gut feeling is that c(x)c(x) is slightly behind R(x)R(x) in terms of accuracy (at least for n<1012n<10^{12}), but at the same time c(x)c(x) seems simpler. While R(x)R(x) uses the Möbius function and the logarithmic integral, c(x)c(x) only uses the logarithmic integral.

c(n)

c(n)

Mathematica

a[n_Integer]:=Sum[(k-1)/Log[k], {k,2,n}]
b[x_]:=LogIntegral[(x+1/2)^2]-LogIntegral[x+1/2]
c[x_]:=LogIntegral[x+Sqrt@x+1/4]-LogIntegral[Sqrt@x+1/2]

Tables

In all the comparison tables below the values are rounded to the nearest integer.

Power of ten

The mathematica version of π\pi only works to about 101410^{14}, so for this table I resorted to the great Online Encyclopedia of Integer Sequences, entry A006880 - “the number of primes < 10^n”.

k,n=10kk, n=10^k c(n)π(n)c(n)-\pi(n) R(n)π(n)R(n)-\pi(n) li(n)π(n)li(n)-\pi(n)
1 1 1 2
2 1 1 5
3 1 0 10
4 -2 -2 17
5 -6 -5 38
6 24 29 130
7 73 88 339
8 51 97 754
9 -207 -79 1701
10 -2183 -1828 3104
11 -3298 -2318 11588
12 -4174 -1476 38263
13 -13218 -5773 108971
14 -39818 -19200 314890
15 15868 73218 1052619
16 166763 327052 3214632
17 -1048475 -598255 7956589
18 -4772209 -3501366 21949555
19 20279712 23884333 99877775
20 -15163730 -4891825 222744644
21 -115833916 -86432204 597394254
22 -211645366 -127132665 1932355208
23 789410232 1033299853 7250186216
24 -2365439590 -1658989719 17146907278
25 -3888233187 -1834784714 55160980939
26 -23137628496 -17149335456 155891678121
27 -35051971558 -17535487934 508666658006
28 -226144684235 -174760519827 1427745660374

Power of two

nn c(n)π(n)c(n)-\pi(n) R(n)π(n)R(n)-\pi(n) li(n)π(n)li(n)-\pi(n)
2 1 1 0
4 1 0 1
8 0 0 1
16 1 0 3
32 0 0 3
64 0 0 4
128 0 0 5
256 1 0 7
512 -1 -1 7
1024 0 0 9
2048 1 1 12
4096 -1 -1 13
8192 2 2 20
16384 -3 -3 20
32768 3 4 32
65536 4 6 42
131072 -3 -2 45
262144 6 9 69
524288 -18 -14 64
1048576 5 11 113
2097152 -12 -5 129
4194304 -20 -10 167
8388608 0 15 249
16777216 20 40 351
33554432 -147 -119 296
67108864 -53 -14 541
134217728 32 85 830
268435456 -141 -70 935
536870912 102 199 1555
1073741824 -521 -388 1448
2147483648 -5 175 2666
4294967296 232 477 3861
8589934592 -1354 -1021 3586
17179869184 -1399 -948 5335
34359738368 1471 2084 10664
68719476736 980 1811 13544
137438953472 -1196 -70 15995
274877906944 -718 810 22831
549755813888 -6550 -4477 25740
1099511627776 -2673 140 41645

Sequences

The 100 first terms are rounded to the nearest integer below

a(n)

0,1,3,5,8,11,14,17,21,25,29,33,38,43,48,53,59,65,71,77,84,91,98,105,113,120,128,136,145,153,162,171,180,189,199,209,219,229,239,250,260,271,283,294,305,317,329,341,354,366,379,392,405,418,432,445,459,473,487,502,516,531,546,561,577,592,608,624,640,656,672,689,706,723,740,757,775,792,810,828,847,865,883,902,921,940,959,979,998,1018,1038,1058,1079,1099,1120,1141,1162,1183,1204,1226

b(n)

1,3,5,7,9,12,15,18,22,26,30,35,39,44,49,55,60,66,72,79,85,92,99,106,114,121,129,137,146,154,163,172,181,190,200,210,220,230,240,251,262,273,284,295,307,318,330,343,355,367,380,393,406,419,433,447,460,474,489,503,518,532,547,563,578,593,609,625,641,657,674,690,707,724,741,759,776,794,812,830,848,866,885,903,922,941,961,980,1000,1020,1039,1060,1080,1100,1121,1142,1163,1184,1205,1227

c(n)

1,2,2,3,3,3,4,4,5,5,5,5,6,6,6,7,7,7,8,8,8,8,9,9,9,9,10,10,10,10,11,11,11,11,12,12,12,12,13,13,13,13,14,14,14,14,15,15,15,15,15,16,16,16,16,17,17,17,17,18,18,18,18,18,19,19,19,19,19,20,20,20,20,21,21,21,21,21,22,22,22,22,22,23,23,23,23,23,24,24,24,24,25,25,25,25,25,26,26,26

Yes I know, the function names are very creative.