Showing posts with label C language. Show all posts
Showing posts with label C language. Show all posts

Wednesday, January 1, 2020

Making Primes More Accessible with Forth and C

Some FaceBook friends in a Forth language group (Forth 2020/ForthWin) were playing around with dates and prime numbers, and I remembered some posts from some years back, generating lists of prime numbers and counting primes in various languages. One of the posts was a program in Forth for generating lists of small primes.


You say you didn't know 20,000,003 was a small number, much less a small prime?

It's small enough to calculate by this simple method on modern personal computers, so, yes, it's small. In terms of cryptology and cryptography, also, it is small.

Well, it turns out the Forth version was directly usable (on modern hardware, if you set the terminal emulator to remember enough lines) to determine if dates written as decimal numbers like this:
 20191231
for yesterday, are prime. But you have to go searching through the list for the number yourself.

It would be easier to be able to use the computer to look for it for you. This is easy to do in Forth. (Taking a clue from the BASIC version.)

Here's the code I'm going to start with, from the post on small primes in Forth, modified to only count, not print, the primes:


( Archetypical implementation of the counting sieve of eratosthenes in FORTH -- fig, bif-c --  )
( using more of the FORTH idiom. )
( Copyright Joel Rees, 2015, 2020 )
( By Joel Rees, Amagasaki, Japan, 2015, 2020 )
( All rights reserved. )
( Permission granted by the author to use this code )
( for any purpose,  )
( on condition that substantial use  )
( shall retain this copyright and permission notice. )


VOCABULARY sieve-local ( Make a local symbol table. )
sieve-local DEFINITIONS ( Switch to the local vocabulary. )


30000000 CONSTANT MAXSIEVE
MAXSIEVE 1 - 2 / CONSTANT FINALPASS

5 CONSTANT DISPWIDTH ( enough digits to display MAXSIEVE )


0 VARIABLE sieve ( Some old FORTHs don't provide a default behavior for CREATE )
( gforth will leave the zero on the stack. Some old FORTHs need an initial value. )

   HERE sieve - DUP ( Some old FORTHs don't provide a CELL width. )
   MAXSIEVE SWAP - ALLOT ( Allocate the rest of the byte array. )

   CONSTANT CELLWIDTH ( To show how this can be done. )
( Arrays are handled differently by different Forths. )
( If CELLWIDTH were just to save a wasted entry in the sieve array, )
( it wouldn't be worth it here. )
( I'm assuming there are likely to be other uses. )

: sieveInit ( -- adr )
0 sieve C!      ( 0 is not prime. )
0 sieve 1+ C!   ( 1 is not prime. )
sieve MAXSIEVE 2 DO    ( set flags to true for 2 to MAXSIEVE. )
   -1 OVER I + C! LOOP  ( sieve pointer -- still on stack. )
;

: primePass ( adr prime -- adr )
MAXSIEVE OVER DUP + DO    ( start at first multiple -- double. )
   OVER I + 0 SWAP C!     ( clear at this multiple. )
   DUP +LOOP              ( next multiple )
DROP ;      ( sieve address still on stack. )

: findPrimes ( adr -- adr )
FINALPASS 2 DO   ( clear flags at all multiples. )
   DUP I + C@ IF ( don't bother if not prime. )
      I primePass
   ENDIF
LOOP ;           ( sieve still on stack. )

: countPrimes ( adr -- )
0 SWAP
MAXSIEVE 0 DO
   DUP I + C@ IF
      SWAP 1+ SWAP
   ENDIF
LOOP DROP . ." Primes less than " MAXSIEVE . ;


FORTH DEFINITIONS

: sieveMain ( -- )
[ sieve-local ] sieveInit
findPrimes
countPrimes ;


sieveMain


People who do a lot of Forth will have a few complaints about this code. It runs fast, and it leaves the prime sieve in memory for later querying, but the queries are just slightly arcane:

sieve 20000003 + c@ .
will print "255", which tells you that 20,000,003 is prime. So, we define a query word:
: isprime? sieve + c@ if 
  ." Yes." 
else 
  ." No." 
endif 
And now you can type
20000003 isprime?
and it responds, Yes. And you can type
20000001 isprime?
and it responds, No. (You know 20000002 is not prime, so no need to test that, right? Right?)

Very convenient. Let's compare that to doing something similar in C.

(Incidentally, a full functionality prime query and factoring command, the primes (source) utility command (manual), can be found in bsd games, available in most Linux distros as well as in BSD, and in Cygwin, as well, for MSWindows sufferers.)

We'll need to pick one of the existing prime sieves and add some code to read numbers from the input. The prime counting routine deals with the largest primes, so I'll borrow it:


/* Archetypical queriable implementation of the counting sieve of eratosthenes in C
** By Joel Rees, Amagasaki, Japan, 2015, 2018, 2020
** All rights reserved.
** Permission granted by the author to use this code
** for any purpose,
** on condition that substantial use
** shall retain this copyright and permission notice.
*/

#include <stdio.h>
#include <stdlib.h>
#include <string.h>


#define MAXSIEVE 30000000L /* Maximum testable of 29999999L */

#define FINALPASS ( ( MAXSIEVE + 1 ) / 2 )

char * sieve;

int main( int argc, char * argv[] )
{
    unsigned long index, prime;
    unsigned long count = 0;
    unsigned long maxsieve = MAXSIEVE;
    unsigned long finalpass = FINALPASS;
    unsigned parameterX = 1;
    char * unscanned;

    if ( argc > 1 )
    {
       if ( ( strcmp( "--help", argv[ 1 ] ) == 0 )
            || ( ( argv[ 1 ][ 0 ] == '-' )
                 && ( ( argv[ 1 ][ 1 ] == 'h' ) || ( argv[ 1 ][ 1 ] == '?' ) ) ) )
       {
          printf( "usage:\n%s [ -max MAX ] ( prime < MAX ) ...\n", argv[ 0 ] );
          return EXIT_SUCCESS;
       }
       else if ( ( argc > 2 ) && ( strcmp( "-max", argv[ 1 ] ) == 0 ) )
       {
          maxsieve = strtoul( argv[ 2 ], &unscanned, 0 );
          if ( ! ( unscanned > argv[ 2 ] ) )
          {
             printf( "max: %s invalid, using %lu\n", argv[ 2 ], MAXSIEVE );
             maxsieve = MAXSIEVE;
          }
          parameterX = 3;
       }
    }
    finalpass = ( maxsieve + 1 ) / 2;
    sieve = malloc( maxsieve );
    if ( sieve == NULL )
    {
       printf( "Can't allocate %lu byte array, quitting.\n", maxsieve );
       return EXIT_FAILURE;
    }

    sieve[ 0 ] = 0;
    sieve[ 1 ] = 0;
    for ( index = 2; index < maxsieve; index = index + 1 )
    {   sieve[ index ] = 1;
    }

    for ( prime = 2; prime < finalpass; prime = prime + 1)
    {
        if ( sieve[ prime ] == 1 )
        {   for ( index = prime + prime; index < maxsieve; index = index + prime )
            {   sieve[ index ] = 0;
            }
        }
    }

    for ( index = 2; index < maxsieve; index = index + 1 )
    {
        if ( sieve[ index ]  != 0 )
        {
            ++count;
        }
    }
    printf( "%lu primes less than %lu\n", count, maxsieve );
    for ( ; parameterX < argc; ++parameterX )
    {
        unsigned long parameter = strtoul( argv[ parameterX ], &unscanned, 0 );
        if ( ! ( unscanned > argv[ parameterX ] ) || ( parameter >= maxsieve ) )
        {
            printf( "parameter %s invalid\n", argv[ parameterX ] );
        }
        else
        {
            printf( "%lu is%sprime\n", parameter,
                    ( sieve[ parameter ] == 0 ) ? " not " : " " );
        }
    }
    return EXIT_SUCCESS;
}


Hmm. But the C program also counts.

It's not really very interactive. But ...

I should note that the Forth program does not make use of isprime? except in the querying phase. If the definition of the sieve and of isprime? somehow get modified so they don't agree, who's to know?

This kind of hidden dependency is called implicit coupling, and should be avoided in good design. Therefore, while I'm adding the counting code, I'll also redefine the sieve construction to use the same code as the query. This will slow it down just slightly, unless your Forth interpreter has some built-in leaf-flattening optimizations. You probably won't notice.

I'll also remove printPrimes, since we don't want millions of primes scrolling off the screen:


( Forth-like archetypical implementation of the sieve of eratosthenes in FORTH -- fig, bif-c --  )
( using even more of the FORTH idiom. )
( Copyright Joel Rees, 2015, 2020 )
( By Joel Rees, Amagasaki, Japan, 2015, 2020 )
( All rights reserved. )
( Permission granted by the author to use this code )
( for any purpose,  )
( on condition that substantial use  )
( shall retain this copyright and permission notice. )



VOCABULARY sieve-local ( Make a local symbol table. )
sieve-local DEFINITIONS ( Switch to the local vocabulary. )


30000000 CONSTANT MAXSIEVE
MAXSIEVE 1 - 2 / CONSTANT FINALPASS


0 VARIABLE sieve ( Some old FORTHs don't provide a default behavior for CREATE )
( gforth will leave the zero there. Some old FORTHs need an initial value. )

   HERE sieve - DUP ( Some old FORTHs don't provide a CELL width. )
   MAXSIEVE SWAP - ALLOT ( Allocate the rest of the byte array. )

   CONSTANT CELLWIDTH ( To show how this can be done. )
( Arrays are handled differently by different Forths. )
( If CELLWIDTH were just to save a wasted entry in the sieve array, )
( it wouldn't be worth it here. )
( I'm assuming there are likely to be other uses.

: sieveCell ( index -- adr )
   sieve + ; ( This level of indirection can slow the loop down a lot. )

: initAsPrime ( index -- )
   sieveCell -1 swap C! ; ( The swap can also slow things down. )

: makeNotPrime ( index -- )
   sieveCell 0 swap C! ; ( The swap can also slow things down. )

: ?prime ( index -- flag )
   sieveCell C@ ;

: sieveInit ( -- )
0 makeNotPrime      ( 0 is not prime. )
1 makeNotPrime      ( 1 is not prime. )
MAXSIEVE 2 DO    ( set flags to true for 2 to MAXSIEVE. )
   I initAsPrime LOOP
;

: primePass ( prime -- )
MAXSIEVE OVER DUP + DO    ( start at first multiple -- double. )
   I makeNotPrime
   DUP +LOOP              ( next multiple )
DROP ;

: findPrimes ( -- )
FINALPASS 2 DO   ( clear flags at all multiples. )
   I ?prime IF ( don't bother if not prime. )
      I primePass
   ENDIF
LOOP ;

: countPrimes ( -- count )
0
MAXSIEVE 0 DO
   I ?prime IF 1+
   ENDIF
LOOP
;

FORTH DEFINITIONS

: sieveMain ( -- )
[ sieve-local ] sieveInit
findPrimes
countPrimes . ." primes less than " MAXSIEVE . ;

FORTH DEFINITIONS

: isPrime? ( index -- ) ( Say whether it's prime or not. )
[ sieve-local ]
   DUP MAXSIEVE < 0= IF
      ." Too big!" CR
   ELSE
      ?prime IF
         ." Yes."
      ELSE
         ." No."
      ENDIF
   ENDIF
;

FORTH DEFINITIONS

sieveMain


Running time for the C program on a more recent and much higher spec machine than the old one I did this on before:
$ time ./queriable 20191231 20200101
1857859 primes less than 30000000
20191231 is prime
20200101 is not prime

real    0m1.069s
user    0m1.044s
sys    0m0.024s
Comparing that to the running time of the gforth code with all the dependencies explicit (telling gforth to use a big enough dictionary):
$ time gforth -m 400M queriable.fs -e "20191231 isPrime? 20200101 isPrime? bye"
1857859 primes less than 30000000 Yes.No.
real    0m2.938s
user    0m2.937s
sys    0m0.000s
And comparing that to the time for the counting forth version at the top of this post, with all the implicit linkage:
$ time gforth -m 400M sievecount.fs -e "20191231 isprime? 20200101 isprime? bye"
1857859 Primes less than 30000000 Yes.No.
real    0m1.848s
user    0m1.844s
sys    0m0.004s
Someday, I want to write a Forth capable of the leaf flattening optimizations I mentioned above.

(Note that this current machine seems to be about 4 times as fast at primes as the machine I developed the earlier code on. The processor and memory are that much faster. But, for another measure of processor speed, see this post from way back in 2012, https://reiisi.blogspot.com/2012/08/too-late-to-complain-performance-of.html, and compare those times with this for the pi test:
time echo "scale=4000;a(1)" | bc -l
...
real    0m11.219s
user    0m11.218s
sys    0m0.000s
So, yes, Intel did "fulfill" their roadmap. For higher spec, more expensive machines.

Just for the perspective, we'll note that IBM has kept ahead of Intel's pace with their PPC and Freescale has been bought out. Wasteland.

Incidentally, this machine is from about the last batch of CPUs that doesn't have the currently trending vulnerabilities in speculative execution. Speculative execution? Heh. And they didn't expect the vulnerabilities? Well, ...)

Sunday, February 24, 2019

Taking the Sieve Too Far

A post on a Fortran program for counting primes in the Vintage {computers | microprocessors | microcontrollers } FB group inspired me to dig an old post on small primes back up and refurbish it a bit.

(If you don't know how this works, refer to the post on small primes.)


/* Archetypical implementation of the sieve of eratosthenes in C
** By Joel Rees, Amagasaki, Japan, 2015, 2018, 2020
** All rights reserved.
** Permission granted by the author to use this code
** for any purpose,
** on condition that substantial use
** shall retain this copyright and permission notice.
*/

#include <stdio.h>
#include <stdlib.h>

#define MAXSIEVE 100000000L

#define FINALPASS ( ( MAXSIEVE + 1 ) / 2 )

char * sieve;

int main( int argc, char * argv[] )
{
    unsigned long index, prime;
    unsigned long count = 0;
    /* BUG! unsigned long maxsieve, finalpass; JMR20200101 */
    unsigned long maxsieve = MAXSIEVE;
    unsigned long finalpass = FINALPASS;
    char * unscanned;

    if ( argc > 1 )
    {
       maxsieve = strtoul( argv[ 1 ], &unscanned, 0 );
       if ( ! ( unscanned > argv[ 1 ] ) || ( maxsieve > MAXSIEVE ) )
       {  maxsieve = MAXSIEVE;
       }
    }
    finalpass = ( maxsieve + 1 ) / 2;
    sieve = malloc( maxsieve );
    if ( sieve == NULL )
    {
       printf( "Can't allocate %lu byte array, quitting.\n", maxsieve );
       return EXIT_FAILURE;
    }

    sieve[ 0 ] = 0;
    sieve[ 1 ] = 0;
    for ( index = 2; index < maxsieve; index = index + 1 )
    {   sieve[ index ] = 1;
    }

    for ( prime = 2; prime < finalpass; prime = prime + 1)
    {
        if ( sieve[ prime ] == 1 )
        {   for ( index = prime + prime; index < maxsieve; index = index + prime )
            {   sieve[ index ] = 0;
            }
        }
    }

    for ( index = 2; index < maxsieve; index = index + 1 )
    {
        if ( sieve[ index ]  != 0 )
        {
            ++count;
        }
    }
    printf( "%lu primes less than %lu\n", count, maxsieve );
    return EXIT_SUCCESS;
}



This version only counts primes, and it alows you to specify the largest number.

Execution time on this old 1.2 GHz single processor:

$ time ./primecount 100000000
5761455 primes less than 100000000

real    0m14.008s
user    0m13.733s
sys    0m0.196s

It uses the same sieve-in-array approach as the small primes example, so you probably should not run it at a hundred million if you don't have at least a gigabyte of RAM.

I hope there's no bug and that's the correct answer.

A note:

I had the array statically allocated at one point, and the compile actually took several seconds to write that big empty array. But the execute time was not quite a second shorter. The extra execution time for the dynamic array may have simply been the system thrashing after the previous version (because this box only has one gigabyte of RAM and a 100 Meg of a Gig is enough to push things around in virtual memory).

[JMR-2018-02-28
And here's how it runs under Cygwin on a much more recent 2.5 GHz dual core Intel CPU:
$ time ./primecount 100000000
5761455 primes less than 100000000

real    0m11.349s
user    0m11.107s
sys     0m0.062s

It's only using one core, but why doesn't the double clock speed halve the execution time? I dunno. You tell me.

I could use the second core by forking the process and having the second process lag behind the first one. The algorithm becomes semi-non-obvious:

First, the first processor sets the flags in the lower half of the array, and the second process sets the flags in the upper half. Maybe bus contention would not eat up the savings in halving the load.

We'll start the parent process on a single pass at multiples of 2, and it will quickly try to do this to the first part of the sieve array:

0123456789 10111213141516171819 20212223242526272829
0011010101 0101010101 0101010101

We'll start the child process on a single pass at multiples of 3, and it will quickly try to do this to the first part of the sieve array:

0123456789 10111213141516171819 20212223242526272829
0011110110 1101101101 1011011011

Note that the effects are independent of each other.

Now, if either the parent or child finishes its first single pass before the other finishes even starts its first, the only damage is an unneeded extra pass on 4, and maybe even 6, 8, and 9. For small arrays, trying to use both processors could end up being (slightly) slower. For large arrays, we can be pretty sure that there won't be more than a few unnecessary passes.

Ideally, by the time one or the other finishes its first pass, the other will have already made its way through the first 15 of its pass, and it will have

0123456789 10111213141516171819 20212223242526272829
0011010100 0101000101 0001010001

and this is good enough to let the process avoid wasting a pass on 4 and start 5.

But the next process to finish will also see 5. (Check for yourself why probing the second or third in the pass isn't sufficient.)

So it looks like we need a shared variable for each to store its current pass on. And there will be some non-zero probability of read-interleaved write, so we should probably protect it with some semaphore or other mutual-exclusion technique.

This shared variable will help at the end, as well, when passes become very quick.

Lack of such mutual exclusion won't make it give the wrong result, it will just result in wasted passes.

Maybe I'll do the code for multiple processes later on.
]

Saturday, September 24, 2016

Calculating π (pi) in C

Maybe I need a better hobby.

Or maybe math really is this fun.

Okay, where this came from -- I gave my son a challenge, and he wrote a short program in perl to calculate π by the area of a circle.

So I wrote one in my favorite calculator, bc.

Now, its a nice program, and I can check the accuracy with bc itself, using the arctangent function. But bc is not as fast as C, and this old intel processor is not that fast, either.

And the program had a bug. Well, in my original explanation, I forgot to mention the problem of rounding. Bc always rounds down. Or, rather, it just drops fractional bits past the current scale This produced some notable effects, which are partially resolved in bc by doubling the scale.

Perhaps one reason bc is not as well known as it might be is that bc doesn't help you with rounding.

And it gives you scale instead of precision or accuracy.

So the limits of computers bite you pretty quick with bc, where other similar solutions help you ignore the limits -- sometimes until it's too late.

Which is part of the reason I'm writing this rant.

Anyway, I also wrote a nice little program for calculating π in standard C, using the language's ordinary math. It uses the same basic approach as I used in the bc version, except that the bc version I showed has a few more bells and whistles.

(I actually started with a very minimalistic version in bc, then wrote the C program and uploaded it to the pastebin linked above, then added the comments and extra features to the bc version before uploading it to a pastebin, if you are wondering why.)

So, if you want a full explanation of the method C source, look at my rants on the bc version (linked above) and the comments in the bc source. I'll limit my comments here to details of C programming and the standard C run-time environment, and issues of rounding and accuracy.

(Modifying the C source to do midpoint and so on is something I leave for you to play with.)

Here's the source of the standard C program, with extra comments:


/* Computing the value of pi by integrating the area of a (half of a) circle.
// by Joel Matthew Rees, 1 August 2013
// Copyright Joel Matthew Rees
//
// Fair use acknowledged.
//
// All rights to this expression of the method reserved.
//
// (Starting from scratch isn't that hard,
// and you'll like the results better.)
// -- Really, writing small programs like this from scratch
// is an excellent way to learn the basic grammar and runtime
// of a programming language. DIY!
// 
//
// Compile with the right libraries:
// cc -Wall -lm -o halfpiArea halfpiArea.c
//
// C language programs have to be compiled. You can't just run them.
// You have to save the source as halfpiArea.c
// and then run the compiler command above in a shell:
// http://joels-programming-fun.blogspot.com/2013/08/get-shell.html
//
// The compiler will catch simple errors in the source,
// and try to give you hints on how to fix them.
// Learning how to read the hints is a topic for another rant or more.
//
// If the compiler succeeds, the command line above will save
// the executable program in a file called halfpiArea,
// which you then run by typing 
// ./halfpiArea
// in the shell.
//
//
*/


/* The following lines help the compiler to figure out
// what functions you are using from the standard library
// and how you should be using them:
*/
#include <stdio.h>
#include <stdlib.h>
#include <math.h>
/* They literally include information about those functions
// into this program.
*/


/* The main function of a C program in a standard runtime environment
// is pretty much always called "main()".
// And it has two or three standard arguments.
// I only tell the program about two:
// argc is the integer count of the command line arguments,
// and argv[] is the array of command line arguments, as character strings.
// I'll tell you more about the command line arguments below.
*/
int main( int argc, char * argv[] )
{
   /* All the variables here are declared as
   // double length floating point.
   // Accuracy is usually more than 11 useful digits,
   // Exponent range is usually more than about 10^(+-235).
   // Simplify way too much,
   // you can write numbers from 1e-235 to 1e235,
   // and, if we get the math write,
   // we can guarantee at least 11 good digits for pi.
   //
   // (Actually, the math normally supported in modern C libraries
   // is IEEE 754, which would be 10^(+-308) and at least 15 good digits.
   // Check C Language Data Types and IEEE 754 on Wikipedia, stackexchange,
   // etc., for more information.
   //
   // If you want to know what your compiler gives,
   // #include <limits.h> and play with the math.)
   */
   double area = 0.0; /* Sum the area here. */
   double deltaX = 0.025; /* Width of the rectangles (by default 1/40). */
   double lastX = -1.0; /* Track the left side. */
   double x; /* The right side. */
   double almostPi; /* The approximation of pi. */
   /* You will notice that we declare the names and types
   // of the variables before we use them.
   // C also allows us to set the initial value as we declare them.
   */

   if ( argc > 1 ) /* Command line arguments? */
   {  deltaX = strtod( argv[ 1 ], NULL ); /* Assume it's a different width. */
   }
   /* This allows us to set a different width on the command line.
   // 1/11 would be 0.09090909090909090909 to twenty decimal places:
   // ./halfpiArea 0.09090909090909090909
   // 1/8 == 0.125, 1/10 == 0.1, 1/100 == 0.01, 1/200000 == 0.000005
   */

   /* Note that starting from the half delta converges at least one decimal digit faster.
   // (That's because the height of the rectangle crosses the circle at midpoint,
   // rather than left edge or right. Draw the pictures to see why.
   // This is right edge, but changing to midpoint or left edge is trivial.)
   */
   for ( x = -1.0 + deltaX; x <= 1.0; x += deltaX )
   {
      double ysquared = 1.0 - ( x * x );
      double y = sqrt( ysquared );
      /* deltaX = x - lastX; Reminder for humans, only! */
      lastX = x;
      area += deltaX * y;
      /* Suppress long listings of the areas.*/
      if ( deltaX > 0.00005 ) /* 1/20000 */
      {  printf( "(%17.15g,%17.15g): %17.15g\n", x, y, area );
         /* %17.15g is a floating point field
         // of 17 digits with 15 digits to the right of the decimal.
         */
      }
   }

   almostPi = area * 2;
   printf( "Almost Pi: %17.15g\n", almostPi );
   return EXIT_SUCCESS;
}



Calculating π to eleven digits works okay with C doubles. But if you look at the listing of the areas, you see some odd differences, because C always rounds but bc always does as many exact digits as you tell it to with the scale variable.

Play with different widths, compare to the bc version, and see what you can learn.

(Modern C, from about C99, allows you to declare floating point types that use alternative rounding.)

And you should notice that C is very fast when compared to bc.

When you want to check the real value of π, remember:
echo "scale=41; a(1)*4;" | bc -l
from the shell command line, or
scale=41
a(1)*4
in bc, adjust scale to one more than the number of good digits you want.

And don't try 
./halfpiArea 0.0000000000000000000000000000000000000001
unless you really want to watch your computer contemplate its navel until you give up and shut it down. Start with
./halfpiArea 0.00001
and work your way up to 0.0000001 or so, where you get really noticeable waits.

And see if you can figure out whether midpoint is really better than right side, as I said it would be, when you start using widths smaller than 1/100 or so.

Oh. And play with the source code a bit, too.

[JMR201611182140:
I decided to post a few hints about playing with the programs:
http://joels-programming-fun.blogspot.com/2016/11/using-to-check-your-floating-point.html.  
]


[JMR201611292224:
Hopefully, I'll get a chance to post sample source with the GMP gnu multi-precision library "real soon now". (I've got one running, it just seems to take more time to write the rant than the source code.)
Here it is, demonstrating the principle with GMP:
http://joels-programming-fun.blogspot.com/2016/11/using-gmp-to-approximate-pi.html.
]