Quantcast
Channel: Sieve of Eratosthenes - Finding Primes Python - Stack Overflow
Viewing all articles
Browse latest Browse all 28

Answer by Arty for Sieve of Eratosthenes - Finding Primes Python

$
0
0

Thanks for interesting question!

Right now I wrote from scratch two versions of classical Sieve of Eratosthenes.

One is single-core (CPU core), another one is multi-core (using all CPU cores).

Main speedup of both (single and multi core) versions was due to using Numpy, official Python package. Install in once through command python -m pip install numpy. Another great speedup of multi-core version was due to usage of all CPU cores, which gives almost speedup N times for N cores, compared to single core version.

Right now I have only two-core laptop. My program produced following timings:

Computing primes less than 8MNumber of CPU cores 2Original time 72.57 secSimple time 5.694 secSimple boost (compared to original) 12.745xMulti core time 2.642 secMulti core boost (compared to simple) 2.155x

Original above means your code from your question's body, the second one that you optimized already. Simple is my single core version. Multi is my multi core version.

As you see above computing primes less than 8 Million took 72 seconds on your original code, my single core version took 5.7 seconds, which is 12.7x times faster than your code, and my 2-core version took 2.6 seconds, which is 2.1x times faster than my single core and 27x times faster than your original code.

In other words I got 27x times speedup in my multi-core code compared to your code, really a lot! And this is only on 2-core laptop. If you have 8 cores or more then you'll get much bigger speedup. But remember that real speedup on multi core machine will be only for quite big prime number limit, try 64 Million limit or bigger, for this modify line primes_limit_M = 8 in my code to set amount of Millions.

Just to dwell into details. My single core version is almost like your code, but uses Numpy which makes any array operations very fast, instead of using pure pythonic loops with lists.

Multi core version also uses Numpy, but splits array with sieved range into as many pieces as there are CPU cores on your machine, each piece of array having equal size. Then every CPU core sets boolean flags in its own part of array. This technique gives speedup only till you hit speed limit of your memory (RAM), so after some point with growth of amount of CPU cores you don't get extra speedup.

By default I use all CPU cores in multi core version, but you may experiment by setting less cores than you have on your machine, this may give even better speedup, because it is not guaranteed that most of cores will give exactly fastest result. Tweak amount of cores through changing line cpu_cores = mp.cpu_count() to something like cpu_cores = 3.

Try it online!

def SieveOfEratosthenes_Original(end):    import numpy as np    limit = end - 1    limitn = limit+1    primes = dict()    for i in range(2, limitn): primes[i] = True    for i in primes:        factors = range(i,limitn, i)        for f in factors[1:]:            primes[f] = False    return np.array([i for i in primes if primes[i]==True], dtype = np.uint32)def SieveOfEratosthenes_Simple(end):    # https://en.wikipedia.org/wiki/Sieve_of_Eratosthenes    import numpy as np    composites = np.zeros((end,), dtype = np.uint8)    composites[:2] = 1    for p, c in enumerate(composites):        if c:            continue        composites[p * p :: p] = 1    return np.array([p for p, c in enumerate(composites) if not c],        dtype = np.uint32)def SieveOfEratosthenes_Task(small_primes, begin, end):    import numpy as np    composites = np.zeros((end - begin,), dtype = np.uint8)    offsets = np.full((len(small_primes),), begin, dtype = np.uint32)    offsets = small_primes - offsets % small_primes    offsets[offsets == small_primes] = 0    for off, p in zip(offsets, small_primes):        composites[off :: p] = 1    return np.array([begin + i for i, c in enumerate(composites) if not c],        dtype = np.uint32)def SieveOfEratosthenes_MultiCore(end, *, nthreads = None):    import math, multiprocessing as mp, numpy as np    end_small = math.ceil(math.sqrt(end)) + 1    small_primes = SieveOfEratosthenes_Simple(end_small)    if nthreads is None:        nthreads = mp.cpu_count()    block = (end - end_small + nthreads - 1) // nthreads    with mp.Pool(nthreads) as pool:        return np.concatenate([small_primes] + pool.starmap(SieveOfEratosthenes_Task, [            (small_primes, min(end_small + ithr * block, end), min(end_small + (ithr + 1) * block, end))            for ithr in range(nthreads)]))def Test():    import time, numpy as np, multiprocessing as mp    primes_limit_M = 8    cpu_cores = mp.cpu_count()    end = primes_limit_M * 2 ** 20    print(f'Computing primes less than {primes_limit_M}M')    print('Number of CPU cores', cpu_cores, flush = True)    tim_orig = time.time()    res_orig = SieveOfEratosthenes_Original(end)    tim_orig = time.time() - tim_orig    print('Original time', round(tim_orig, 3), 'sec', flush = True)    tim_simple = time.time()    res_simple = SieveOfEratosthenes_Simple(end)    tim_simple = time.time() - tim_simple    print('Simple time', round(tim_simple, 3), 'sec', flush = True)    assert np.all(res_orig == res_simple)    print(f'Simple boost (compared to original) {tim_orig / tim_simple:.3f}x')    tim_multi = time.time()    res_multi = SieveOfEratosthenes_MultiCore(end, nthreads = cpu_cores)    tim_multi = time.time() - tim_multi    print('Multi core time', round(tim_multi, 3), 'sec', flush = True)    assert np.all(res_simple == res_multi)    print(f'Multi core boost (compared to simple) {tim_simple / tim_multi:.3f}x')if __name__ == '__main__':    Test()

Viewing all articles
Browse latest Browse all 28

Trending Articles



<script src="https://jsc.adskeeper.com/r/s/rssing.com.1596347.js" async> </script>