-
Notifications
You must be signed in to change notification settings - Fork 2
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
On Inefficiencys #2
Comments
Hello! Just checking in. I'll read in detail tomorrow and respond more. Note: when it comes to primality checks, the "Optimized" version (the default) makes choices to select which algorithm will perform better for the size of the number. Smaller numbers will typically use Polynomial, and with larger numbers, Miller-Rabin will be used with some https://github.com/Open-NET-Libraries/Open.Numeric.Primes/blob/master/source/Optimized.cs |
Providing the option to use it, is fine. |
I'm attempting to use this feature of C# |
This is done instead of |
|
Since I'm really adapting what someone else wrote, I'm not sure how this could be improved. |
So I'm aware of the wrap-around possibility and try to avoid issues with that where possible. |
It would be great if you cloned the repo, made a branch with the suggested changes and I can test. |
Here are the current speed test results I have: (thanks for the trial division help/fix)
If you are correct, then at least one of these numbers should improve. |
-Disclaimer : Im only good at writing fast Code, usually C/C++ or Asm, Not "nice" or "good" Code in a sense of readability.
But judging by the documented methods and this "ReSharper"(i dont know what that is, but i assume it helps with readability)
You seem to be good at it / care about it ...
In the File "Open.Numeric.Primes/source/MillerRabin.cs"
So using an unproven algorithm in a prime testing library is (in my opinion) not a good idea.
The miller test is "assuming the truth of the generalized Riemann hypothesis (GRH)"
So it might be incorrect. I would recommend the AKS, because its also runs in polynomial time an // but is a little bit slower
"does not rely on unproven assumptions."
https://en.wikipedia.org/wiki/Miller%E2%80%93Rabin_primality_test
But since you decided to use it i will point out a few problems:
Line 45 "
ref readonly var b = ref ar[i]; // i dont know why you would use "ref readonly var" instead of "ulong"
var a = value - 2; // I dont know what that is, that's not part of the algorithm
var now = a > b // again, no clue, that's not part of the algorithm
? Pow(in b, in d, in value) // again, no clue, that's not part of the algorithm
: Pow(in a, in d, in value); // again, no clue, But nice try avoiding branch miss using Conditional moves xDD
"
that can be completely replaced by "ulong now = Pow(a, d, value);" // not sure, what the "in" keyword means ...
but let me take a look at the Pow function ...
okey looks complicated, it looks like a really cryptic recursive D&C Power algorithm.
So that really slow. each recursion will get written on the Stack + Call Overhead + Pipeline flushes + unnecessary Overhead.
Write on without recursion ( i assume it would be at least 10 times faster ). I leave it as an exercise to the reader, i think you
can figure it out. Lets see what "Mul" does ...
Oh Okey, i think i need to explain (simplified) branches.
before a processor can execute a Instruction, it needs to be "decoded" (interpreted)
this step can take a long time, usually longer then the execution of the instruction itself. // on all x86 Systems
so the processor starts decoding many instructions ahead of its current instruction, so the next instruction
is finished, before the processor executes it. These already decode instruction a stored in a pipeline.
So every time your execution path splits up (like at a if statement) the processor "guesses" a path and
continues decoding. If the execution Path follows the prediction, every thing is fine, but if not, the processor "flushes"
the Pipeline, and starts decoding the other execution Path. That takes a lot of time.
To avoid that, there are simple heuristics, for the prediction, like for example, loops are always
predicted true because on say 10 Iterations, 10 / 11 predictions will be correct, otherwise you got like 1 / 11. very bad.
Okey, thats no good Multiplikation algorithm, but some of the mistakes are common,
so lets cover them, and go for alternatives later.
First of all : Its wrong, it will fail for Numbers bigger than 2^63, because the integer result
greater than 2^64 will wrap around, so THIS comparison wont catch them. Plus "(now > mod)"
should be "(now >= mod)", its se same Speed, but more reliable, because it reduces the risk of overflow, see above.
So remember this branch stuff ? with loop conditions are always predicted to be true, and we want to avoid miss predictions ?
In Line 72 and 74 the "while (now >= mod) now -= mod;" // I corrected the Condition
loops ensure, that 0 <= now < mod.
So lets go though one Loop Iteration:
now <<= 1;
// Now we got 0<= 2now < 2mod
// As we see, if we subtract mod only if (now >= mod)
// we got it back to 0 <= now < mod without a Loop
// but lets see what the loop does
while (now >= mod) now -= mod; // Corrected again
if now >= mod is wrong, the processor flushes the pipeline, ouch. // again, loops are always predicted to be true
if now >= mod is true, the processor predicts the next Loop Iteration to be true again, wich is wrong EVERY time
// So the processor flushes the pipeline, ouch,
// either way, the Pipeline gets flushed, BIG OUCH
next we got another if statement, wich again will be miss predicted to some amount, bad. // a naive approximation : 50%
So before we get to the same bad correction loop again, lets cover the case where the addition ("now += b;") happens.
// lets assume, 0 <= b < mod, so b is already reduced, wich it should be, else you should reduce it at the start of the function.
// Now we got 0<= now + b < mod + b < mod * 2
// if we subtract mod only if (now >= mod)
// were again back to 0 <= now < mod, without the while loop, which again will always lead to a pipeline flush
okey, now to the solution : cmove or conditional move.
a conditional move acts like this ;
if(Condition)
x = a;
else
x = b;
but without breaking the Execution Path, so it cant get miss predicted.
Good compilers will notice the code above, and generate the cmove instruction, but not the C# Compiler (at least in my tests).
So we need to give him a little hint:
x = Condition ? a : b;
This almost always generates a cmove instruction, IF and only IF a and be are already computed,
so no "cheating" like
x = 0 < a ? a : -a; // x = |a| mathematically speaking
because -a needs to be computed. again, good compilers my recognize this, and do this for you, but last time i checked,
C# didn't. if the Compiler doesn't generate a cmove, it generates branches, like above, and they will sometimes be miss
predicted, so in speed intensives Code, like inner loops avoid them whenever you can.
The optimization of this loop is again left as an exercise to the reader xD. // i love to say that xDD
When you are done with the loop in line 69 you can remove the loop in line 68, because it wont save much time but generates
at least one(possibly even more) branch miss prediction, witch alone will take more time as 10 - 20 times the line 69's loop body
also consider "Inlining" your function through
using System.Runtime.CompilerServices;
[MethodImpl(MethodImplOptions.AggressiveInlining)]
static ulong Mul(ulong a, ulong b, ulong mod) { ... }
this will eliminate the method call overhead and increase code locality at the (negatable)cost of a larger executable,
if the function is very big, for relatively small functions it reduces the overall size. For more information:
https://en.wikipedia.org/wiki/Inline_expansion
The previously mentioned alternatives are also left as an exercise for the reader
but i encourage you to not just google, because i think you can do it yourself.
Let me know, what you came up with.
Okey, back to my own work, if you need performance advice, please give me the specific part of the Code,
but i cannot think myself into a hole (especially multi file) project and search all slow parts myself, figuring out what needs to
be fast isn't that hard.
PS : I just noticed the
"/* Based on: https://stackoverflow.com/questions/4236673/sample-code-for-fast-primality-testing-in-c-sharp#4236870 */"
comment, this mess of a multiplication and power algorithm isn't your fault, and the wrong implemented unproven prime
testing algorithm wasnt your idea, but thats a good example, why you should just take every thing from google. xDD
The text was updated successfully, but these errors were encountered: