Opened 9 years ago

Closed 8 years ago

#3368 closed enhancement (wontfix)

[PATCH] Switch to compiler implementations of sqrt

Reported by: Bruno Manganelli Owned by:
Priority: Should Have Milestone:
Component: Core engine Keywords: patch
Cc: Patch:


On MSVC, GCC and Clang the builtin sqrt function is anywhere from 4 to 8 times faster than the isqrt64 function, while producing exactly the same result on all test cases [ test here ].

All the argumets passed to isqrt64 during an hour long session have been saved to a file and the above test has been run on such file, with the same result output from the two function calls. File and source here.

I suspect the ad-hoc function is not needed at all on any compiler and platform, but I haven't tested other compilers other than the three above.

Attachments (1)

maths.patch (1.5 KB ) - added by Bruno Manganelli 9 years ago.

Download all attachments as: .zip

Change History (15)

by Bruno Manganelli, 9 years ago

Attachment: maths.patch added

comment:1 by Doménique, 9 years ago

Interesting, I looked into this late in 2013. I also found the custom isqrt64 function to be noticeably slower but at the time I had slightly different results on my test set. Any difference causes issues as the multiplayer game relies on exactly deterministic behaviour, which is why the custom function is avoiding floating point calculations at the cost of speed. Any speedup would be directly noticeable as the sqrt function is called very often by the pathfinder.

I just wanted to add my previous experience. I see you have some code available to check behaviour on exotic platforms that may cause trouble, that's certainly helpful.

comment:2 by kanetaka, 9 years ago

Your test doesn't cover whole 64 bit unsigned, I think the builtin sqrt(int) is much faster but not compatible with isqrt64().

comment:3 by Itms, 9 years ago

bmanga95, thanks for contributing. Can you answer or address kanetaka's remark before we look into integrating that patch? Thanks in advance!

comment:4 by Bruno Manganelli, 9 years ago

Sorry for the late reply. I ran some tests today on the higher end of the range and I did find a discrepancy (by one unit) between the compiler and the custom version when the number is above 18446744073709550591. This is probably due to the increased error in the conversion from u64 to double. Using sqrtl fixes the problem on GCC and Clang. MSVC, however, does not support the 80 bit long double extension and long double is effectively an alias for double.

A quick fix is to define and use

u32 isqrt64(u64 x)


return (x > 18446744065119617025) ? 4294967295 : (u32)sqrt((double)x);


However, I am not sure this is even worth solving, given that those are roughly the last 1000 numbers that an u64 can represent and hence an extremely tiny fraction of the range. Also in 700mb of data collected from the game not a single sqrt call required numbers remotely close to those that cause the issue. Thoughts about this?

dvangennip, as for the multiplayer problems, could you please elaborate further? Do you mean that by using doubles we may get inconsistent results?

comment:5 by Stan, 9 years ago

dvangennip, as for the multiplayer problems, could you please elaborate further? Do you > mean that by using doubles we may get inconsistent results?

Yes that what he means, we can't really assume that the result of the functions will be the same on all Windows Mac Os, and Linux Distros. We need to be sure, else the game may go out of sync in a multiplayer match.

Could you submit an updated patch ? (If something is worth adding) so it can get commited right away.

comment:6 by Doménique, 9 years ago

stanislas69 essentially said what I wanted to reply with. Any discrepancy would generate out-of-sync errors for the game in multiplayer. So that needs some testing on different platforms. Perhaps a commands.txt can be used to reliably test a game and evaluate if any platform gives a different final result?

But the troubling numbers you found are indeed well out of the range of the usual game requirements. I think, if your patch behaves consistently for all platforms within the range of data that 0 A.D. requires, it would be good to reap the performance benefits.

It may be wise to add a little caveat as a comment in the source code where relevant, with a link to this trac ticket. If ever any issues crop up with unexplained out-of-sync errors this could point in the right direction.

Is there any guess as to how much effect this patch may have on overall game performance? If it's significant it may be easier/more acceptable to take on the slight risk.

comment:7 by Itms, 9 years ago

Milestone: Alpha 19Alpha 20

I'm pushing this to the next milestone because we're close to feature freeze for A19. You can still work on it but it won't be committed before the release happens.

Thanks for your work so far :)

comment:8 by wraitii, 9 years ago

In 17207:

Change the shape rasterization to not use DistanceToSquare, which often called sqrt. This apparently reduces total turn time by as much as 5% (!)
Refs #3368

comment:9 by Itms, 8 years ago

Keywords: review removed

Any updates on this?

We would at the very least need a "release candidate" version of the patch that can be tested.

comment:10 by elexis, 8 years ago

Milestone: Alpha 20Backlog

comment:11 by wraitii, 8 years ago

Current patch fails on the comparison test with "4294967295u" on MSVC2013. Overall I remain in favor of shipping with it. Compiler implementations can benefit from sse_2 and generally remove this hurdle in the short_range pathfinder, it should save around 20 to 40% of the time spent in ComputeShortPath.

Otherwise we could keep both versions and choose when we're sure the numbers are going to be small enough.

comment:12 by wraitii, 8 years ago

Keywords: review added
Milestone: BacklogAlpha 21

comment:13 by elexis, 8 years ago

Keywords: review removed

comment:14 by elexis, 8 years ago

Milestone: Alpha 21
Resolution: wontfix
Status: newclosed

The relevant excerpt of that:

17:13 <@leper> #3368 if it fails it isn't MP safe so remove review and close as invalid or wontfix
17:13 < WildfireBot`> #3368 ([PATCH] Switch to compiler implementations of sqrt) –
17:14 < wraitii> leper: it fails on stupidly huge numbers. I would wager that we at least could use it in the short-range pathfinder, where numbers are much smaller (and bounded)
17:14 <@leper> #3990 I still don't think that is the way to go, but if someone wants to handle and be responsible for all the bugs that will cause, meh
17:15 < WildfireBot`> #3990 ([PATCH] ConfigDB complains about empty strings) –
17:15 <@leper> wraitii: so we should also use floats a little in case it works on one platform&OS combination?
17:16 < wraitii> different situation imo
17:16 < wraitii> the short range pathfinder has bounded distances
17:16 < wraitii> we could compute how many bits of precision we need
17:17 < wraitii> and check that no decent compiler would have an error margin above that
17:17 < wraitii> But again, will reprofile
17:17 < wraitii> and well the whole approach can be considered broken
17:17 < wraitii> so maybe it shouldn't be fixed

20-40% saved time sounds significant.However changing some arguments (perhaps the size of tiles f.e.?) might easily break it. Every new usage of sqrt somewhere else in the simulation will likely break. So if this optimization were used, every new contributor that might think about using sqrt in the C++ part of the simulation would have to be informed of this pitfall and would have to test on a number of platforms. Even then the deviations between the different compiler implementations can be completely arbitrary, so it will be wrong to assume any value found in tests as accurate, or accurate for the future.

Note: See TracTickets for help on using tickets.