Opened 11 years ago

Last modified 8 years ago

#2109 new enhancement

[PATCH] Optimisation of Fixed Point Trigonometic Approximations

Reported by: Thanduel Owned by: Thanduel
Priority: Should Have Milestone: Backlog
Component: Core engine Keywords: patch
Cc: Patch:

Description (last modified by Thanduel)

The algorithm currently used in the atan2_aprox has a very large error term for its performance.

To improve it I suggest the use of a CORDIC algorithm. A CORDIC algorithm calculates the angle by working out, through a binary search of precomputed values, what angle rotation would reduce the original angle to zero. All the numerical operations are done in power of 2 operations.

This should at least double the performance for a slightly smaller error (3.576deg vs 4.5deg) term or give slightly faster performance for a sub 0.5 deg error term.

The algorithm currently used in the SIN and COS approximation is very accurate but very slow. I propose to improve this with a CORDIC algorithm as well. For slightly worse accuracy the performance would be about double.

Attachments (8)

Atan2.patch (4.0 KB ) - added by Thanduel 11 years ago.
patch for fixed.h and fixed.cpp with cordic version of atan2_approx
Atan2.2.patch (4.0 KB ) - added by Thanduel 11 years ago.
sincos_approx.patch (5.7 KB ) - added by Thanduel 11 years ago.
patch to replace current sincos_approx with cordic sincos approx
Benchmark_Fixed.zip (11.7 KB ) - added by Thanduel 11 years ago.
benchmarking project for both atan2 and sincos approximations
cordic_patch.patch (7.8 KB ) - added by Thanduel 9 years ago.
play tested patch including cordic atan2 and sincos
2109rc1.patch (8.8 KB ) - added by fsincos 8 years ago.
sincos.patch (3.8 KB ) - added by fsincos 8 years ago.
atan2cordic.patch (2.8 KB ) - added by fsincos 8 years ago.

Download all attachments as: .zip

Change History (50)

comment:1 by Thanduel, 11 years ago

I am currently working on a patch for this and will submit it soon.

comment:2 by sanderd17, 11 years ago

Note that our code also has a second atan and atan2 implementation in JS. It's used by the JS scripts (as the native spidermonkey function wasn't platform consistent and caused OOS problems). That algorithm works with a simple taylor polynome, so it's likely it can be done better too. You can find it in binaries/data/mods/public/globalscripts/Math.js.

If your algorithm turns out to be better, it can be used there too. I don't know how much performance advantage it will give though, as I don't think the atan function is used very often in the JS scripts. But while you're doing all the research on the algorithms, it should be a simple language translation exercise.

comment:3 by Thanduel, 11 years ago

I will take a look, the same algorithm can be applied to sin and cos approximations of angles calculation of the hypotenuse as well as things like calculating a sqrt or log. It is, however, unclear whether or not it would provide a performance improvement in all of those circumstances.

In my benchmarks averaged over 500000 samples the cordic algorithm improves the performance of the atan2 function by a factor of about 2 for an error term of 3.576deg.

comment:4 by sanderd17, 11 years ago

Well, be sure to include some benchmarks on real-world examples (not on special cases, unless they happen a lot in the code), as well how you did those benchmarks (so we can repeat them). It would help us decide weather it's better or not.

I don't think we're too concerned about precision (most angle stuff is for visual things), the biggest concern is that it has to be cross-platform.

comment:5 by Thanduel, 11 years ago

The fixed point trigonometric functions are used in the game logic for example the atan2 approximation is used to determine what direction a unit is facing. The reason for using the fixed point library in 0AD is to prevent OOS errors. The precision of this routine is 100% the accuracy of the routine is bounded by an error of +- 3.57deg for 5 iterations which is the way it is currently in the patch.

I've attached the patch I will attach main of the benchmarking program in a moment. I looked at the javascript maths functions but I'll need to chat with someone more experienced with javascript to know what the best way to implement this algorithm is and how to benchmark it.

I have made the constructor from specified value public in order to be able to construct a table of angles at compile time. I don't think it will create the table at compile time if I use the static FromInt method but I may be wrong.

comment:6 by sanderd17, 11 years ago

Can you make your patch from the svn root directory? That makes it show up nicer in trac.

If you want to chat, be sure to drop by on #0ad-dev on Quakenet.

To check performance of JS code, you can use the spidermonkey stand-alone interpreter (usually the "js" command on Linux). Use +(new Date) to get the current time in milliseconds, so you can get the time difference.

In javascript, you can't access the internal value of the number.

But anyway, if you don't want to do the JS optimisations, the C++ optimisations will be welcome anyway (if they're as good as promised ;)

comment:7 by Thanduel, 11 years ago

Sure its actually the first patch file I've ever made so I wasn't sure what the procedure was. I just realised I need to fix that patch file as well to put the types back to the 0ad standard i64 u64 etc.

by Thanduel, 11 years ago

Attachment: Atan2.patch added

patch for fixed.h and fixed.cpp with cordic version of atan2_approx

comment:8 by Thanduel, 11 years ago

Description: modified (diff)
Summary: Optimisation of Fixed Point atan2_approxOptimisation of Fixed Point Trigonometic Approximations

by Thanduel, 11 years ago

Attachment: Atan2.2.patch added

by Thanduel, 11 years ago

Attachment: sincos_approx.patch added

patch to replace current sincos_approx with cordic sincos approx

by Thanduel, 11 years ago

Attachment: Benchmark_Fixed.zip added

benchmarking project for both atan2 and sincos approximations

comment:9 by Thanduel, 11 years ago

Both the atan2 and sincos approximations implemented using the CORDIC algorithm in the benchmarking project double the performance of the functions. The benchmarking project includes benchmarking for both of the new functions.

I have included two separate patches for the actual functions so that they may be applied separately. I'm not sure how to delete attachments but the older of the two Atan2 patch files should be disregarded.

comment:10 by Thanduel, 11 years ago

Keywords: review patch added; Fixed Point Optimisation removed
Milestone: BacklogAlpha 15
Summary: Optimisation of Fixed Point Trigonometic Approximations[PATCH] Optimisation of Fixed Point Trigonometic Approximations

comment:11 by Thanduel, 11 years ago

Keywords: Fixed Point Optimisation added

comment:12 by FeXoR, 11 years ago

AFAICS there's no JS implementation patch included for RMS (Random Map Scripts).

If you don't want to do it I'm willing to do this part (though due to your experience it would be better if you could give it a try).

Actually trigonometric functions are used several thousand times in some RMS so it would really be a big gain to have those.

To have faster functions for squareroot, exp and log would also be nice (especially squareroot for vector length/distance calculations).

Examples of extensive use:

  • RMS Deep Forest: binaries/data/mods/public/maps/random/deep_forest.js
  • RMGEN lib to place walls: binaries/data/mods/public/maps/random/rmgen/wall_builder.js
Last edited 11 years ago by FeXoR (previous) (diff)

comment:13 by wraitii, 10 years ago

Has anybody tested wether this worked properly and actually improved speed?

comment:14 by leper, 10 years ago

Milestone: Alpha 15Alpha 16

comment:15 by Stan, 10 years ago

Pushing to A17 to have time to implement it in JS and to test it.

comment:16 by Josh, 10 years ago

Keywords: review patch Fixed Point Optimisation removed
Milestone: Alpha 16Alpha 17

FeXoR: Could you please look at committing this?

in reply to:  16 ; comment:17 by leper, 10 years ago

Keywords: patch review added

Replying to Josh:

FeXoR: Could you please look at committing this?

You do know that FeXoR doesn't have commit access?

in reply to:  17 comment:18 by Josh, 10 years ago

Replying to leper:

Replying to Josh:

FeXoR: Could you please look at committing this?

You do know that FeXoR doesn't have commit access?

I forgot about that. If he could review/integrate this patch with JS and upload the patch, I'll commit it.

comment:19 by Radagast, 10 years ago

I don't get the point of the discussion here. Won't JavaScript finally use the SpiderMonkey trigonometry functions?

And if the JavaScript side should also get a CORDIC implementation then this looks more like a separate ticket to me. Therefore we just had to test the performance effect of the patch of Thanduel.

I think we could really need this improvement. Why do we push it away just because of JavaScript which seems like a different problem. (at least I can't find any trigonometric functions for Random map on the C++ side currently.)

I hope FeXoR sees this ticket. And Yves too. He will know what to do.

in reply to:  19 comment:20 by Josh, 10 years ago

Replying to Radagast:

I don't get the point of the discussion here. Won't JavaScript finally use the SpiderMonkey trigonometry functions?

And if the JavaScript side should also get a CORDIC implementation then this looks more like a separate ticket to me. Therefore we just had to test the performance effect of the patch of Thanduel.

I think we could really need this improvement. Why do we push it away just because of JavaScript which seems like a different problem. (at least I can't find any trigonometric functions for Random map on the C++ side currently.)

I hope FeXoR sees this ticket. And Yves too. He will know what to do.

Because at this point we can't be sure the spidermonkey functions are entirely deterministic across architectures and platforms.

comment:21 by sanderd17, 10 years ago

To be correct, we're sure they're not consistent across platforms, as they were the cause of previous OOS problems.

comment:22 by Stan, 10 years ago

They shouldn't be used then.

comment:23 by Thanduel, 10 years ago

The optimisations made here cannot be translated into javascript. The CORDIC algorithm relies on the ability to perform multiplications and divisions by bitshifting. Javascript only deals in floating point numbers and thus these algorithms would be horribly slow in javascript. I can spend some time investigating potential algorithms for javascript but I do think it is a separate issue and certainly shouldn't hold back this patch.

comment:24 by sanderd17, 10 years ago

JS supports bit shifting (f.e. 1 << 3 == 8), and behind the scenes, the IonMonkey compiler notices if something is floating point or integer (but breaking that assertion causes a huge slowdown, as it has to be compiled again).

But you're right it shouldn't hold back this patch. Though I thought it was pretty trivial. Just copy this algorithm to the JS functions (after the validation checks wrt NaN, Infinity, ...), remove some type info, and it should work. JS and C syntax for function bodies is almost equal.

comment:25 by sanderd17, 10 years ago

I just tried implementing it, and it's not as simple indeed. Most values are floats to start with, while in C++, you start with fixed. the conversion to something integer would add too much time to the algorithm to make up with the rest of the algorithm.

So I guess, for JS, we can't do a lot better than our current algorithm (which doesn't even have branching, loops or any type changes, so is very easy to compile for IonMonkey).

comment:26 by Itms, 10 years ago

Hi Thanduel, I just took a look at your patch and unfortunately it seems to raise some problems.

First, a quick profiling makes me think sincos doesn't specifically need to be improved.

For atan, even if I do observe an improvement of 1/2 the original time spent on this, there are also strong orientation problems for units, who often move perpendicularly to the direction they're facing. This is especially visible for cavalry units.

If the current implementation can be improved, you should try to, but if it's not the case I think this is far too inconvenient for players.

If it's the first case I'd like to point out that your patch adds useless initial lines to the modified files, and you should also change the copyright year to 2014. We use tabs and not quadruple spaces for indentation, and you need a space between if/for and the parenthesis (but everything here is just for coding conventions).

comment:27 by Thanduel, 10 years ago

Thanks for the review, hopefully I'll have some time to have a look at this over the next week or so.

Despite the fact that sincos might not require improvement it does make sense to have both trigonometric functions use the same type of approximation. It makes the job of future maintainers slightly easier.

I'll see to the coding convention issues this is actually the first patch I've ever submitted to a project so I'm not surprised there were a few oversights. I shall smite them with my mighty vim hammer despite the fact that it makes me sad to use tabs.

I will run a comparison between the old algorithm and the new algorithm there are probably boundary conditions where something is going wrong but it shouldn't be a big issue to sort out.

Thanks again for taking the time to review this.

comment:28 by Itms, 10 years ago

Keywords: review removed

Hi Thanduel, any progress on this? If not, there is no hurry, but I'll mark this ticket as reviewed to clean up a bit our review queue.

Thanks for your work so far!

comment:29 by Itms, 10 years ago

Milestone: Alpha 17Alpha 18

comment:30 by Itms, 9 years ago

Milestone: Alpha 18Backlog

Backlogging due to lack of progress.

comment:31 by Thanduel, 9 years ago

Hi, I've finally had a chance to come back to this. I've found the bug (hopefully the only one). I just need to test it in game. Should get a chance to do that tomorrow evening.

by Thanduel, 9 years ago

Attachment: cordic_patch.patch added

play tested patch including cordic atan2 and sincos

comment:32 by Thanduel, 9 years ago

these are now play tested, I also tested the outputs of the algorithm against in game input data that I logged by writing the inputs to a file. So far as I can tell there are no further issues.

comment:33 by Thanduel, 9 years ago

Keywords: review added
Milestone: BacklogAlpha 19

comment:34 by Itms, 9 years ago

Keywords: review removed

Thanks a lot for your work! Here are some comments, mainly about code organization and style.

  • It is bad to make the explicit constructor public only for this algorithm. What would be nice is to group everything the CORDIC algorithms need in a structure inside CFixed. That would also allow you to get rid of the ugly declarations of the lines 146 to 161.
  • Improving the documentation of your code would be nice. The code is nicely commented but the functions themselves lack explanation. Are the comments line 379 and 385 of Fixed.h still accurate? Do you have some link to a documentation for CORDIC algorithms that you could use to replace the comment of the former line 146? Are the values in tan_angle_table_rad angles (like the comment suggests) or tan(angles) (like the name suggests)?
  • Comments should be // foo and not //foo
  • Why using foo* -1 instead of -foo? You could also use the - operator of the CFixeds, btw.
  • Make sure to always use exact same types. For CFixed_15_16s, T is i32 and not just int, for instance, so internal values should be i32s.
  • If loop_var is not used outside the loop, just define it as for (int loop_var = ...
  • Everytime you define a CFixed with its internal value (for instance line 259), please write in a comment what's a human-readable value for it, or even better, define it with FromInt or FromDouble.

Looking forward to the next version of the patch! :)

comment:35 by Itms, 9 years ago

Milestone: Alpha 19Alpha 20

Hi, we're hopefully getting close to the next release, so I think it will be a bit late to integrate that when it's ready. You don't have to wait for the release to happen to propose a new patch though ;)

comment:36 by elexis, 8 years ago

Milestone: Alpha 20Backlog

Backlogging due to lack of progress.

comment:37 by fsincos, 8 years ago

I attached a patch that should improve the precision of sincos_approx (from ~4*10-4 to ~1.5*10-4) while reducing the amount of multiplications used. It uses a similar approach to the "original" implementation and as such, there is no improvement for atan (arctan(z) has singularities for z=+-i and thus polynomial approximations are much worse when compared to those of sin(z) resp. cos(z) (no singularities)).

Perhaps the CORDIC algorithms are better, in the meantime we can use this.

Typos/optimizations in my patch: c1: 36364 -> 36365 c2: 3604 -> 3605 c3: 46340 -> 46341 c5: 703 -> 708 Error 1.37*10-4 -> 1.27*10-4 scsum <-> scdif (How could this happen?)

Last edited 8 years ago by fsincos (previous) (diff)

comment:38 by fsincos, 8 years ago

I have polished Thanduel's atan2 patch slightly. The error bound should improve to about 1.79 degrees (down from 3.576). Since we don't really need CFixed_15_16 multiplication for this, I've "integerized" the whole algorithm. There is also a slight difference in how the angle is transformed into [-pi/2, pi/2] and [-pi/4, pi/4], respectively.

Please don't mind the changes in test_Fixed.h.

comment:39 by Itms, 8 years ago

Hi, thanks for your patch. Correct me if I'm wrong, but there are typos in it? Please fix them and upload a new patch :) Also please read SubmittingPatches when doing so!

by fsincos, 8 years ago

Attachment: 2109rc1.patch added

comment:40 by fsincos, 8 years ago

I have fixed the typos and improved the algorithm further. The attached patch is play-tested and has roughly the same performance as Thanduel's version, with half its error. It can be viewed as a "release candidate".

By the way, this algorithm is not well-suited for JS because it needs a lot of branching. I could help find something similar to what I did for sincos_approx, reducing the degree of the approximations to ~8-10 (down from 12) for cos and ~11 (down from 15) for atan, preserving the symmetries.

In order to proceed further, I'd like to know a couple of things. (1): Does the result of atan2_approx have to be in a specific interval, e.g. [-pi,pi)? (2): Are there certain angles where equality should hold? (3): What is the expected range of input values? (Overflow for |(x,y)| > 19910.8, rounding errors for small |(x,y)|) (4): Is this more to your liking than what Thanduel did or should I change the style back to his? (5): Is it OK if the patched game fails some tests but works fine otherwise (see also (2))?

comment:41 by sanderd17, 8 years ago

For the overflow and large error on points close to the origin, perhaps you could start with normalising the point. Chose an easy norm (like the maximum norm: norm(x,y) = max(|x|,|y|) ), and divide x and y by that norm. That results in one fixed division (you don't need to divide a number with itself, that's just 1)

I also wonder if you can't deduce info from that normalised form (with the maximum norm). As when x ends up to be one, y is exactly the tangents, and you have the mirrored situation for y being one.

It's just a thought though. If it doesn't help, then so be it.

A bigger issue for me is that you changed the tests. F.e. sin and cos used to return exactly 0 and 1 when evaluated in 0. But now with the float version, you allow a difference. As the game has never been tested with a small difference, this could cause problems. Making the difference smaller isn't a problem of course.

I didn't really look into the sincos changes yet.

by fsincos, 8 years ago

Attachment: sincos.patch added

by fsincos, 8 years ago

Attachment: atan2cordic.patch added

comment:42 by fsincos, 8 years ago

I have updated the patches. The sincos patch can be ignored for now since it's not clear if the special angles cause breakages; I improved it a bit anyway. The atan2 patch has the shifting to prevent overflow (which would have happened in every previous version including the current svn implementation).

Note: See TracTickets for help on using tickets.