Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Floating point error in currentTime and stepSize? #14

Open
mrindar opened this issue Mar 9, 2017 · 2 comments
Open

Floating point error in currentTime and stepSize? #14

mrindar opened this issue Mar 9, 2017 · 2 comments
Labels

Comments

@mrindar
Copy link
Contributor

mrindar commented Mar 9, 2017

Had some discussions with Christian Steinebach where he mentioned that simulations they have performed in a tool using HLA had issues due to floating point inaccuracies in current simulation time and step size.

Example:
The situation below is what we would expect with a stepSize set to 0.1

timePoint currentTime stepSize
    0         0.0       0.1
    1         0.1       0.1
    2         0.2       0.1 

The situation below is what can happen with a stepSize set to 0.1

timePoint   currentTime          stepSize
    0           0.0              0.100...01
    1           0.100...01       0.099...99
    2           0.199...99       0.100...01

Example End

The situation described by the above example will probably cause user code in the fmu on the following form to brake:

if(stepSize > 0.1) {
 //do stuff
}

Also, if the step size in the execonf file is set to 0.1, but internally coral increments currentTime with a step size of 0.100...01, things might start to drift?

I'm not 100% sure whether this is a problem for us now, but this was Christians experience from the HLA tool, so we should probably give it some thought.

@kyllingstad
Copy link
Member

This is a fundamental problem of floating-point arithmetic: It is only possible to exactly represent numbers on the form m*(2^q), where m and q are integers. (And m and q can only occupy a certain number of bits, of course, so their size is limited.) Any number which is not representable like this gets rounded to the nearest representable number. So for example -0.25 = -1*(2^-2) and 7 = 7*(2^0) can be represented exactly, while your example of 0.1 can not.

So you're right, if you start at 0 and add 0.1 ten times, you don't get 1. Assuming double precision, you actually get a number which is 1.1e-16 too small. (I wrote a small program to check.) If you keep adding 0.1 until you reach what should have been one million, the discrepancy has grown to -1.6e-4. But if we're talking simulated seconds here, that would amount to 278 hours, which is a pretty long simulation. :-)

So it's definitely a thing, and there is going to be some drift, but I am having trouble coming up with cases where that would be a problem. The important thing is that if fmiDoStep() is at one point called with currentCommunicationPoint=t and communicationStepSize=dt, then the next call should have currentCommunicationPoint=t+dt. User code should just accept the time points it is given, and not worry too much about whether dt is exactly so-and-so big.

But I am very interested to hear some more specific examples of where that might pose a problem!

@kyllingstad
Copy link
Member

kyllingstad commented Mar 9, 2017

Just a few more thoughts:

Firstly, I guess there could be an issue if user code keeps time in single precision, i.e., as a float rather than a double. Then the discrepancies could be bigger, and you also have a mismatch between the master's time and the slave's internal time.

Secondly, I thought of a case where small errors in time step length could really matter, and that is if user code performs numerical differentiation using the step size (or a fraction thereof) as the time differential. For example,

        f(t) - f(t-dt)
f'(t) = --------------
              dt

Thirdly, the FMI 2.0 spec (sec. 4.2.2) says the following about why fmi2DoStep takes both the current communication point and the step size, which might be relevant:

Formally argument currentCommunicationPoint is not needed. It is present in order to handle a mismatch between the master and the FMU state of the slave: The currentCommunicationPoint and the FMU state of the slaves defined by former fmi2DoStep or fmi2SetFMUState calls have to be consistent with respect to each other. For example, if the slave does not use the update formula for the independent variable as required above, 𝑡𝑐 𝑖+1 = 𝑡𝑐 𝑖 + ℎ𝑐 𝑖 (using argument 𝑡𝑐 𝑖 = currentCommunicationPoint of fmi2DoStep) but uses internally an own update formula, such as 𝑡𝑐 𝑠,𝑖+1 = 𝑡𝑐 𝑠,𝑖 + ℎ𝑐 𝑠,𝑖 then the slave could use as time increment ℎ𝑐 𝑠,𝑖 := (𝑡𝑐 𝑖 − 𝑡𝑐 𝑠,𝑖) + ℎ𝑐 𝑖 (instead of ℎ𝑐 𝑠,𝑖 := ℎ𝑐 𝑖) to avoid a mismatch between the master time 𝑡𝑐 𝑖+1 and the slave internal time 𝑡𝑐 𝑠,𝑖+1 for large i.

(Some formatting was lost in the paste; the s'es and i's should be subscripts.)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

2 participants