Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

coral_0.8.0-beta1: Random crash after changing config files #29

Open
mrindar opened this issue Mar 22, 2017 · 8 comments
Open

coral_0.8.0-beta1: Random crash after changing config files #29

mrindar opened this issue Mar 22, 2017 · 8 comments
Labels

Comments

@mrindar
Copy link
Contributor

mrindar commented Mar 22, 2017

Don't have enough information for this issue yet, but after I started working with coral_0.8.0-beta1 coralslave randomly crashes at startup after having made changes to .execonf or .sysconf.

This is a typical scenario:

  1. coralslaveprovider /path/to/fmus -o out/
  2. coralmaster run .execonf .sysconf
  3. coralslaves spawn and simulation continues and terminates normally
  4. make a change in .execonf, e.g. stop time
  5. coralmaster run .execonf .sysconf
  6. spawned coralslave crashes
    1. Which spawned coralslave that crash seems random (sorry about that)
  7. close all spawned cloralslave windows
  8. close coralslaveprovider
  9. coralslaveprovider --clean-cache
  10. repeat 1 and 2
  11. coralslaves spawn and simulation continues and terminates normally

It should be noted that running coralslaveprovider --clean-cache does not always work, sometimes I have to repeat the process a couple of times until the coralslaves stop crashing. Also, this doesn't always happen when a change is made to .execonf or .sysconf, but every time it has happened it has been after making a change to one of them.

Some more investigation is definitely needed on this issue

@kyllingstad
Copy link
Member

It's hard to say which changes since 0.7 could cause this. The first thing that comes to mind would be the timeout stuff, but that shouldn't be affected by changing configuration files. Actually, I can't think of anything that would cause some kind of memory effect wrt. the configuration files.

Some things you could do which could possibly narrow this down:

  • Try disabling all timeouts (i.e., set them to -1). If things start to run smoothly then, enable them one at a time and try to observe which one triggers the issue.
  • Try going back to 0.7.1 for a while and see if the problems occur there too.

@mrindar
Copy link
Contributor Author

mrindar commented Apr 7, 2017

So it appears that changing the configuration files has nothing to do with it. I've been running a few simulations 10ish and 2 crashed (no changes to the configuration files). If I try to start a new simulation after the crash (not running --clean-cache on the slaveprovider), I get a
[warning] GetSlaveTypes request to slave provider 61926fa9-6de0-456c-afa5-69034f0c37c9 failed (timed out Error: Slave type not found: SomeFmu
error message, and it seems to always be on the same FMU, but that't not the one that necessarily crashes. The coralslave.exe which crashes isn't always for the same FMU.

EDIT
I'm sorry to say, but this isn't consistent either :P

@mrindar
Copy link
Contributor Author

mrindar commented Apr 7, 2017

When I tried to press 'Debug' in the coralslave.exe crash window, Visual Studio gave me this message:
Unhandled exception at 0x743B170C (mswsock.dll) in coralslave.exe: 0xC0000005: Access violation executing location 0x743B170C.
Don't know if it means something

@mrindar
Copy link
Contributor Author

mrindar commented Apr 7, 2017

I'm curious if this is similar to the zeroMQ crash we've struggled with in the past, except then the crash took a proper choke hold on the CPU and a hard restart was required. However, that crash also seemed rather random, much like this one, and the mswsock.dll sounds like some socket stuff if I put on my sherlock holmes hat

@mrindar
Copy link
Contributor Author

mrindar commented Apr 7, 2017

Oh! This is interesting.
So,

  1. I run a simulation
  2. A 'coralslave.exe has crashed window' shows up
  3. I press 'Debug'
  4. Visual Studio fires up and shows this exception: Unhandled exception at 0x743B170C (mswsock.dll) in coralslave.exe: 0xC0000005: Access violation executing location 0x743B170C.
  5. I press 'Continue'
  6. The coralslave continues and the simulation starts and finishes normally

@mrindar
Copy link
Contributor Author

mrindar commented Apr 7, 2017

Simulation Number Number of coralslaves.exe crashed Pressing 'Continue' after crash worked
1 1 yes
2 1 yes
3 0 N/A
4 0 N/A
5 1 yes
6 1 yes
7 0 N/A
8 1 yes
9 0 N/A
10 0 N/A
11 0 N/A
12 3 yes
13 0 N/A
14 0 N/A
15 0 N/A
16 0 N/A
17 0 N/A
18 0 N/A
19 0 N/A
20 0 N/A

So, not very scientific here but, approximately 6 in 20 simulations crashes and one interesting one here is simulation 12 where 3 coralslaves crashed. So first one crashed, then i pressed 'Continue' in the debugger and then another one crashed, and so on. Also it seems like pressing 'Continue' in the debugger when a crash occurs, works every time

@kyllingstad
Copy link
Member

I've run your setup 50–60-ish times myself now, using coral 0.8.0-beta1, and it didn't crash once. So this is going to be hard to track down, I guess. You're still on Windows 7?

@mrindar
Copy link
Contributor Author

mrindar commented May 2, 2017

Awesome :P Still on Windows 7, yes

@kyllingstad kyllingstad removed their assignment Aug 17, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

2 participants