-
Notifications
You must be signed in to change notification settings - Fork 219
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
rednet.lookup, rednet.host and gps.locate infinite loop | turtle.forward does not return #1995
Comments
Could also be that the processes were killed silently or simply not resumed since they yielded, however this does not happen at any other spot that yields / uses os.pullEvent except for those loops. |
Cc does not have concept of processes so that's not likely what is happening. rednet.host is most likely just echo effect of rednet.lookup because it preforms lookup when hosting (to ensure its not duplicating existing hosted name).
|
If you are flooding the event queue, getting the turtles to randomly delay when they start might help. |
Yep, if you have massive amount of modem and rednet hosting in the world, queue excess is possible. If you want control swarm of computer, you can try use websocket, where you can compress multiple message in a single event. |
Yeah I just dumped the folder from my host turtle, nothing pretty. The main logic for bluenet is in classBluenetNode. bluenet.lua itself is basically just for opening and closing the channels as a global api. Lookup and host is the original rednet implementation. The actual messaging is handled by the nodes to reduce function calls to the global api. I have 60+ Turtles in my Testworld but i made sure that startup happens in steps, no delay tough. During startup no new events/modem_messages are queued without the host being online in the first place. Except for the dns/gps messages of course. Those turtles fail before the host even boots. |
True, but also quite annoying. Idk about the limitations of the queue but i cant imagine it being so low, not even 60 computers at startup are supported. It also only happens during lag which in my mind only slows down the code, not actually increase the events being queued dramatically. |
The limit of queue is 256, however this is limit is per computer, which means you should be aware if you are using parallel to queue tasks at same time. |
Its "only" 60 Turtles, so i wouldnt say thats massive but a considerable amount. As mentioned in my previous reply idk enough about the java side of the queue implementation to give any usable reply to this. Is it possible to use the websockets for direct turtle communication without using http? |
Oh, 256 is lower than expected. That might explain it but lets say I have 256 Turtles, wouldnt that automatically make rednet.lookup fail each time they start? My goal for this project was to support as many turtles as possible but for my testworld i stuck to just 60. I have some sleep(0) statements in the other code like pathfinding etc., not during startup though. If i remember correctly, sleep also queues a new timer event, which is why you recommend using a global tick? Could I alternatively just do coroutine.yield if i want to avoid the timeout? |
coroutine.yield cannot be used to replace sleep(0). because it may not wait for a single tick, or it may wait forever (until an event is queued). sleep(0) queue a timer event that will be fired after exactly one tick. |
So if I understand correctly the queue is being flooded with more than 256 events. According to my current setup: For each request:
If those messages are not handled/pulled in time and the queue gets above 256, the initial timer event gets discarded. Somewhat unrelated to the problem itself, I decided to get rid of rednet entirely. This is mainly to avoid a duplicate "while true do os.pullEvent()" loop and in turn a duplicate check of the modem_messages. It took me a while to figure out how to manipulate the bios but here´s how to kill rednet.run:
startup.lua:
|
I actually have a coroutine library to allow you create infinity timers by iter ticks as I mentioned above https://github.com/zyxkad/cc/blob/master/coroutinex.lua#L488 |
Note that this should only be the case when a timeout is passed. If you're receiving 10k messages a second, you probably don't need a timeout in the first place! Though yes, we should cancel the timer if it expires. |
Very interesting, ill look into it in more detail in the next days. Thanks! |
Not in testing, but i need the timout to trigger onNoAnswer events in "real" applications. While i send lots of messages, they all come from different clients and require ACKs (like MQTT QOS 1). If one client/broker has an issue or lags behind, this must result an according reaction for that specific message. If i dont check for missed messages via timers, a client might get "stuck" without being able to resolve the issue. Once the timer runs out, the clients have to republish their accumulated logs and are only allowed to discard them after they received an ack. The 10k messages are quite excessive for testing purposes but still highlight the overhead caused by the timers. |
Several functions accept a "timeout" argument, which is implemented by starting a timer, and then racing the desired output against the timer event. However, if the timer never wins, we weren't cancelling the timer, and so it was still queued. This is especially problematic if dozens or hundreds of rednet (or websocket) messages are received in quick succession, as we could fill the entire event queue, and stall the computer. See #1995
Minecraft Version
1.21.x
Version
1.21.1-fabric-1.113.1
Details
During high lag situations, especially when loading the world, rednet.lookup does not return / resume. I observed this behaviour in gps.locate and rednet.host as well. Since rednet.host uses rednet.lookup this is somewhat plausible. Because gps.locate uses a very similar loop to rednet.lookup, I assume the issue lies with the timer (os.startTimer) event being used as an exit condition. I dont know if it is possible for a process to miss a timer event, however this looks like the only commonality between those three scenarios.
Except for rednet, no other process is using os.pullEvent in the cases i encountered this issue (except for the bios i guess). This usually happens after repeatedly calling rednet.lookup to wait for the host to come online. (would be nice to manually set the timeout for rednet.lookup btw.)
I only encountered this when loading the world, which results in my turtles being stuck because they call gps.locate, rednet.host and rednet.lookup on startup. The only fix is to manually restart the turtles or preload the world into memory and join again.
Quickly reloading the shaders using "R" helps with reproducing this. No error message is being displayed and even after 30+ minutes the turtles do not recover.
Screenhots:
rednet.lookup using parallel.waitForAll 2 processes
forced parallel lookups all locking at once instead of just one
rednet.lookup singular main process 6th call
rednet.host parallel.waitForAll 2 processes (bluenet is a more lightweight implementation of rednet but uses the default rednet.lookup and rednet.host because of its response mechanism being started in the bios)
gps.locate singular main process, 2nd call
UPDATE: 18.10.2024
After removing rednet entirely, the only bug i still couldnt get rid of, was the one seen in the last screenshot about gps.locate. It seems this is actually an issue with turtle.forward. During stresstesting i found that the first turtle.forward statement does not return or throw any errors. Since it was right before gps.locate, i initially assumed the issue lies with the gps.
turtle.forward indefinitely locking up
This only happens when immediately calling turtle.forward after opening three shells.
I got rid of the bug by opening the shells after initializing the orientation of my turtle.
Turtle.forward does get stuck but the turtle itself did in fact move a block forward and was not blocked.
The text was updated successfully, but these errors were encountered: