Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Do not run forever in request worker threads #3726

Closed
wants to merge 3 commits into from

Conversation

geo2a
Copy link
Collaborator

@geo2a geo2a commented Feb 21, 2024

I've added code that tracks the worked threads spawned in Kore.JsonRpc.Server.srv, and polls them on their status when spawning a new one. It turns out, that they never exit, because we will spawn a new worker for every request.

We end up with logs like when submitting several copies of the same requests one after the other:

.build/kore/bin/kore-rpc test/rpc-server/simplify/and-simplify/definition.kore --module TEST --server-port 12341 --log-level info
...
    Verifying the definition TimeSpec {sec = 0, nsec = 1467} []
mkore-rpc: [173420029] Info (LogJsonRpcServer):
    Spawned worker with thread id ThreadId 27
kore-rpc: [173421622] Info (LogJsonRpcServer):
    Status of worker threads:
kore-rpc: [173422274] Info (LogJsonRpcServer):
      ThreadId 27: ThreadBlocked BlockedOnSTM

kore-rpc: [173423187] Info (LogJsonRpcServer):
    Process request 1 simplify
kore-rpc: [178823488] Info (LogJsonRpcServer):
    Spawned worker with thread id ThreadId 35
kore-rpc: [178824515] Info (LogJsonRpcServer):
    Status of worker threads:
kore-rpc: [178824876] Info (LogJsonRpcServer):
      ThreadId 35: ThreadBlocked BlockedOnSTM
      ThreadId 27: ThreadBlocked BlockedOnSTM

kore-rpc: [178825417] Info (LogJsonRpcServer):
    Process request 1 simplify
kore-rpc: [181816914] Info (LogJsonRpcServer):
    Spawned worker with thread id ThreadId 43
kore-rpc: [181817559] Info (LogJsonRpcServer):
    Status of worker threads:
kore-rpc: [181817869] Info (LogJsonRpcServer):
      ThreadId 43: ThreadBlocked BlockedOnSTM
      ThreadId 35: ThreadBlocked BlockedOnSTM
      ThreadId 27: ThreadBlocked BlockedOnSTM

kore-rpc: [181818666] Info (LogJsonRpcServer):
    Process request 1 simplify

and with change we get the threads to stop:

.build/kore/bin/kore-rpc test/rpc-server/simplify/and-simplify/definition.kore --module TEST --server-port 12341 --log-level info
kore-rpc: [2199] Info (LogMessage):
    Reading the input file TimeSpec {sec = 0, nsec = 607689} []
kore-rpc: [26521] Info (LogMessage):
    Parsing the file TimeSpec {sec = 0, nsec = 1467} []
kore-rpc: [74874] Info (LogMessage):
    Verifying the definition TimeSpec {sec = 0, nsec = 1397} []
kore-rpc: [137888] Info (LogMessage):
    Executing TimeSpec {sec = 0, nsec = 59132870} []
kore-rpc: [143984] Info (LogMessage):
    Reading the input file TimeSpec {sec = 0, nsec = 531772} []
kore-rpc: [171454] Info (LogMessage):
    Parsing the file TimeSpec {sec = 0, nsec = 1467} []
kore-rpc: [217430] Info (LogMessage):
    Verifying the definition TimeSpec {sec = 0, nsec = 1886} []
kore-rpc: [22047663] Info (LogJsonRpcServer):
    Spawned worker with thread id ThreadId 27
kore-rpc: [22048451] Info (LogJsonRpcServer):
    Status of worker threads:
kore-rpc: [22049142] Info (LogJsonRpcServer):
      ThreadId 27: ThreadBlocked BlockedOnSTM

kore-rpc: [22049871] Info (LogJsonRpcServer):
    Process request 1 simplify
kore-rpc: [24260908] Info (LogJsonRpcServer):
    Spawned worker with thread id ThreadId 36
kore-rpc: [24261398] Info (LogJsonRpcServer):
    Status of worker threads:
kore-rpc: [24261677] Info (LogJsonRpcServer):
      ThreadId 36: ThreadBlocked BlockedOnSTM
      ThreadId 27: ThreadFinished

kore-rpc: [24262226] Info (LogJsonRpcServer):
    Process request 1 simplify
kore-rpc: [25892950] Info (LogJsonRpcServer):
    Spawned worker with thread id ThreadId 45
kore-rpc: [25893404] Info (LogJsonRpcServer):
    Status of worker threads:
kore-rpc: [25893620] Info (LogJsonRpcServer):
      ThreadId 45: ThreadBlocked BlockedOnSTM
      ThreadId 36: ThreadFinished
      ThreadId 27: ThreadFinished

kore-rpc: [25894108] Info (LogJsonRpcServer):
    Process request 1 simplify

@@ -188,7 +205,7 @@ srv respond handlers = do

liftIO $
forkIO $
forever $
id $
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this id is here just to preserve formatting, must be removed if we merge this

@@ -757,10 +755,7 @@ runServer port serverState mainModule runSMT Log.LoggerEnv{logAction} = do
flip runLoggingT logFun $
jsonRpcServer
srvSettings
( \req parsed ->
log (InfoJsonRpcProcessRequest (getReqId req) parsed)
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we already emit a similar log message on receiving requests in Kore.JsonRpc.Server.srv

@goodlyrottenapple
Copy link
Contributor

How exactly is this being tested? From the logs, it would appear that each request is using a brand new client connection (since the request ID is always 1). If this is the case, the old behaviour is to be expected, unless the client connections were actually closed, in which case it's definitely a bug.

@jberthold
Copy link
Member

How exactly is this being tested? From the logs, it would appear that each request is using a brand new client connection (since the request ID is always 1). If this is the case, the old behaviour is to be expected, unless the client connections were actually closed, in which case it's definitely a bug.

Good point. It looks very much like each request uses a fresh connection/session in these tests but something is amiss either way:

  • We reuse a connection created by kore-rpc-client when we pass several commands to one invocation of the program. In that case, we would want to see the thread being created and then finished when the connection is closed (but it is not finished).
  • If this test was using several invocations of kore-rpc-client, there would probably be different sessions, each one creating its worker thread. Again, these threads should be terminated when the session is closed but they aren't.

An important part of understanding this is to understand how exactly the sessions in jsonRpcServer work, whether each session has its own main thread (i.e., whether there is another, implicit, forkIO in the server infrastructure code).

@jberthold
Copy link
Member

jberthold commented Feb 22, 2024

Upon closer inspection, the thread status logging itself unfortunately creates a misleading picture:

  • I added logging to the jsonRpcServer worker function (within runGeneralTCPServer. This shows that indeed there is a new thread for every session - actually 3 of them (the first one's ID is logged). These threads come from inside the Conduit engine room, and I assume they terminate or get killed when the session is closed.

  • Worker threads for each session are indeed killed, unless we log them in the added code:
    By adding the worker ThreadId to the global list, we create a reference from the main thread (actually, any thread) to the worker thread. This prevents worker threads from receiving a BlockedIndefinitely exception during garbage collection, which would otherwise terminate the orphan workers.

I changed the code to use Weak references to ThreadId (there is a dedicated mkWeakThreadId) and the logging confirms that these threads disappear. Running three invocations of kore-rpc-clients with 3 requests each, I am getting the following output on the third invocation:

kore-rpc: [30711541] Info (LogJsonRpcServer):
    Json RPC Request handler in thread ThreadId 44
kore-rpc: [30711800] Info (LogJsonRpcServer):
    Spawned worker with thread id ThreadId 48
kore-rpc: [30711907] Info (LogJsonRpcServer):
    Status of worker threads: 
      Just (ThreadId 48): ThreadBlocked BlockedOnSTM
      Nothing: Gone
      Nothing: Gone
kore-rpc: [7223752] Info (LogJsonRpcServer):
    Process request 1 execute
kore-rpc: [7225970] Info (LogJsonRpcServer):
    Process request 1 execute
kore-rpc: [7227838] Info (LogJsonRpcServer):
    Process request 1 execute
kore-rpc: [7229554] Info (LogJsonRpcServer):
    Process request 1 execute

i.e., the previous two worker threads have been removed (by garbage collection).

So I think all is well with the current server code in that respect.

Code changes to avoid holding thread references:

diff --git a/kore-rpc-types/src/Kore/JsonRpc/Server.hs b/kore-rpc-types/src/Kore/JsonRpc/Server.hs
index fe3d8b7cf..9c05b183c 100644
--- a/kore-rpc-types/src/Kore/JsonRpc/Server.hs
+++ b/kore-rpc-types/src/Kore/JsonRpc/Server.hs
@@ -18,7 +18,7 @@ import Control.Concurrent (forkIO, throwTo)
 import Control.Concurrent.STM qualified as GHC
 import Control.Concurrent.STM.TChan (newTChan, readTChan, writeTChan)
 import Control.Exception (Exception (fromException), catch, mask, throw)
-import Control.Monad (forM, forM_)
+import Control.Monad (forM, forM_, forever)
 import Control.Monad.IO.Class (MonadIO (liftIO))
 import Control.Monad.Logger (MonadLoggerIO)
 import Control.Monad.Logger qualified as Log
@@ -33,6 +33,7 @@ import Data.Conduit.Network (ServerSettings, appSink, appSource, runGeneralTCPSe
 import Data.Conduit.TMChan (closeTBMChan, sinkTBMChan, sourceTBMChan)
 import Data.Maybe (catMaybes)
 import Data.Text qualified as Text
+import System.Mem.Weak qualified as GHC
 import GHC.Conc.Sync qualified as GHC
 import Kore.JsonRpc.Types (FromRequestCancellable (isCancel), ReqException (..), rpcJsonConfig)
 import Network.JSONRPC hiding (encodeConduit, runJSONRPCT)
@@ -88,6 +89,8 @@ jsonRpcServer ::
     m a
 jsonRpcServer serverSettings respond handlers =
     runGeneralTCPServer serverSettings $ \cl ->
+        liftIO GHC.myThreadId >>= \me ->
+        Log.logInfoNS "kore-rpc" (Text.pack $ "Json RPC Request handler in thread " <> show me) >>
         runJSONRPCT
             -- we have to ensure that the returned messages contain no newlines
             -- inside the message and have a trailing newline, which is used as the delimiter
@@ -101,7 +104,7 @@ jsonRpcServer serverSettings respond handlers =
 data JsonRpcHandler = forall e. Exception e => JsonRpcHandler (e -> Log.LoggingT IO ErrorObj)
 
 {-# NOINLINE workerList #-}
-workerList :: GHC.TVar [GHC.ThreadId]
+workerList :: GHC.TVar [GHC.Weak GHC.ThreadId]
 workerList = IO.unsafePerformIO $ GHC.newTVarIO []
 
 srv ::
@@ -130,13 +133,14 @@ srv respond handlers = do
              in loop
     newWorkerTid <- spawnWorker reqQueue
     Log.logInfoN $ "Spawned worker with thread id " <> Text.pack (show newWorkerTid)
-    atomically $ GHC.modifyTVar workerList (newWorkerTid :)
-    Log.logInfoN "Status of worker threads: "
+    weakWorkerTid <- liftIO $ GHC.mkWeakThreadId newWorkerTid
+    atomically $ GHC.modifyTVar workerList (weakWorkerTid :)
     workerIds <- atomically $ GHC.readTVar workerList
-    workerStatuses <- forM workerIds $ \tId -> do
-        status <- liftIO $ GHC.threadStatus tId
-        pure $ "  " <> show tId <> ": " <> show status
-    Log.logInfoN . Text.pack . unlines $ workerStatuses
+    workerStatuses <- forM workerIds $ \wkTId -> do
+        mbTId <- liftIO $ GHC.deRefWeak wkTId
+        status <- liftIO $ maybe (pure "Gone") (fmap show . GHC.threadStatus) mbTId
+        pure $ "  " <> show mbTId <> ": " <> status
+    Log.logInfoN . Text.pack . unlines $  "Status of worker threads: " : workerStatuses
 
     mainLoop newWorkerTid
   where
@@ -205,7 +209,7 @@ srv respond handlers = do
 
         liftIO $
             forkIO $
-                id $
+                forever $
                     bracketOnReqException
                         (atomically $ readTChan reqQueue)
                         (withLog . processReq)

@geo2a
Copy link
Collaborator Author

geo2a commented Feb 22, 2024

Closing this as this is not a valid solution, see #3727 instead.

@geo2a geo2a closed this Feb 22, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants