Skip to content

2014 09 18

Andre Merzky edited this page Sep 17, 2014 · 18 revisions
  • Agenda:

    • open TODOs
      • TODO SJ: doc is missing 10.000 feet overview (SJ?)
      • TODO SJ: contact Helmut about SuperMuc allocation
      • DONE AM: reduce testing frequency further after release
      • WIP AM: follow up with Anjani on EC2
      • WIP AM: RP slide decks
      • TODO AM: test supermuc, check out gsi contexts
      • DONE AM: suspend testing on FG
      • TODO MS: verify state model wrt. data staging
      • TODO AM: better name for STATE_X
    • open agenda items
      • WIP AM: AGENDA: cleanup
      • WIP AM: AGENDA: configuration files
      • TODO AM: AGENDA: test suite granularity
      • TODO AM: AGENDA: performance PTY / SHELL / SAGA
      • TODO AM: AGENDA: discuss how to ensure test coverage
      • TODO AM: AGENDA: discuss #307, async call semantics
      • TODO AM: AGENDA: student project: plotting
    • MS.8
      • What are goals for the next couple of weeks?
      • (check on open tickets)
    • eval tutorial with Indiana: online, Scott and Abhinav
      • TODO IU: provide application code
    • configuration files
      • AM: we could re-use what we had in OWMS (code exists in utils):
        • RP resource configs remain as is
        • user configs can be used to overwrite those default settings, like:
          # $HOME/radical/pilot.cfg
          {
              "resources" : {
          
                  # add a custom host
                  "boskop" : {
                      "defaults"                    : "localhost"
                      "pilot_agent"                 : "radical-pilot-agent-multicore_testing.py",
                      "lrms"                        : "TORQUE",
                      "task_launch_method"          : "SSH",
                      "mpi_launch_method"           : "MPIRUN",
                      "global_virtenv"              : "$HOME/ve/"
                  },
          
                  # change some user specific variable in existing RP config entries
                  "*.futuregrid.org" : {
                      "username" : "merzky"
                  },
                  "sierra.futuregrid.org" : {
                      "default_queue"    : "batch"
                  },
              }
          }
          
    • cleanup modes
      • 1: cleanup database entries: session.close (cleanup=TRUE)
      • 2: terminate pilots: session.close (terminate=TRUE)
      • 3: clean pilot sandbox: pilot_description.cleanup = TRUE
      • 4: clean unit sandbox: unit_description.cleanup = TRUE
      • 1, 2 are enacted by RP/Application on clean application shutdown
      • 3, 4 are enacted by agent on clean pilot shutdown
      • 1, 2 can be performed after application finishes, via radicalpilot-cleanup
      • 3, 4 cannot be performed after application finishes (yet)
    • STATE_X:
      • AM: should be SCHEDULING: the CU has reached the scheduler but has not yet been assigned to a pilot (e.g., if none is free to run the CU).
        # scheduler
        for task in wait_q :
            task.state = SCHEDULING
        
            while True :
                pilot = find_free_pilot (task)
                if  pilot :
                    task.pilot = pilot
                    break
                
        task.state = PENDING_EXECUTION
        submit_task_to_pilot (task)
        
      • but: SCHEDULING already used within the agent:
        # agent
        for task in mongodb.find (pid  : my_pid, 
                                  state: PENDING_EXECUTION)
            task.state = SCHEDULING
        
            while True :
                cores = find_free_cores (task)
                if  cores :
                    task.cores = cores
                    break
                
        task.state = EXECUTING
        submit_task_to_cores (task)
        
  • Notes:

Clone this wiki locally