Skip to content

2014 06 05

Andre Merzky edited this page Jun 11, 2014 · 3 revisions
  • Who: Mark, Matteo, Shantenu, Ole, AndreM, Antons

  • Agenda:

    • open TODOs
      • DONE AM: re-check 4096 ceiling
      • TODO AM: micro benchmarks for RP
      • DONE MS: MPI agent...
      • TODO AT: start on Cray agent, based on ATs scripts
      • TODO OW: get quantitative EnMDTK requirements
      • DONE AM: re-check with Matteo on exact problem
      • INV. AM: support MS on MPI agent?
      • TODO AM: address port forwarding ... (mom-nodes, head-nodes, ...)
      • TODO OW: check with Sergeij on Cray/BlueGene issue
      • DONE AT: put archer scripts into repo
      • TODO DINESH: check out port forwarding solution from Mark
    • MS-7 checkpoints:
      • May 8:
        • OW: OK simple MPI support for Stampede complete (prototype)
        • AT, OW: ?? draft architecture for Cray agent
      • May 15:
        • MS: ?? implementation proposal for MPI support beyond stampede
        • AM: ?? MPI integration tests set up
        • OW: ?? first prototype of non-MPI agent for cray
        • ALL: ?? agree on implementation plan for Cray agent
      • May 22:
        • AM: ?? MPI support integrated in Troy
      • May 29:
        • MS: ?? MPI support complete
        • OW/AT: ?? prototype for Cray agent -- reconfirm implementation plan
        • ALL: ?? documentation of performance bottleneck
      • June 5:
        • AM, OW: ?? integration tests for MPI stable for multiple use cases / machines
        • ALL: ?? agreement on optimization targets
    • status reports
    • ticket prioritization / milestones
    • (?) what role plays scheduling on agent level?
  • Notes:

    • AM: tickets, tests, integration, 4096 checks

    • TODO AM: triage, this is probably not #104

    • TODO OW,DINESH: HT-BAC as scalability testing for MPI

    • Mark:

      • order of module load matters :( preexec...
      • TODO MS: document in resource configs
      • multicore and mpi support working
      • switched by resource config file
      • should be moved to CU level (#156)
      • removed multiple exec workers (no partitions)
      • multi threads for data remain...
      • merge with Cray track: ignore for now...
      • TODO MS: preexec does not work across nodes...
    • Ole:

      • resource configuration revamped
      • TODO OW: check
    • Antons:

      • Archer down...
    • TODO AM:

      • workdir
    • release:

      • TODO MS: sort tickets into tutorial milestones
  • will data be merged?
  • will 156 done?
  • put data on agenda again
Clone this wiki locally