Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Split collections into 2 #593

Open
wants to merge 9 commits into
base: main
Choose a base branch
from

Conversation

lldelisle
Copy link
Contributor

See #583

@lldelisle lldelisle marked this pull request as draft November 8, 2024 15:10
Copy link

Test Results (powered by Planemo)

Test Summary

Test State Count
Total 3
Passed 1
Error 1
Failure 1
Skipped 0
Errored Tests
  • ❌ Split_collection_using_tabular.ga_0

    Execution Problem:

    • File [/home/runner/work/iwc/iwc/workflows/data-manipulation/split-collection/group_asignment.txt] does not exist - parent directory [/home/runner/work/iwc/iwc/workflows/data-manipulation/split-collection] does exist, cwd is [/home/runner/work/iwc/iwc]
      
Failed Tests
  • ❌ Split_collection_using_comma_separated_list.ga_0

    Problems:

    • Output collection 'collection_first_group': failed to find identifier 'cat1_1' in the tool generated elements []
      

    Workflow invocation details

    • Invocation Messages

    • Steps
      • Step 1: Input Dataset Collection:

        • step_state: scheduled
      • Step 2: Groups:

        • step_state: scheduled
      • Step 3: toolshed.g2.bx.psu.edu/repos/iuc/collection_element_identifiers/collection_element_identifiers/0.0.2:

        • step_state: scheduled

        • Jobs
          • Job 1:

            • Job state is ok

            Command Line:

            • mv '/tmp/tmpp6_pn0hn/job_working_directory/000/11/configs/tmpvomaori0' '/tmp/tmpp6_pn0hn/job_working_directory/000/11/outputs/dataset_0d51c252-b5b3-4dae-b753-4d69d6231f6f.dat'

            Exit Code:

            • 0

            Traceback:

            Job Parameters:

            • Job parameter Parameter value
              __input_ext "txt"
              __workflow_invocation_uuid__ "ccd2a75ca0ed11efb5c6618888f353e2"
              chromInfo "/tmp/tmpp6_pn0hn/galaxy-dev/tool-data/shared/ucsc/chrom/?.len"
              dbkey "?"
              input_collection {"values": [{"id": 2, "src": "hdca"}]}
      • Step 4: Create a dataset from text:

        • step_state: scheduled

        • Jobs
          • Job 1:

            • Job state is ok

            Command Line:

            • times=1; yes -- '1,1,1,2,3' 2>/dev/null | head -n $times >> '/tmp/tmpp6_pn0hn/job_working_directory/000/12/outputs/dataset_122efa88-af32-488c-bd82-41dea16a24f0.dat';

            Exit Code:

            • 0

            Traceback:

            Job Parameters:

            • Job parameter Parameter value
              __input_ext "input"
              __workflow_invocation_uuid__ "ccd2a75ca0ed11efb5c6618888f353e2"
              chromInfo "/tmp/tmpp6_pn0hn/galaxy-dev/tool-data/shared/ucsc/chrom/?.len"
              dbkey "?"
              token_set [{"__index__": 0, "line": "1,1,1,2,3", "repeat_select": {"__current_case__": 0, "repeat_select_opts": "user", "times": "1"}}]
      • Step 5: Replace comma by back to line:

        • step_state: scheduled

        • Jobs
          • Job 1:

            • Job state is ok

            Command Line:

            • perl '/tmp/shed_dir/toolshed.g2.bx.psu.edu/repos/bgruening/text_processing/86755160afbf/text_processing/find_and_replace' -o '/tmp/tmpp6_pn0hn/job_working_directory/000/13/outputs/dataset_b82ee5cc-f610-4735-90cd-76c90e209252.dat' -g    -r ',' '\n' '/tmp/tmpp6_pn0hn/files/1/2/2/dataset_122efa88-af32-488c-bd82-41dea16a24f0.dat'

            Exit Code:

            • 0

            Traceback:

            Job Parameters:

            • Job parameter Parameter value
              __input_ext "input"
              __workflow_invocation_uuid__ "ccd2a75ca0ed11efb5c6618888f353e2"
              chromInfo "/tmp/tmpp6_pn0hn/galaxy-dev/tool-data/shared/ucsc/chrom/?.len"
              dbkey "?"
              find_and_replace [{"__index__": 0, "caseinsensitive": false, "find_pattern": ",", "global": true, "is_regex": true, "replace_pattern": "\\n", "searchwhere": {"__current_case__": 0, "searchwhere_select": "line"}, "skip_first_line": false, "wholewords": false}]
      • Step 6: Put side by side identifiers and groups:

        • step_state: scheduled

        • Jobs
          • Job 1:

            • Job state is ok

            Command Line:

            • perl '/tmp/tmpp6_pn0hn/galaxy-dev/tools/filters/pasteWrapper.pl' '/tmp/tmpp6_pn0hn/files/0/d/5/dataset_0d51c252-b5b3-4dae-b753-4d69d6231f6f.dat' '/tmp/tmpp6_pn0hn/files/b/8/2/dataset_b82ee5cc-f610-4735-90cd-76c90e209252.dat' T '/tmp/tmpp6_pn0hn/job_working_directory/000/14/outputs/dataset_190210ea-92a6-460b-9211-ff72ccf98c19.dat'

            Exit Code:

            • 0

            Traceback:

            Job Parameters:

            • Job parameter Parameter value
              __input_ext "txt"
              __workflow_invocation_uuid__ "ccd2a75ca0ed11efb5c6618888f353e2"
              chromInfo "/tmp/tmpp6_pn0hn/galaxy-dev/tool-data/shared/ucsc/chrom/?.len"
              dbkey "?"
              delimiter "T"
      • Step 7: Unlabelled step:

        • step_state: scheduled

        • Subworkflow Steps
          • Step 1: Input Dataset Collection:

            • step_state: scheduled
          • Step 2: identifier mapping:

            • step_state: scheduled
          • Step 3: get the first group value:

            • step_state: scheduled

            • Jobs
              • Job 1:

                • Job state is ok

                Command Line:

                • env -i $(which awk) --sandbox -v FS='	' -v OFS='	' --re-interval -f '/tmp/tmpp6_pn0hn/job_working_directory/000/15/configs/tmp73gtl51n' '/tmp/tmpp6_pn0hn/files/1/9/0/dataset_190210ea-92a6-460b-9211-ff72ccf98c19.dat' > '/tmp/tmpp6_pn0hn/job_working_directory/000/15/outputs/dataset_2038971d-b463-4da9-b3b3-8c21d87b0638.dat'

                Exit Code:

                • 0

                Traceback:

                Job Parameters:

                • Job parameter Parameter value
                  __input_ext "input"
                  __workflow_invocation_uuid__ "ccd2a75da0ed11efb5c6618888f353e2"
                  chromInfo "/tmp/tmpp6_pn0hn/galaxy-dev/tool-data/shared/ucsc/chrom/?.len"
                  code "NR==1{print $2}"
                  dbkey "?"
          • Step 4: convert to parameter:

            • step_state: scheduled

            • Jobs
              • Job 1:

                • Job state is ok

                Command Line:

                • cd ../; python _evaluate_expression_.py

                Exit Code:

                • 0

                Traceback:

                Job Parameters:

                • Job parameter Parameter value
                  __input_ext "tabular"
                  __workflow_invocation_uuid__ "ccd2a75da0ed11efb5c6618888f353e2"
                  chromInfo "/tmp/tmpp6_pn0hn/galaxy-dev/tool-data/shared/ucsc/chrom/?.len"
                  dbkey "?"
                  param_type "text"
                  remove_newlines true
          • Step 5: make filter condition:

            • step_state: scheduled

            • Jobs
              • Job 1:

                • Job state is ok

                Command Line:

                • cd ../; python _evaluate_expression_.py

                Exit Code:

                • 0

                Traceback:

                Job Parameters:

                • Job parameter Parameter value
                  __input_ext "input"
                  __workflow_invocation_uuid__ "ccd2a75da0ed11efb5c6618888f353e2"
                  chromInfo "/tmp/tmpp6_pn0hn/galaxy-dev/tool-data/shared/ucsc/chrom/?.len"
                  components [{"__index__": 0, "param_type": {"__current_case__": 0, "component_value": "c2 == \"", "select_param_type": "text"}}, {"__index__": 1, "param_type": {"__current_case__": 0, "component_value": "1", "select_param_type": "text"}}, {"__index__": 2, "param_type": {"__current_case__": 0, "component_value": "\"", "select_param_type": "text"}}]
                  dbkey "?"
          • Step 6: filter tabular to get only lines with first group:

            • step_state: scheduled

            • Jobs
              • Job 1:

                • Job state is ok

                Command Line:

                • python '/tmp/tmpp6_pn0hn/galaxy-dev/tools/stats/filtering.py' '/tmp/tmpp6_pn0hn/files/1/9/0/dataset_190210ea-92a6-460b-9211-ff72ccf98c19.dat' '/tmp/tmpp6_pn0hn/job_working_directory/000/18/outputs/dataset_44ab842e-cac2-46e6-947e-914e1a093393.dat' '/tmp/tmpp6_pn0hn/job_working_directory/000/18/configs/tmpmroapp_c' 2 "str,int" 0

                Exit Code:

                • 0

                Standard Output:

                • Filtering with c2 == "1", 
                  kept 0.00% of 5 valid lines (5 total lines).
                  

                Traceback:

                Job Parameters:

                • Job parameter Parameter value
                  __input_ext "tabular"
                  __workflow_invocation_uuid__ "ccd2a75da0ed11efb5c6618888f353e2"
                  chromInfo "/tmp/tmpp6_pn0hn/galaxy-dev/tool-data/shared/ucsc/chrom/?.len"
                  cond "c2 == \"1\""
                  dbkey "?"
                  header_lines "0"
          • Step 7: keep only identifiers:

            • step_state: scheduled

            • Jobs
              • Job 1:

                • Job state is ok

                Command Line:

                • perl '/tmp/tmpp6_pn0hn/galaxy-dev/tools/filters/cutWrapper.pl' '/tmp/tmpp6_pn0hn/files/4/4/a/dataset_44ab842e-cac2-46e6-947e-914e1a093393.dat' 'c1' T '/tmp/tmpp6_pn0hn/job_working_directory/000/19/outputs/dataset_dc1dd85a-4876-4dc1-b009-8e55d3b1df27.dat'

                Exit Code:

                • 0

                Traceback:

                Job Parameters:

                • Job parameter Parameter value
                  __input_ext "tabular"
                  __workflow_invocation_uuid__ "ccd2a75da0ed11efb5c6618888f353e2"
                  chromInfo "/tmp/tmpp6_pn0hn/galaxy-dev/tool-data/shared/ucsc/chrom/?.len"
                  columnList "c1"
                  dbkey "?"
                  delimiter "T"
          • Step 8: Split collection into 2:

            • step_state: scheduled

            • Jobs
              • Job 1:

                • Job state is ok

                Traceback:

                Job Parameters:

                • Job parameter Parameter value
                  __workflow_invocation_uuid__ "ccd2a75da0ed11efb5c6618888f353e2"
                  how {"__current_case__": 0, "filter_source": {"values": [{"id": 19, "src": "hda"}]}, "how_filter": "remove_if_absent"}
                  input {"values": [{"id": 2, "src": "hdca"}]}
    • Other invocation details
      • history_id

        • 3443cf8fc2a993fb
      • history_state

        • ok
      • invocation_id

        • 65b5a0093bcdce8d
      • invocation_state

        • scheduled
      • workflow_id

        • 50b51245ddf901fd
Passed Tests
  • ✅ Split_collection_by_pattern_in_identifiers.ga_0

    Workflow invocation details

    • Invocation Messages

    • Steps
      • Step 1: Input Dataset Collection:

        • step_state: scheduled
      • Step 2: pattern:

        • step_state: scheduled
      • Step 3: toolshed.g2.bx.psu.edu/repos/iuc/collection_element_identifiers/collection_element_identifiers/0.0.2:

        • step_state: scheduled

        • Jobs
          • Job 1:

            • Job state is ok

            Command Line:

            • mv '/tmp/tmpp6_pn0hn/job_working_directory/000/26/configs/tmps3ybtech' '/tmp/tmpp6_pn0hn/job_working_directory/000/26/outputs/dataset_ecaeaadb-3138-4094-a833-fcf95abf84b9.dat'

            Exit Code:

            • 0

            Traceback:

            Job Parameters:

            • Job parameter Parameter value
              __input_ext "txt"
              __workflow_invocation_uuid__ "21be76c4a0ee11efb5c6618888f353e2"
              chromInfo "/tmp/tmpp6_pn0hn/galaxy-dev/tool-data/shared/ucsc/chrom/?.len"
              dbkey "?"
              input_collection {"values": [{"id": 5, "src": "hdca"}]}
      • Step 4: Select identifiers with pattern:

        • step_state: scheduled

        • Jobs
          • Job 1:

            • Job state is ok

            Command Line:

            • grep -P -A 0 -B 0 --no-group-separator  -i -- 'cat1' '/tmp/tmpp6_pn0hn/files/e/c/a/dataset_ecaeaadb-3138-4094-a833-fcf95abf84b9.dat' > '/tmp/tmpp6_pn0hn/job_working_directory/000/27/outputs/dataset_eedd0674-a2ba-4ba4-9163-901987ccad10.dat'

            Exit Code:

            • 0

            Traceback:

            Job Parameters:

            • Job parameter Parameter value
              __input_ext "input"
              __workflow_invocation_uuid__ "21be76c4a0ee11efb5c6618888f353e2"
              case_sensitive "-i"
              chromInfo "/tmp/tmpp6_pn0hn/galaxy-dev/tool-data/shared/ucsc/chrom/?.len"
              color "NOCOLOR"
              dbkey "?"
              invert ""
              lines_after "0"
              lines_before "0"
              regex_type "-P"
              url_paste "cat1"
      • Step 5: Split collection into 2:

        • step_state: scheduled

        • Jobs
          • Job 1:

            • Job state is ok

            Traceback:

            Job Parameters:

            • Job parameter Parameter value
              __workflow_invocation_uuid__ "21be76c4a0ee11efb5c6618888f353e2"
              how {"__current_case__": 0, "filter_source": {"values": [{"id": 31, "src": "hda"}]}, "how_filter": "remove_if_absent"}
              input {"values": [{"id": 5, "src": "hdca"}]}
    • Other invocation details
      • history_id

        • 50b51245ddf901fd
      • history_state

        • ok
      • invocation_id

        • 50b51245ddf901fd
      • invocation_state

        • scheduled
      • workflow_id

        • 9233b6e0ca46fef0

Copy link

Test Results (powered by Planemo)

Test Summary

Test State Count
Total 3
Passed 1
Error 1
Failure 1
Skipped 0
Errored Tests
  • ❌ Split_collection_using_tabular.ga_0

    Execution Problem:

    • File [/home/runner/work/iwc/iwc/workflows/data-manipulation/split-collection/group_asignment.txt] does not exist - parent directory [/home/runner/work/iwc/iwc/workflows/data-manipulation/split-collection] does exist, cwd is [/home/runner/work/iwc/iwc]
      
Failed Tests
  • ❌ Split_collection_using_comma_separated_list.ga_0

    Problems:

    • Output collection 'collection_first_group': failed to find identifier 'cat1_1' in the tool generated elements []
      

    Workflow invocation details

    • Invocation Messages

    • Steps
      • Step 1: Input Dataset Collection:

        • step_state: scheduled
      • Step 2: Groups:

        • step_state: scheduled
      • Step 3: toolshed.g2.bx.psu.edu/repos/iuc/collection_element_identifiers/collection_element_identifiers/0.0.2:

        • step_state: scheduled

        • Jobs
          • Job 1:

            • Job state is ok

            Command Line:

            • mv '/tmp/tmpc2qy7ps4/job_working_directory/000/11/configs/tmp7ds1a737' '/tmp/tmpc2qy7ps4/job_working_directory/000/11/outputs/dataset_4a22dd7f-c9da-4190-ae86-22d3a915943f.dat'

            Exit Code:

            • 0

            Traceback:

            Job Parameters:

            • Job parameter Parameter value
              __input_ext "txt"
              __workflow_invocation_uuid__ "5b9b1424a0ee11ef87454dec7da58a7f"
              chromInfo "/tmp/tmpc2qy7ps4/galaxy-dev/tool-data/shared/ucsc/chrom/?.len"
              dbkey "?"
              input_collection {"values": [{"id": 2, "src": "hdca"}]}
      • Step 4: Create a dataset from text:

        • step_state: scheduled

        • Jobs
          • Job 1:

            • Job state is ok

            Command Line:

            • times=1; yes -- '1,1,1,2,3' 2>/dev/null | head -n $times >> '/tmp/tmpc2qy7ps4/job_working_directory/000/12/outputs/dataset_40aac13d-fc86-4d76-a463-7140bf08072f.dat';

            Exit Code:

            • 0

            Traceback:

            Job Parameters:

            • Job parameter Parameter value
              __input_ext "input"
              __workflow_invocation_uuid__ "5b9b1424a0ee11ef87454dec7da58a7f"
              chromInfo "/tmp/tmpc2qy7ps4/galaxy-dev/tool-data/shared/ucsc/chrom/?.len"
              dbkey "?"
              token_set [{"__index__": 0, "line": "1,1,1,2,3", "repeat_select": {"__current_case__": 0, "repeat_select_opts": "user", "times": "1"}}]
      • Step 5: Replace comma by back to line:

        • step_state: scheduled

        • Jobs
          • Job 1:

            • Job state is ok

            Command Line:

            • perl '/tmp/shed_dir/toolshed.g2.bx.psu.edu/repos/bgruening/text_processing/86755160afbf/text_processing/find_and_replace' -o '/tmp/tmpc2qy7ps4/job_working_directory/000/13/outputs/dataset_91e2d4ca-11b1-4e37-a2b0-59ea41282e88.dat' -g    -r ',' '\n' '/tmp/tmpc2qy7ps4/files/4/0/a/dataset_40aac13d-fc86-4d76-a463-7140bf08072f.dat'

            Exit Code:

            • 0

            Traceback:

            Job Parameters:

            • Job parameter Parameter value
              __input_ext "input"
              __workflow_invocation_uuid__ "5b9b1424a0ee11ef87454dec7da58a7f"
              chromInfo "/tmp/tmpc2qy7ps4/galaxy-dev/tool-data/shared/ucsc/chrom/?.len"
              dbkey "?"
              find_and_replace [{"__index__": 0, "caseinsensitive": false, "find_pattern": ",", "global": true, "is_regex": true, "replace_pattern": "\\n", "searchwhere": {"__current_case__": 0, "searchwhere_select": "line"}, "skip_first_line": false, "wholewords": false}]
      • Step 6: Put side by side identifiers and groups:

        • step_state: scheduled

        • Jobs
          • Job 1:

            • Job state is ok

            Command Line:

            • perl '/tmp/tmpc2qy7ps4/galaxy-dev/tools/filters/pasteWrapper.pl' '/tmp/tmpc2qy7ps4/files/4/a/2/dataset_4a22dd7f-c9da-4190-ae86-22d3a915943f.dat' '/tmp/tmpc2qy7ps4/files/9/1/e/dataset_91e2d4ca-11b1-4e37-a2b0-59ea41282e88.dat' T '/tmp/tmpc2qy7ps4/job_working_directory/000/14/outputs/dataset_afcb2b94-aa6a-4d26-a0eb-c8dc58d83bed.dat'

            Exit Code:

            • 0

            Traceback:

            Job Parameters:

            • Job parameter Parameter value
              __input_ext "txt"
              __workflow_invocation_uuid__ "5b9b1424a0ee11ef87454dec7da58a7f"
              chromInfo "/tmp/tmpc2qy7ps4/galaxy-dev/tool-data/shared/ucsc/chrom/?.len"
              dbkey "?"
              delimiter "T"
      • Step 7: Unlabelled step:

        • step_state: scheduled

        • Subworkflow Steps
          • Step 1: Input Dataset Collection:

            • step_state: scheduled
          • Step 2: identifier mapping:

            • step_state: scheduled
          • Step 3: get the first group value:

            • step_state: scheduled

            • Jobs
              • Job 1:

                • Job state is ok

                Command Line:

                • env -i $(which awk) --sandbox -v FS='	' -v OFS='	' --re-interval -f '/tmp/tmpc2qy7ps4/job_working_directory/000/15/configs/tmpxdt19hu0' '/tmp/tmpc2qy7ps4/files/a/f/c/dataset_afcb2b94-aa6a-4d26-a0eb-c8dc58d83bed.dat' > '/tmp/tmpc2qy7ps4/job_working_directory/000/15/outputs/dataset_913be485-a966-4d50-8a44-cb37b96b8d16.dat'

                Exit Code:

                • 0

                Traceback:

                Job Parameters:

                • Job parameter Parameter value
                  __input_ext "input"
                  __workflow_invocation_uuid__ "5b9b1425a0ee11ef87454dec7da58a7f"
                  chromInfo "/tmp/tmpc2qy7ps4/galaxy-dev/tool-data/shared/ucsc/chrom/?.len"
                  code "NR==1{print $2}"
                  dbkey "?"
          • Step 4: convert to parameter:

            • step_state: scheduled

            • Jobs
              • Job 1:

                • Job state is ok

                Command Line:

                • cd ../; python _evaluate_expression_.py

                Exit Code:

                • 0

                Traceback:

                Job Parameters:

                • Job parameter Parameter value
                  __input_ext "tabular"
                  __workflow_invocation_uuid__ "5b9b1425a0ee11ef87454dec7da58a7f"
                  chromInfo "/tmp/tmpc2qy7ps4/galaxy-dev/tool-data/shared/ucsc/chrom/?.len"
                  dbkey "?"
                  param_type "text"
                  remove_newlines true
          • Step 5: make filter condition:

            • step_state: scheduled

            • Jobs
              • Job 1:

                • Job state is ok

                Command Line:

                • cd ../; python _evaluate_expression_.py

                Exit Code:

                • 0

                Traceback:

                Job Parameters:

                • Job parameter Parameter value
                  __input_ext "input"
                  __workflow_invocation_uuid__ "5b9b1425a0ee11ef87454dec7da58a7f"
                  chromInfo "/tmp/tmpc2qy7ps4/galaxy-dev/tool-data/shared/ucsc/chrom/?.len"
                  components [{"__index__": 0, "param_type": {"__current_case__": 0, "component_value": "c2 == \"", "select_param_type": "text"}}, {"__index__": 1, "param_type": {"__current_case__": 0, "component_value": "1", "select_param_type": "text"}}, {"__index__": 2, "param_type": {"__current_case__": 0, "component_value": "\"", "select_param_type": "text"}}]
                  dbkey "?"
          • Step 6: filter tabular to get only lines with first group:

            • step_state: scheduled

            • Jobs
              • Job 1:

                • Job state is ok

                Command Line:

                • python '/tmp/tmpc2qy7ps4/galaxy-dev/tools/stats/filtering.py' '/tmp/tmpc2qy7ps4/files/a/f/c/dataset_afcb2b94-aa6a-4d26-a0eb-c8dc58d83bed.dat' '/tmp/tmpc2qy7ps4/job_working_directory/000/18/outputs/dataset_d4e0df87-dde7-4ea8-87a8-80974571e8f9.dat' '/tmp/tmpc2qy7ps4/job_working_directory/000/18/configs/tmptvjcspcb' 2 "str,int" 0

                Exit Code:

                • 0

                Standard Output:

                • Filtering with c2 == "1", 
                  kept 0.00% of 5 valid lines (5 total lines).
                  

                Traceback:

                Job Parameters:

                • Job parameter Parameter value
                  __input_ext "tabular"
                  __workflow_invocation_uuid__ "5b9b1425a0ee11ef87454dec7da58a7f"
                  chromInfo "/tmp/tmpc2qy7ps4/galaxy-dev/tool-data/shared/ucsc/chrom/?.len"
                  cond "c2 == \"1\""
                  dbkey "?"
                  header_lines "0"
          • Step 7: keep only identifiers:

            • step_state: scheduled

            • Jobs
              • Job 1:

                • Job state is ok

                Command Line:

                • perl '/tmp/tmpc2qy7ps4/galaxy-dev/tools/filters/cutWrapper.pl' '/tmp/tmpc2qy7ps4/files/d/4/e/dataset_d4e0df87-dde7-4ea8-87a8-80974571e8f9.dat' 'c1' T '/tmp/tmpc2qy7ps4/job_working_directory/000/19/outputs/dataset_138d7815-18cd-4af8-8daa-fe3afad05acf.dat'

                Exit Code:

                • 0

                Traceback:

                Job Parameters:

                • Job parameter Parameter value
                  __input_ext "tabular"
                  __workflow_invocation_uuid__ "5b9b1425a0ee11ef87454dec7da58a7f"
                  chromInfo "/tmp/tmpc2qy7ps4/galaxy-dev/tool-data/shared/ucsc/chrom/?.len"
                  columnList "c1"
                  dbkey "?"
                  delimiter "T"
          • Step 8: Split collection into 2:

            • step_state: scheduled

            • Jobs
              • Job 1:

                • Job state is ok

                Traceback:

                Job Parameters:

                • Job parameter Parameter value
                  __workflow_invocation_uuid__ "5b9b1425a0ee11ef87454dec7da58a7f"
                  how {"__current_case__": 0, "filter_source": {"values": [{"id": 19, "src": "hda"}]}, "how_filter": "remove_if_absent"}
                  input {"values": [{"id": 2, "src": "hdca"}]}
    • Other invocation details
      • history_id

        • 8bc0613ac77c6934
      • history_state

        • ok
      • invocation_id

        • b3ccdf42b5a440eb
      • invocation_state

        • scheduled
      • workflow_id

        • 462bee55b4c85682
Passed Tests
  • ✅ Split_collection_by_pattern_in_identifiers.ga_0

    Workflow invocation details

    • Invocation Messages

    • Steps
      • Step 1: Input Dataset Collection:

        • step_state: scheduled
      • Step 2: pattern:

        • step_state: scheduled
      • Step 3: toolshed.g2.bx.psu.edu/repos/iuc/collection_element_identifiers/collection_element_identifiers/0.0.2:

        • step_state: scheduled

        • Jobs
          • Job 1:

            • Job state is ok

            Command Line:

            • mv '/tmp/tmpc2qy7ps4/job_working_directory/000/26/configs/tmpy31z3yo2' '/tmp/tmpc2qy7ps4/job_working_directory/000/26/outputs/dataset_aa611102-3ccf-4021-9e5c-09345e3ce2bd.dat'

            Exit Code:

            • 0

            Traceback:

            Job Parameters:

            • Job parameter Parameter value
              __input_ext "txt"
              __workflow_invocation_uuid__ "aaba94daa0ee11ef87454dec7da58a7f"
              chromInfo "/tmp/tmpc2qy7ps4/galaxy-dev/tool-data/shared/ucsc/chrom/?.len"
              dbkey "?"
              input_collection {"values": [{"id": 5, "src": "hdca"}]}
      • Step 4: Select identifiers with pattern:

        • step_state: scheduled

        • Jobs
          • Job 1:

            • Job state is ok

            Command Line:

            • grep -P -A 0 -B 0 --no-group-separator  -i -- 'cat1' '/tmp/tmpc2qy7ps4/files/a/a/6/dataset_aa611102-3ccf-4021-9e5c-09345e3ce2bd.dat' > '/tmp/tmpc2qy7ps4/job_working_directory/000/27/outputs/dataset_b19d144e-570e-4415-ae6e-237369ce851f.dat'

            Exit Code:

            • 0

            Traceback:

            Job Parameters:

            • Job parameter Parameter value
              __input_ext "input"
              __workflow_invocation_uuid__ "aaba94daa0ee11ef87454dec7da58a7f"
              case_sensitive "-i"
              chromInfo "/tmp/tmpc2qy7ps4/galaxy-dev/tool-data/shared/ucsc/chrom/?.len"
              color "NOCOLOR"
              dbkey "?"
              invert ""
              lines_after "0"
              lines_before "0"
              regex_type "-P"
              url_paste "cat1"
      • Step 5: Split collection into 2:

        • step_state: scheduled

        • Jobs
          • Job 1:

            • Job state is ok

            Traceback:

            Job Parameters:

            • Job parameter Parameter value
              __workflow_invocation_uuid__ "aaba94daa0ee11ef87454dec7da58a7f"
              how {"__current_case__": 0, "filter_source": {"values": [{"id": 31, "src": "hda"}]}, "how_filter": "remove_if_absent"}
              input {"values": [{"id": 5, "src": "hdca"}]}
    • Other invocation details
      • history_id

        • 462bee55b4c85682
      • history_state

        • ok
      • invocation_id

        • 462bee55b4c85682
      • invocation_state

        • scheduled
      • workflow_id

        • dd281a73856d463b

Copy link

Test Results (powered by Planemo)

Test Summary

Test State Count
Total 3
Passed 1
Error 1
Failure 1
Skipped 0
Errored Tests
  • ❌ Split-collection-using-tabular.ga_0

    Execution Problem:

    • File [/home/runner/work/iwc/iwc/workflows/data-manipulation/split-collection/group_asignment.txt] does not exist - parent directory [/home/runner/work/iwc/iwc/workflows/data-manipulation/split-collection] does exist, cwd is [/home/runner/work/iwc/iwc]
      
Failed Tests
  • ❌ Split-collection-using-comma-separated-list.ga_0

    Problems:

    • Output collection 'collection_first_group': failed to find identifier 'cat1_1' in the tool generated elements []
      

    Workflow invocation details

    • Invocation Messages

    • Steps
      • Step 1: Input Dataset Collection:

        • step_state: scheduled
      • Step 2: Groups:

        • step_state: scheduled
      • Step 3: toolshed.g2.bx.psu.edu/repos/iuc/collection_element_identifiers/collection_element_identifiers/0.0.2:

        • step_state: scheduled

        • Jobs
          • Job 1:

            • Job state is ok

            Command Line:

            • mv '/tmp/tmpe5mppa9j/job_working_directory/000/19/configs/tmppq_qfcfr' '/tmp/tmpe5mppa9j/job_working_directory/000/19/outputs/dataset_fc760c3e-b39f-4f21-8a26-ff8d446cb317.dat'

            Exit Code:

            • 0

            Traceback:

            Job Parameters:

            • Job parameter Parameter value
              __input_ext "txt"
              __workflow_invocation_uuid__ "84232944a0ef11ef810f9d5a5423fa88"
              chromInfo "/tmp/tmpe5mppa9j/galaxy-dev/tool-data/shared/ucsc/chrom/?.len"
              dbkey "?"
              input_collection {"values": [{"id": 5, "src": "hdca"}]}
      • Step 4: Create a dataset from text:

        • step_state: scheduled

        • Jobs
          • Job 1:

            • Job state is ok

            Command Line:

            • times=1; yes -- '1,1,1,2,3' 2>/dev/null | head -n $times >> '/tmp/tmpe5mppa9j/job_working_directory/000/20/outputs/dataset_8de8ae95-7020-4bb1-99c1-a617995a8804.dat';

            Exit Code:

            • 0

            Traceback:

            Job Parameters:

            • Job parameter Parameter value
              __input_ext "input"
              __workflow_invocation_uuid__ "84232944a0ef11ef810f9d5a5423fa88"
              chromInfo "/tmp/tmpe5mppa9j/galaxy-dev/tool-data/shared/ucsc/chrom/?.len"
              dbkey "?"
              token_set [{"__index__": 0, "line": "1,1,1,2,3", "repeat_select": {"__current_case__": 0, "repeat_select_opts": "user", "times": "1"}}]
      • Step 5: Replace comma by back to line:

        • step_state: scheduled

        • Jobs
          • Job 1:

            • Job state is ok

            Command Line:

            • perl '/tmp/shed_dir/toolshed.g2.bx.psu.edu/repos/bgruening/text_processing/86755160afbf/text_processing/find_and_replace' -o '/tmp/tmpe5mppa9j/job_working_directory/000/21/outputs/dataset_1fa66a48-6c33-47f1-a742-d55a076787d0.dat' -g    -r ',' '\n' '/tmp/tmpe5mppa9j/files/8/d/e/dataset_8de8ae95-7020-4bb1-99c1-a617995a8804.dat'

            Exit Code:

            • 0

            Traceback:

            Job Parameters:

            • Job parameter Parameter value
              __input_ext "input"
              __workflow_invocation_uuid__ "84232944a0ef11ef810f9d5a5423fa88"
              chromInfo "/tmp/tmpe5mppa9j/galaxy-dev/tool-data/shared/ucsc/chrom/?.len"
              dbkey "?"
              find_and_replace [{"__index__": 0, "caseinsensitive": false, "find_pattern": ",", "global": true, "is_regex": true, "replace_pattern": "\\n", "searchwhere": {"__current_case__": 0, "searchwhere_select": "line"}, "skip_first_line": false, "wholewords": false}]
      • Step 6: Put side by side identifiers and groups:

        • step_state: scheduled

        • Jobs
          • Job 1:

            • Job state is ok

            Command Line:

            • perl '/tmp/tmpe5mppa9j/galaxy-dev/tools/filters/pasteWrapper.pl' '/tmp/tmpe5mppa9j/files/f/c/7/dataset_fc760c3e-b39f-4f21-8a26-ff8d446cb317.dat' '/tmp/tmpe5mppa9j/files/1/f/a/dataset_1fa66a48-6c33-47f1-a742-d55a076787d0.dat' T '/tmp/tmpe5mppa9j/job_working_directory/000/22/outputs/dataset_c4f53797-0b6e-46df-8430-ca059d70b914.dat'

            Exit Code:

            • 0

            Traceback:

            Job Parameters:

            • Job parameter Parameter value
              __input_ext "txt"
              __workflow_invocation_uuid__ "84232944a0ef11ef810f9d5a5423fa88"
              chromInfo "/tmp/tmpe5mppa9j/galaxy-dev/tool-data/shared/ucsc/chrom/?.len"
              dbkey "?"
              delimiter "T"
      • Step 7: Unlabelled step:

        • step_state: scheduled

        • Subworkflow Steps
          • Step 1: Input Dataset Collection:

            • step_state: scheduled
          • Step 2: identifier mapping:

            • step_state: scheduled
          • Step 3: get the first group value:

            • step_state: scheduled

            • Jobs
              • Job 1:

                • Job state is ok

                Command Line:

                • env -i $(which awk) --sandbox -v FS='	' -v OFS='	' --re-interval -f '/tmp/tmpe5mppa9j/job_working_directory/000/23/configs/tmpcmrd7w92' '/tmp/tmpe5mppa9j/files/c/4/f/dataset_c4f53797-0b6e-46df-8430-ca059d70b914.dat' > '/tmp/tmpe5mppa9j/job_working_directory/000/23/outputs/dataset_0a945b2a-667c-46e5-bb3f-8a672f66cf76.dat'

                Exit Code:

                • 0

                Traceback:

                Job Parameters:

                • Job parameter Parameter value
                  __input_ext "input"
                  __workflow_invocation_uuid__ "84232945a0ef11ef810f9d5a5423fa88"
                  chromInfo "/tmp/tmpe5mppa9j/galaxy-dev/tool-data/shared/ucsc/chrom/?.len"
                  code "NR==1{print $2}"
                  dbkey "?"
          • Step 4: convert to parameter:

            • step_state: scheduled

            • Jobs
              • Job 1:

                • Job state is ok

                Command Line:

                • cd ../; python _evaluate_expression_.py

                Exit Code:

                • 0

                Traceback:

                Job Parameters:

                • Job parameter Parameter value
                  __input_ext "tabular"
                  __workflow_invocation_uuid__ "84232945a0ef11ef810f9d5a5423fa88"
                  chromInfo "/tmp/tmpe5mppa9j/galaxy-dev/tool-data/shared/ucsc/chrom/?.len"
                  dbkey "?"
                  param_type "text"
                  remove_newlines true
          • Step 5: make filter condition:

            • step_state: scheduled

            • Jobs
              • Job 1:

                • Job state is ok

                Command Line:

                • cd ../; python _evaluate_expression_.py

                Exit Code:

                • 0

                Traceback:

                Job Parameters:

                • Job parameter Parameter value
                  __input_ext "input"
                  __workflow_invocation_uuid__ "84232945a0ef11ef810f9d5a5423fa88"
                  chromInfo "/tmp/tmpe5mppa9j/galaxy-dev/tool-data/shared/ucsc/chrom/?.len"
                  components [{"__index__": 0, "param_type": {"__current_case__": 0, "component_value": "c2 == \"", "select_param_type": "text"}}, {"__index__": 1, "param_type": {"__current_case__": 0, "component_value": "1", "select_param_type": "text"}}, {"__index__": 2, "param_type": {"__current_case__": 0, "component_value": "\"", "select_param_type": "text"}}]
                  dbkey "?"
          • Step 6: filter tabular to get only lines with first group:

            • step_state: scheduled

            • Jobs
              • Job 1:

                • Job state is ok

                Command Line:

                • python '/tmp/tmpe5mppa9j/galaxy-dev/tools/stats/filtering.py' '/tmp/tmpe5mppa9j/files/c/4/f/dataset_c4f53797-0b6e-46df-8430-ca059d70b914.dat' '/tmp/tmpe5mppa9j/job_working_directory/000/26/outputs/dataset_7785f065-1c09-4bf8-bddb-a62b78db29cf.dat' '/tmp/tmpe5mppa9j/job_working_directory/000/26/configs/tmp1lkiw7pn' 2 "str,int" 0

                Exit Code:

                • 0

                Standard Output:

                • Filtering with c2 == "1", 
                  kept 0.00% of 5 valid lines (5 total lines).
                  

                Traceback:

                Job Parameters:

                • Job parameter Parameter value
                  __input_ext "tabular"
                  __workflow_invocation_uuid__ "84232945a0ef11ef810f9d5a5423fa88"
                  chromInfo "/tmp/tmpe5mppa9j/galaxy-dev/tool-data/shared/ucsc/chrom/?.len"
                  cond "c2 == \"1\""
                  dbkey "?"
                  header_lines "0"
          • Step 7: keep only identifiers:

            • step_state: scheduled

            • Jobs
              • Job 1:

                • Job state is ok

                Command Line:

                • perl '/tmp/tmpe5mppa9j/galaxy-dev/tools/filters/cutWrapper.pl' '/tmp/tmpe5mppa9j/files/7/7/8/dataset_7785f065-1c09-4bf8-bddb-a62b78db29cf.dat' 'c1' T '/tmp/tmpe5mppa9j/job_working_directory/000/27/outputs/dataset_37758570-d1e9-49da-b2e6-3468e2143150.dat'

                Exit Code:

                • 0

                Traceback:

                Job Parameters:

                • Job parameter Parameter value
                  __input_ext "tabular"
                  __workflow_invocation_uuid__ "84232945a0ef11ef810f9d5a5423fa88"
                  chromInfo "/tmp/tmpe5mppa9j/galaxy-dev/tool-data/shared/ucsc/chrom/?.len"
                  columnList "c1"
                  dbkey "?"
                  delimiter "T"
          • Step 8: Split collection into 2:

            • step_state: scheduled

            • Jobs
              • Job 1:

                • Job state is ok

                Traceback:

                Job Parameters:

                • Job parameter Parameter value
                  __workflow_invocation_uuid__ "84232945a0ef11ef810f9d5a5423fa88"
                  how {"__current_case__": 0, "filter_source": {"values": [{"id": 31, "src": "hda"}]}, "how_filter": "remove_if_absent"}
                  input {"values": [{"id": 5, "src": "hdca"}]}
    • Other invocation details
      • history_id

        • de2d236c7be5e2a8
      • history_state

        • ok
      • invocation_id

        • 3a3ddc1218b29ebe
      • invocation_state

        • scheduled
      • workflow_id

        • 9fc91d2c728bbe01
Passed Tests
  • ✅ Split-collection-by-pattern-in-identifiers.ga_0

    Workflow invocation details

    • Invocation Messages

    • Steps
      • Step 1: Input Dataset Collection:

        • step_state: scheduled
      • Step 2: pattern:

        • step_state: scheduled
      • Step 3: toolshed.g2.bx.psu.edu/repos/iuc/collection_element_identifiers/collection_element_identifiers/0.0.2:

        • step_state: scheduled

        • Jobs
          • Job 1:

            • Job state is ok

            Command Line:

            • mv '/tmp/tmpe5mppa9j/job_working_directory/000/6/configs/tmppj63nh13' '/tmp/tmpe5mppa9j/job_working_directory/000/6/outputs/dataset_92339aae-c36c-4f94-9c56-496a7ebb0518.dat'

            Exit Code:

            • 0

            Traceback:

            Job Parameters:

            • Job parameter Parameter value
              __input_ext "txt"
              __workflow_invocation_uuid__ "38c200d8a0ef11ef810f9d5a5423fa88"
              chromInfo "/tmp/tmpe5mppa9j/galaxy-dev/tool-data/shared/ucsc/chrom/?.len"
              dbkey "?"
              input_collection {"values": [{"id": 1, "src": "hdca"}]}
      • Step 4: Select identifiers with pattern:

        • step_state: scheduled

        • Jobs
          • Job 1:

            • Job state is ok

            Command Line:

            • grep -P -A 0 -B 0 --no-group-separator  -i -- 'cat1' '/tmp/tmpe5mppa9j/files/9/2/3/dataset_92339aae-c36c-4f94-9c56-496a7ebb0518.dat' > '/tmp/tmpe5mppa9j/job_working_directory/000/7/outputs/dataset_e45b0140-c7f3-480d-ac30-5dda3440014b.dat'

            Exit Code:

            • 0

            Traceback:

            Job Parameters:

            • Job parameter Parameter value
              __input_ext "input"
              __workflow_invocation_uuid__ "38c200d8a0ef11ef810f9d5a5423fa88"
              case_sensitive "-i"
              chromInfo "/tmp/tmpe5mppa9j/galaxy-dev/tool-data/shared/ucsc/chrom/?.len"
              color "NOCOLOR"
              dbkey "?"
              invert ""
              lines_after "0"
              lines_before "0"
              regex_type "-P"
              url_paste "cat1"
      • Step 5: Split collection into 2:

        • step_state: scheduled

        • Jobs
          • Job 1:

            • Job state is ok

            Traceback:

            Job Parameters:

            • Job parameter Parameter value
              __workflow_invocation_uuid__ "38c200d8a0ef11ef810f9d5a5423fa88"
              how {"__current_case__": 0, "filter_source": {"values": [{"id": 7, "src": "hda"}]}, "how_filter": "remove_if_absent"}
              input {"values": [{"id": 1, "src": "hdca"}]}
    • Other invocation details
      • history_id

        • cdfac92d550372b3
      • history_state

        • ok
      • invocation_id

        • cdfac92d550372b3
      • invocation_state

        • scheduled
      • workflow_id

        • cdfac92d550372b3

@lldelisle lldelisle marked this pull request as ready for review November 12, 2024 14:30
@lldelisle
Copy link
Contributor Author

@wm75 if you want to review this.

mvdbeek
mvdbeek previously approved these changes Nov 15, 2024

The way to split the collection differs with the workflow.

- In the workflow "Split collection by pattern in identifiers", you need to specify a "pattern". This is a word that is present only in one part of your samples. This will split your collection into 2: one with the identifiers which have the 'pattern' and the other one with the identifiers which don't have.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is that a regex pattern or literal words ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It uses grep so it is a regex. However, as this workflow is targeting a "user" audience I was trying to make it understandable for non bioinformaticians but I am happy to get suggestions.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

They might be surpised if they do enter special characters, so best to mention this I think ?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually, do you expect users to run this workflow as a standalone workflow ? I thought this was useful as a thing to embed in another workflow ? Would it maybe make more sense to add what is not covered by the filter filter to the tool itself ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I feel like I am using this workflow as a standalone. For example, to select only part of my samples for deeper analysis or for a specific heatmap etc...

You are right about the word/regex, I should warn the users.

Co-authored-by: Marius van den Beek <m.vandenbeek@gmail.com>
@lldelisle
Copy link
Contributor Author

@mvdbeek, in the workflow: "Split-collection-using-tabular", I extract during the workflow the value of the group name I want to select. Is there a way to use this value to rename the final output collections?
I tried to define the json parameter as a workflow output and reuse the name of this output as ${my_variable} but it just asked me its value when I tried to run the workflow which is not what I wanted...

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants