Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[oneDPL][ranges] support size limit for output for merge algorithm #1942

Open
wants to merge 2 commits into
base: main
Choose a base branch
from

Conversation

MikeDvorskiy
Copy link
Contributor

@MikeDvorskiy MikeDvorskiy commented Nov 20, 2024

[oneDPL][ranges] support size limit for output for merge algorithm.
The change is according to https://www.open-std.org/jtc1/sc22/wg21/docs/papers/2024/p3179r2.html#range_as_output

  1. serial pattern
  2. parallel pattern (tbb)
  3. parallel pattern (openMP)
  4. parallel pattern (serial backend)
  5. parallel pattern (DPCPP backend)

Update: Changes to draft status, causing faced to design issue, connected with different return types from the merge patterns - __result_and_scratch_storage/__result_and_scratch_storage_base. As an option - to have one common type of __result_and_scratch_storage for the all needs (ate least for pattern dpcpp merge patterns).
Update 2: the issue mentioned above has been resolved.

@MikeDvorskiy MikeDvorskiy marked this pull request as draft November 20, 2024 14:30
@MikeDvorskiy MikeDvorskiy force-pushed the dev/mdvorski/merge_sized_output branch 9 times, most recently from 33cd332 to d443dbe Compare November 27, 2024 12:03
@MikeDvorskiy MikeDvorskiy force-pushed the dev/mdvorski/merge_sized_output branch 4 times, most recently from 9ebcfb6 to 0066210 Compare November 28, 2024 11:55
@MikeDvorskiy MikeDvorskiy marked this pull request as ready for review November 28, 2024 15:24
@MikeDvorskiy MikeDvorskiy force-pushed the dev/mdvorski/merge_sized_output branch 7 times, most recently from 3f648a7 to 5b078ad Compare November 29, 2024 17:24
@MikeDvorskiy MikeDvorskiy force-pushed the dev/mdvorski/merge_sized_output branch from 98a7acb to c81b4c1 Compare December 23, 2024 13:50
@MikeDvorskiy MikeDvorskiy marked this pull request as draft December 29, 2024 10:00
@MikeDvorskiy MikeDvorskiy force-pushed the dev/mdvorski/merge_sized_output branch from 76c3c16 to c0c8ba4 Compare January 14, 2025 13:49
@MikeDvorskiy MikeDvorskiy marked this pull request as ready for review January 14, 2025 13:49
@MikeDvorskiy MikeDvorskiy force-pushed the dev/mdvorski/merge_sized_output branch from c0c8ba4 to ffea24a Compare January 14, 2025 13:51
@MikeDvorskiy MikeDvorskiy added this to the 2022.8.0 milestone Jan 14, 2025
@MikeDvorskiy MikeDvorskiy force-pushed the dev/mdvorski/merge_sized_output branch 3 times, most recently from 90b640f to 02ed111 Compare January 14, 2025 17:37
@MikeDvorskiy MikeDvorskiy force-pushed the dev/mdvorski/merge_sized_output branch from 02ed111 to 45a9bef Compare January 15, 2025 10:04
@@ -2948,6 +2949,49 @@ __pattern_remove_if(__parallel_tag<_IsVector> __tag, _ExecutionPolicy&& __exec,
// merge
//------------------------------------------------------------------------

template<typename It1, typename It2, typename ItOut, typename _Comp>
std::pair<It1, It2>
__brick_merge_2(It1 __it_1, It1 __it_1_e, It2 __it_2, It2 __it_2_e, ItOut __it_out, ItOut __it_out_e, _Comp __comp,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Probably the existing implementation of __serial_merge is more faster then this.

auto __n = __n1 + __n2;
if (__n == 0)
return 0;
if (__rng3.size() == 0)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
if (__rng3.size() == 0)
if (__rng3.empty())

if (__n == 0)
return 0;
if (__rng3.size() == 0)
return {0, 0};

//To consider the direct copying pattern call in case just one of sequences is empty.
if (__n1 == 0)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can make additional optimization here for the case when last(rng1) < first(rng2)

{
const _Index __rng1_size = std::min<_Index>(__n1 > __start1 ? __n1 - __start1 : _Index{0}, __chunk);
const _Index __rng2_size = std::min<_Index>(__n2 > __start2 ? __n2 - __start2 : _Index{0}, __chunk);
const _Index __rng3_size = std::min<_Index>(__rng1_size + __rng2_size, __chunk);

const _Index __rng1_idx_end = __start1 + __rng1_size;
const _Index __rng2_idx_end = __start2 + __rng2_size;
const _Index __rng3_idx_end = __start3 + __rng3_size;
const _Index __rng3_idx_end = std::min<_Index>(__n3, __start3 + __rng3_size);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

looks like a logical error, because __n3 is the size but __rng3_idx_end is the last index.

});
// We should return the same thing in the second param of __future for compatibility
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please restore this comment

@@ -320,8 +335,13 @@ struct __parallel_merge_submitter_large<_IdType, _CustomName,
__start = __base_diagonals_sp_global_ptr[__diagonal_idx];
}

__serial_merge(__rng1, __rng2, __rng3, __start.first, __start.second, __i_elem,
__nd_range_params.chunk, __n1, __n2, __comp);
auto __ends = __serial_merge(__rng1, __rng2, __rng3, __start.first, __start.second, __i_elem,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  1. const auto
  2. we know return type here, why you are using auto ?

@@ -391,7 +415,7 @@ __parallel_merge(oneapi::dpl::__internal::__device_backend_tag, _ExecutionPolicy

using __value_type = oneapi::dpl::__internal::__value_t<_Range3>;

const std::size_t __n = __rng1.size() + __rng2.size();
const std::uint64_t __n = std::min<std::uint64_t>(__rng1.size() + __rng2.size(), __rng3.size());
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why we can't use std::size_t here as were before?

@@ -522,6 +522,7 @@ struct __usm_or_buffer_accessor
struct __result_and_scratch_storage_base
{
virtual ~__result_and_scratch_storage_base() = default;
virtual std::size_t __get_data(sycl::event, std::size_t* __p_buf) const = 0;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As far as __result_and_scratch_storage_base already has __ in their name, I believe additional __ isn't required in method name.

return 0;
}

virtual std::size_t __get_data(sycl::event __event, std::size_t* __p_buf) const override
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Technically this declaration is correct.
But for compatibility with the other code, as I seen, virtual aren't used together with override in our code.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants