-
Notifications
You must be signed in to change notification settings - Fork 114
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[oneDPL][ranges] support size limit for output for merge algorithm #1942
base: main
Are you sure you want to change the base?
Conversation
33cd332
to
d443dbe
Compare
9ebcfb6
to
0066210
Compare
3f648a7
to
5b078ad
Compare
98a7acb
to
c81b4c1
Compare
76c3c16
to
c0c8ba4
Compare
c0c8ba4
to
ffea24a
Compare
90b640f
to
02ed111
Compare
02ed111
to
45a9bef
Compare
@@ -2948,6 +2949,49 @@ __pattern_remove_if(__parallel_tag<_IsVector> __tag, _ExecutionPolicy&& __exec, | |||
// merge | |||
//------------------------------------------------------------------------ | |||
|
|||
template<typename It1, typename It2, typename ItOut, typename _Comp> | |||
std::pair<It1, It2> | |||
__brick_merge_2(It1 __it_1, It1 __it_1_e, It2 __it_2, It2 __it_2_e, ItOut __it_out, ItOut __it_out_e, _Comp __comp, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Probably the existing implementation of __serial_merge
is more faster then this.
auto __n = __n1 + __n2; | ||
if (__n == 0) | ||
return 0; | ||
if (__rng3.size() == 0) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if (__rng3.size() == 0) | |
if (__rng3.empty()) |
if (__n == 0) | ||
return 0; | ||
if (__rng3.size() == 0) | ||
return {0, 0}; | ||
|
||
//To consider the direct copying pattern call in case just one of sequences is empty. | ||
if (__n1 == 0) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We can make additional optimization here for the case when last(rng1) < first(rng2)
{ | ||
const _Index __rng1_size = std::min<_Index>(__n1 > __start1 ? __n1 - __start1 : _Index{0}, __chunk); | ||
const _Index __rng2_size = std::min<_Index>(__n2 > __start2 ? __n2 - __start2 : _Index{0}, __chunk); | ||
const _Index __rng3_size = std::min<_Index>(__rng1_size + __rng2_size, __chunk); | ||
|
||
const _Index __rng1_idx_end = __start1 + __rng1_size; | ||
const _Index __rng2_idx_end = __start2 + __rng2_size; | ||
const _Index __rng3_idx_end = __start3 + __rng3_size; | ||
const _Index __rng3_idx_end = std::min<_Index>(__n3, __start3 + __rng3_size); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
looks like a logical error, because __n3
is the size but __rng3_idx_end
is the last index.
}); | ||
// We should return the same thing in the second param of __future for compatibility |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please restore this comment
@@ -320,8 +335,13 @@ struct __parallel_merge_submitter_large<_IdType, _CustomName, | |||
__start = __base_diagonals_sp_global_ptr[__diagonal_idx]; | |||
} | |||
|
|||
__serial_merge(__rng1, __rng2, __rng3, __start.first, __start.second, __i_elem, | |||
__nd_range_params.chunk, __n1, __n2, __comp); | |||
auto __ends = __serial_merge(__rng1, __rng2, __rng3, __start.first, __start.second, __i_elem, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
- const auto
- we know return type here, why you are using
auto
?
@@ -391,7 +415,7 @@ __parallel_merge(oneapi::dpl::__internal::__device_backend_tag, _ExecutionPolicy | |||
|
|||
using __value_type = oneapi::dpl::__internal::__value_t<_Range3>; | |||
|
|||
const std::size_t __n = __rng1.size() + __rng2.size(); | |||
const std::uint64_t __n = std::min<std::uint64_t>(__rng1.size() + __rng2.size(), __rng3.size()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why we can't use std::size_t
here as were before?
@@ -522,6 +522,7 @@ struct __usm_or_buffer_accessor | |||
struct __result_and_scratch_storage_base | |||
{ | |||
virtual ~__result_and_scratch_storage_base() = default; | |||
virtual std::size_t __get_data(sycl::event, std::size_t* __p_buf) const = 0; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As far as __result_and_scratch_storage_base
already has __
in their name, I believe additional __
isn't required in method name.
return 0; | ||
} | ||
|
||
virtual std::size_t __get_data(sycl::event __event, std::size_t* __p_buf) const override |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Technically this declaration is correct.
But for compatibility with the other code, as I seen, virtual
aren't used together with override
in our code.
[oneDPL][ranges] support size limit for output for merge algorithm.
The change is according to https://www.open-std.org/jtc1/sc22/wg21/docs/papers/2024/p3179r2.html#range_as_output
Update: Changes to draft status, causing faced to design issue, connected with different return types from the merge patterns -
__result_and_scratch_storage/__result_and_scratch_storage_base
. As an option - to have one common type of __result_and_scratch_storage for the all needs (ate least for pattern dpcpp merge patterns).Update 2: the issue mentioned above has been resolved.