-
Notifications
You must be signed in to change notification settings - Fork 114
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Remove local (in-group) atomic usage from __parallel_find_or (V1) #1668
Remove local (in-group) atomic usage from __parallel_find_or (V1) #1668
Conversation
… local atomic from __parallel_find_or
…view comment: remove duplicated comments Signed-off-by: Sergey Kopienko <sergey.kopienko@intel.com> (cherry picked from commit cbf9e7a)
(cherry picked from commit 7328925)
…FPGA_DEVICE macro Signed-off-by: Sergey Kopienko <sergey.kopienko@intel.com>
…view comment: rename __save_state_to for works with atomic to __save_state_to_atomic Signed-off-by: Sergey Kopienko <sergey.kopienko@intel.com>
…sycl::bit_or instead of __dpl_sycl::__plus Signed-off-by: Sergey Kopienko <sergey.kopienko@intel.com>
…- using sycl::bit_or instead of __dpl_sycl::__plus" - rolled back due very small or absent perf profit Signed-off-by: Sergey Kopienko <sergey.kopienko@intel.com>
… FoundLocalReduceOp to _LocalResultsReduceOp Signed-off-by: Sergey Kopienko <sergey.kopienko@intel.com>
d6dd3da
to
32a9d32
Compare
… and use __found_state in the struct __parallel_or_tag Signed-off-by: Sergey Kopienko <sergey.kopienko@intel.com>
Signed-off-by: Sergey Kopienko <sergey.kopienko@intel.com>
…dpl_sycl::__bit_or in struct __parallel_or_tag Signed-off-by: Sergey Kopienko <sergey.kopienko@intel.com>
5b735ce
to
9201751
Compare
@@ -139,6 +141,9 @@ using __maximum = sycl::ONEAPI::maximum<_T>; | |||
|
|||
template <typename _T> | |||
using __minimum = sycl::ONEAPI::minimum<_T>; | |||
|
|||
template <typename _T = void> | |||
using __bit_or = sycl::ONEAPI::bit_or<_T>; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@@ -130,6 +130,8 @@ using __maximum = sycl::maximum<_T>; | |||
template <typename _T = void> | |||
using __minimum = sycl::minimum<_T>; | |||
|
|||
template <typename _T = void> | |||
using __bit_or = sycl::bit_or<_T>; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Test evidence for complier icx 2021.1.2
(whene group operations isn't present) ): https://godbolt.org/z/PcnYT4vqe
…view comment: remove unexpected comment Signed-off-by: Sergey Kopienko <sergey.kopienko@intel.com>
@danhoeflinger, @julianmi is something that mandatory required to change in this PR? |
Co-authored-by: Julian Miller <julian.miller@intel.com>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM. Please wait for @danhoeflinger and green CI.
…NEDPL_FPGA_EMU instead of _ONEDPL_FPGA_DEVICE Signed-off-by: Sergey Kopienko <sergey.kopienko@intel.com>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, with the comment explaining the restriction of workgroup size, it is OK.
I believe we can just make that restriction for all targets, but it should be examined, and we can do that in a separate PR.
In this PR we remove local (in-group) atomic usage from
__parallel_find_or
implementation:__found_local
now is local variable (for each item);__found_local
within one group through__dpl_sycl::__reduce_over_group
operation :__found_local = __dpl_sycl::__reduce_over_group(__item_id.get_group(), __found_local, typename _BrickTag::_LocalResultsReduceOp{});
where
_LocalResultsReduceOp
is :__dpl_sycl::__minimum
(for the__parallel_find_forward_tag
);__dpl_sycl::__maximum
(for the__parallel_find_backward_tag
);__dpl_sycl::__bit_or
(for the__parallel_or_tag
).The new operation
__dpl_sycl::__bit_or
has been defined in the file sycl_defs.h :