Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Improve SYCL backend
__parallel_for
performance for large input sizes #1870Improve SYCL backend
__parallel_for
performance for large input sizes #1870Changes from all commits
e0e03bd
9f3384a
a244bbb
ad05086
bb83642
3dbdcec
f53ae6c
0bf77a7
7768aff
4316e07
d2cf632
4eeaf97
b717ac7
839f1ad
b347606
72a0941
9f82b11
b5f0021
5519dac
2d985a5
6ab46ad
e3e05a7
8f9c5bb
55623b2
39b572f
33337f8
File filter
Filter by extension
Conversations
Jump to
There are no files selected for viewing
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you include
utils.h
where__is_spirv_target_v
is defined?There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
All these functions returns
std::size_t
. Could you please explain why you are usingstd::uint32_t
instead?There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The signature for the sub-group member function is
uint32_t get_local_linear_range() const
and related functions also returnuint32_t
. Were you thinking about thegroup
class maybe?There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you include
<tuple>
?There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
From my point of view, the usage of
std::make_tuple
for primitive types doesn't make sense at all.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just
std::tuple
is now used in the new PR.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The same.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just
std::tuple
is now used in the new PR.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we haven't any other cases where we capture
policy
intosubmit
call.May be better to eval
outside of
submit
and capture__work_group_size
by value?There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure, applied to new PR (this one will be closed soon).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's more readable from my point of view.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agreed, it is in the new PR.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we use
_ONEDPL_PRAGMA_UNROLL
for thisfor-loop
too?There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not in this case (there is a similar case in the strided loop utility in the new PR). Because the loop end variable is computed at run-time, the loop cannot be unrolled.
This path is called for the last sub-group / work-group, so the performance impact is negligible.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we combine this
if constexpr
together with the nextif
?Will we have some real profit from these two conditions?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Combining the two would make it a runtime if conditional. Even if
__iters_per_work_item
is known at compile-time and the compiler can optimize it out, there may still be a chance for the kernel to be unnecessarily compiled. I think it is best to keep theif constexpr
, so we can be sure to avoid compiling the large submitter if possible.