Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
RFC for histogram CPU implementation #1930
base: main
Are you sure you want to change the base?
RFC for histogram CPU implementation #1930
Changes from 12 commits
a03577e
ce117f5
ccc001e
d518a14
6e03468
10c4e50
efa7c9b
1ac82fd
02523c4
ac7b654
506fb62
1c6cb47
ceee3e3
0711090
3c5ad12
06a734f
b858a0e
53f4643
d718e0e
bb9e6f9
17e0510
9614209
2964a9e
9287fd2
cdf5092
215c2b7
04d5127
fe1efa2
60ec0e5
File filter
Filter by extension
Conversations
Jump to
There are no files selected for viewing
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
let me share my thoughts:
In my understanding RFC is not a book... So, I would preferer to have a short, concise and precise description of what is offered, without frills, like a mathematical theorem. For example:
"The oneDPL library added histogram APIs, currently implemented only for device policies with the DPC++ backend. These APIs are defined in the oneAPI Specification 1.4. Please see the
oneAPI Specification for the details. The host-side backends (serial, TBB, OpenMP) are not yet supported. This RFC proposes extending histogram support to these backends."
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, I've accepted your language here. Thanks.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Due to it is not a story telling, I would suggest omitting introductory expressions like "It may make more sense" or "It's natural for a user to expect"... Only short and exact information.
For example,
"There are many cases to use a host-side serial or a host-side implementation of histogram. Another motivation for adding the support is simply to be spec compliant with the oneAPI specification."
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
taken suggestion. Thanks
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Taking into account my shared thought above, I would propose to re-prahse it keeping the main point shorter:
"A histogram algorithm is a memory-bound algorithm. So, the implementation should care of reducing memory accesses and minimizing temporary memory traffic."
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Taken mostly. Thanks
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we wish to specify space / computational complexity of the proposed algorithm somewhere, or is this too much?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've added some sections on computation complexity and temporary memory requirements.
I don't think we should add something like this to the specification though, because I don't want to dictate the algorithm used from the specification, but its good to document it in the current proposal. Thanks.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I guess we can this topic omit at all. It tells nothing about 'histogram', just general wording, which can be applied for any new feature in oneDPL...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've removed some of the general language and added something which is important for histogram in an attempt to answer feedback from @akukanov to clarify where the implementation of the algorithm will live.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
"unseq Backend"
Basically, we don't have such back end officially. Yes, sometimes we used such term in the internal communication as for "name" for a set of functions with "pragma simd" implementation. But we did not specify and publish API for that. So, I suggest renaming this topic to "SIMD/openMP SIMD Implementation" f.e.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this part is about developing (or not) an implementation for unsequenced policies.
I do not mind calling it `unseq backend" in the design docs, but Mikhail is correct that it's rather informal (while parallel backend is somewhat more formal).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I like the proposed name for the section better anyway for what is discussed. Thanks.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If we intent to give some information about OneDPL parallel backend patterns on which histogram can based on, I would notify, there is not "count_if" pattern, there is "reduce"("transform_reduce") pattern.
When a man says "reduce", it becomes more or less obvious that histogram calculation based on reduce is not effective at all.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I clarified the language a little here to make it more clear that copy_if uses reduce internally. I still think it deserves some text describing it as it may not be immediately obvious to everyone that reduce is not well matched.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
BTW, I have a curiosity question. Which approach does NVidia use?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
NVidia has a similar API within CUB but not within Thrust, and therefore does not have a CPU implementation that I am aware of, only one specifically for a GPU device.