-
Notifications
You must be signed in to change notification settings - Fork 114
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Host Implementation of Histogram APIs #1974
base: main
Are you sure you want to change the base?
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks generally good to me. I had to go through it a few times to understand __construct_by_args
, but I think I get how it works now and why it's useful.
Just some minor comments from my side.
@@ -4289,6 +4289,86 @@ __pattern_shift_right(_Tag __tag, _ExecutionPolicy&& __exec, _BidirectionalItera | |||
return __res.base(); | |||
} | |||
|
|||
template <typename _ForwardIterator, typename _IdxHashFunc, typename _RandomAccessIterator, class _IsVector> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is there a reason why _IsVector
is a class when the rest are typename
s?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No, it seems the convention in this file is generally for everything to be class (though its not fully consistent). I'll adjust it to the norm.
include/oneapi/dpl/pstl/omp/util.h
Outdated
std::uint32_t __count = 0; | ||
std::uint32_t __j = 0; | ||
|
||
for (; __j < __thread_specific_storage.size() && __count <= __i; ++__j) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
std::uint32_t __count = 0
could probably be moved here to the initialization expression of this loop.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
thanks, done.
} | ||
|
||
std::uint32_t | ||
size() const |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think better to use std::size_t
for size-related stuff.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Again, this is associated with the number of threads, is it necessary?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
From my point of view, if we describe anything about the size, better to use std::size_t
type.
The exception - only one case when we save size in the memory and want to reduce the amount of memory.
But of course, I don't insist, it's not one place with another approach.
Signed-off-by: Dan Hoeflinger <dan.hoeflinger@intel.com>
Signed-off-by: Dan Hoeflinger <dan.hoeflinger@intel.com>
Signed-off-by: Dan Hoeflinger <dan.hoeflinger@intel.com>
Signed-off-by: Dan Hoeflinger <dan.hoeflinger@intel.com>
Signed-off-by: Dan Hoeflinger <dan.hoeflinger@intel.com>
Signed-off-by: Dan Hoeflinger <dan.hoeflinger@intel.com>
Signed-off-by: Dan Hoeflinger <dan.hoeflinger@intel.com>
Signed-off-by: Dan Hoeflinger <dan.hoeflinger@intel.com>
Signed-off-by: Dan Hoeflinger <dan.hoeflinger@intel.com>
Signed-off-by: Dan Hoeflinger <dan.hoeflinger@intel.com>
Signed-off-by: Dan Hoeflinger <dan.hoeflinger@intel.com>
Signed-off-by: Dan Hoeflinger <dan.hoeflinger@intel.com>
Signed-off-by: Dan Hoeflinger <dan.hoeflinger@intel.com>
Signed-off-by: Dan Hoeflinger <dan.hoeflinger@intel.com>
Signed-off-by: Dan Hoeflinger <dan.hoeflinger@intel.com>
Signed-off-by: Dan Hoeflinger <dan.hoeflinger@intel.com>
Signed-off-by: Dan Hoeflinger <dan.hoeflinger@intel.com>
Signed-off-by: Dan Hoeflinger <dan.hoeflinger@intel.com>
Signed-off-by: Dan Hoeflinger <dan.hoeflinger@intel.com>
Signed-off-by: Dan Hoeflinger <dan.hoeflinger@intel.com>
Signed-off-by: Dan Hoeflinger <dan.hoeflinger@intel.com>
Signed-off-by: Dan Hoeflinger <dan.hoeflinger@intel.com>
Signed-off-by: Dan Hoeflinger <dan.hoeflinger@intel.com>
Signed-off-by: Dan Hoeflinger <dan.hoeflinger@intel.com>
Signed-off-by: Dan Hoeflinger <dan.hoeflinger@intel.com>
Signed-off-by: Dan Hoeflinger <dan.hoeflinger@intel.com>
Signed-off-by: Dan Hoeflinger <dan.hoeflinger@intel.com>
d2f99dc
to
455cf9c
Compare
Signed-off-by: Dan Hoeflinger <dan.hoeflinger@intel.com>
Signed-off-by: Dan Hoeflinger <dan.hoeflinger@intel.com>
_DiffType __local_n = __global_histogram_last - __global_histogram_first; | ||
std::uint32_t __num_temporary_copies = __tls.size(); | ||
_DiffType __range_begin_id = __global_histogram_first - __histogram_first; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
_DiffType __local_n = __global_histogram_last - __global_histogram_first; | |
std::uint32_t __num_temporary_copies = __tls.size(); | |
_DiffType __range_begin_id = __global_histogram_first - __histogram_first; | |
const _DiffType __local_n = __global_histogram_last - __global_histogram_first; | |
const std::uint32_t __num_temporary_copies = __tls.size(); | |
const _DiffType __range_begin_id = __global_histogram_first - __histogram_first; |
_PSTL_PRAGMA(omp single) { __thread_specific_storage.resize(omp_get_num_threads()); } | ||
} | ||
|
||
// Note: Size should not be used concurrantly with parallel loops which may instantiate storage objects, as it may |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
// Note: Size should not be used concurrantly with parallel loops which may instantiate storage objects, as it may | |
// Note: Size should not be used concurrently with parallel loops which may instantiate storage objects, as it may |
__backend_tag{}, std::forward<_ExecutionPolicy>(__exec), __histogram_first, __histogram_first + __num_bins, | ||
[__histogram_first, &__tls](auto __global_histogram_first, auto __global_histogram_last) { | ||
_DiffType __local_n = __global_histogram_last - __global_histogram_first; | ||
std::uint32_t __num_temporary_copies = __tls.size(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
After the first parallel_for, __tls.size()
should no longer grow, correct? What about moving __num_temporary_copies
outside of this parallel_for and capturing it into the lambda? This way we only perform a single atomic load.
using _DiffType = typename std::iterator_traits<_RandomAccessIterator2>::difference_type; | ||
|
||
_DiffType __n = __last - __first; | ||
if (__n > 0) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't insist... But there is a common good practice to process a shorter branch of if..else
operator first. It has better readability. Also, in case of immediate return from the shorter branch we have the "non indented" code of the bigger branch:
if(n ==0)
{
//shorter branch
return;
}
//else: bigger branch - no indents.
.....
Implementation of histogram APIs for host backends.
Implementations are provided for
serial
,tbb
, andopenMP
backends. We add a generic__thread_enumerable_storage
struct to add a generic thread local storage for our host backends. We use the new TLS (Thread local storage) withparallel_for
to implement histogram. Testing is also added, and some minor adjustments are made to cmake.Please see the RFC documentation / discussion here for more details.