Skip to content
This repository has been archived by the owner on Jan 13, 2025. It is now read-only.

Fixed issues encountered through oneMKL portBLAS backend #504

Merged
merged 12 commits into from
Apr 11, 2024

Conversation

OuadiElfarouki
Copy link
Collaborator

@OuadiElfarouki OuadiElfarouki commented Mar 7, 2024

This patch includes following changes :

  • Re-enabled half data type support for CPUs the same way as the other devices (half support no longer disabled when picking tuning_target=default_cpu)
  • Reverted changes (introduced in PR 493) to the BufferIterator<T>::get_range_accessor() that caused some accessors to be not initialized (sycl::property::no_init{}) and thus yield erroneous test results.
  • Modify WGAtomicReduction signature to comply with different usm memory allocation and atomic initialization. Adding usmManagedMem to flag if memory is allocated using managed memory which requires generic_space in atomic initialization, especially on AMD hardware. Flag added not to cause performance degradation to the operators that use this kernel.

@s-Nick s-Nick requested review from Rbiessy and hjabird March 21, 2024 11:03
@pgorlani
Copy link
Contributor

Thanks for your PR, @s-Nick.

Could you please update the PR comment including your modification to WGAtomicReduction?

Comment on lines -202 to -206
// Skip data initialization if not accessing in read mode only
return typename BufferIterator<element_t>::template accessor_t<acc_md_t>(
buffer_, cgh, cl::sycl::range<1>(size),
cl::sycl::id<1>(BufferIterator<element_t>::get_offset()),
cl::sycl::property::no_init{});
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does this have a performance impact for devices where this code was originally working?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@hjabird This was actually added to get rid of some warnings regarding buffers initialization when using AdaptiveCpp, but introducing it caused some tests to fail through the portBLAS backend API in oneMKL, that's why we're reverting. No performance aspects have been tested/verified but this change was tested with AdaptiveCpp (warnings are back but no errors/tests failures).

# * @filename CMakeLists.txt
# *
# **************************************************************************/
cmake_minimum_required(VERSION 3.4.3)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm curious about the choice of 3.4.3 - has this been tested?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes I just used the same version required by the main CMakeLists.txt, as I got some warnings/errors if I still remember when building the samples alone (header only with portBLAS)

src/interface/blas1/backend/amd_gpu.hpp Outdated Show resolved Hide resolved
src/interface/blas1/backend/amd_gpu.hpp Show resolved Hide resolved
src/interface/blas1/backend/amd_gpu.hpp Outdated Show resolved Hide resolved
src/interface/blas1_interface.hpp Outdated Show resolved Hide resolved
OuadiElfarouki and others added 12 commits April 10, 2024 10:35
AMD atomic operation implementation requires some specific hardware to
work properly with current reduction kernel. This patch adds a check for
AMD only to provides the correct result even if the specific hardware is
not available.

Signed-off-by: nscipione <nicolo.scipione@codeplay.com>
Signed-off-by: nscipione <nicolo.scipione@codeplay.com>
buffer only

Signed-off-by: nscipione <nicolo.scipione@codeplay.com>
Add documentation for `usmManagedMem` template parameter.

Signed-off-by: nscipione <nicolo.scipione@codeplay.com>
Co-authored-by: HJA Bird <hja.bird@gmail.com>
Co-authored-by: pgorlani <92453485+pgorlani@users.noreply.github.com>
comments

Addressing PR comments on memory allocation type function checker and updating
comment.
Fixing documentation.

Signed-off-by: nscipione <nicolo.scipione@codeplay.com>
@s-Nick s-Nick force-pushed the acpp_regression_fixes branch from f1b4ee3 to aca6ee0 Compare April 10, 2024 09:37
@s-Nick s-Nick merged commit 11e8b0b into codeplaysoftware:master Apr 11, 2024
3 checks passed
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants