Skip to content

Commit

Permalink
v5.2.2 release
Browse files Browse the repository at this point in the history
  • Loading branch information
Henry Jin committed Apr 16, 2024
1 parent 075683d commit 11f2efc
Show file tree
Hide file tree
Showing 183 changed files with 5,545 additions and 3,897 deletions.
36 changes: 18 additions & 18 deletions Chap_SIMD.tex
Original file line number Diff line number Diff line change
Expand Up @@ -8,34 +8,34 @@
Many processors have SIMD (vector) units that can perform simultaneously
2, 4, 8 or more executions of the same operation (by a single SIMD unit).

Loops without loop-carried backward dependency (or with dependency preserved using
ordered simd) are candidates for vectorization by the compiler for
Loops without loop-carried backward dependences (or with dependences preserved using
\kcode{ordered simd}) are candidates for vectorization by the compiler for
execution with SIMD units. In addition, with state-of-the-art vectorization
technology and \code{declare simd} directive extensions for function vectorization
technology and \kcode{declare simd} directive extensions for function vectorization
in the OpenMP 4.5 specification, loops with function calls can be vectorized as well.
The basic idea is that a scalar function call in a loop can be replaced by a vector version
of the function, and the loop can be vectorized simultaneously by combining a loop
vectorization (\code{simd} directive on the loop) and a function
vectorization (\code{declare simd} directive on the function).
vectorization (\kcode{simd} directive on the loop) and a function
vectorization (\kcode{declare simd} directive on the function).

A \code{simd} construct states that SIMD operations be performed on the
A \kcode{simd} construct states that SIMD operations be performed on the
data within the loop. A number of clauses are available to provide
data-sharing attributes (\code{private}, \code{linear}, \code{reduction} and
\code{lastprivate}). Other clauses provide vector length preference/restrictions
(\code{simdlen} / \code{safelen}), loop fusion (\code{collapse}), and data
alignment (\code{aligned}).
data-sharing attributes (\kcode{private}, \kcode{linear}, \kcode{reduction} and
\kcode{lastprivate}). Other clauses provide vector length preference/restrictions
(\kcode{simdlen} / \kcode{safelen}), loop fusion (\kcode{collapse}), and data
alignment (\kcode{aligned}).

The \code{declare simd} directive designates
The \kcode{declare simd} directive designates
that a vector version of the function should also be constructed for
execution within loops that contain the function and have a \code{simd}
directive. Clauses provide argument specifications (\code{linear},
\code{uniform}, and \code{aligned}), a requested vector length
(\code{simdlen}), and designate whether the function is always/never
called conditionally in a loop (\code{notinbranch}/\code{inbranch}).
execution within loops that contain the function and have a \kcode{simd}
directive. Clauses provide argument specifications (\kcode{linear},
\kcode{uniform}, and \kcode{aligned}), a requested vector length
(\kcode{simdlen}), and designate whether the function is always/never
called conditionally in a loop (\kcode{notinbranch}/\kcode{inbranch}).
The latter is for optimizing performance.

Also, the \code{simd} construct has been combined with the worksharing loop
constructs (\code{for simd} and \code{do simd}) to enable simultaneous thread
Also, the \kcode{simd} construct has been combined with the worksharing loop
constructs (\kcode{for simd} and \kcode{do simd}) to enable simultaneous thread
execution in different SIMD units.
%Hence, the \code{simd} construct can be
%used alone on a loop to direct vectorization (SIMD execution), or in
Expand Down
32 changes: 16 additions & 16 deletions Chap_affinity.tex
Original file line number Diff line number Diff line change
@@ -1,7 +1,7 @@
\cchapter{OpenMP Affinity}{affinity}
\label{chap:openmp_affinity}

OpenMP Affinity consists of a \code{proc\_bind} policy (thread affinity policy) and a specification of
OpenMP Affinity consists of a \kcode{proc_bind} policy (thread affinity policy) and a specification of
places (``location units'' or \plc{processors} that may be cores, hardware
threads, sockets, etc.).
OpenMP Affinity enables users to bind computations on specific places.
Expand All @@ -11,13 +11,13 @@
if two or more cores (hardware threads, sockets, etc.) have been assigned to a given place.

Often the binding can be managed without resorting to explicitly setting places.
Without the specification of places in the \code{OMP\_PLACES} variable,
Without the specification of places in the \kcode{OMP_PLACES} variable,
the OpenMP runtime will distribute and bind threads using the entire range of processors for
the OpenMP program, according to the \code{OMP\_PROC\_BIND} environment variable
or the \code{proc\_bind} clause. When places are specified, the OMP runtime
the OpenMP program, according to the \kcode{OMP_PROC_BIND} environment variable
or the \kcode{proc_bind} clause. When places are specified, the OMP runtime
binds threads to the places according to a default distribution policy, or
those specified in the \code{OMP\_PROC\_BIND} environment variable or the
\code{proc\_bind} clause.
those specified in the \kcode{OMP_PROC_BIND} environment variable or the
\kcode{proc_bind} clause.

In the OpenMP Specifications document a processor refers to an execution unit that
is enabled for an OpenMP thread to use. A processor is a core when there is
Expand All @@ -31,7 +31,7 @@

The processors available to a process may be a subset of the system's
processors. This restriction may be the result of a
wrapper process controlling the execution (such as \code{numactl} on Linux systems),
wrapper process controlling the execution (such as \plc{numactl} on Linux systems),
compiler options, library-specific environment variables, or default
kernel settings. For instance, the execution of multiple MPI processes,
launched on a single compute node, will each have a subset of processors as
Expand All @@ -53,20 +53,20 @@

Threads of a team are positioned onto places in a compact manner, a
scattered distribution, or onto the primary thread's place, by setting the
\code{OMP\_PROC\_BIND} environment variable or the \code{proc\_bind} clause to
\code{close}, \code{spread}, or \code{primary} (\code{master} has been deprecated), respectively. When
\code{OMP\_PROC\_BIND} is set to FALSE no binding is enforced; and
\kcode{OMP_PROC_BIND} environment variable or the \kcode{proc_bind} clause to
\kcode{close}, \kcode{spread}, or \kcode{primary} (\kcode{master} has been deprecated), respectively. When
\kcode{OMP_PROC_BIND} is set to FALSE no binding is enforced; and
when the value is TRUE, the binding is implementation defined to
a set of places in the \code{OMP\_PLACES} variable or to places
defined by the implementation if the \code{OMP\_PLACES} variable
a set of places in the \kcode{OMP_PLACES} variable or to places
defined by the implementation if the \kcode{OMP_PLACES} variable
is not set.

The \code{OMP\_PLACES} variable can also be set to an abstract name
(\code{threads}, \code{cores}, \code{sockets}) to specify that a place is
The \kcode{OMP_PLACES} variable can also be set to an abstract name
(\kcode{threads}, \kcode{cores}, \kcode{sockets}) to specify that a place is
either a single hardware thread, a core, or a socket, respectively.
This description of the \code{OMP\_PLACES} is most useful when the
This description of the \kcode{OMP_PLACES} is most useful when the
number of threads is equal to the number of hardware thread, cores
or sockets. It can also be used with a \code{close} or \code{spread}
or sockets. It can also be used with a \kcode{close} or \kcode{spread}
distribution policy when the equality doesn't hold.


Expand Down
35 changes: 18 additions & 17 deletions Chap_data_environment.tex
Original file line number Diff line number Diff line change
@@ -1,12 +1,12 @@
\cchapter{Data Environment}{data_environment}
\label{chap:data_environment}
The OpenMP \plc{data environment} contains data attributes of variables and
objects. Many constructs (such as \code{parallel}, \code{simd}, \code{task})
objects. Many constructs (such as \kcode{parallel}, \kcode{simd}, \kcode{task})
accept clauses to control \plc{data-sharing} attributes
of referenced variables in the construct, where \plc{data-sharing} applies to
whether the attribute of the variable is \plc{shared},
is \plc{private} storage, or has special operational characteristics
(as found in the \code{firstprivate}, \code{lastprivate}, \code{linear}, or \code{reduction} clause).
(as found in the \kcode{firstprivate}, \kcode{lastprivate}, \kcode{linear}, or \kcode{reduction} clause).

The data environment for a device (distinguished as a \plc{device data environment})
is controlled on the host by \plc{data-mapping} attributes, which determine the
Expand All @@ -21,57 +21,57 @@

Certain variables and objects have predetermined attributes.
A commonly found case is the loop iteration variable in associated loops
of a \code{for} or \code{do} construct. It has a private data-sharing attribute.
of a \kcode{for} or \kcode{do} construct. It has a private data-sharing attribute.
Variables with predetermined data-sharing attributes cannot be listed in a data-sharing clause; but there are some
exceptions (mainly concerning loop iteration variables).

Variables with explicitly determined data-sharing attributes are those that are
referenced in a given construct and are listed in a data-sharing attribute
clause on the construct. Some of the common data-sharing clauses are:
\code{shared}, \code{private}, \code{firstprivate}, \code{lastprivate},
\code{linear}, and \code{reduction}. % Are these all of them?
\kcode{shared}, \kcode{private}, \kcode{firstprivate}, \kcode{lastprivate},
\kcode{linear}, and \kcode{reduction}. % Are these all of them?

Variables with implicitly determined data-sharing attributes are those
that are referenced in a given construct, do not have predetermined
data-sharing attributes, and are not listed in a data-sharing
attribute clause of an enclosing construct.
For a complete list of variables and objects with predetermined and
implicitly determined attributes, please refer to the
\plc{Data-sharing Attribute Rules for Variables Referenced in a Construct}
\docref{Data-sharing Attribute Rules for Variables Referenced in a Construct}
subsection of the OpenMP Specifications document.

\bigskip
DATA-MAPPING ATTRIBUTES

The \code{map} clause on a device construct explicitly specifies how the list items in
The \kcode{map} clause on a device construct explicitly specifies how the list items in
the clause are mapped from the encountering task's data environment (on the host)
to the corresponding item in the device data environment (on the device).
The common \plc{list items} are arrays, array sections, scalars, pointers, and
structure elements (members).

Procedures and global variables have predetermined data mapping if they appear
within the list or block of a \code{declare}~\code{target} directive. Also, a C/C++ pointer
within the list or block of a \kcode{declare target} directive. Also, a C/C++ pointer
is mapped as a zero-length array section, as is a C++ variable that is a reference to a pointer.
% Waiting for response from Eric on this.

Without explicit mapping, non-scalar and non-pointer variables within the scope of the \code{target}
construct are implicitly mapped with a \plc{map-type} of \code{tofrom}.
Without explicit mapping, scalar variables within the scope of the \code{target}
Without explicit mapping, non-scalar and non-pointer variables within the scope of the \kcode{target}
construct are implicitly mapped with a \plc{map-type} of \kcode{tofrom}.
Without explicit mapping, scalar variables within the scope of the \kcode{target}
construct are not mapped, but have an implicit firstprivate data-sharing
attribute. (That is, the value of the original variable is given to a private
variable of the same name on the device.) This behavior can be changed with
the \code{defaultmap} clause.
the \kcode{defaultmap} clause.

The \code{map} clause can appear on \code{target}, \code{target data} and
\code{target enter/exit data} constructs. The operations of creation and
The \kcode{map} clause can appear on \kcode{target}, \kcode{target data} and
\kcode{target enter/exit data} constructs. The operations of creation and
removal of device storage as well as assignment of the original list item
values to the corresponding list items may be complicated when the list
item appears on multiple constructs or when the host and device storage
is shared. In these cases the item's reference count, the number of times
it has been referenced (+1 on entry and -1 on exited) in nested (structured)
it has been referenced (increment by 1 on entry and decrement by 1 on exit) in nested (structured)
map regions and/or accumulative (unstructured) mappings, determines the operation.
Details of the \code{map} clause and reference count operation are specified
in the \plc{map Clause} subsection of the OpenMP Specifications document.
Details of the \kcode{map} clause and reference count operation are specified
in the \docref{\kcode{map} Clause} subsection of the OpenMP Specifications document.


%===== Examples Sections =====
Expand All @@ -81,6 +81,7 @@
\input{data_environment/fort_loopvar}
\input{data_environment/fort_sp_common}
\input{data_environment/fort_sa_private}
\input{data_environment/fort_shared_var}
\input{data_environment/carrays_fpriv}
\input{data_environment/lastprivate}
\input{data_environment/reduction}
Expand Down
35 changes: 18 additions & 17 deletions Chap_devices.tex
Original file line number Diff line number Diff line change
@@ -1,9 +1,9 @@
\cchapter{Devices}{devices}
\label{chap:devices}

The \code{target} construct consists of a \code{target} directive
and an execution region. The \code{target} region is executed on
the default device or the device specified in the \code{device}
The \kcode{target} construct consists of a \kcode{target} directive
and an execution region. The \kcode{target} region is executed on
the default device or the device specified in the \kcode{device}
clause.

In OpenMP version 4.0, by default, all variables within the lexical
Expand All @@ -16,39 +16,39 @@
The constructs that explicitly
create storage, transfer data, and free storage on the device
are categorized as structured and unstructured. The
\code{target} \code{data} construct is structured. It creates
a data region around \code{target} constructs, and is
\kcode{target data} construct is structured. It creates
a data region around \kcode{target} constructs, and is
convenient for providing persistent data throughout multiple
\code{target} regions. The \code{target} \code{enter} \code{data} and
\code{target} \code{exit} \code{data} constructs are unstructured, because
\kcode{target} regions. The \kcode{target enter data} and
\kcode{target exit data} constructs are unstructured, because
they can occur anywhere and do not support a ``structure''
(a region) for enclosing \code{target} constructs, as does the
\code{target} \code{data} construct.
(a region) for enclosing \kcode{target} constructs, as does the
\kcode{target data} construct.

The \code{map} clause is used on \code{target}
The \kcode{map} clause is used on \kcode{target}
constructs and the data-type constructs to map host data. It
specifies the device storage and data movement \code{to} and \code{from}
specifies the device storage and data movement \plc{to} and \plc{from}
the device, and controls on the storage duration.

There is an important change in the OpenMP 4.5 specification
that alters the data model for scalar variables and C/C++ pointer variables.
The default behavior for scalar variables and C/C++ pointer variables
in a 4.5 compliant code is \code{firstprivate}. Example
in a 4.5 compliant code is \kcode{firstprivate}. Example
codes that have been updated to reflect this new behavior are
annotated with a description that describes changes required
for correct execution. Often it is a simple matter of mapping
the variable as \code{tofrom} to obtain the intended 4.0 behavior.
the variable as \kcode{tofrom} to obtain the intended 4.0 behavior.

In OpenMP version 4.5 the mechanism for target
execution is specified as occurring through a \plc{target task}.
When the \code{target} construct is encountered a new
\plc{target task} is generated. The \plc{target task}
completes after the \code{target} region has executed and all data
When the \kcode{target} construct is encountered a new
target task is generated. The target task
completes after the \kcode{target} region has executed and all data
transfers have finished.

This new specification does not affect the execution of
pre-4.5 code; it is a necessary element for asynchronous
execution of the \code{target} region when using the new \code{nowait}
execution of the \kcode{target} region when using the new \kcode{nowait}
clause introduced in OpenMP 4.5.


Expand All @@ -59,6 +59,7 @@
\input{devices/target_structure_mapping}
\input{devices/target_fort_allocatable_array_mapping}
\input{devices/array_sections}
\input{devices/usm}
\input{devices/C++_virtual_functions}
\input{devices/array_shaping}
\input{devices/target_mapper}
Expand Down
30 changes: 15 additions & 15 deletions Chap_directives.tex
Original file line number Diff line number Diff line change
Expand Up @@ -2,7 +2,7 @@
\label{chap:directive_syntax}
\index{directive syntax}

OpenMP \emph{directives} use base-language mechanisms to specify OpenMP program behavior.
OpenMP \plc{directives} use base-language mechanisms to specify OpenMP program behavior.
In C code, the directives are formed exclusively with pragmas, whereas in C++
code, directives are formed from either pragmas or attributes.
Fortran directives are formed with comments in free form and fixed form sources (codes).
Expand All @@ -20,36 +20,36 @@

C/C++ pragmas
\begin{indentedcodelist}
\code{\#pragma omp} \plc{directive-specification}
\kcode{\#pragma omp} \plc{directive-specification}
\end{indentedcodelist}

C++ attributes
\begin{indentedcodelist}
\code{[[omp :: directive(} \plc{directive-specification} \code{)]]}
\code{[[using omp : directive(} \plc{directive-specification} \code{)]]}
\kcode{[[omp :: directive( \plc{directive-specification} )]]}
\kcode{[[using omp : directive( \plc{directive-specification} )]]}
\end{indentedcodelist}

Fortran comments
\begin{indentedcodelist}
\code{!\$omp} \plc{directive-specification}
\scode{!$omp} \plc{directive-specification}
\end{indentedcodelist}
where \code{c\$omp} and \code{*\$omp} may be used in Fortran fixed form sources.
where \scode{c$omp} and \scode{*$omp} may be used in Fortran fixed form sources.
Most OpenMP directives accept clauses that alter the semantics of the directive in some way,
and some directives also accept parenthesized arguments that follow the directive name.
A clause may just be a keyword (e.g., \scode{untied}) or it may also accept argument lists
(e.g., \scode{shared(x,y,z)}) and/or optional modifiers (e.g., \scode{tofrom} in
\scode{map(tofrom:}~\scode{x,y,z)}).
A clause may just be a keyword (e.g., \kcode{untied}) or it may also accept argument lists
(e.g., \kcode{shared(\ucode{x,y,z})}) and/or optional modifiers (e.g., \kcode{tofrom} in
\kcode{map(tofrom: \ucode{x,y,z})}).
Clause modifiers may be ``simple'' or ``complex'' -- a complex modifier consists of a
keyword followed by one or more parameters, bracketed by parentheses, while a simple
modifier does not. An example of a complex modifier is the \scode{iterator} modifier,
as in \scode{map(iterator(i=0:n),}~\scode{tofrom:}~\scode{p[i])}, or the \scode{step} modifier, as in
\scode{linear(x:}~\scode{ref,}~\scode{step(4))}.
In the preceding examples, \scode{tofrom} and \scode{ref} are simple modifiers.
modifier does not. An example of a complex modifier is the \kcode{iterator} modifier,
as in \kcode{map(iterator(\ucode{i=0:n}), tofrom: \ucode{p[i]})}, or the \kcode{step} modifier, as in
\kcode{linear(\ucode{x}: ref, step(\ucode{4}))}.
In the preceding examples, \kcode{tofrom} and \kcode{ref} are simple modifiers.
For Fortran, a declarative directive (such as \code{declare}~\code{reduction})
must appear after any \code{USE}, \code{IMPORT}, and \code{IMPLICIT} statements
For Fortran, a declarative directive (such as \kcode{declare reduction})
must appear after any \bcode{USE}, \bcode{IMPORT}, and \bcode{IMPLICIT} statements
in the specification part.
Expand Down
Loading

0 comments on commit 11f2efc

Please sign in to comment.