-
Notifications
You must be signed in to change notification settings - Fork 65
/
Chap_parallel_execution.tex
130 lines (107 loc) · 5.97 KB
/
Chap_parallel_execution.tex
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
\cchapter{Parallel Execution}{parallel_execution}
\label{chap:parallel_execution}
A single thread, the \plc{initial thread}, begins sequential execution of
an OpenMP enabled program, as if the whole program is in an implicit parallel
region consisting of an implicit task executed by the \plc{initial thread}.
A \kcode{parallel} construct encloses code,
forming a parallel region. An \plc{initial thread} encountering a \kcode{parallel}
region forks (creates) a team of threads at the beginning of the
\kcode{parallel} region, and joins them (removes from execution) at the
end of the region. The initial thread becomes the primary thread of the team in a
\kcode{parallel} region with a \plc{thread} number equal to zero, the other
threads are numbered from 1 to number of threads minus 1.
A team may be comprised of just a single thread.
Each \plc{thread} of a team is assigned an implicit task consisting of code within the
\kcode{parallel} region. The task that creates a \kcode{parallel} region is suspended while the
tasks of the team are executed. A thread is tied to its task; that is,
only the thread assigned to the task can execute that task. After completion
of the \kcode{parallel} region, the primary thread resumes execution of the generating task.
%After the \code{parallel} region the primary thread becomes the initial
%thread again, and continues to execute the \plc{sequential part}.
Any task within a \kcode{parallel} region is allowed to encounter another
\kcode{parallel} region to form a nested \kcode{parallel} region. The
parallelism of a nested \kcode{parallel} region (whether it forks additional
threads, or is executed serially by the encountering task) can be controlled by the
\kcode{OMP_NESTED} environment variable or the \kcode{omp_set_nested()}
API routine with arguments indicating true or false.
The number of threads of a \kcode{parallel} region can be set by the \kcode{OMP_NUM_THREADS}
environment variable, the \kcode{omp_set_num_threads()} routine, or on the \kcode{parallel}
directive with the \kcode{num_threads}
clause. The routine overrides the environment variable, and the clause overrides all.
Use the \kcode{OMP_DYNAMIC}
or the \kcode{omp_set_dynamic()} function to specify that the OpenMP
implementation dynamically adjust the number of threads for
\kcode{parallel} regions. The default setting for dynamic adjustment is implementation
defined. When dynamic adjustment is on and the number of threads is specified,
the number of threads becomes an upper limit for the number of threads to be
provided by the OpenMP runtime.
%\pagebreak
\bigskip
WORKSHARING CONSTRUCTS
A worksharing construct distributes the execution of the associated region
among the members of the team that encounter it. There is an
implied barrier at the end of the worksharing region
(there is no barrier at the beginning).
\newpage
The worksharing constructs are:
\begin{compactitem}
\item loop constructs: {\kcode{for} and \kcode{do} }
\item \kcode{sections}
\item \kcode{single}
\item \kcode{workshare}
\end{compactitem}
The \kcode{for} and \kcode{do} constructs (loop constructs) create a region
consisting of a loop. A loop controlled by a loop construct is called
an \plc{associated} loop. Nested loops can form a single region when the
\kcode{collapse} clause (with an integer argument) designates the number of
\plc{associated} loops to be executed in parallel, by forming a
``single iteration space'' for the specified number of nested loops.
The \kcode{ordered} clause can also control multiple associated loops.
An associated loop must adhere to a ``canonical form'' (specified in the
\docref{Canonical Loop Form} of the OpenMP Specifications document) which allows the
iteration count (of all associated loops) to be computed before the
(outermost) loop is executed. %[58:27-29].
Most common loops comply with the canonical form, including C++ iterators.
A \kcode{single} construct forms a region in which only one thread (any one
of the team) executes the region.
The other threads wait at the implied
barrier at the end, unless the \kcode{nowait} clause is specified.
The \kcode{sections} construct forms a region that contains one or more
structured blocks. Each block of a \kcode{sections} directive is
constructed with a \kcode{section} construct, and executed once by
one of the threads (any one) in the team. (If only one block is
formed in the region, the \kcode{section} construct, which is used to
separate blocks, is not required.)
The other threads wait at the implied
barrier at the end, unless the \kcode{nowait} clause is specified.
The \kcode{workshare} construct is a Fortran feature that consists of a
region with a single structure block (section of code). Statements in the
\kcode{workshare} region are divided into units of work, and executed (once)
by threads of the team.
\bigskip
MASKED CONSTRUCT
The \kcode{masked} construct is not a worksharing construct. The \kcode{masked} region is
executed only by the primary thread. There is no implicit barrier (and flush)
at the end of the \kcode{masked} region; hence the other threads of the team continue
execution beyond code statements beyond the \kcode{masked} region.
The \kcode{master} construct, which has been deprecated in OpenMP 5.1, has identical semantics
to the \kcode{masked} construct with no \kcode{filter} clause.
%===== Examples Sections =====
\input{parallel_execution/ploop}
\input{parallel_execution/parallel}
\input{parallel_execution/host_teams}
\input{parallel_execution/nthrs_nesting}
\input{parallel_execution/nthrs_dynamic}
\input{parallel_execution/fort_do}
\input{parallel_execution/nowait}
\input{parallel_execution/collapse}
\input{parallel_execution/linear_in_loop}
\input{parallel_execution/psections}
\input{parallel_execution/fpriv_sections}
\input{parallel_execution/single}
\input{parallel_execution/workshare}
\input{parallel_execution/masked}
\input{parallel_execution/loop}
\input{parallel_execution/pra_iterator}
\input{parallel_execution/set_dynamic_nthrs}
\input{parallel_execution/get_nthrs}