Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Minor improvements to the documentation #292

Open
wants to merge 2 commits into
base: master
Choose a base branch
from
Open

Conversation

devreal
Copy link
Contributor

@devreal devreal commented Jun 11, 2024

No description provided.

@devreal devreal requested a review from evaleev June 11, 2024 14:30
TTG C++ implementation is currently supported by 2 backends providing task scheduling, data transfer, and resource management.
While it is possible to use specific TTG backend explicitly, by using the appropriate namespaces, it is recommended to write backend-neutral programs that can be specialized to a particular backend as follows.
The TTG C++ implementation is currently supported by 2 backends providing task scheduling, data transfer, and resource management.
While it is possible to use a specific TTG backend explicitly, by using the appropriate namespaces, it is recommended to write backend-neutral programs that can be specialized to a particular backend in of two ways.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

... in *one of *the two ways

README.md Outdated
@@ -154,8 +160,7 @@ To execute a TTG we must make it executable (this will declare the TTG program
tt->invoke();
```

`ttg::execute()` must occur before, not after, sending any messages. Note also that we must ensure that only one such message must be generated. Since TTG execution uses the Single Program Multiple Data (SPMD) model,
when launching the TTG program as multiple processes only the first process (rank) gets to send the message.
`ttg::execute()` must occur before, not after, sending any messages. Note also that we must ensure that only one such message is generated. Since TTG execution uses the Single Program Multiple Data (SPMD) model, when launching the TTG program as multiple processes only the first process (rank) gets to send the message. Otherwise,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Otherwise ?

README.md Outdated
@@ -422,22 +447,24 @@ Although the structure of the device-capable program is nearly identical to the

##### `TTValue`

For optimal performance low-level runtime that manages the data motion across the memory hierarchy (host-to-host (i.e., between MPI ranks), host-to-device, and device-to-device) must be able to _track_ each datum as it orchestrates the computation. For example, when a TTG task `send`'s a datum to an output terminal connected to multiple consumers the runtime may avoid unnecessary copies, e.g. by recognizing that all consumers will only need read-only access to the data, hence reference to the same datum can be passed to all consumers. This requires being able to map pointer to a C++ object to the control block that describes that object to the runtime. Deriving C++ type `T` from `TTValue<T>` makes it possible to track objects `T` by embedding the control block into each object. This is particularly important for the data that has to travel to the device.
For optimal performance, the low-level runtime that manages the data motion across the memory hierarchy (host-to-host (i.e., between MPI ranks), host-to-device, and device-to-device) and so it must be able to _track_ each datum as it orchestrates the computation. For example, when a TTG task `send`'s a datum to an output terminal connected to multiple consumers the runtime may avoid unnecessary copies, e.g., by recognizing that all consumers will only need read-only access to the data, hence reference to the same datum can be passed to all consumers. This requires the mapping of a pointer to a C++ object to the control block that describes that object to the runtime. Deriving C++ type `T` from `TTValue<T>` includes the control block in `T` and avoids creating a separate control block. This is particularly important for the data that has to travel to the device.
Copy link
Contributor

@nileshchat nileshchat Jun 11, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

... a TTG task send's sends a datum ...

README.md Outdated
ttg::execute();
// add a single task into the taskpool
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

since we never introduce implementation details like taskpools, I prefer something like // create task to kickstart computation

README.md Outdated
@@ -28,15 +28,21 @@ The development of TTG was motivated by _irregular_ scientific applications like
#include <ttg.h>

int main(int argc, char *argv[]) {
// initialization
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can just skip comments that look like code

README.md Outdated
ttg::fence();

// finalization
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can just skip comments that look like code

README.md Outdated
@@ -195,28 +200,40 @@ $F_N = F_{N-1} + F_{N-2}, F_0=0, F_1=1$.
int main(int argc, char *argv[]) {
ttg::initialize(argc, argv);

const int64_t N = 20;
const int64_t N = 20; // want to compute Fib(20)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should match the TT name, so Fib -> fib

README.md Outdated
ttg::Edge<int64_t, Fn> f2f;
ttg::Edge<void, Fn> f2p;
auto make_ttg_fib_lt(const int64_t F_n_max) {
ttg::Edge<int64_t, Fn> f2f; // Fib to Fib
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fib -> fib

README.md Outdated
ttg::Edge<void, Fn> f2p;
auto make_ttg_fib_lt(const int64_t F_n_max) {
ttg::Edge<int64_t, Fn> f2f; // Fib to Fib
ttg::Edge<void, Fn> f2p; // Fib to print
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fib -> print

README.md Outdated
`Buffer<T>` is a view of a contiguous sequence of objects of type `T` in the host memory that can be automatically moved by the runtime to/from the device memory. Here `Fn::b` is a view of the 2-element sequence pointed to by `Fn::F`; once it's constructed the content of `Fn::F` will be moved to/from the device by the runtime. The subsequent actions of `Fn::b` cause the automatic transfers of data to (`device::select(f_n.b)`) and from (`ttg::device::wait(f_n.b)`) the device.
`Buffer<T>` is a view of a contiguous sequence of objects of type `T` in the host memory that can be automatically moved by the runtime to/from the device memory. Here `Fn::b` is a view of the 2-element sequence pointed to by `Fn::F`; once it's constructed the content of `Fn::F` will be moved to/from the device by the runtime. The subsequent use of `Fn::b` cause the automatic transfers of data to (`device::select(f_n.b)`) and from (`ttg::device::wait(f_n.b)`) the device.
A `Buffer<T>` can be either owning or non-owning. In the example above, the memory is owned by the `unique_ptr`.
If no pointer is passed to the constructor of `Buffer<T>` the buffer becomes owning, i.e., it allocates the necessary host-side memory.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

do we need to get in the weeds of why F must be on heap to give relocatability?

README.md Outdated
@@ -493,8 +518,7 @@ To simplify debugging of multirank TTG programs it is possible to automate the p
# TTG Performance

Competitive performance of TTG for several paradigmatic scientific applications on shared- and distributed-memory machines (CPU only)
will be discussed in [manuscript ``Generalized Flow-Graph Programming Using Template Task-Graphs: Initial Implementation and Assessment''](https://www.ipdps.org/ipdps2022/2022-accepted-papers.html) to be presented at [IPDPS'22](https://www.ipdps.org/ipdps2022/).
Stay tuned!
will be discussed in [manuscript ``Generalized Flow-Graph Programming Using Template Task-Graphs: Initial Implementation and Assessment''](https://www.ipdps.org/ipdps2022/2022-accepted-papers.html) and has been presented at [IPDPS'22](https://www.ipdps.org/ipdps2022/).
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"will be" -> "is"

@evaleev
Copy link
Contributor

evaleev commented Jun 18, 2024

@devreal ping

Signed-off-by: Joseph Schuchart <joseph.schuchart@stonybrook.edu>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants