RFC for histogram CPU implementation #1930

danhoeflinger · 2024-11-01T19:23:24Z

Adds an RFC for histogram CPU implementation.

Signed-off-by: Dan Hoeflinger <dan.hoeflinger@intel.com>

rfcs/proposed/host_backend_histogram/README.md

Signed-off-by: Dan Hoeflinger <dan.hoeflinger@intel.com>

rfcs/proposed/host_backend_histogram/README.md

akukanov · 2024-11-04T15:23:30Z

rfcs/proposed/host_backend_histogram/README.md

+
+* Is it worthwhile to have separate implementations for TBB and OpenMP because they may differ in the best-performing implementation? What is the best heuristic for selecting between algorithms (if one is not the clear winner)?
+
+* How will vectorized bricks perform, and in what situations will it be advantageous to use or not use vector instructions?


A couple of papers that might be of interest:

https://www.researchgate.net/publication/221131224_SIMD_Vectorization_of_Histogram_Functions

https://www.researchgate.net/publication/266660552_Versatile_and_scalable_parallel_histogram_construction

Thanks, these are interesting, and mainly follow similar lines to the approaches outlined here. I will look more into the pcwar SIMD instruction to understand if that is viable to use here.

While its interesting to explore, I believe we basically depend on OpenMP simd to provide our SIMD operations. We won't be able to serve more complex intrinsic operations if we want to stick to that.

Further, I'm not actually finding anything like the pcwar instruction(s) they refer to here in https://www.intel.com/content/www/us/en/docs/intrinsics-guide/index.html. I do find some for conflict detection across lane which could be useful, but again those won't be available through the interface of OpenMP.

I think our options are basically to decline SIMD or to have duplicates of the output for each SIMD lane. I think even that may be more "hands on" with SIMD details than we have done thus far from oneDPL.

rfcs/proposed/host_backend_histogram/README.md

Signed-off-by: Dan Hoeflinger <dan.hoeflinger@intel.com>

more formatting fixes Signed-off-by: Dan Hoeflinger <dan.hoeflinger@intel.com>

rfcs/proposed/host_backend_histogram/README.md

akukanov · 2024-11-06T21:37:33Z

Overall, this all sounds good enough for the "proposed" stage, where it's expected that some details are unknown and need to be determined. I am happy to approve it but will wait for a few days in case @danhoeflinger wants to update the document with some follow-up thoughts on the discussion.

rfcs/proposed/host_backend_histogram/README.md

Signed-off-by: Dan Hoeflinger <dan.hoeflinger@intel.com>

SIMD + implementation Signed-off-by: Dan Hoeflinger <dan.hoeflinger@intel.com>

akukanov

I approve it for the implementation as a supported functionality, with the following understanding of the proposed design:

For multithreaded execution with par and par_unseq, the implementation uses the existing parallel_for backend pattern.
Unsequenced policies do not enable vectorization when thread-local histograms are built (the 1st stage of the algorithm) but do apply for combining those histograms into the final result (the 2nd stage).

The following questions I believe will need to be answered:

Which extensions are needed in the parallel backend API to facilitate the use of enumerable storage for per-thread temporary buffers?
Will the serial backend also allocate a temporary buffer, or will it directly accumulate into the output sequence and run just the 1st stage? If the latter - how this difference between backends will be handled on the pattern level?
Should the 2nd stage always run according to the given execution policy, or should it under certain conditions better run serially by the calling thread?

Specification changes are not required.

I would recommend @MikeDvorskiy as the second approver.

MikeDvorskiy · 2024-12-19T16:29:09Z

rfcs/proposed/host_backend_histogram/README.md

+of values in each bin, writing to a user-provided output histogram sequence. Currently, `histogram` is not supported
+with serial, tbb, or openmp backends in our oneDPL implementation. This RFC aims to propose the implementation of
+`histogram` for these host-side backends.
+


let me share my thoughts:
In my understanding RFC is not a book... So, I would preferer to have a short, concise and precise description of what is offered, without frills, like a mathematical theorem. For example:

"The oneDPL library added histogram APIs, currently implemented only for device policies with the DPC++ backend. These APIs are defined in the oneAPI Specification 1.4. Please see the
oneAPI Specification for the details. The host-side backends (serial, TBB, OpenMP) are not yet supported. This RFC proposes extending histogram support to these backends."

Yes, I've accepted your language here. Thanks.

MikeDvorskiy · 2024-12-19T16:44:54Z

rfcs/proposed/host_backend_histogram/README.md

+to use a serial implementation or a host-side parallel implementation of `histogram`. It's natural for a user to expect
+that oneDPL supports these other backends for all APIs. Another motivation for adding the support is simply to be spec
+compliant with the oneAPI specification.
+


Due to it is not a story telling, I would suggest omitting introductory expressions like "It may make more sense" or "It's natural for a user to expect"... Only short and exact information.

For example,
"There are many cases to use a host-side serial or a host-side implementation of histogram. Another motivation for adding the support is simply to be spec compliant with the oneAPI specification."

taken suggestion. Thanks

MikeDvorskiy · 2024-12-19T16:54:27Z

rfcs/proposed/host_backend_histogram/README.md

+low computation algorithm which will likely be limited by memory bandwidth, especially for the evenly-divided case.
+Minimizing and optimizing memory accesses, as well as limiting unnecessary memory traffic of temporaries, will likely
+have a high impact on overall performance.
+


Taking into account my shared thought above, I would propose to re-prahse it keeping the main point shorter:

"A histogram algorithm is a memory-bound algorithm. So, the implementation should care of reducing memory accesses and minimizing temporary memory traffic."

Taken mostly. Thanks

danhoeflinger · 2024-12-19T17:58:50Z

I plan to do a pass cleaning up and also addressing some of what Alexey mentioned above today / tomorrow morning. I will also try to cut down verbosity.

I also have a draft PR up for the implementation which is a work in progress. #1974
I'll add this link to the RFC with my next changes.

MikeDvorskiy · 2024-12-20T18:00:14Z

rfcs/proposed/host_backend_histogram/README.md

+histogram is for the number of elements in the input sequence to be far greater than the number of output histogram
+bins. We may be able to use that to our advantage.
+
+### Code Reuse


I guess we can this topic omit at all. It tells nothing about 'histogram', just general wording, which can be applied for any new feature in oneDPL...

I've removed some of the general language and added something which is important for histogram in an attempt to answer feedback from @akukanov to clarify where the implementation of the algorithm will live.

MikeDvorskiy · 2024-12-20T18:05:16Z

rfcs/proposed/host_backend_histogram/README.md

+Our goal here is to make something maintainable and to reuse as much as we can which already exists and has been
+reviewed within oneDPL. With everything else, this must be balanced with performance considerations.
+
+### unseq Backend


"unseq Backend"
Basically, we don't have such back end officially. Yes, sometimes we used such term in the internal communication as for "name" for a set of functions with "pragma simd" implementation. But we did not specify and publish API for that. So, I suggest renaming this topic to "SIMD/openMP SIMD Implementation" f.e.

I think this part is about developing (or not) an implementation for unsequenced policies.
I do not mind calling it `unseq backend" in the design docs, but Mikhail is correct that it's rather informal (while parallel backend is somewhat more formal).

I like the proposed name for the section better anyway for what is discussed. Thanks.

MikeDvorskiy · 2024-12-20T18:20:41Z

rfcs/proposed/host_backend_histogram/README.md

+Finally, for our below proposed implementation, there is the task of combining temporary histogram data into the global
+output histogram. This is directly vectorizable via our existing brick_walk implementation.
+
+### Serial Backend


https://github.com/oneapi-src/oneDPL/pull/1930/files#diff-fb5f6394ad0d350d719b9f31b139fa60c347ec64795c78e56875c4f002aeb0e7R25
We already have the key requirements topic where we enumerate all backends that we propose to support.
It is good enough I think, and we also can omit this topic "Serial Backend".

Explanation what is "Serial Backend" means as the others backends mean, is a kind of "oneDPL general description" and not related to RFC for histogram feature, IMHO.

With some recent changes, there is some specifics about the serial implementation I wanted to add here so I've kept the section.

Signed-off-by: Dan Hoeflinger <dan.hoeflinger@intel.com>

address feedback Signed-off-by: Dan Hoeflinger <dan.hoeflinger@intel.com>

Signed-off-by: Dan Hoeflinger <dan.hoeflinger@intel.com>

MikeDvorskiy · 2024-12-23T11:55:21Z

rfcs/proposed/host_backend_histogram/README.md

+policies types, but always provide a serial unvectorized implementation.
+
+## Existing Patterns
+


If we intent to give some information about OneDPL parallel backend patterns on which histogram can based on, I would notify, there is not "count_if" pattern, there is "reduce"("transform_reduce") pattern.
When a man says "reduce", it becomes more or less obvious that histogram calculation based on reduce is not effective at all.

I clarified the language a little here to make it more clear that copy_if uses reduce internally. I still think it deserves some text describing it as it may not be immediately obvious to everyone that reduce is not well matched.

MikeDvorskiy · 2024-12-23T13:21:29Z

rfcs/proposed/host_backend_histogram/README.md

+time.
+
+### Other Unexplored Approaches
+* One could consider some sort of locking approach which locks mutexes for subsections of the output histogram prior to


BTW, I have a curiosity question. Which approach does NVidia use?

NVidia has a similar API within CUB but not within Thrust, and therefore does not have a CPU implementation that I am aware of, only one specifically for a GPU device.

MikeDvorskiy · 2024-12-23T13:23:32Z

rfcs/proposed/host_backend_histogram/README.md

+cases which are important, and provides reasonable performance for most cases.
+
+### Embarrassingly Parallel Via Temporary Histograms
+This method uses temporary storage and a pair of embarrassingly parallel `parallel_for` loops to accomplish the


What does Embarrassingly Parallel term mean?

Update: got it.. https://en.wikipedia.org/wiki/Embarrassingly_parallel

Of course, if you are solving a concrete task and you are allowed to use the all machine recourses, and there are no any other workloads on the node, the best way for histogram calculation - to make static dividing of amount of work, each thread is calculating a local histogram, and after the local histograms are reducing into one.

But, talking about parallelism in a kind of general library we have to keep in mind that a final user's application can work in "different circumstances", depends on their application type, task, real-time data, other workloads on the same host and other many things..
When we were developing TBB backend we kept in mind that things and preferred to use TBB auto partitioner (instead of static f.e).
Also composability reasons make sense here.

BTW, have you considered a "common parallel reduce" (in general) pattern (and tbb::parallel_reduce pattern, in particular) for histogram calculation? It seems the parallel histogram calculation matches on the common reduce (with a certain "big" grainsize): each Body calculates a local histogram (bins), Combiner summaries the all local bins into final ones.
Additionally, if number of bins is "big" we can apply the second level of parallelism within Combiner code - SIMD or even "parallel_for" and SIMD, if number of bins is "too big".

Yes, although I think there is no reason here to do static division of work, but rather rely upon our existing parallel_for implementation to provide a good composable implementation.

Agree, which is why the intent is to use the existing parallel_for structure (including partitioners) to implement the parallelism. If we were to do it from scratch, we would do it in a similarly composable way, but better to rely upon existing infrastructure

Yes, I thought about this. For TBB and even more for openMP the built in reduction functionality is geared toward very simple lightweight types as the reduction variable where we may have an arbitrarily large array. Especially since we want a unified implementation, it does not seem like these backend are really set up to handle these large reduction variables. It seems we should take more control to ensure no unnecessary copies are made, and that the final combination is done performantly, based on knowledge we have of the task. The implementation remains quite simple and unified.

MikeDvorskiy · 2024-12-27T12:00:22Z

rfcs/proposed/host_backend_histogram/README.md

+This method uses temporary storage and a pair of embarrassingly parallel `parallel_for` loops to accomplish the
+`histogram`.
+
+For this algorithm, each parallel backend will add a  `__thread_enumerable_storage<_StoredType>` struct which provides


Why a TLS is used, not the global memory with "thread id" as a key? F.e. bins[thread_id]?

Does TBB guarantee that the same threads finalize the work? "The same threads" means the threads which have started the work?

There are a number of reasons:
a) My understanding is that in TBB while it may be technically possible to get the the local thread id within an arena, it is an undocumented API and generally discouraged and against the TBB mindset. Using TLS seems to be the preferred method specifically with TBB.
b) While what you suggest perhaps fits better within OpenMP, we want to create a single implementation and not require a __parallel_histogram within every current and future backend, but rather depend upon existing functionality within the backend as much as we can (in this case __parallel_for).
c) With smaller values of n, num_bins and larger number of threads, not all threads should be used because of the overhead associated with allocation and initialization of each temporary bin copy. We can let the partitioner decide how many blocks to employ, but we want to avoid unnecessary allocation and initialization overheads wherever possible.

I will mention a downside for completeness, but it is outweighed here in my opinion:
It requires implementation of a thread local storage class for each backend. This is only non-trivial for OpenMP. It has been written generically though to serve future patterns though so it is nice to have.

I'm not exactly sure what you mean here by "finalize the work". If you mean the second parallel for, then no, we are explicitly parallelizing over a different dimension (num_bins), and accumulating across the temporary histograms which were used from different threads. TBB does guarantee that each thread will always use its own TLS for each grain of work though, when retrieved through local().

2. I'm not exactly sure what you mean here by "finalize the work"

I try to explain with example:
Usage of a general TBB pattern tbb::paralle_for doesn't suppose using system thread directly. There is only a "Body" which is called (with a part of data(tbb range) by executing thread. Imagine the input range is split into 4 parts. Two threads call 2 parts simultaneously. The Body stores local bin results in TLS, associated with mentioned threads.
After, to "finalize the work", TBB should call Body two time to process final 2 parts of input range. These final two calls may be done by another threads which have the other associated TLSs. So, it is impossible to make final reduce of local bins, located in TLSs.

There are 2 parallel_for calls, each of which is "embarrassingly parallel" where no thread body depends on previous thread bodies. The first parallel_for must complete before the second one starts though. The first parallel_for uses the TLS as normal, and just accumulates sections of the input data into each thread's individual TLS.

The second parallel_for call does not use the TLS as normal, but rather has every thread visit a section of every TLS which was created one by one, processing a section of the histogram bins in parallel, combining the work of different threads from the first loop into the final global histogram.

The TLS we propose here (that is also implemented in the PR) supports this, and we obtain the correct result. We will not have perfect cache effects when accessing TLS from different threads than it was created upon but that is just something we have to deal with.

I don't understand your answer, Dan...
It seems you don't catch my question/concerns...

I try to explain my concern again:
tbb::paralle_for "produces several calls of Body, which is passed to tbb::paralle_for. You don't know how many callbacks is, because "tbb auto-partioner" is applied by default.
Each call of this body may be done by the different threads. Moreover, The first calls of the body may be done by threads "ids" 0-3, the last calls may be done by another threads, "ids" 4-7 by example. Each TLS is associated with its own thread. you don't know IDs of threads.... I don't understand how you can get the calculated local bins from the all TLSs....

We are basically using TBB's enumerable_thread_specific as a model here, and implementing a stripped down version for omp and a trivial version for serial backend.

enumerable_thread_specific has two ways of accessing the data.
a) local() which gets the TLS for the current thread,
b) begin() and end() which provide iterators to the sequence of all local storage from all threads.

This allows us to use (a) in the first parallel loop and (b) in the second parallel loop. The second parallel loop does not use the enumerable_thread_specific as a "Thread Local Storage" but rather a 2-D array space which it iterates over summing across columns (corresponding to individual histogram bins from different threads). This allows us to accumulate the data from all threads into the global space histogram copy no matter which threads are used and when.

I'm not sure how else I can explain it. The code in the implementation is tested, working, and pretty concise, if you want to see the details you can look at the PR.

You mean "TBB TLS", not system TLS...
I clarified that question with Alexey.
TBB TLS is a kind of container and allows to iterate the all local bins... I was not aware of that.
Now I got it.

Signed-off-by: Dan Hoeflinger <dan.hoeflinger@intel.com>

danhoeflinger · 2025-01-13T15:15:11Z

One question I have for the group...
If we know that a serial implementation will provide better performance up to some threshold (perhaps dependent on num bins, num threads, num input elements), can / should we dispatch instead to a serial implementation?

From my reading, it seems the answer is probably no. Execution policies have semantic meaning, and par / par_unseq do not simply mean "provide the fastest version" even if that is what the users probably want.

mmichel11 · 2025-01-13T20:49:03Z

One question I have for the group... If we know that a serial implementation will provide better performance up to some threshold (perhaps dependent on num bins, num threads, num input elements), can / should we dispatch instead to a serial implementation?

From my reading, it seems the answer is probably no. Execution policies have semantic meaning, and par / par_unseq do not simply mean "provide the fastest version" even if that is what the users probably want.

I agree that we should honor the user's request for a specific policy as opposed to using the serial implementation until some empirically determined cutoff point. I also imagine that the exact cutoff point where the parallel version performs better can highly vary dependent on a user's hardware setup and giving them the freedom to manually choose when to make the switch from the serial to parallel version may result in better performance than any generic decisions we could make.

mmichel11

I've taken another pass through the document. A single question regarding how technical we want to get when explaining the algorithm.

The RFC looks ready to me.

mmichel11 · 2025-01-14T00:31:16Z

rfcs/proposed/host_backend_histogram/README.md

+With little computation, a histogram algorithm is likely a memory-bound algorithm. So, the implementation prioritize
+reducing memory accesses and minimizing temporary memory traffic.
+
+### Memory Footprint


Do we wish to specify space / computational complexity of the proposed algorithm somewhere, or is this too much?

danhoeflinger added 9 commits October 30, 2024 11:27

initial rough commit

a03577e

Signed-off-by: Dan Hoeflinger <dan.hoeflinger@intel.com>

minor improvements

ce117f5

Signed-off-by: Dan Hoeflinger <dan.hoeflinger@intel.com>

revision

ccc001e

Signed-off-by: Dan Hoeflinger <dan.hoeflinger@intel.com>

Formatting, minor

d518a14

Signed-off-by: Dan Hoeflinger <dan.hoeflinger@intel.com>

spelling and grammar

6e03468

Signed-off-by: Dan Hoeflinger <dan.hoeflinger@intel.com>

Minor improvements

10c4e50

Signed-off-by: Dan Hoeflinger <dan.hoeflinger@intel.com>

subsection

efa7c9b

Signed-off-by: Dan Hoeflinger <dan.hoeflinger@intel.com>

Adding some alternative approaches

1ac82fd

Signed-off-by: Dan Hoeflinger <dan.hoeflinger@intel.com>

minor improvements

02523c4

Signed-off-by: Dan Hoeflinger <dan.hoeflinger@intel.com>

danhoeflinger mentioned this pull request Nov 1, 2024

Implement the histogram algorithm for standard aligned execution policies #1900

Open

danhoeflinger requested review from reble, vossmjp, timmiesmith, akukanov, rarutyun, MikeDvorskiy, dmitriy-sobolev, SergeyKopienko, mmichel11 and adamfidel November 4, 2024 14:25

akukanov reviewed Nov 4, 2024

View reviewed changes

rfcs/proposed/host_backend_histogram/README.md Outdated Show resolved Hide resolved

akukanov reviewed Nov 4, 2024

View reviewed changes

rfcs/proposed/host_backend_histogram/README.md Outdated Show resolved Hide resolved

line widths

ac7b654

Signed-off-by: Dan Hoeflinger <dan.hoeflinger@intel.com>

akukanov reviewed Nov 4, 2024

View reviewed changes

rfcs/proposed/host_backend_histogram/README.md Outdated Show resolved Hide resolved

akukanov reviewed Nov 4, 2024

View reviewed changes

mmichel11 reviewed Nov 5, 2024

View reviewed changes

rfcs/proposed/host_backend_histogram/README.md Outdated Show resolved Hide resolved

danhoeflinger added 2 commits November 6, 2024 10:08

fixing numbering.

506fb62

Signed-off-by: Dan Hoeflinger <dan.hoeflinger@intel.com>

putting in specifics for TBB / OpenMP

1c6cb47

more formatting fixes Signed-off-by: Dan Hoeflinger <dan.hoeflinger@intel.com>

akukanov reviewed Nov 6, 2024

View reviewed changes

rfcs/proposed/host_backend_histogram/README.md Outdated Show resolved Hide resolved

akukanov reviewed Nov 13, 2024

View reviewed changes

rfcs/proposed/host_backend_histogram/README.md Outdated Show resolved Hide resolved

danhoeflinger added 2 commits November 13, 2024 15:12

c++17 -> c++20 fix

06a734f

Signed-off-by: Dan Hoeflinger <dan.hoeflinger@intel.com>

Updates after some experimentation and thought

b858a0e

SIMD + implementation Signed-off-by: Dan Hoeflinger <dan.hoeflinger@intel.com>

akukanov previously approved these changes Dec 18, 2024

View reviewed changes

MikeDvorskiy reviewed Dec 19, 2024

View reviewed changes

MikeDvorskiy reviewed Dec 20, 2024

View reviewed changes

danhoeflinger added 2 commits December 20, 2024 14:24

improvements from feedback

53f4643

Signed-off-by: Dan Hoeflinger <dan.hoeflinger@intel.com>

thread enumerable storage +

d718e0e

address feedback Signed-off-by: Dan Hoeflinger <dan.hoeflinger@intel.com>

danhoeflinger dismissed akukanov’s stale review via d718e0e December 20, 2024 20:00

danhoeflinger added 4 commits December 20, 2024 15:04

remove general language keep specifics to histogram

bb9e6f9

Signed-off-by: Dan Hoeflinger <dan.hoeflinger@intel.com>

SIMD naming

17e0510

Signed-off-by: Dan Hoeflinger <dan.hoeflinger@intel.com>

spelling

9614209

Signed-off-by: Dan Hoeflinger <dan.hoeflinger@intel.com>

clarifying thread enumerable storage

2964a9e

Signed-off-by: Dan Hoeflinger <dan.hoeflinger@intel.com>

MikeDvorskiy reviewed Dec 23, 2024

View reviewed changes

MikeDvorskiy reviewed Dec 27, 2024

View reviewed changes

danhoeflinger added 3 commits December 30, 2024 15:11

minor improvements

9287fd2

Signed-off-by: Dan Hoeflinger <dan.hoeflinger@intel.com>

spelling

cdf5092

Signed-off-by: Dan Hoeflinger <dan.hoeflinger@intel.com>

adding link to implementation

215c2b7

Signed-off-by: Dan Hoeflinger <dan.hoeflinger@intel.com>

danhoeflinger mentioned this pull request Dec 30, 2024

Host Implementation of Histogram APIs #1974

Open

danhoeflinger added this to the 2022.8.0 milestone Jan 3, 2025

mmichel11 reviewed Jan 14, 2025

View reviewed changes


		* Is it worthwhile to have separate implementations for TBB and OpenMP because they may differ in the best-performing implementation? What is the best heuristic for selecting between algorithms (if one is not the clear winner)?

		* How will vectorized bricks perform, and in what situations will it be advantageous to use or not use vector instructions?

		policies types, but always provide a serial unvectorized implementation.

		## Existing Patterns

RFC for histogram CPU implementation #1930

Are you sure you want to change the base?

RFC for histogram CPU implementation #1930

Conversation

danhoeflinger commented Nov 1, 2024

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

akukanov commented Nov 6, 2024

akukanov left a comment

Choose a reason for hiding this comment

MikeDvorskiy Dec 19, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

danhoeflinger commented Dec 19, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

MikeDvorskiy Dec 20, 2024 • edited Loading

Choose a reason for hiding this comment

akukanov Dec 20, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

MikeDvorskiy Dec 20, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

MikeDvorskiy Dec 27, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

MikeDvorskiy Jan 14, 2025 • edited Loading

Choose a reason for hiding this comment

danhoeflinger Jan 14, 2025 • edited Loading

Choose a reason for hiding this comment

MikeDvorskiy Jan 15, 2025 • edited Loading

Choose a reason for hiding this comment

danhoeflinger commented Jan 13, 2025

mmichel11 commented Jan 13, 2025 • edited Loading

mmichel11 left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

MikeDvorskiy Dec 19, 2024 •

edited

Loading

danhoeflinger commented Dec 19, 2024 •

edited

Loading

MikeDvorskiy Dec 20, 2024 •

edited

Loading

akukanov Dec 20, 2024 •

edited

Loading

MikeDvorskiy Dec 20, 2024 •

edited

Loading

MikeDvorskiy Dec 27, 2024 •

edited

Loading

MikeDvorskiy Jan 14, 2025 •

edited

Loading

danhoeflinger Jan 14, 2025 •

edited

Loading

MikeDvorskiy Jan 15, 2025 •

edited

Loading

mmichel11 commented Jan 13, 2025 •

edited

Loading