You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
This goes slightly against what I said in #37, but something that would be quite useful
would be a bulk zero-initialisation. There are lots of zero-holes in memory maps,
whether that is because it is zero-initialised data-section, or because there is a long
run of zeros within a sparsely initialised data-section.
An extra boot command that can zero initialise arbitrary segments of RAM using
a single packet would reduce the amount of message traffic needed at start-up,
particularly when we have MBs of data that is mostly zeros.
e.g. something like this:
else if (cmd == StoreZeroCmd) {
// Store zeros to data memory
int n = msgIn->args[0]; // Size ***in bytes*** to transfer (saves an instruction)
uint32_t addrEnd=addrReg + n;
while( addrReg < addrEnd ){
* (uint32_t*) addrReg = 0;
addrReg += 4;
}
}
I estimate that a total burden of 10-ish instructions added to the bootloader,
and it should be able to fill at about 1 word per 5-ish instructions - presumably
it would end up being DRAM bandwidth limited.
This is assuming that:
DRAM is not already zero-initialised: I assume it isn't?
Bandwidth from host to boards is much less than total bandwidth to DRAMs; We've
got one PCI Expression link at ~1GB/sec, but even with Aesop we have 6 DRAMs
which offer 12GB/s * 6 = 72 GB/s.
So for a system which is loading multi-GB sections on to DRAM this could
reduce the serial cost quite a bit.
Note that I'm aware that a lot can already be done to support faster loading,
e.g. using multiple threads per DRAM to load, and packing multiple words
into each packet. However a memset instruction would be easy to integrate
into the existing hostlink loaders without adding much complexity, and also
make more sophisticated loaders faster.
Flagrantly not using the PEP system I literally only just proposed because I don't
have time right now - this is more a reminder to turn this into one if it makes sense.
The text was updated successfully, but these errors were encountered:
This goes slightly against what I said in #37, but something that would be quite useful
would be a bulk zero-initialisation. There are lots of zero-holes in memory maps,
whether that is because it is zero-initialised data-section, or because there is a long
run of zeros within a sparsely initialised data-section.
An extra boot command that can zero initialise arbitrary segments of RAM using
a single packet would reduce the amount of message traffic needed at start-up,
particularly when we have MBs of data that is mostly zeros.
e.g. something like this:
I estimate that a total burden of 10-ish instructions added to the bootloader,
and it should be able to fill at about 1 word per 5-ish instructions - presumably
it would end up being DRAM bandwidth limited.
This is assuming that:
DRAM is not already zero-initialised: I assume it isn't?
Bandwidth from host to boards is much less than total bandwidth to DRAMs; We've
got one PCI Expression link at ~1GB/sec, but even with Aesop we have 6 DRAMs
which offer 12GB/s * 6 = 72 GB/s.
So for a system which is loading multi-GB sections on to DRAM this could
reduce the serial cost quite a bit.
Note that I'm aware that a lot can already be done to support faster loading,
e.g. using multiple threads per DRAM to load, and packing multiple words
into each packet. However a memset instruction would be easy to integrate
into the existing hostlink loaders without adding much complexity, and also
make more sophisticated loaders faster.
Flagrantly not using the PEP system I literally only just proposed because I don't
have time right now - this is more a reminder to turn this into one if it makes sense.
The text was updated successfully, but these errors were encountered: