Skip to content
This repository has been archived by the owner on Apr 3, 2024. It is now read-only.

TT scheduler data usage #4

Open
groobybugs opened this issue Nov 7, 2021 · 18 comments
Open

TT scheduler data usage #4

groobybugs opened this issue Nov 7, 2021 · 18 comments

Comments

@groobybugs
Copy link

Hi @hamadmarri

I have been testing TT scheduler in my daily work routine for 1 complete day (no gaming related), and in general the experience was great, no lags, no hangs, etc. and the system was responsive, only one thing that I will explain later. I tested on 5.14.16-cacule and I 5.14.16 with your TT patch, I also will test the 5.15.1-tt and 5.15.1-cacule(I applied your 5.14 full patch with no problems and is working so far)

My aspects in general are:

8 x Intel Core i7-8665U @ 1.90 Ghz
Ubuntu 20.04.3 LTS
32 Gb ram
KDE plasma 5.18.7
zram 8GB algorithm lzo-rle

I've worked using overclocking in my laptop and always the temperature was approx 80 C.

Something I noticed using TT was that my local builds some times were double of the normal time under high process demand.
I opened 2 android emulators(QEMU), ide, chrome, slack, etc. Using cacule gives me the best results when I'm doing multiple tasks (debuggin, building, jumping in meetings, etc) the compilation times are constants e.g. 4-5 min per project, and also playing some video/music on the background, using cacule and doing all of these task I only notice some times a lag in the emulators, or in latte-dock, but system is responsive,some kde desktop animations become a little slow, but I repeat only minimum lag switching between windows.

With TT I've noticed longer lags on the same apps, even in the android emulator the app I am using stops, the building in the background goes to 8-10 minutes build time, some windows freezes, and the lag in latte is more noticeable, when the background build stops the system works like a charm again, If a I have a low CPU demand everything works normally.

I have some log files using your TT script (every log is with a building in the background, 2 emulator opened and a video playing in the background) I opened kate and was frozen
kate.txt

when the java builds lasted twice as long
java.txt

and the emulator slowed down
emulator.txt

Also run some stress test on TT
stress-ng
stress-ng-tt.txt

sysbench( different runs (4) take the one with more events of all of them)
sysbench_tt.txt

Cacule
stress-ng
stress-ng-cacule.txt

sysbench
sysbench-cacule.txt

and finally your responsiveness python script

responsive_cacule.txt
responsive_tt.txt

To me cacule is the one with the best results in 5.14.16 for multitasking and high cpu demand, TT and cacule have the same result to me in single tasks and low cpu demand, now I'm testing 5.15.1 with your 5.14.full patch applied, I know it is for 5.14 but I wanted to tested in 5.15.

responsive_cacule_15.txt

at the moment I'm doing the same "tests" in 5.15 and I see a better performance than in 5.14.

If you need a very specific test or log do not hesitate to ask me, as soon as I can I will share it with you and as soon as I have results and commentary for 5.15 tt and cacule I will post it here.

Thanks!

@hamadmarri
Copy link
Owner

hamadmarri commented Nov 8, 2021

Hi @groobybugs

Just to confirm, was the 5.14 TT r2 or the normal one? Note that r2 has some fixes over the older TT.

I am looking at the results, thank you so much for sharing.

@groobybugs
Copy link
Author

groobybugs commented Nov 8, 2021

Hi @hamadmarri it was tt-xanmod-5.14-r2.patch

@groobybugs
Copy link
Author

Calcule in 5.15 gives 630 events in sysbench average, I hope to share the same information with the 5.15 soon, any specific tests that I can do?

@hamadmarri
Copy link
Owner

hamadmarri commented Nov 9, 2021

Hi @groobybugs

From your feedback, I suspect that RT tasks have very high priority that make some starvation to other tasks (hence freezing). So I have some proposal solution included in the TT future plane: https://github.com/hamadmarri/TT-CPU-Scheduler#future-plan

Regarding throughput tests: I just want to stress out about that both tested kernels should have the same Hz values, please make sure that cacule and tt have the same hz values and also have almost the same .config (most importantly the nohz configurations).

Freezing issue could be related to:

  • RT taking over other tasks
  • Lack of UCLAMP_TASK feature
  • Lack of proper tasks accounting and stats

I will let you know when I update TT so you can test if the freezing issue is solved.

Thank you for your valuable feedback

@hamadmarri
Copy link
Owner

hamadmarri commented Nov 9, 2021

Calcule in 5.15 gives 630 events in sysbench average, I hope to share the same information with the 5.15 soon, any specific tests that I can do?

Since TT failed the multitasking test against CacULE in your case, I would like to see TT vs CacULE in intensive single task performance like gaming or video/audio encoding tasks. Or anything that is latency bound.

Thank you

@groobybugs
Copy link
Author

groobybugs commented Nov 9, 2021

Hi @groobybugs

From your feedback, I suspect that RT tasks have very high priority that make some starvation to other tasks (hence freezing). So I have some proposal solution included in the TT future plane: https://github.com/hamadmarri/TT-CPU-Scheduler#future-plan

Regarding throughput tests: I just want to stress out about that both tested kernels should have the same Hz values, please make sure that cacule and tt have the same hz values and also have almost the same .config (most importantly the nohz configurations).

Freezing issue could be related to:

  • RT taking over other tasks
  • Lack of UCLAMP_TASK feature
  • Lack of proper tasks accounting and stats

I will let you know when I update TT so you can test if the freezing issue is solved.

Thank you for your valuable feedback

Hi @hamadmarri

for cacule I used the default xanmod config, and when I applied your TT r2 patch to 5.14, I also used the default config file in the xanmod repo, so same config for TT and cacule, in this case CONFIG_NO_HZ_IDLE=y, CONFIG_HZ=500 and autogroup enabled.

@groobybugs
Copy link
Author

Calcule in 5.15 gives 630 events in sysbench average, I hope to share the same information with the 5.15 soon, any specific tests that I can do?

Since TT failed the multitasking test against CacULE in your case, I would like to see TT vs CacULE in intensive single task performance like gaming or video/audio encoding tasks. Or anything that is latency bound.

Thank you

sure, let me see what I can do.

Thanks!

@hamadmarri
Copy link
Owner

Hi @groobybugs

Could you please try this fix
#5 (comment)

It fixes the tasks accounting and stats, in case the issue is related to cpu frequ somehow.

@groobybugs
Copy link
Author

groobybugs commented Nov 17, 2021

Hi @hamadmarri

I applied your patch on cacule 15-tt branch and i set TT_ACCOUNTING_STATS to n, because i use performance governor, I will test this value enabled later.

I tested the scheduler with the patch and works great, the system was very responsive all the time, under heavy load and multiprocessing, but like the previous test, in my local builds the times under multitasks were:

Cacule 7 minutes
TT 9 minutes
TT patch 11 minutes

but in my experience the most responsive scheduler is in this order

1.-TT with patch
2.- Cacule
3.- normal TT

and as you asked me I ran some benchmarks using phoronix-test-suite for blender and xonotic

these are the results
specs
Processor: Intel Core i7-8665U @ 4.80GHz (4 Cores / 8 Threads), Motherboard: Dell 07WDVW (1.14.0 BIOS), Chipset: Intel Cannon Point-LP, Memory: 32GB, Disk: SK hynix PC601 NVMe 512GB, Graphics: Intel UHD 620 WHL GT2 3GB (1150MHz), Audio: Realtek ALC3254, Network: Intel Cannon Point-LP CNVi, cpu-scaling-governor: intel_pstate performance , cpu-microcode : 0xea

xonotic

12 runs

image

xonotic tt 12 sched
OS: Ubuntu 20.04, Kernel: 5.15.2-xanmod1-tt-tt-fix (x86_64), Desktop: KDE Plasma 5.18.7, Display Server: X Server 1.20.11, OpenGL: 4.6 Mesa 21.0.3, Vulkan: 1.2.145, Compiler: GCC 9.3.0, File-System: ext4, Screen Resolution: 1920x1080

FPS: 79.1633005: 78.3352968: 74.0823467: 72.2815212: 71.8177972: 71.6806862: 70.3664129: 70.7046619: 70.3465052: 70.1676165: 70.7141228: 70.286941

xonotic 12 runs tt scheduler
Kernel: 5.15.2-xanmod1-tt
FPS: 79.7947663: 79.208414: 75.4100481: 72.8958173: 72.649925: 72.0702237: 72.147732: 71.6166447: 71.6933495: 71.3670533: 71.1901418: 71.2663421

xonotic cacule
Kernel: 5.15.2-xanmod1-cacule-full (x86_64)

FPS: 79.9925776: 78.9416091: 75.7230745: 73.5371781: 73.2700679: 73.1542655: 73.2225431: 72.8344118: 72.7633519: 72.7872554: 72.9407657: 68.1741742

Blender

3 runs

image

Intel Core i7-8665U == Kernel: 5.15.2-xanmod1-tt-tt-fix

@hamadmarri
Copy link
Owner

Hi @groobybugs

Thank you for the tests. Could you please attach the two .configs for cacule and tt.

I have updated the TT patch since the last one I have sent hear, which one have you tried? The last commit was yesterday which has sig. improvement because of considering cache hot tasks (ported from cfs)

@groobybugs
Copy link
Author

groobybugs commented Nov 17, 2021

Sure, I will attach the config files and the patch
I tried was this one #5 (comment)

@hamadmarri
Copy link
Owner

Sure, I will attach the config files and the patch I tried was this one #5 (comment)

Yes, this patch has no improvements for performance, it only fixes the freq. scaling issues.

You might try the latest commit: https://github.com/hamadmarri/TT-CPU-Scheduler/blob/4fd4a9a29c8cb7c05e22df49514de304ea66afeb/patches/5.15/tt-5.15-r2.patch

For compiling measurements, in case you have realtime task is running like youtube vid. or audio, TT will give more preferences to realtime tasks than cpu/io bound tasks like compiling. So, it is normal to see the build time is higher, but more importantly, the FPS or frame drops in realtime task is almost 0%.

@groobybugs
Copy link
Author

groobybugs commented Nov 17, 2021

ok @hamadmarri tomorrow I will test with the latest commit and this are my configs files, and yeap tI did not notice any slowdowns in the emulators or the system at any time.

config (cacule).txt
config (tt).txt

and thanks!!!

@hamadmarri
Copy link
Owner

hamadmarri commented Nov 17, 2021

ok @hamadmarri tomorrow I will test with the latest commit and this are my configs files, and yeap tI did not notice any slowdowns in the emulators or the system at any time.

config (cacule).txt config (tt).txt

and thanks!!!

Hi @groobybugs
Thank you so much for you efforts, here are some notes related to benchmarks:

  • HZ values must be the same. I see that tt <- CONFIG_HZ_1000=y where cacule <- CONFIG_HZ_500=y. This will certainly lead to better results to cacule in performance tests. High Hz value is a trade of between latency and throughput. That's why cacule shows better results in your phoronix tests. And actually the difference in Hz between 1000hz and 500hz is huge (2x).
  • I would recommend to enable CONFIG_TT_ACCOUNTING_STATS even though you are using performance governor, It seems that the overhead is tiny (few nanoseconds per tick!) but on the other hand, I am not 100% aware if perfromance governor doesn't need any utils update. To be safe I would recommend to enable it.

The rest of configs are identical 👍

Side note:
You have NUMA_BALANCE=y in both cacule and tt, you maybe don't need numa at all. Check with numactl command whether you have only 1 node or more. In case of 1 node, you don't need any NUMA configs enabled. Disabling numa can save some overheads if your machine has only 1 node.

Thank you

@groobybugs
Copy link
Author

Hi @hamadmarri!!, I really sorry I did not notice the differences I thought I was using 500 hz for both, my bad. I will do the test again and enable CONFIG_TT_ACCOUNTING_STATS, thanks!!!

@groobybugs
Copy link
Author

groobybugs commented Nov 24, 2021

Hi @hamadmarri

Now I'm completly sure thaht I used the same configuration and also disabled Numa and enable CONFIG_TT_ACCOUNTING_STATS I used 500 hz for all my test.

image

cacule
78.889229:79.5157125:79.6588391:79.0820449:78.2657684:77.5217652:76.4722857:76.6737493:75.5381214:74.9619431

TT
80.0675823:79.7179842:79.9550305:79.3849837:79.6759446:79.0674221:79.0211555:78.9002892:78.4843618:78.1941778

as we can see TT has better performance over all the test

image

and the diff in blender is small actually

Using TT as I said before es very responsive in multitask and for single task has great results, something I do to improve my compilation time, was increase the niceness of that task I more interested in and for example if the build was 11 minutes, increasing the niceness reduce the time to 5 minutes, and the system is still responsive.

I general I'm going to say that TT is a great Scheduler,
thanks man!
this speeds up my daily work

@groobybugs
Copy link
Author

I hope this information is useful to you, and if there is anything else I can help you with, just ask man

@hamadmarri
Copy link
Owner

I am glad to see that TT is performing well in your cases 👍

Any feedback are welcome

Thank you so much @groobybugs

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

No branches or pull requests

2 participants