Optimizations for scikit-tree to improve multi-core performance #245

sampan501 · 2024-03-12T17:43:14Z

adam2392 · 2024-03-12T19:40:30Z

I think in terms of sequential experiments to run:

RandomForestClassifier in scikit-learn vs RandomForestClassifier in scikit-tree in just n_samples vs time to fit with n_jobs =1 vs n_jobs = -1

If this doesn't look good, it means forsure our compiler is messed up somehow, or we introduce some serious issues in the fork that we're not aware of.

Wrap HonestForestClassifier with DTC from sklearn vs DTC from scikit-tree. To determine if HonestForest introduces this issue somehow

Within each of the above, we would have to investigate CPU/RAM usage in-depth using valgrind, or something...

sampan501 · 2024-03-12T19:42:51Z

sampan501 · 2024-03-12T19:55:49Z

CoMIGHT before changes in #242

CoMIGHT after changes in #242

adam2392 · 2024-03-13T13:32:59Z

To confirm this is not an isolated issue with comight right? Or so far it is?

sampan501 · 2024-03-13T13:37:38Z

it is not

SUKI-O · 2024-03-13T19:31:37Z

We ran some tests and after the fix Adam pushed the diff between RF and sktree-RF are:

Fit time for RandomForestClassifier: 3.522181987762451
Fit time for RandomForestClassifier: 3.4983439445495605
Fit time for RandomForestClassifier: 3.518531084060669
Fit time for RandomForestClassifier: 3.5076229572296143
Fit time for RandomForestClassifier: 3.5162460803985596
Fit time for sktreeRandomForestClassifier: 3.697654962539673
Fit time for sktreeRandomForestClassifier: 3.660207986831665
Fit time for sktreeRandomForestClassifier: 3.6615519523620605
Fit time for sktreeRandomForestClassifier: 3.6803948879241943
Fit time for sktreeRandomForestClassifier: 3.653079032897949

Note: the result for sktree-RF was 7sec+ prior to this fix.

The script for this test is found : https://github.com/neurodata/might/blob/cmi/exps/new_submission/Figure6_comight_vs_nsamples_ndims/test_rf_parallel.py

The commit that we tested to get ~3sec on sktree-RF was: 7c75677

sampan501 · 2024-03-13T19:34:59Z

wooot!!

sampan501 added bug Something isn't working research Requires experimentation, theory and research. labels Mar 12, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Optimizations for scikit-tree to improve multi-core performance #245

Optimizations for scikit-tree to improve multi-core performance #245

sampan501 commented Mar 12, 2024 •

edited

Loading

adam2392 commented Mar 12, 2024

sampan501 commented Mar 12, 2024

sampan501 commented Mar 12, 2024

adam2392 commented Mar 13, 2024

sampan501 commented Mar 13, 2024

SUKI-O commented Mar 13, 2024 •

edited by adam2392

Loading

sampan501 commented Mar 13, 2024

Optimizations for scikit-tree to improve multi-core performance #245

Optimizations for scikit-tree to improve multi-core performance #245

Comments

sampan501 commented Mar 12, 2024 • edited Loading

Checklist

Description

adam2392 commented Mar 12, 2024

sampan501 commented Mar 12, 2024

sampan501 commented Mar 12, 2024

adam2392 commented Mar 13, 2024

sampan501 commented Mar 13, 2024

SUKI-O commented Mar 13, 2024 • edited by adam2392 Loading

sampan501 commented Mar 13, 2024

sampan501 commented Mar 12, 2024 •

edited

Loading

SUKI-O commented Mar 13, 2024 •

edited by adam2392

Loading