Update benchmark results after migrating to AWS. #127

scottcanoe · 2025-01-08T15:35:16Z

This PR updates benchmark results tables after migrating to AWS. No changes have been made to tbp.monty's core code since the last set of complete benchmarks was completed.

nielsleadholm · 2025-01-09T12:24:45Z

Thanks very much for running these Scott. Are the results deterministic within AWS? I.e. if a new instance is spun-up, do we at least get the exact same results across those cases? I would be less concerned in that case about any differences between these results and Oracle (e.g. the accuracy changes look equivocal).

scottcanoe · 2025-01-09T15:33:20Z

Thanks very much for running these Scott. Are the results deterministic within AWS? I.e. if a new instance is spun-up, do we at least get the exact same results across those cases? I would be less concerned in that case about any differences between these results and Oracle (e.g. the accuracy changes look equivocal).

@nielsleadholm I'm rerunning benchmarks now, and I'll post an update when they're done. It'd be really nice to have a complete answer about reproducibility (i.e., rerun all benchmarks, including laptop). I think I can probably get this done in 2-3 hours with AWS.

scottcanoe · 2025-01-09T18:42:48Z

Reproducibility Report

Since we are on a new infrastructure and some results were different than on OCI, I ran the benchmarks a second time. Below are tables of both batches of runs. I'm happy to report that everything is identical except small variations in run times.

Shorter Experiments with 10 Objects

Run 1

Experiment	% Correct	% Used MLH	Num Matching Steps	Rotation Error (radians)	Run Time	Episode Run Time (s)
base_config_10distinctobj_dist_agent	99.29%	5.00%	34	0.27	6m	20s
base_config_10distinctobj_surf_agent	100.00%	0.00%	28	0.17	4m	19s
randrot_noise_10distinctobj_dist_agent	98.00%	6.00%	47	0.45	5m	31s
randrot_noise_10distinctobj_dist_on_distm	100.00%	2.00%	36	0.26	4m	28s
randrot_noise_10distinctobj_surf_agent	99.00%	0.00%	28	0.33	4m	27s
randrot_10distinctobj_surf_agent	100.00%	0.00%	29	0.40	3m	19s
randrot_noise_10distinctobj_5lms_dist_agent	100.00%	7.00%	52	0.86	18m	86s
base_10simobj_surf_agent	95.00%	7.86%	70	0.16	8m	41s
randrot_noise_10simobj_dist_agent	82.00%	40.00%	182	0.61	16m	116s
randrot_noise_10simobj_surf_agent	90.00%	34.00%	180	0.50	24m	203s
randomrot_rawnoise_10distinctobj_surf_agent	73.00%	78.00%	15	1.54	11m	12s
base_10multi_distinctobj_dist_agent	69.29%	47.14%	25	0.82	1h6m	2s

Run 2

Experiment	% Correct	% Used MLH	Num Matching Steps	Rotation Error (radians)	Run Time	Episode Run Time (s)
base_config_10distinctobj_dist_agent	99.29%	5.00%	34	0.27	6m	19s
base_config_10distinctobj_surf_agent	100.00%	0.00%	28	0.17	4m	17s
randrot_noise_10distinctobj_dist_agent	98.00%	6.00%	47	0.45	7m	38s
randrot_noise_10distinctobj_dist_on_distm	100.00%	2.00%	36	0.26	4m	29s
randrot_noise_10distinctobj_surf_agent	99.00%	0.00%	28	0.33	5m	36s
randrot_10distinctobj_surf_agent	100.00%	0.00%	29	0.40	3m	19s
randrot_noise_10distinctobj_5lms_dist_agent	100.00%	7.00%	52	0.86	16m	77s
base_10simobj_surf_agent	95.00%	7.86%	70	0.16	9m	48s
randrot_noise_10simobj_dist_agent	82.00%	40.00%	182	0.61	16m	117s
randrot_noise_10simobj_surf_agent	90.00%	34.00%	180	0.50	22m	189s
randomrot_rawnoise_10distinctobj_surf_agent	73.00%	78.00%	15	1.54	15m	16s
base_10multi_distinctobj_dist_agent	69.29%	47.14%	25	0.82	1h5m	2s

Longer Experiments with all 77 YCB Objects

Run 1

Experiment	% Correct	% Used MLH	Num Matching Steps	Rotation Error (radians)	Run Time	Episode Run Time (s)
base_77obj_dist_agent	93.07%	14.72%	86	0.33	1h4m	197s
base_77obj_surf_agent	98.27%	5.19%	57	0.21	31m	96s
randrot_noise_77obj_dist_agent	87.01%	29.87%	148	0.69	1h33m	314s
randrot_noise_77obj_surf_agent	94.81%	19.91%	107	0.61	55m	198s
randrot_noise_77obj_5lms_dist_agent	84.42%	9.09%	64	1.07	42m	800s

Run 2

Experiment	% Correct	% Used MLH	Num Matching Steps	Rotation Error (radians)	Run Time	Episode Run Time (s)
base_77obj_dist_agent	93.07%	14.72%	86	0.33	1h4m	196s
base_77obj_surf_agent	98.27%	5.19%	57	0.21	28m	88s
randrot_noise_77obj_dist_agent	87.01%	29.87%	148	0.69	1h36m	323s
randrot_noise_77obj_surf_agent	94.81%	19.91%	107	0.61	57m	205s
randrot_noise_77obj_5lms_dist_agent	84.42%	9.09%	64	1.07	47m	944s

Unsupervised Learning

Run 1

Experiment	%Correct - 1st Epoch	% Correct - >1st Epoch	Mean Objects per Graph	Mean Graphs per Object	Run Time	Episode Run Time (s)
surf_agent_unsupervised_10distinctobj	80.00%	86.67%	1.11	1.11	16m	10s
surf_agent_unsupervised_10distinctobj_noise	80.00%	67.78%	1.09	2.78	22m	13s
surf_agent_unsupervised_10simobj	50.00%	76.67%	2.75	2.20	25m	15s

Run 2

Experiment	%Correct - 1st Epoch	% Correct - >1st Epoch	Mean Objects per Graph	Mean Graphs per Object	Run Time	Episode Run Time (s)
surf_agent_unsupervised_10distinctobj	80.00%	86.67%	1.11	1.11	17m	10s
surf_agent_unsupervised_10distinctobj_noise	80.00%	67.78%	1.09	2.78	23m	14s
surf_agent_unsupervised_10simobj	50.00%	76.67%	2.75	2.20	26m	16s

Monty-Meets-World

Run 1

Experiment	% Correct	% Used MLH	Num Matching Steps	[Rotation Error (radians)]	Run Time	Episode Run Time (s)
randrot_noise_sim_on_scan_monty_world	80.00%	85.83%	437	0.94	54m	25s
world_image_on_scanned_model	66.67%	87.50%	453	2.05	16m	19s
dark_world_image_on_scanned_model	43.75%	77.08%	433	1.87	15m	18s
bright_world_image_on_scanned_model	47.92%	83.33%	457	2.16	22m	27s
hand_intrusion_world_image_on_scanned_model	54.17%	47.92%	333	1.79	11m	13s
multi_object_world_image_on_scanned_model	41.67%	39.58%	298	1.67	10m	12s

Run 2

Experiment	% Correct	% Used MLH	Num Matching Steps	Rotation Error (radians)	Run Time	Episode Run Time (s)
randrot_noise_sim_on_scan_monty_world	80.00%	85.83%	437	0.94	57m	27s
world_image_on_scanned_model	66.67%	87.50%	453	2.05	20m	24s
dark_world_image_on_scanned_model	43.75%	77.08%	433	1.87	17m	21s
bright_world_image_on_scanned_model	47.92%	83.33%	457	2.16	18m	21s
hand_intrusion_world_image_on_scanned_model	54.17%	47.92%	333	1.79	10m	12s
multi_object_world_image_on_scanned_model	41.67%	39.58%	298	1.67	9m	11s

nielsleadholm

Thanks for running those Scott.

Given the results are identical across the two AWS runs, I'm not concerned about issues with our seed fixing etc, and I assume that some of the other potential elements we discussed might explain the Oracle vs AWS differences. The accuracy changes between Oracle and AWS look equivocal to me so I think we should merge this and not spend 1 week+ chasing possible causes, i.e. given they haven't had a negative effect.

@vkakerbeck just tagging you to make sure you agree before we merge it?

vkakerbeck

Yes, that makes sense. Thanks for updating those Scott and for making sure the new infrastructure still produces consistent results!

Update results

4952a20

scottcanoe marked this pull request as ready for review January 8, 2025 15:46

scottcanoe requested review from nielsleadholm and vkakerbeck January 8, 2025 15:46

tristanls assigned vkakerbeck and nielsleadholm Jan 8, 2025

tristanls added documentation Improvements or additions to documentation triaged This issue or pull request was triaged labels Jan 8, 2025

Add back in brackets in table title

514fc5f

nielsleadholm approved these changes Jan 9, 2025

View reviewed changes

vkakerbeck approved these changes Jan 9, 2025

View reviewed changes

Merge branch 'main' into aws_benchmarks

79174e1

scottcanoe merged commit a6464bc into thousandbrainsproject:main Jan 9, 2025
13 checks passed

scottcanoe deleted the aws_benchmarks branch January 9, 2025 22:13

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Update benchmark results after migrating to AWS. #127

Update benchmark results after migrating to AWS. #127

scottcanoe commented Jan 8, 2025 •

edited

Loading

nielsleadholm commented Jan 9, 2025

scottcanoe commented Jan 9, 2025

scottcanoe commented Jan 9, 2025

nielsleadholm left a comment

vkakerbeck left a comment

Update benchmark results after migrating to AWS. #127

Update benchmark results after migrating to AWS. #127

Conversation

scottcanoe commented Jan 8, 2025 • edited Loading

nielsleadholm commented Jan 9, 2025

scottcanoe commented Jan 9, 2025

scottcanoe commented Jan 9, 2025

Reproducibility Report

Shorter Experiments with 10 Objects

Run 1

Run 2

Longer Experiments with all 77 YCB Objects

Run 1

Run 2

Unsupervised Learning

Run 1

Run 2

Monty-Meets-World

Run 1

Run 2

nielsleadholm left a comment

Choose a reason for hiding this comment

vkakerbeck left a comment

Choose a reason for hiding this comment

scottcanoe commented Jan 8, 2025 •

edited

Loading