-
Notifications
You must be signed in to change notification settings - Fork 85
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Don't use the semantic sensor by default #116
Open
scottcanoe
wants to merge
26
commits into
thousandbrainsproject:main
Choose a base branch
from
scottcanoe:semantic_sensor
base: main
Could not load branches
Branch not found: {{ refName }}
Loading
Could not load tags
Nothing to show
Loading
Are you sure you want to change the base?
Some commits from the old base branch may be removed from the timeline,
and old review comments may become outdated.
Open
Don't use the semantic sensor by default #116
scottcanoe
wants to merge
26
commits into
thousandbrainsproject:main
from
scottcanoe:semantic_sensor
+327
−183
Conversation
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Also fix spelling error. `get_perc_on_obj_semantic` had the parameter "sematic" [sic]. I changed it to "semantic" and updated uses.
Change default "semantics" to False in Multi-LM mount configs.
Update docstring
…cation_to_look_at
- Explicitly use semantic sensors for habitat_transform_test.py - Drop more non-matching columns between parallel and non-parallel experiments for run_parallel_test.py since not having a semantic sensor modifies reporting of stepwise_performance and stepwise_target_object.
docstrings
Don't use ycb data path for multi-object dataset args.
Add return values to get_good_view and get_good_view_with_patch_refinement so we can raise an assertion error if we start an episode off-object.
Prepare to pretrain new models.
…onty into semantic_sensor
📚 Documentation Preview ✅ A preview of the documentation changes in this PR is available for maintainers at: Last updated: 2025-01-09 23:41 UTC |
scottcanoe
changed the title
WIP: Don't use the semantic sensor by default
Don't use the semantic sensor by default
Jan 10, 2025
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Problem Statement
The environment data loader returns observations for each sensor, such as an RGBA image or depth image. By default, it also returns a semantic map for each sensor which indicates the object each pixel is "looking at"1. However, these semantic maps do contain a lot of information which is delivered to Monty "for free", and we would like to remove this dependence on privileged information by having Monty compute its own semantic maps from sensor data as its default behavior.
Most of the groundwork for this has already been laid. The
DepthTo3DLocation
transform currently has a method for estimating semantic maps using depth data. WhenDepthTo3DLocation.use_semantic_sensor=False
, the ground-truth semantic map is not used during the normal course of an episode. However, it is still used by when initializing an episode and after executing hypothesis-driven jumps. The goal is to replace usage of the ground-truth semantic map with the estimated semantic map in these contexts. We would also like to prevent the ground-truth semantic map from being returned by the environment data loader to guarantee that Monty doesn't use it when we don't want it to.Here's a quick overview of the settings that affect the usage of ground-truth semantic maps2.
MountConfig.semantics
value for a sensor determines whether the ground-truth semantic map is returned by the data loader. More specifically, it creates the itemobservations[agent_id][sensor_module_id]["semantic"]
.DepthTo3DLocation.use_semantic_sensor
determines whether the ground-truth semantic map is used by this transform.InformedPolicy
'smove_close_enough
,orient_to_object
, andfind_location_to_look_at
methods currently use (and require) the existence ofobservations[agent_id][sensor_module_id]["semantic"]
which is controlled by the mount config only. SettingDepthTo3DLocation.use_semantic_sensor=False
has no effect here.Important: the estimated semantic map created by
DepthTo3DLocation
is binary (off- or on-object), whereas the ground-truth semantic maps are zero when off-object and a particular object ID otherwise (e.g., 3s where the pixel is on thepotted_meat_can
). The presence of object IDs in the semantic map is what is needed by theInformedPolicy
's movement/orientation methods. While we can currently work around binary semantic maps for single-object episodes (see below), there's currently no way to avoid using ground-truth semantic maps for multi-object experiments.Solutions
I've adapted
InformedPolicy.move_close_enough
,InformedPolicy.orient_to_object
, andInformedPolicy.find_location_to_look_at
methods to use semantic maps embedded into theobservations
dict byDepthTo3DLocation
. These methods have been augmented withprimary_target_id
andmultiple_objects_present
parameters. If this isFalse
, the nonzero elements of the estimated semantic map are temporarily replaced with the ID of the target object. The nice thing about this method is that the semantic data contained in"semantic_3d"
reflects the choice ofDepthTo3DLocation.use_semantic_sensor
, and the modifications to the code have no effect when the semantic sensor is in use. These changes are specific to the distant agent.Quite a few changes have also been made to
DepthTo3DLocation
. The basic reason here is that the ground-truth semantic map was used to zero out off-object regions on an estimated surface. I introduced a simple, temporary semantic mask from depth data that performs the same function. There was a decent amount of reorganization to get this to work, including changes to parameters and moving some behavior between functions. Side note, the surface estimation behavior ofDepthTo3DLocation
isn't super obvious to a code reader, and I'd be happy to contribute some documentation to it somewhere.Beyond these changes, some tests were updated and some typos were changed (
num_distractors
was often spellednum_distactors
, andsemantic
was sometimes written assematic
). I have also added a pre-episode checks that ensures that we start an episode on the target object (with the exception of multi-object experiments).Dataset configs were also changed to not use semantic sensors in
DepthTo3DLocation
nor have ground-truth semantic data returned by habitat. Multi-object dataset configs were also added and should be used when performing multi-object experiments. Benchmark pretraining and object recognition experiments reflect these modifications.Finally, I re-pretrained all models (we're now up to
pretrained_ycb_v10
) and ran all benchmarks.Footnotes
Clarification for future readers: these maps are not used for object recognition -- they're used to select a location in space where Monty should move and orient its sensors toward to initialize an episode. ↩
The user should be aware that certain combinations of these two attributes are invalid. For example, setting the
MountConfig.semantics
value for a sensor toFalse
andDepthTo3DLocation.use_semantic_sensor=True
will trigger an error. This is something we can warn against when we implement config checking. ↩