Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

python projects/habitat_objectnav/eval_episode.py, this does not seem to wok? #470

Open
Ethan2415 opened this issue Jan 19, 2024 · 3 comments
Assignees

Comments

@Ethan2415
Copy link

🐛 Bug

After following home-robot sim instrctuions, and downloading necessary datasets as instructed in Habitat-ObjNav, I run this line of code, then many problems that seems to be relevant to the code in the file /home-robot/src/home_robot/home_robot/mapping/semantic/categorical_2d_semantic_map_module.py occur.

Immediately, it says

global_pose = np.array([camera_x.item(), camera_y.item(), roll.item()] )#if type(roll) is not int else roll])
AttributeError: 'int' object has no attribute 'item'

which is in line 573, and it seems that the roll is an integer and so the datatype here is inappropriately handled. So can be seen above after #, i added the type check line after the original code , it worked successfully. However, this is just the start. Then it met the problem saying

File "/home/yqp/home-robot/src/home_robot/home_robot/mapping/semantic/categorical_2d_semantic_map_module.py", line 579, in _update_local_map_and_pose
   self.instance_memory.process_instances(
TypeError: process_instances() got multiple values for argument 'image'

it seems to be related to the function process_instances, the image is here used as a keyword argument, but failed to correspond to the original function definition. The function is defined in the form as:

    def process_instances(
        self,
        instance_channels: Tensor,
        point_cloud: Tensor,
        image: Tensor,
        cam_to_world: Optional[Tensor] = None,
        semantic_channels: Optional[Tensor] = None,
        pose: Optional[Tensor] = None,
    ):

But here is called with the arguments:

  if num_instance_channels > 0:
      self.instance_memory.process_instances(
          instance_channels,
          absolute_point_cloud,
          #image=obs[:, :3, :, :],
          #obs[:, :3, :, :],
          torch.concat([current_pose + origins, lmb], axis=1)
          .cpu()
          .float(),  # store the global pose
          image=obs[:, :3, :, :],
          semantic_channels=semantic_channels,
          background_class_labels=[0, semantic_max_val],
          #pose=[0, semantic_max_val]
      )

which fail to correspond one by one. I can't debug this problem because the chain of calling all the functions is too long and I do not know how all the data is processed. This seems to be the wrong code written that fail to handle the data processing? Can anyone help? Is that because the code file is wrongly written or did I miss to set up some necessary steps?
Steps to reproduce the behavior:

@yvsriram
Copy link
Contributor

This seems to be the wrong code written that fail to handle the data processing? Can anyone help?

Hi, this is because the habitat objectnav project is not up-to-date with other changes we made in the repository. I will fix this up today.

Meanwhile, you can use the previous tagged version for ObjectNav here, which is stable. Please remember to use the corresponding habitat-lab commit when you switch to this tagged version.

@yvsriram yvsriram self-assigned this Jan 31, 2024
@Ethan2415
Copy link
Author

Ethan2415 commented Feb 4, 2024

Thanks for answering and your recommendation to use the tagged version, but this is still a problem I want to ask. So in this repo, when i want to try to train a rl agent, i run the file home-robot/src/third_party/habitat-lab/habitat-baselines/habitat_baselines/run.py, with configs provided in the folder /home-robot/src/third_party/habitat-lab/habitat-baselines/habitat_baselines/config/objectnav/, and after a long time of running, eventually I get a broken pipe error.

    self._compute_actions_and_step_envs(buffer_index)
  File "/home/yqp/home-robot/src/third_party/habitat-lab/habitat-baselines/habitat_baselines/rl/ppo/ppo_trainer.py", lin
e 508, in _compute_actions_and_step_envs
    range(env_slice.start, env_slice.stop), actions.cpu().unbind(0)
RuntimeError: CUDA error: device-side assert triggered

tracing back,

Exception ignored in: <function VectorEnv.__del__ at 0x7ff2341a1ee0>                                            [0/1898]
Traceback (most recent call last):                                                                                      
  File "/home/yqp/home-robot/src/third_party/habitat-lab/habitat-lab/habitat/core/vector_env.py", line 589, in __del__
    self.close()                                                                                                        
  File "/home/yqp/home-robot/src/third_party/habitat-lab/habitat-lab/habitat/core/vector_env.py", line 460, in close
    write_fn((CLOSE_COMMAND, None))                                                                                     
  File "/home/yqp/home-robot/src/third_party/habitat-lab/habitat-lab/habitat/core/vector_env.py", line 129, in __call__
    self.write_fn(data)                                                                                                 
  File "/home/yqp/home-robot/src/third_party/habitat-lab/habitat-lab/habitat/utils/pickle5_multiprocessing.py", line 63,
 in send                                                    
    self.send_bytes(buf.getvalue())                                                                                     
  File "/home/yqp/miniforge3/envs/home-robot/lib/python3.9/multiprocessing/connection.py", line 200, in send_bytes
    self._send_bytes(m[offset:offset + size])                                                                           
  File "/home/yqp/miniforge3/envs/home-robot/lib/python3.9/multiprocessing/connection.py", line 411, in _send_bytes
    self._send(header + buf)                                                                                            
  File "/home/yqp/miniforge3/envs/home-robot/lib/python3.9/multiprocessing/connection.py", line 368, in _send
    n = write(self._handle, buf)   
BrokenPipeError: [Errno 32] Broken pipe       

Why is this happening? Based on the trace back, it looks like the process wants to write a close command to the pipe but failed, this actually happened both before and after I switched to your recommendation tagged version, How could I solve this?

@yvsriram
Copy link
Contributor

yvsriram commented Feb 5, 2024

Hi, I never trained "objectnav" agents in this branch. I did not face any errors initially but will try training for long.
For objectnav training, you should consider using this repository: https://github.com/3dlg-hcvc/hssd.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants