Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improvements for the OPi image #19

Merged
merged 38 commits into from
Jul 30, 2024

Conversation

crschardt
Copy link
Contributor

@crschardt crschardt commented Jul 6, 2024

This PR includes a significantly re-written install_opi5.sh that:

  • sets bash as the shell for the pi user
  • updates AllowedCPUs to use all 8 cores (in 24.04 it was only using the A76 cores)
  • reduces image size by 316 MiB by removing unnecessary packages
  • leverages cloud-init/netplan for initial network configuration
  • speeds startup by
    • blocking cloud-init from running after the first boot
    • telling NetworkManager not to wait for a carrier signal (this significantly speeds startup when not connected to ethernet, but may not be needed) <- moved to a future PR for more testing
    • disabling networkd-wait-online <- moved to a future PR for more testing
  • fixes the network not being available when photonvision starts by
    • setting photonvision.service to start after the network is available
    • manually configuring the ethernet with a static address so that it is recognized as an interface even if DHCP isn't available at boot <- not implemented as this is probably not the right approach

The benefits of these changes are pretty clear:

  • the first boot is <30 seconds, and subsequent boots take ~10 seconds
  • as long as the coprocessor is connected to an ethernet network on first boot, the network interface is avialable the first time photonvision loads and so it recognizes it and can manage it correctly with nmcli
  • this approach works with the new server images from Joshua Riek and should allow us to add support for the OPi5 Pro

Since these are such large changes, it would be good to have people test this on Orange Pis. I've done a lot of testing on an OPi5, but I don't have an OPi5 Plus and I'm concerned that the network configuration will need to be modified for the dual ethernet ports.

Copy link
Contributor Author

@crschardt crschardt left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The system performance on 24.04 seems diffrent. I'm running two OV9281 cameras with AprilTag pipelines at 1280x720 (decimation=2) and getting 35 fps with ~23 ms latency. When I check the CPU usage, it's only about 65-70% distributed across all 8 cores. This makes me wonder if there is a bottleneck in the pipeline.

Copy link
Contributor

@stephenjust stephenjust left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please find a way to re-enable networking config via PhotonVision UI.

install_opi5.sh Outdated
wget https://git.io/JJrEP -O install.sh
chmod +x install.sh

sed -i 's/# AllowedCPUs=4-7/AllowedCPUs=4-7/g' install.sh
sed -i 's/# AllowedCPUs=4-7/AllowedCPUs=0-7/g' install.sh

./install.sh -n -q
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since you disabled networking support / NetworkManager via the -n in your previous commit, PhotonVision can no longer set the device hostname.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is temporary, and part of the reason this is still draft. I have some more commits that I'll push to improve networking on OPis.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The latest pushes have re-enabled network management.

install_opi5.sh Outdated
# when the coprocessor isn't connected to the ethernet
cat > /etc/NetworkManager/conf.d/50-ignore-carrier.conf <<EOF
[main]
ignore-carrier=*
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This might be dangerous on a robot network where ethernet cables can intermittently be unplugged.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you share more details of the failure mode that you're concerned about? I'm testing different network setting to understand how they impact startup time and network behavior to try to find a configuration that provides the most stability and predictability, so understanding what could go wrong from your perspective would be a big help.

@crschardt
Copy link
Contributor Author

The latest push includes a revised network configuration that defaults to DHCP + link-local fallback. It also waits for a network connection (or a timeout) before starting the photonvision service. When connected to a network during the first boot, this increases the chances that photonvision will get a valid network connection name. The downside is that it can cause a 1 to 2 minute delay in startup when the coprocessor isn't connected to any network. I'm not sure if this is a real problem, since there isn't much point in using photonvision if it isn't connected to a network, but that remains to be seen.

I think this is now ready for general testing and would like to get feedback on OPi5, OPi5 Plus, and OPi5 Pro if people are willing to do the testing.

@crschardt crschardt marked this pull request as ready for review July 21, 2024 23:48
Comment on lines 31 to 32
- name: ubuntu
password: photon
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we'll need to update our downstream documentation with the new login info

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

pi/raspberry still works. This is a small change to the default ubuntu/ubuntu username so I didn't have to change the password every time I ssh'd in to a new image. I can change it back to ubuntu/ubuntu.

Comment on lines -37 to +59
# cat > /etc/netplan/00-default-nm-renderer.yaml <<EOF
# network:
# renderer: NetworkManager
# tell NetworkManager not to wait for the carrier on ethernet, which can delay boot
# when the coprocessor isn't connected to the ethernet
# cat > /etc/NetworkManager/conf.d/50-ignore-carrier.conf <<EOF
# [main]
# ignore-carrier=*
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why's this still commented out?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ignoring the carrier may change current behavior on the robot. I'm not sure that it's a problem, but I'd like to leave it commented out until I have a chance to test on a robot network to make sure that nothing unexpected happens.

echo "Installing additional things"
sudo apt-get update
apt-get install -y network-manager net-tools libatomic1
# set NetworkManager as the renderer in cloud-init
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've got some reservations about the approach using cloud-init.

The upstream image exposes these on a separate device partition for editing on Windows. That's all fine, but it would be confusing for these files to be present yet not honored after first boot. IMO either keep cloud-init and always execute it (with a boot performance penalty) or just remove it entirely.

Copy link
Contributor Author

@crschardt crschardt Jul 22, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My understanding of cloud-init is that it doesn't regenerate the network configuration on every boot, but I'll admit that it isn't entirely clear to me from the documentation. In addition, when photonvision changes the IP to static, NetworkManager adds it's own configuration file to /etc/netplan which will override any changes made in the cloud-init setup files.

@mcm001
Copy link
Contributor

mcm001 commented Jul 25, 2024

Looks like there are merge conflicts now?

@crschardt
Copy link
Contributor Author

Looks like there are merge conflicts now?

Easy fix. I've removed a few networking-related things from this PR because I'd like to do more testing. The image is working well and is ready for merging.

@crschardt crschardt merged commit fbaa802 into PhotonVision:master Jul 30, 2024
8 checks passed
@crschardt crschardt mentioned this pull request Jul 30, 2024
@crschardt crschardt deleted the opi-24-image-fixes branch August 4, 2024 03:03
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants