Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: check for panic in logs on start #1258

Merged
merged 4 commits into from
Sep 16, 2024
Merged

Conversation

Reecepbcups
Copy link
Member

@Reecepbcups Reecepbcups commented Sep 16, 2024

closes #1238

Summary

This is a huge user experience improvement for ICT & local-ic upon a bad genesis param set (or some other failed startup option).

Before, the network would hang and just state it failed to query
:
failed to start chains: failed to start chain localosmosis-1: All attempts fail:
#1: post failed: Post "http://0.0.0.0:26659\": dial tcp 0.0.0.0:26659: connect: connection refused
...
#30 ...

Showcase

This chain uses a genesis modification of {"key": "app_state.gov.params.voting_period", "value": "bad"},. Both validators show this error in an easy to read format + the docker logs for the stacktrace.

image

Copy link

vercel bot commented Sep 16, 2024

The latest updates on your projects. Learn more about Vercel for Git ↗︎

1 Skipped Deployment
Name Status Preview Comments Updated (UTC)
interchaintest-docs ⬜️ Ignored (Inspect) Visit Preview Sep 16, 2024 3:36pm

@Reecepbcups Reecepbcups marked this pull request as ready for review September 16, 2024 15:38
@Reecepbcups Reecepbcups requested a review from a team as a code owner September 16, 2024 15:38
@joelsmith-2019
Copy link
Member

joelsmith-2019 commented Sep 16, 2024

Will this slow down tests that spin up numerous containers? I just want to double check we won't wait 1 second for each ephemeral container we spin up.

@Reecepbcups
Copy link
Member Author

Reecepbcups commented Sep 16, 2024

@joelsmith-2019

Will this slow down tests that spin up numerous containers?

Test are only slowed down by the 1 second per container, and containers are spun up in parallel. So if you have 1 or 100 containers, the test time only slows down by 1 second

just want to double check we won't wait 1 second for each ephemeral container we spin up

No, ephemeral containers use their own job.Run() method which is outside of the containerLifecycle.

func (tn *ChainNode) Exec(ctx context.Context, cmd []string, env []string) ([]byte, []byte, error) {
	job := dockerutil.NewImage(tn.logger(), tn.DockerClient, tn.NetworkID, tn.TestName, tn.Image.Repository, tn.Image.Version)
	opts := dockerutil.ContainerOptions{
		Env:   env,
		Binds: tn.Bind(),
	}
	res := job.Run(ctx, cmd, opts)
	return res.Stdout, res.Stderr, res.Err
}

containerLifecycle is only used for long running containers (nodes & sidecars).

I had previously tested this with

	if err := c.CheckForFailedStart(ctx, time.Second*10); err != nil {
		return err
	}

Only time it hangs is post "Starting container" once.

@Reecepbcups Reecepbcups merged commit 3ca97c4 into main Sep 16, 2024
18 checks passed
@Reecepbcups Reecepbcups deleted the reece/better-bad-input-errors branch September 16, 2024 16:20
@Reecepbcups
Copy link
Member Author

Reecepbcups commented Sep 16, 2024

I also have a WIP speed up branch which will reduce test time by like 40%. Theoretical max is ~90 - 92% with state cache. Which would make this 1 second start wait the slowest part of ICT in the future

#1208

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

ICT should give more descriptive errors when chain genesis validation fails
2 participants