Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: add command download to download public node snapshots #13598

Open
wants to merge 24 commits into
base: main
Choose a base branch
from

Conversation

lean-apple
Copy link
Contributor

@lean-apple lean-apple commented Dec 30, 2024

Closes #13469.

Example of use :

cargo run --bin reth -- download --url https://downloads.merkle.io/reth-2024-10-23.tar.lz4

Starting snapshot download for chain: Chain::Named(Mainnet)
Target directory: "/Users/xxxx/Library/Application/reth/mainnet"
Source URL: https://downloads.merkle.io/reth-2024-10-23.tar.lz4
Downloading... 0.1%

Copy link

codspeed-hq bot commented Dec 30, 2024

CodSpeed Performance Report

Merging #13598 will not alter performance

Comparing lean-apple:cli-download-public-node-snapshots (cba7ae2) with main (ac25fd8)

Summary

✅ 77 untouched benchmarks

Copy link
Member

@gakonst gakonst left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's also maybe add a list of pre-set snapshot URLs so that the user doesn't need to find the URL themselves? And we can default to one of them, while the --help menu shows alternatives/fallbacks?

@lean-apple lean-apple marked this pull request as ready for review January 6, 2025 16:37
@lean-apple
Copy link
Contributor Author

Let's also maybe add a list of pre-set snapshot URLs so that the user doesn't need to find the URL themselves? And we can default to one of them, while the --help menu shows alternatives/fallbacks?

What would you advise me to be able to fetch the same block height they are using in their url ? @matias-gonz @joshieDo
For example, in this url https://snapshots.publicnode.com/ethereum-holesky-reth-3087369.tar.lz4, we see it's3087369,
but if I use let latest_block = provider.get_block_number().await?; I got 3090694.

My aim is to build a default "dynamic" url with the corresponding block height with this pattern :
https://snapshots.publicnode.com/ethereum-{potential network}-reth-{block height}.tar.lz4.

Copy link
Collaborator

@joshieDo joshieDo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Snapshots can be quite big (>2TB with an archival node).

The current PR would mean that we'd need double that space, since we're downloading everything first and then decompressing. Ideally we'd pipe what we download straight into the decompressing stage.

Basically replicating the following behaviour:
wget -O - https://downloads.merkle.io/reth-2025-01-06.tar.lz4 | tar -I lz4 -xvf -

@joshieDo
Copy link
Collaborator

joshieDo commented Jan 7, 2025

Regarding merkle url, it's possible to find the latest archive with the following link: https://downloads.merkle.io/latest.txt

@lean-apple
Copy link
Contributor Author

lean-apple commented Jan 9, 2025

Snapshots can be quite big (>2TB with an archival node).

The current PR would mean that we'd need double that space, since we're downloading everything first and then decompressing. Ideally we'd pipe what we download straight into the decompressing stage.

Basically replicating the following behaviour: wget -O - https://downloads.merkle.io/reth-2025-01-06.tar.lz4 | tar -I lz4 -xvf -

Thanks, I've added this option --decompress that indeed makes the downloading not opitmized, will update the code.

@lean-apple lean-apple changed the title feat: add cli download to download public node snapshots feat: add command download to download public node snapshots Jan 13, 2025
Comment on lines +35 to +47
#[arg(
long,
short,
help = "Custom URL to download the snapshot from",
long_help = "Specify a snapshot URL or let the command propose a default one.\n\
\n\
Available snapshot sources:\n\
- https://downloads.merkle.io (default, mainnet archive)\n\
- https://publicnode.com/snapshots (full nodes & testnets)\n\
\n\
If no URL is provided, the latest mainnet archive snapshot\n\
will be proposed for download from merkle.io"
)]
Copy link
Contributor Author

@lean-apple lean-apple Jan 13, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So right now @joshieDo @gakonst

reth download

will check if an url is provided,
if not, it will propose a default url built from https://downloads.merkle.io/latest.txt

Copy link
Collaborator

@joshieDo joshieDo Jan 14, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

no need to propose/ask for further user input. if no --url is provided then default to the merkle one

Comment on lines +108 to +117
fn spawn_tar_process(target_dir: &Path, lz4_stdout: Stdio) -> Result<Child> {
Ok(ProcessCommand::new("tar")
.arg("-xf")
.arg("-") // Read from stdin
.arg("-C")
.arg(target_dir)
.stdin(lz4_stdout)
.stderr(Stdio::inherit())
.spawn()?)
}
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

could this be using the lz4 and tar crates instead? we shouldnt be depending on external binaries imo

);

stream_and_extract(&url, data_dir.data_dir()).await?;
info!("Snapshot downloaded and extracted successfully");
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
info!("Snapshot downloaded and extracted successfully");
info!(target: "reth::cli", "Snapshot downloaded and extracted successfully");

and in other places

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Add new cli command to download public node snapshots
3 participants