Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Some image scraping silently fails #131

Open
maximecb opened this issue Nov 10, 2024 · 0 comments
Open

Some image scraping silently fails #131

maximecb opened this issue Nov 10, 2024 · 0 comments

Comments

@maximecb
Copy link

First I'd like to say thank you for putting in the work to create this export script 🙏

I'd like to report an issue with the image export. The script seems to be able to scrape most of the images on my blog just fine, but some images seem to randomly not get scraped and I don't know why. Some of the images are wrapped in links while others are not.

Example in the exported output. I would have expected this image to get scraped and be replaced by a relative URL:

![](https://lh5.googleusercontent.com/t2kGCE-RAuoB1DvJl7oJxqEvAShWRjHb9_r1rgw8Q84ZuBivDJzbZUF-HLbxbIvVlN1gEHYQFVJmYdDpJbRmL167WvxhTbb0eUkquWsy0B2v85gi0IlT-kOCjPPO95iMXvdZRt1V)

I ran the export script with the following arguments:

node index.js \
--input=pointersgonewild.xml \
--output=exported \
--include-other-types=false \
--year-folders=false \
--month-folders=false \
--post-folders=true \
--prefix-date=true \
--save-attached-images=true \
--save-scraped-images=true

@lonekorean I would say that your WordPress export script is very nearly perfect. I can go and manually scrape the failed images myself, but it would be even better if the script could handle it.

@maximecb maximecb changed the title Some image scraping fails Some image scraping silently fails Nov 10, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant