in games, programming, python

How to retrieve all the images from a website

Share Button

Few weeks ago I posted on Twitter few rather bizarre screenshots. A composition of all the submissions for #ScreenshotSaturday, loosely ordered by colour. In this series of posts I’ll briefly explain how I did that using Python.

You can download the original pictures (16Mb, 71Mb, 40Mb, 13Mb) here.

Step 1: How to retrieve all the screenshots

The first step is, of course, to download all the screenshots. After few attempts, I decided to physically copy them on my HDD so that I could attempt several different visualisation techniques and analysis without querying ScreenshotSaturday every time. All the pages with previous screenshots are reachable from www.screenshotsaturday.com/week_x.html, making it very easy to access. Then, we just had retrieve all the images linked in the page.

Line 9 uses BeautifulSoup to navigate the DOM of the retrieved HTML document. For ScreenshotSaturday there isn’t a specific tag which identifies the screenshots. However, they are the only element to using milkbox for the zoom-in effect. Line 12 queries all of these hyperlinks.

The code refers directly to ScreenshotSaturday, but it can be easily adapted to any other website which has a similar structure.

A problem I’ve been experiencing is that some pictures are decoded wrongly. I believe this might be a problem with line 21, which makes strong assumptions on the way pixels are encoded.

Step 2: Download from different pages

Now, to download all the screenshots:

Leave it on overnight and get 10Gb of space on your HDD.

Conclusion

This series of post explains how to use Python to download all the images from a website. In the next posts I’ll discuss a more interesting (and less programming-oriented) topics: how to find the main colours in a image, using clustering techniques.

Before everybody tries to attempt this, I want to remember that the guys at ScreenshotSaturday might not be too happy if you overload their servers with too many requests. I grabbed the screenshots over a week, having some delay in between requests to spread out the traffic. Download responsibly.


Support this blog! ♥

For the past three years I've been dedicating more and more of my time to the creation of quality tutorials, mainly about game development and machine learning. If you think these posts have either helped or inspired you, please consider supporting this blog.

Paypal
Twitter_logo

Don't miss the next tutorial!

There's a new post every Wednesday: leave your email to be notified!


Write a Comment

Comment

Webmentions

  • A practical guide to sort colors November 3, 2016

    This is great. How did you create the thumbnail embeddings into one image?

  • How to find the main colours in an image | Alan Zucconi November 3, 2016

    This is great. How did you create the thumbnail embeddings into one image?