Reddpics: Download all of the images from a subreddit

Posted

Update 10/4/2015 – This now uses Reddit::Client 1.0 and Oauth. If you have an older version, you’ll need to update it, and use your Oauth ID and secret from your user preferences page. A guide on how to do that will be up shortly.

I recently had a need to get all of the images from a subreddit, so I wrote this script to do it, then decided to release it for public consumption. If the syntax highlighting below is hard to read you can get the source here. (I tried to make it look as much like vi as possible, the environment I write most of my Perl in. It came out looking all right, but with some slightly mangled indentation).

It will download any direct image link as well as any single-page imgur link— meaning the links that omit the file extension like this one, and also the ones that use a 13-character filename (those actually link the imgur file to a Reddit post, if you wondered). Galleries and albums will be ignored. Self posts or any other kind of web page will be ignored. Any file without a MIME type of “image” and a file type of jpeg, png or gif will be ignored (you can add additional types if you want).

It keeps an index of all the files it downloads in output_directory/index.txt, which maps the Reddit post ID, the local filename, and the original URL. This can be used to omit those files you’ve already downloaded, if you want— for example, say you were making a monthly archive. For that reason it appends to the index instead of overwriting the old one. (If you were to run it 3 times in a row for the same subreddit for some reason, your index would have 3 records for each file, so you’d want to either delete the old one before starting or change it to overwrite instead of append. Or just have the script you use to read the index ignore duplicates.)

It corrects any bad file extensions as it goes— for example, a gif that’s been named with a .jpg extension.

It also has options to stop after filling a certain amount of disk space or after a certain number of failed requests, to keep it from getting away from you if you choose to download some really crazy amount of images, like every image ever from /r/pics.

The options should be fairly self-explanatory, but just in case:

$reddit_user your (or your bot account’s) Reddit username
$reddit_pass a recipe for cheese
$subreddit subreddit to fetch images from
$max_images max images to fetch
$max_consecutive_badreqs die after this many consecutive posts don’t return an image for any reason
$filenames “numbered” will number them sequentially, “original” will use the original file name, and “reddit_id” will use the Reddit post ID
%types The types of image files accepted and the matching file extension. These are usually the same except in the case of jpeg=>jpg.

Source

I’ve had a couple of people ask how to run this. I haven’t tried it on Windows, but on Linux, you’d just install the dependencies and run it:

cpan Reddit::Client
cpan Try::Tiny
cpan LWP::Simple
./reddpics.pl

Update 10/4/2015 – Updated to work with Reddit::Client 1.0 (it now uses Oauth)

 

Comments

  1. babidi

    Can you kindly provide some kind of instruction on how someone who doesn’t know how to use perl code could use your script?

Comment

Enter your comment below. Fields marked * are required. You must preview your comment first before finally posting.





← Older Newer →