How tumblr-backup Works
Overview
By default, tumblr-backup backs up all posts in HTML format.
The generated directory structure looks like this:
./ - the current directory
<outdir>/ - your blog backup
index.html - table of contents with links to the monthly pages
backup.css - the default backup style sheet
custom.css - the user's style sheet (optional)
override.css - the user's style sheet override (optional)
archive/
<yyyy-mm-pnn>.html - the monthly pages
…
posts/
<id>.html - the single post pages
…
media/
<image.ext> - image files
<audio>.mp3 - audio files
<video>.mp4 - video files
…
json/
<id>.json - the original JSON posts
…
tags/
index.html - the index of all tag indices
<tag>/index.html - the index for <tag>
archive/
<yyyy-mm-pnn>.html - the monthly pages for <tag>
theme/
avatar.<ext> - the blog's avatar
style.css - the blog's style sheet
The default outdir is the blog-name.
Directory Structure with -D Option
If option -D is used, one folder per post is generated, and the post's
images are saved in the same folder. The monthly archive is also stored in a
folder per month. This results in the same URL structure as on the Tumblr page.
The directories look like this:
./ - the current directory
<outdir>/ - your blog backup
index.html - table of contents with links to the monthly pages
backup.css - the default backup style sheet
custom.css - the user's style sheet (optional)
override.css - the user's style sheet override (optional)
archive/
<yyyy-mm-pnn>/
index.html - the monthly page
…
posts/
<id>/
index.html - the single post page
<image.ext> - the image file(s) for this post
<audio>.mp3 - audio files
<video>.mp4 - video files
…
…
json/
<id>.json - the original JSON posts
…
theme/
avatar.<ext> - the blog's avatar
style.css - the blog's style sheet
Page Generation and Styling
The modification time of the single post pages is set to the post's timestamp. tumblr-backup applies a simple style to the saved pages. All generated pages are HTML5.
The index pages are recreated from scratch after every backup, based on the
existing single post pages. Normally, the index and monthly pages are in
reverse chronological order, i.e. more recent entries on top. The options -R
and -r can be used to reverse the order.
Option --tag-index creates a tag index for each tag used in the posts.
It can be reached through the "Tag index" link in the main index.
If you want to use a custom CSS file, call it custom.css, put it in the backup
folder and do a complete backup. Without a custom CSS file, tumblr-backup saves
a default style sheet in backup.css. The blog's style sheet itself is always
saved in theme/style.css.
It you want to override just a few default styles, create the file
override.css in the backup folder. This file is included automatically by the
default style sheet. You may have to mark your overriding styles with
!important to make them stick because override.css is imported first in the
style sheet.
Image Handling
Tumblr saves some image files without extension. This probably saves a few
billion bytes in their database. tumblr-backup restores the image extensions. If
an image is already backed up, it is not downloaded again. If an image is
re-uploaded/edited, the old image is kept in the backup, but no post links to
it. The format of the image file names can be selected with the -I option.
It must be noted that saved inline images (from non-photo posts) keep their name. This means that only the first image with any given name will be saved; the others with the same name will point to the first one.
The download of images can be disabled with option -k. In this case, the
image URLs will point to the original location.
EXIF Metadata
With option -e, IPTC keyword tags can be added to image files. There are
three possibilities:
-e kw1,kw2adds the post's tags pluskw1andkw2as keywords-e ''adds just the post's tags-e -removes all keywords from the image
Incremental Backups
In incremental backup mode, tumblr-backup saves only posts that have higher ids than the highest id saved locally. Note that posts that are edited after being backed up are not backed up again with this option.
In JSON backup mode, the original JSON source returned by the Tumblr API is
saved under the json/ folder in addition to the HTML format.
Automatic Archive Mode
Automatic archive mode -a is designed to be used from an hourly cron script.
It normally makes an incremental backup except if the current hour is the one
given as argument. In this case, tumblr-backup will make a full backup. An
example invocation is tumblr-backup -qa4 to do a full backup at 4 in the
morning. This option obviates the need for shell script logic to determine what
options to pass. If you don't want cron to send a mail if no new posts have been
backed up, use this crontab entry:
0 * * * * tumblr-backup -qa4 <blog-name> || test $? -eq 1
This changes the exit code 1 to 0.
Blosxom Format
In Blosxom format mode, the posts generated are saved in a format suitable for
re-publishing in Blosxom with the Meta
plugin. Images are not
downloaded; instead, the image links point back to the original image on
Tumblr. The posts are saved in the current folder with a .txt extension. The
index is not updated.
Limiting Backed Up Posts
In order to limit the set of backed up posts, use the -n and -s options. The
most recent post is always number 0, so the option -n 200 would select the 200
most recent posts. Calling tumblr-backup -n 100 -s 200 would skip the 200 most
recent posts and backup the next 100. -n 1 is the fastest way to rebuild the
index pages.
The option -T limits the backup to posts of the given type. -t saves only
posts with the given tags. -Q combines both: it accepts comma-separated
requests of the form TYPE:TAG1:TAG2:…, where the tags for each post type can
be different. Omitting the TAGs is allowed; this saves posts of this type with
any or no tags. Example: -Q any:personal,quote,photo:me:self saves all posts
tagged 'personal', all quotes, and photos tagged 'me' or 'self' or 'personal'
(because of the any request).
The option --no-reblog suppresses the backup of reposts of other blogs'
posts.
If you combine -n, -s, -i, -p, -t, -T, -Q and --no-reblog, only
posts matching all criteria will be backed up.