Quantcast
Channel: sleeplessbeastie's notes
Viewing all articles
Browse latest Browse all 770

How to extract cover image from an e-book

$
0
0

I have successfully used Google Drive and Insync to organize all of the e-books that I have acquired during last years, but currently plan to upload them to personal DokuWiki instance, since I use it more every day. Before I can start, I need to extract cover images to ensure that I will get a decent outcome.

Requirements

It is absolutely enough to install onlyImageMagick package to perform PDF to image conversion.

$ sudo apt-get install imagemagick

Additionally you can install Poppler utilities to get PDF details.

$ sudo apt-get install poppler-utils

Extract single cover image

Use convert utility to convert first page to an image.

$ convert Linux-Voice-Issue-016.pdf[0] Linux-Voice-Issue-016.png

You perform additional operations (like resize in this example) on this image during conversion process.

$ convert Linux-Voice-Issue-016.pdf[0] -resize 200x300 Linux-Voice-Issue-016.png

Notice that from ImageMagick's point of view page numbers start from 0.

Extract multiple cover images

Use simple Bash shell script to extract and store cover images from e-books found in sub-directories.

#!/bin/bash
# Create cover images from e-books in sub-directories
# This shell script is not recursive

# maximum width and height of the output image
maxsize="200x200"

for directory in */;do
  if [ -d "$directory" ]; then
    echo "Processing sub-directory: "${directory%%/}
    mkdir -p "${directory}covers"
    for ebook in "${directory}"*.pdf; do
      ebook="$(basename "$ebook")"
      if [ ! -f "${directory}covers/${ebook%%.pdf}.png" -a -f "${directory}${ebook}" ]; then
        echo "  Processing e-book: $ebook"
        convert "${directory}${ebook}"[0] -resize $maxsize "${directory}covers/${ebook%%.pdf}.png" 2>/dev/null
      fi
    done
  fi
done

The output will look similar to the following.

Processing sub-directory: BSDmag
  Processing e-book: BSD_2008_01.pdf
  Processing e-book: BSD_2008_02.pdf
[...]
Processing sub-directory: LinuxFormat
  Processing e-book: LXF134.complete.pdf
  Processing e-book: LXF135.book.pdf
[...]
Processing sub-directory: LinuxVoice
  Processing e-book: Linux-Voice-Issue-001.pdf
  Processing e-book: Linux-Voice-Issue-002.pdf
[...]

Simple shell script to generate wiki content

It is just an ugly snippet, but it will help you to quickly build list of PDF files.

#!/bin/bash
# create DokuWiki content
# create list of PDF files in current directory

dir=$(basename $(pwd))

for pdf in *.pdf; do
cat << EOF&lbrace;&lbrace;:bookshelf:$dir:covers:$&lbrace;pdf%%.pdf&rbrace;.png?nolink |&rbrace;&rbrace;
**$(echo $pdf | sed s/.pdf// | sed "s/_/ /g"| sed "s/-/ /g")**\\\\
//$(pdfinfo $pdf | sed -ne "/Author:/ &lbrace;s/^Author:\ *//;p&rbrace;")//&lbrace;&lbrace;:bookshelf:$dir:$&lbrace;pdf&rbrace;|Download e-book&rbrace;&rbrace;
----

EOF
done

Sample output.

[...]&lbrace;&lbrace;:bookshelf:pragprog:covers:the-viml-primer_p1_0.png?nolink |&rbrace;&rbrace;
**the viml primer p1 0**\\
//Benjamin Klein//&lbrace;&lbrace;:bookshelf:pragprog:the-viml-primer_p1_0.pdf|Download e-book&rbrace;&rbrace;
----&lbrace;&lbrace;:bookshelf:pragprog:covers:tmux_p3_0.png?nolink |&rbrace;&rbrace;
**tmux p3 0**\\
//Brian P. Hogan//&lbrace;&lbrace;:bookshelf:pragprog:tmux_p3_0.pdf|Download e-book&rbrace;&rbrace;
----
[...]
Notice that DokuWiki does not like mixed case names - see Page Names documentation.

Additional information

The most effective way to get number of pages from PDF e-book is to use pdfinfo utility from mentioned earlier Poppler utilities package.

$ pdfinfo Linux-Voice-Issue-016.pdf | awk '/^Pages:/ { print $2 }'
116

You can use ImageMagick'sidentify command to get the same information, but it is very slow, as it extracts every page as an image.

$ identify -format "%n" Linux-Voice-Issue-016.pdf | head -1
116

You can analyze first ten pages to print the one with most colors using the following command.

$ identify -format "%s %k\n" Linux-Voice-Issue-016.pdf[0-10] | sort -nrk2 | awk 'NR==1 {print $1}'
3

This command can be very useful if you need to search for cover image.


Viewing all articles
Browse latest Browse all 770

Trending Articles