How to extract cover image from an e-book

I have successfully used Google Drive and Insync to organize all of the e-books that I have acquired during last years, but currently plan to upload them to personal DokuWiki instance, since I use it more every day. Before I can start, I need to extract cover images to ensure that I will get a decent outcome.

Requirements

It is absolutely enough to install onlyImageMagick package to perform PDF to image conversion.

$ sudo apt-get install imagemagick

Additionally you can install Poppler utilities to get PDF details.

$ sudo apt-get install poppler-utils

Extract single cover image

Use convert utility to convert first page to an image.

$ convert Linux-Voice-Issue-016.pdf[0] Linux-Voice-Issue-016.png

You perform additional operations (like resize in this example) on this image during conversion process.

$ convert Linux-Voice-Issue-016.pdf[0] -resize 200x300 Linux-Voice-Issue-016.png

Notice that from ImageMagick's point of view page numbers start from 0.

Extract multiple cover images

Use simple Bash shell script to extract and store cover images from e-books found in sub-directories.

#!/bin/bash
# Create cover images from e-books in sub-directories
# This shell script is not recursive

# maximum width and height of the output image
maxsize="200x200"

for directory in */;do
  if [ -d "$directory" ]; then
    echo "Processing sub-directory: "${directory%%/}
    mkdir -p "${directory}covers"
    for ebook in "${directory}"*.pdf; do
      ebook="$(basename "$ebook")"
      if [ ! -f "${directory}covers/${ebook%%.pdf}.png" -a -f "${directory}${ebook}" ]; then
        echo "  Processing e-book: $ebook"
        convert "${directory}${ebook}"[0] -resize $maxsize "${directory}covers/${ebook%%.pdf}.png" 2>/dev/null
      fi
    done
  fi
done

The output will look similar to the following.

Processing sub-directory: BSDmag
  Processing e-book: BSD_2008_01.pdf
  Processing e-book: BSD_2008_02.pdf
[...]
Processing sub-directory: LinuxFormat
  Processing e-book: LXF134.complete.pdf
  Processing e-book: LXF135.book.pdf
[...]
Processing sub-directory: LinuxVoice
  Processing e-book: Linux-Voice-Issue-001.pdf
  Processing e-book: Linux-Voice-Issue-002.pdf
[...]

Simple shell script to generate wiki content

It is just an ugly snippet, but it will help you to quickly build list of PDF files.

#!/bin/bash
# create DokuWiki content
# create list of PDF files in current directory

dir=$(basename $(pwd))

for pdf in *.pdf; do
cat << EOF&lbrace;&lbrace;:bookshelf:$dir:covers:$&lbrace;pdf%%.pdf&rbrace;.png?nolink |&rbrace;&rbrace;
**$(echo $pdf | sed s/.pdf// | sed "s/_/ /g"| sed "s/-/ /g")**\\\\
//$(pdfinfo $pdf | sed -ne "/Author:/ &lbrace;s/^Author:\ *//;p&rbrace;")//&lbrace;&lbrace;:bookshelf:$dir:$&lbrace;pdf&rbrace;|Download e-book&rbrace;&rbrace;
----

EOF
done

Sample output.

[...]&lbrace;&lbrace;:bookshelf:pragprog:covers:the-viml-primer_p1_0.png?nolink |&rbrace;&rbrace;
**the viml primer p1 0**\\
//Benjamin Klein//&lbrace;&lbrace;:bookshelf:pragprog:the-viml-primer_p1_0.pdf|Download e-book&rbrace;&rbrace;
----&lbrace;&lbrace;:bookshelf:pragprog:covers:tmux_p3_0.png?nolink |&rbrace;&rbrace;
**tmux p3 0**\\
//Brian P. Hogan//&lbrace;&lbrace;:bookshelf:pragprog:tmux_p3_0.pdf|Download e-book&rbrace;&rbrace;
----
[...]

Notice that DokuWiki does not like mixed case names - see Page Names documentation.

Additional information

The most effective way to get number of pages from PDF e-book is to use pdfinfo utility from mentioned earlier Poppler utilities package.

$ pdfinfo Linux-Voice-Issue-016.pdf | awk '/^Pages:/ { print $2 }'
116

You can use ImageMagick'sidentify command to get the same information, but it is very slow, as it extracts every page as an image.

$ identify -format "%n" Linux-Voice-Issue-016.pdf | head -1
116

You can analyze first ten pages to print the one with most colors using the following command.

$ identify -format "%s %k\n" Linux-Voice-Issue-016.pdf[0-10] | sort -nrk2 | awk 'NR==1 {print $1}'
3

This command can be very useful if you need to search for cover image.

How to extract cover image from an e-book

Requirements

Extract single cover image

Extract multiple cover images

Simple shell script to generate wiki content

Additional information

Trending Articles

मुख मैथुन से उठाएं सेक्स का भरपूर मज़ा, जानें क्या है इसका सही तरीकामुख मैथुन...

NCERT Solutions for Class 9th Sanskrit Chapter 3 पाथेयम्

Who Is Sisanda Jonas? | Biography| Profile| History Of South African Media...

Practice Sheet of Right form of verbs for HSC Students

[MP3] Texzy Ft Dr. Ritzy –“Leg Over” (Prod. @DrRitzy & @KezzyKlef)

The 10 Tennessee Cities With The Largest Black Population For 2021

Breaking Down Bumpy’s Boys: NYC Black Mob Boss Of Old Surrounded Himself With...

Rajasthan Board 10th Result 2016 Roll No wise & Name Wise

Rajasthan Board 12th Science Result 2018 name wise- RBSE 12th commerce result...

Sarangapur Mandal Sarpanch | Upa-Sarpanch | Ward member Mobile Numbers List...

Thomas Grundy – Bradwell

Black Angus Grilled Artichokes

Suspected burglar to know fate in January

18A St. Fintan's Villas, Deansgrange, Co. Dublin - €365,000

God of war 3 PPSSPP Download For Android 1.3 GB

Walkthrough Pokemon Victory Fire Complete | English Language

99 God Status for Whatsapp, Facebook

Attharintiki Daaredhi: Bappu Gari Bommo Lyrics Translation

Not much punishment for substantial benefit fraud

Cattivissimo.Me.3.2017.iTALiAN.MD.WEBDL.XviD-iSTANCE Seed (318)/Leech (148)