RST output --"> Including PDFs in RST output -- Linux edition
Examination will need to be given of the code below to determine what portions and tweaks are appropriate for the Linux systems, and what should be overhauled entirely instead. Metrics for various utilities include readability, accuracy, and consistency across a large number of submissions. So far, this investigation has not been done.
Not that on
RT #43799 there's a suggestion that pdftops be used for conversion:
pdftops -paper match [file].pdf temp.ps
. This will need to be compared against acroread on the Linux systems.
RST output_AN1"> Including PDFs in RST output -- Solaris edition
RST can keep EPS files if you pass the
-E
flag (or
-e
) to
keepFile
/
printFile
. Likely, you will want a simple full-page rendition of each page of the submitted file, so the simpler
-E
flag will be appropriate (see
the man page for details).
NB: The following discussion is based on experience on Solaris 8. Similar problems likely exist on other platforms, but the magic command incantations to solve the problems may be different. Some care will likely be needed to tailor these scripts to the
target platform on which you're running
RST.
Conversion to postscript
The first problem is converting arbitrary PDFs to postscript files. There are a number of commands on the Solaris environment that can attempt this.
pdf2ps
seems like a good candidate, but it has been known to be unable to convert some PDFs that are otherwise convertible.
convert
has also shown lacklustre results. The only command that appears able to convert the most number of arbitrary submitted PDFs is
acroread
.
There are two versions of
acroread
, 5.0 and 7.0. The default version is 7.0 as of Winter 2010:
[tavaskor@cpu18]:Desktop$ ls /software/*/bin/acroreadd
/software/acrobat-5.0/bin/acroread* /software/acrobat/bin/acroread*
/software/acrobat-7.0/bin/acroread*
[tavaskor@cpu18]:Desktop$ diff /software/acrobat-7.0/bin/acroread /software/acrobat/bin/acroread
[tavaskor@cpu18]:Desktop$
Unfortunately, version 7 dies without doing work if there is no X-forwarding present, even though to convert a file on the command line no X-connection should be needed. Because of this, only version 5.0 is usable for this task.
The command to convert a PDF file stored in the variable
$file
and write it to the temporary directory is:
/software/acrobat-5.0/bin/acroread -toPostScript -size letter "$file" "$tmpdir"
Page selection
RST is not designed to handle images that span multiple pages gracefully. Because of this, any multi-page images should be broken into chunks that fit on single pages nicely.
First, you will need to know how many pages to grab. This can be done with a bit of magic resulting from knowledge of the particulars of the postscript format:
page_total=`grep -c '%%Page:' $base.ps`
Then, this many pages can be looped over and the appropriate page from the postscript file can be extracted.
psselect -p$current_page "$(basename "$file" .pdf).ps" "$(basename "$file" .pdf)_$current_page.ps"
Note that you will likely want to limit
page_total
to a particular hardcoded maximum (that the students are informed of) to ensure that you do not print a very large number of pages.
Conversion of individual pages to EPS
This is another case where there are several options on Solaris 8, not all of which actually work. A natural choice is
ps2eps
, which appears to work. However, it is encoded in a format that does not allow LaTeX to resize it properly. As a result, any eps files generated in this fashion will display over top of the header that precedes them in the output file. The command that seems to work reliably instead is imagemagick's
convert
:
convert "$basename".ps "$basename".eps
Combining into a conversion function
These pieces can be combined in a fashion similar to the following bash function:
readonly inspacer=' '
readonly MAX_PAGES=3
process_pdf () {
local file="$1"
echo Keeping $file
# checking if file exists; don't do anything more if it doesn't.
if [ ! -f "$file" ]; then
echo $(basename "$file") does not exist! | tee --append "$marksheet"
return 14
fi
local acroreport_file="$tmpdir/$(basename "$file").$$.acro.out"
/software/acrobat-5.0/bin/acroread -toPostScript -size letter "$file" "$tmpdir" > "$acroreport_file" 2>&1
local base=$(basename "$file" .pdf)
if [ ! -f $tmpdir/$base.ps ]; then
echo $base.pdf could not be converted properly | tee --append "$marksheet"
perl -ne "print \"$inspacer$inspacer$inspacer\$_\"" < "$acroreport_file"
keepFile "$acroreport_file" 50 -n -f -h "$base.pdf conversion errors"
return 15
fi
local page_total=`grep -c '%%Page:' $base.ps`
if [ $page_total -gt "$MAX_PAGES" ]; then
page_total="$MAX_PAGES"
fi
# process first $page_total pages of $base.pdf's PS
for (( current_page=1; current_page<=$page_total; current_page++ )); do
# Make sure all output is indented for readabaility.
# Accomplish this for output from other commands by piping
# to another process that will add spaces in front of
# input lines.
echo "${inspacer}Working on page $current_page..."
(
readonly basename="${base}_$current_page"
psselect -p$current_page $base.ps "$basename".ps
convert "$basename".ps "$basename".eps
keepFile "$basename".eps -E -h "$base.pdf (page $current_page)"
) 2>&1 | perl -ne "print \"$inspacer$inspacer$inspacer\$_\""
done
}
Function Use
This function can be called from a loop like the following, which simply loops over every single submitted PDF file:
for file in "$submitdir"/*.pdf; do
if [ -e "$file" ]; then
process_pdf "$file"
fi
done