ISG Web>ISGScripts>Testing>RST>RSTIncludingPDFs (2018-09-05, YiLee)

RST output --"> Including PDFs in RST output -- Linux edition

Examination will need to be given of the code below to determine what portions and tweaks are appropriate for the Linux systems, and what should be overhauled entirely instead. Metrics for various utilities include readability, accuracy, and consistency across a large number of submissions. So far, this investigation has not been done.

Not that on RT #43799 there's a suggestion that pdftops be used for conversion: pdftops -paper match [file].pdf temp.ps. This will need to be compared against acroread on the Linux systems.

RST output_AN1"> Including PDFs in RST output -- Solaris edition

RST can keep EPS files if you pass the -E flag (or -e) to keepFile / printFile. Likely, you will want a simple full-page rendition of each page of the submitted file, so the simpler -E flag will be appropriate (see the man page for details).

NB: The following discussion is based on experience on Solaris 8. Similar problems likely exist on other platforms, but the magic command incantations to solve the problems may be different. Some care will likely be needed to tailor these scripts to the target platform on which you're running RST.

Conversion to postscript

The first problem is converting arbitrary PDFs to postscript files. There are a number of commands on the Solaris environment that can attempt this. pdf2ps seems like a good candidate, but it has been known to be unable to convert some PDFs that are otherwise convertible. convert has also shown lacklustre results. The only command that appears able to convert the most number of arbitrary submitted PDFs is acroread.

There are two versions of acroread, 5.0 and 7.0. The default version is 7.0 as of Winter 2010:

[tavaskor@cpu18]:Desktop$ ls /software/*/bin/acroreadd                             
/software/acrobat-5.0/bin/acroread*  /software/acrobat/bin/acroread*
/software/acrobat-7.0/bin/acroread*
[tavaskor@cpu18]:Desktop$ diff /software/acrobat-7.0/bin/acroread /software/acrobat/bin/acroread
[tavaskor@cpu18]:Desktop$

Unfortunately, version 7 dies without doing work if there is no X-forwarding present, even though to convert a file on the command line no X-connection should be needed. Because of this, only version 5.0 is usable for this task.

The command to convert a PDF file stored in the variable $file and write it to the temporary directory is:

/software/acrobat-5.0/bin/acroread -toPostScript -size letter "$file" "$tmpdir"

Page selection

RST is not designed to handle images that span multiple pages gracefully. Because of this, any multi-page images should be broken into chunks that fit on single pages nicely.

First, you will need to know how many pages to grab. This can be done with a bit of magic resulting from knowledge of the particulars of the postscript format:

page_total=`grep -c '%%Page:' $base.ps`

Then, this many pages can be looped over and the appropriate page from the postscript file can be extracted.

psselect -p$current_page "$(basename "$file" .pdf).ps" "$(basename "$file" .pdf)_$current_page.ps"

Note that you will likely want to limit page_total to a particular hardcoded maximum (that the students are informed of) to ensure that you do not print a very large number of pages.

Conversion of individual pages to EPS

This is another case where there are several options on Solaris 8, not all of which actually work. A natural choice is ps2eps, which appears to work. However, it is encoded in a format that does not allow LaTeX to resize it properly. As a result, any eps files generated in this fashion will display over top of the header that precedes them in the output file. The command that seems to work reliably instead is imagemagick's convert:

convert "$basename".ps "$basename".eps

Combining into a conversion function

These pieces can be combined in a fashion similar to the following bash function:

readonly inspacer='  '
readonly MAX_PAGES=3

process_pdf () {
   local file="$1"
   echo Keeping $file

    # checking if file exists; don't do anything more if it doesn't. 
    if [ ! -f "$file" ]; then
      echo $(basename "$file") does not exist! | tee --append "$marksheet"
      return 14
    fi  

    local acroreport_file="$tmpdir/$(basename "$file").$$.acro.out"

    /software/acrobat-5.0/bin/acroread -toPostScript -size letter "$file" "$tmpdir" > "$acroreport_file" 2>&1

   local base=$(basename "$file" .pdf)
   if [ ! -f $tmpdir/$base.ps ]; then
      echo $base.pdf could not be converted properly | tee --append "$marksheet"
      perl -ne "print \"$inspacer$inspacer$inspacer\$_\"" < "$acroreport_file"
      keepFile "$acroreport_file" 50 -n -f -h "$base.pdf conversion errors"
      return 15
    fi


   local page_total=`grep -c '%%Page:' $base.ps`
   if [ $page_total -gt "$MAX_PAGES" ]; then
      page_total="$MAX_PAGES"
   fi

   # process first $page_total pages of $base.pdf's PS
   for (( current_page=1; current_page<=$page_total; current_page++ )); do
      # Make sure all output is indented for readabaility.
      # Accomplish this for output from other commands by piping
      # to another process that will add spaces in front of
      # input lines.
      echo "${inspacer}Working on page $current_page..."

      (
      readonly basename="${base}_$current_page"
      psselect -p$current_page $base.ps "$basename".ps
      convert "$basename".ps "$basename".eps

      keepFile "$basename".eps -E -h "$base.pdf (page $current_page)"
      ) 2>&1 | perl -ne "print \"$inspacer$inspacer$inspacer\$_\""

   done
}

Function Use

This function can be called from a loop like the following, which simply loops over every single submitted PDF file:

for file in "$submitdir"/*.pdf; do
   if [ -e "$file" ]; then
      process_pdf "$file"
   fi
done

Topic revision: r4 - 2018-09-05 - YiLee

ISG Web

ISG Web Home
- Changes
- Index
- Search

Webs
- AIMAS
- CERAS
- CF
- CrySP
- External
- Faqtest
- HCI
- Himrod
- ISG
- Main
- Multicore
- Sandbox
- TWiki
- TestNewSandbox
- TestWebS
- UW

My links
- People
- CERAS
- WatForm
- Tetherless lab
- Ubuntu Main.HowTo
- eDocs
- RGG NE notes
- RGG
- CS infrastructure
- Grad images

Edit

Instructional Support Group, David R. Cheriton School of Computer Science, University of Waterloo