Using find and rsync to extract files from a directory and move them

The context

I use for annotating PDFs (and websites). This works best, however, if the PDFs are online somewhere.

I use Zotero and Paperpile for citation management. Zotero in particular, stores all the PDFs that I collect via my bibliography locally in a very fragmented directory structure (each entry in the bibliography manager is its own directory, meaning in my case, the PDFs are spread over 7000 sub-directories.

The problem

So what I want to do is the following:

  1. find and extract all downloaded PDFs in my Zotero folders
  2. upload them to a (private) bibliographic server, where I can use to annotate them

The solution

There are a couple of sites showing you how to use find (to find and extract the files) and rsync (to sync them with the remote directory). E.g. here and here

The trouble is that I kept getting errors from them with files not being found. What I did find, however, was a posting that showed how to integrate the ls (i.e. list directory contents) utility into find to using the -exec option. With some minor modification, it then allowed me to use rsync without any problem.

find /source/directory/ -name "*.pdf" -type f -exec rsync -avvz -e ssh {} ;

This seemed to do the trick.

Further improvements

One problem is that Zotero PDF renaming rules are a little opaque. e.g. the article

Fruin, Christine, and Fred Rascoe. “Funding Open Access Journal Publishing Article Processing Charges.” College & Research Libraries News 75, no. 5 (May 1, 2014): 240–43.

has a PDF named the following by Zotero:

Fruin and Rascoe - 2014 - Funding open access journal publishing Article pro.pdf

This is very difficult to predict: capitalisation varies, there are spaces, and it uses a truncated title.

I have found a potential solution: Zotfile, which is a zotero plugin and contains some renaming utilities. My only concern is that when I did some testing, it looked like it might have trouble with my author-less files. Also, I might have to keep running it to rename the files which would cause additional troubles.

tags: , , , , , ,


Get every new post delivered to your Inbox

Join other followers: