Operating System - OpenVMS
1839157 Members
3415 Online
110136 Solutions
New Discussion

Re: Looking For PDF Search Program

 
Robert Atkinson
Respected Contributor

Looking For PDF Search Program

Does anyone know of a search engine/program that can find text in PDF files, and will run on VMS?
12 REPLIES 12
Antoniov.
Honored Contributor

Re: Looking For PDF Search Program

Search here
http://decwarch.free.fr/pspdf.html

Antonio Vigliotti
Antonio Maria Vigliotti
Robert Atkinson
Respected Contributor

Re: Looking For PDF Search Program

Antoniov - I hunted all through the Ghostscript sites, but couldn't find anything relating to a search facility.

I also hunted for an email address to pose the question, but still came up blank.

Could you point me in the right direction for either. I don't want to install such a large package if it doesn't suit my needs.

Cheers, Robert.
Martin Vorlaender
Honored Contributor

Re: Looking For PDF Search Program

Sorry for answering so late.

My port of ht://Dig to VMS can search PDFs. It uses pdftotext and pdfinfo from the xpdf package.

See http://www.pdv-systeme.de/users/martinv/htdig/
Robert Atkinson
Respected Contributor

Re: Looking For PDF Search Program

Martin - thanks for the links, all very usefull.

One question though; can HTDig search the PDF files without extracting the data?

If not, then it would be too cumbersome to use in my application.

Any other suggestions you can give to allow me to search a bunch of PDF files quickly would be very much appreciated.

Cheers, Robert.
Martin Vorlaender
Honored Contributor

Re: Looking For PDF Search Program

Robert,

sorry, but no. I know of no utility that searches PDF files "natively" on VMS, i.e. without extracting the text.
labadie_1
Honored Contributor

Re: Looking For PDF Search Program

Swish-E does it.

But as you are a subscriber of the Wasd mailing list, you already know that :-)



Martin Vorlaender
Honored Contributor

Re: Looking For PDF Search Program

No, it doesn't. Quoting from http://swish-e.org/current/docs/searchdoc.html :

"Swish-e can internally only parse HTML, XML and TXT (text) files by default"

Same philosophy as ht://Dig.
Martin Vorlaender
Honored Contributor

Re: Looking For PDF Search Program

labadie_1
Honored Contributor

Re: Looking For PDF Search Program

Sorry Martin, but I would tend to disagree, a post today in the Wasd mailing list explains how to do it.

And when you search in the Vms documentation at
http://pi-net.dyndns.org/cgiplus-bin/search

you find sometimes Pdf documents.

So it works

:-)
Martin Vorlaender
Honored Contributor

Re: Looking For PDF Search Program

I never said it couldn't.

But it does it the same way as ht://Dig does: by extracting the text portions inside the PDFs, and indexing those.

And as Robert indicated, that is not what he is looking for.
Robert Atkinson
Respected Contributor

Re: Looking For PDF Search Program

I'm afraid I'd have to agree with Martin on the one.

I've spent the morning reading the documentation on Swish, and although it can be largely automated, it does rely on first extracting the PDF to a text file (using PDFTOTEXT) and then running the index.

Rob.
Martin P.J. Zinser
Honored Contributor

Re: Looking For PDF Search Program

Hello Robert,

could you please give us some background why extracting the text and then searching that is not an acceptable solution and what exactly
you do expect this program to do?

Greetings, Martin

(You've already got the htdig Martin, now you have the Xpdf Martin too ;-)