Operating System - OpenVMS
1751765 Members
5036 Online
108781 Solutions
New Discussion юеВ

Re: HTDIG - A couple of post-implementation questions...

 
SOLVED
Go to solution
Paul Beaudoin
Regular Advisor

HTDIG - A couple of post-implementation questions...

Gents (Especially Martin),

I have managed to get HTDIG installed, started, initial indexing and queries working on AXP VMS 8.2 (unpatched) but now need to finish and can't find the needed advice. How to update the indexes regularly? It looks like running HTDIG.EXE with no options but do I then need to run the merge program(s) or other 'post-dig' routines to integrate the new results? Also where can I find (assuming I need) the 'helper' plugins to index PDF, DOC and other common files? The link in the documentation gives a 404 and an error in German (which I can't read).
Any pointers most welcome
Regards

Paul
8 REPLIES 8
Willem Grooters
Honored Contributor

Re: HTDIG - A couple of post-implementation questions...

It's been some time that I installed it and had it running at a customer site. IIRC, you;ll need to run the update in batch on regular intervals, you can use a switch to eiter update the database, or create a completely new one. It depends on the size and number of updates which one is most feasable. I also tried indexing real .PDF and .DOC (Ofice??) files, but IIRC, the tool has been designed to be used with HTML (or: plain text) files only.
I'll check my archives to see if I can find an example.
Willem Grooters
OpenVMS Developer & System Manager
Volker Halle
Honored Contributor

Re: HTDIG - A couple of post-implementation questions...

Paul,

I regularily run a DCL procedure in batch including the command:

$ @HTDIG_ROOT:[BIN]RUNDIG -vv -s

The .LOG file is huge, but you can probably get away with just -v or even without -vv at all.

Volker.
Martin Vorlaender
Honored Contributor
Solution

Re: HTDIG - A couple of post-implementation questions...

Paul,

you need to schedule a batch job to run HTDIG_ROOT:[BIN]RUNDIG.COM -a , see http://www.htdig.org/running.html and http://www.htdig.org/rundig.html . Be aware that this generates DB files owned by the account running the batch job, which may be unaccessible by HTSEARCH, so a subsequent SET SECURITY must possibly be used.

ht://Dig only indexes text and HTML, so everything else needs to be cnverted to one of these formats. Using utilities from XPDF and antidoc I built an extension package to index DOC and PDF files; it is located at http://vms.pdv-systeme.de/users/martinv/htdig/ . For "other common files" you need to find a converter to text or HTML, and use that in a similar way.

HTH,
Martin
Steven Schweda
Honored Contributor

Re: HTDIG - A couple of post-implementation questions...

> [...] XPDF and antidoc [...]

XPDF and antiword?
Martin Vorlaender
Honored Contributor

Re: HTDIG - A couple of post-implementation questions...

> XPDF and antiword?

Of course (also corrected on my ht://Dig webpage).

Antiword for VMS, ported by Sepp Huber, http://wwwvms.mppmu.mpg.de/~huber/pds/

cu,
Martin
Paul Beaudoin
Regular Advisor

Re: HTDIG - A couple of post-implementation questions...

Thanks one and all - after experimenting last night, I found a few other interesting aspects. The dig follows links such that even docs within the same directory, if not explicitly linked to by some other 'dug' doc won't be indexed. This was probably obvious to everyone but me. I am looking for mechanism by which I can just copy docs into an area (semi-organised by directory) and these become auotmatically availble to web browsers in some sort of presentable way. I've worked this out previously but now need to do it on a much larger scale. (1000's). While htdig will find most and the presentation is excellent, I'll need to work out something to 'leave the trail of breadcrubs' that it can follow.
As it stands, the questions were answered so I'll close the thread. Thanks again for the help

Paul
Paul Beaudoin
Regular Advisor

Re: HTDIG - A couple of post-implementation questions...

Thanks
Martin Vorlaender
Honored Contributor

Re: HTDIG - A couple of post-implementation questions...

Paul,

>>>
The dig follows links such that even docs within the same directory, if not explicitly linked to by some other 'dug' doc won't be indexed.
<<<

The is due to the fact that ht://Dig works strictly by HTTP (you can tell it that some web directory corresponds to some device directory, and it will directly access files, but that is purely a performance optimization).

>>>
I am looking for mechanism by which I can just copy docs into an area (semi-organised by directory) and these become automatically available to web browsers in some sort of presentable way.
<<<

If a directory is accessible to the web server, doesn't have an index document, and the web server is allowed to browse it, any web browser will show a directory listing, and any HTTP-based search engine will index all files there.

cu,
Martin