Re: HTDIG - A couple of post-implementation questions...

Paul Beaudoin · ‎06-01-2009

Gents (Especially Martin),

I have managed to get HTDIG installed, started, initial indexing and queries working on AXP VMS 8.2 (unpatched) but now need to finish and can't find the needed advice. How to update the indexes regularly? It looks like running HTDIG.EXE with no options but do I then need to run the merge program(s) or other 'post-dig' routines to integrate the new results? Also where can I find (assuming I need) the 'helper' plugins to index PDF, DOC and other common files? The link in the documentation gives a 404 and an error in German (which I can't read).
Any pointers most welcome
Regards

Paul

Willem Grooters · ‎06-01-2009

It's been some time that I installed it and had it running at a customer site. IIRC, you;ll need to run the update in batch on regular intervals, you can use a switch to eiter update the database, or create a completely new one. It depends on the size and number of updates which one is most feasable. I also tried indexing real .PDF and .DOC (Ofice??) files, but IIRC, the tool has been designed to be used with HTML (or: plain text) files only.
I'll check my archives to see if I can find an example.

Willem Grooters
OpenVMS Developer & System Manager

Volker Halle · ‎06-01-2009

Paul,

I regularily run a DCL procedure in batch including the command:

$ @HTDIG_ROOT:[BIN]RUNDIG -vv -s

The .LOG file is huge, but you can probably get away with just -v or even without -vv at all.

Volker.

Martin Vorlaender · ‎06-01-2009

Paul,

you need to schedule a batch job to run HTDIG_ROOT:[BIN]RUNDIG.COM -a , see http://www.htdig.org/running.html and http://www.htdig.org/rundig.html . Be aware that this generates DB files owned by the account running the batch job, which may be unaccessible by HTSEARCH, so a subsequent SET SECURITY must possibly be used.

ht://Dig only indexes text and HTML, so everything else needs to be cnverted to one of these formats. Using utilities from XPDF and antidoc I built an extension package to index DOC and PDF files; it is located at http://vms.pdv-systeme.de/users/martinv/htdig/ . For "other common files" you need to find a converter to text or HTML, and use that in a similar way.

HTH,
Martin

Steven Schweda · ‎06-01-2009

> [...] XPDF and antidoc [...]

XPDF and antiword?

Martin Vorlaender · ‎06-01-2009

> XPDF and antiword?

Of course (also corrected on my ht://Dig webpage).

Antiword for VMS, ported by Sepp Huber, http://wwwvms.mppmu.mpg.de/~huber/pds/

cu,
Martin

Paul Beaudoin · ‎06-02-2009

Thanks one and all - after experimenting last night, I found a few other interesting aspects. The dig follows links such that even docs within the same directory, if not explicitly linked to by some other 'dug' doc won't be indexed. This was probably obvious to everyone but me. I am looking for mechanism by which I can just copy docs into an area (semi-organised by directory) and these become auotmatically availble to web browsers in some sort of presentable way. I've worked this out previously but now need to do it on a much larger scale. (1000's). While htdig will find most and the presentation is excellent, I'll need to work out something to 'leave the trail of breadcrubs' that it can follow.
As it stands, the questions were answered so I'll close the thread. Thanks again for the help

Paul

Paul Beaudoin · ‎06-02-2009

Thanks

Martin Vorlaender · ‎06-02-2009

Paul,

>>>
The dig follows links such that even docs within the same directory, if not explicitly linked to by some other 'dug' doc won't be indexed.
<<<

The is due to the fact that ht://Dig works strictly by HTTP (you can tell it that some web directory corresponds to some device directory, and it will directly access files, but that is purely a performance optimization).

>>>
I am looking for mechanism by which I can just copy docs into an area (semi-organised by directory) and these become automatically available to web browsers in some sort of presentable way.
<<<

If a directory is accessible to the web server, doesn't have an index document, and the web server is allowed to browse it, any web browser will show a directory listing, and any HTTP-based search engine will index all files there.

cu,
Martin

Categories

Company

Local Language

Forums

Discussions

Forums

Discussions

Discussions

Forums

Discussions

Forums

Discussions

Forums

Forums

Discussions

Forums

Discussions

Forums

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Community

Resources

Other HPE Sites

Discussions

Forums

Blogs

Re: HTDIG - A couple of post-implementation questions...

HTDIG - A couple of post-implementation questions...