Date: Mon, 11 Nov 2002 21:07:50 -0600 (CST)
From: Gilles Detillieux <grdetil@scrc.umanitoba.ca>
To: downeys@internet-ireland.ie
Cc: "ht://Dig mailing list" <htdig-general@lists.sourceforge.net>
Subject: Re: [htdig] PATCH - collections for 3.1.6 (was: HTMerge /mifluz)

According to Sean Downey:
> Thanks a million Gilles
> 
> if there was a patch for this - I think & hope - all the problems I've had
> getting it going would be solved
...
> -----Original Message-----
> From: Gilles Detillieux [mailto:grdetil@scrc.umanitoba.ca]
...
> comes close to what Sean is looking for.  Back in 1999, when Rajendra
> Inamdar first offered us a patch for this feature, it was for 3.1.3.
> See http://www.htdig.org/htdig-dev/1999/11/0181.html
> 
> However, at the time this was too radical a change to incorporate into
> the 3.1.x tree, so we suggested putting it into 3.2 instead.  In the end,
> Rajendra ended up porting the patch to 3.2, and that was the one that
> made it into the CVS tree (for 3.2) and into the mail archives.
> 
> However, 3.1.x users continued to request this feature, so at some point
> it made it into the patch archive...
> 
>   ftp://ftp.ccsf.org/htdig-patches/3.1.3/htdig-3.1.3-nntp-mdb.tar.gz
> 
> The patch also adds a couple other features, and it may take some work to
> adapt it to 3.1.6, but it's a start.  If you don't want the nntp support
> in htdig, I think you can remove everything from the patch other than
> the changes to htsearch/* source files.  There may be some complications
> due to interactions with the addition of max_excerpts handling in 3.1.6,
> as well as the changes involving coded vs unencoded URLs.  There may also
> be a few other tricky bits in main() because of recent changes there.
> I'll see if I can clean up this patch a little.

OK, I cleaned it up a lot!  There were lots of changes between 3.1.3
and 3.1.6, so it took quite a bit longer than I thought it would to
get this adapted and working on 3.1.6.  But, I did it.  I also removed
some of the superfluous stuff like the "search policy" additions and the
whole collection_names attribute which seemed to cause too much confusion
(build_select_lists does a better job of generating the followup config
file list than collection_names did).  I also put it a hook to get rid
of duplicate URLs, which happens if the same URL winds up in two or
more collections.  (I'll have to port this to 3.2 as well!)

So, here is the patch for 3.1.6 below.  Apply it in your main htdig-3.1.6
source directory using "patch -p0 < this-message".  The patch is designed
to be applied after you've already run "./configure" in the source.

To enable it, #define COLLECTIONS in include/htconfig.h (this patch does
that by default, but it gets undone after another run of ./configure).
To use it, set up a number of config files in your CONFIG_DIR, and have
them all begin with "include: htdig.conf" (or some other common config
file, in which you'll have all your attribute settings).  In each of
these config files, after the include above, override the settings of
database_dir (or database_base) and start_url, so that each config file
defines a separate database for a different collection of indexed URLs.
htsearch ends up using all of the display customization attributes from
the last selected config file, so for the sake of consistency, the config
files should share a common set of htsearch attributes.

To set up your initial search form, replace the "hidden" input parameter
definition for "config" with something like:

  Collections:
  <input type="checkbox" name="config" value="scrc" checked>SCRC Pages
  <input type="checkbox" name="config" value="physiology">Physiology Pages
  <input type="checkbox" name="config" value="wcsn">Winnipeg Chapter SFN Pages
  <br>

and in the followup search forms (in header.html, wrapper.html, nomatch.html
and syntax.html), replace it with...

  Collections:
  $(CONFIG_LIST)<br>

which would be generated by these attribute settings in htdig.conf:

  build_select_lists:	CONFIG_LIST,checkbox config config_names 2 1 2 "" ""
  config_names:		scrc "SCRC Pages" physiology "Physiology Pages" \
			wcsn "Winnipeg Chapter SFN Pages"

Here's the patch...

--- acconfig.h.nocoll	Thu Jan 31 17:47:18 2002
+++ acconfig.h	Mon Nov 11 16:44:14 2002
@@ -40,6 +40,10 @@
 /*  regardless of the security problems with this. */
 #undef ALLOW_INSECURE_CGI_CONFIG
 
+/* Define this if you want to allow htsearch to use collections by taking */
+/*  multiple "config" CGI input parameters. */
+#undef COLLECTIONS
+
 /* Define to remove the word count in db and WordRef struct. */
 #undef NO_WORD_COUNT
 
--- htsearch/Collection.cc.nocoll	Mon Nov 11 14:07:43 2002
+++ htsearch/Collection.cc	Mon Nov 11 16:37:26 2002
@@ -0,0 +1,88 @@
+//
+// Collection.cc
+//
+//
+#if RELEASE
+static char RCSid[] = "$Id: Collection.cc,v 1.0 2000/03/17 18:34:23 inamdar Exp $";
+#endif
+
+#include "htsearch.h"
+#include "Collection.h"
+#include "ResultMatch.h"
+#include "WeightWord.h"
+#include "StringMatch.h"
+#include "QuotedStringList.h"
+#include "URL.h"
+#include <fstream.h>
+#include <stdio.h>
+#include <ctype.h>
+#include <syslog.h>
+#include <locale.h>
+#include "HtURLCodec.h"
+#include "HtWordType.h"
+
+#ifdef COLLECTIONS
+//*****************************************************************************
+//
+Collection::Collection(char *name, char *word_file, char *index_file, 
+    char *doc_file)
+{
+    isopen = 0;
+    collectionName = name;
+    wordFile = word_file;
+    indexFile = index_file;
+    docFile = doc_file;
+    docIndex = NULL;
+    matches = NULL;
+    searchWords = NULL;
+    searchWordsPattern = NULL;
+}
+
+Collection::~Collection()
+{
+    Close();
+}
+
+void
+Collection::Open()
+{
+    if (!isopen)
+    {
+        docIndex = Database::getDatabaseInstance();
+        docIndex->OpenRead(indexFile);
+        docDB.SetCompatibility(config.Boolean("uncoded_db_compatible", 1));
+        docDB.Read(docFile);
+    }
+    isopen = 1;
+}
+
+void
+Collection::Close()
+{
+    if (isopen)
+    {
+        docDB.Close();
+        docIndex->Close();
+        docIndex = NULL;
+    }
+    isopen = 0;
+}
+
+// Collection::operator [] (char *u) 
+
+DocumentRef *
+Collection::getDocumentRef(char *u)
+{
+    Open();
+    return docDB.FindCoded(u);
+    // return docDB[u];
+}
+
+int
+Collection::Get(char *key, String &data)
+{
+    Open();
+    return docIndex->Get(key, data);
+}
+
+#endif
--- htsearch/Collection.h.nocoll	Mon Nov 11 14:07:43 2002
+++ htsearch/Collection.h	Mon Nov 11 16:36:26 2002
@@ -0,0 +1,68 @@
+//
+// Collection.h
+//
+// $Id: Collection.h,v 1.0 2000/03/17 18:34:23 inamdar Exp $
+//
+#ifndef _Collection_h_
+#define _Collection_h_
+
+#include "Object.h"
+#include "ResultList.h"
+#include "ResultMatch.h"
+#include "TemplateList.h"
+#include "cgi.h"
+#include "StringMatch.h"
+#include "List.h"
+#include "DocumentDB.h"
+#include "Database.h"
+#include "Dictionary.h"
+
+#ifdef COLLECTIONS
+class Collection : public Object
+{
+public:
+    //
+    // Construction/Destruction
+    //
+    Collection(char *name, char *wordFile, char *indexFile, char *docFile);
+    ~Collection();
+
+    void Collection::Open();
+
+    void Collection::Close(); 
+
+    char *getWordFile() { return wordFile.get(); }
+
+    // DocumentRef         *operator [] (char *url);
+    DocumentRef         *getDocumentRef(char *url);
+    int 		Get(char *key, String &data);
+
+    ResultList		*getResultList() { return matches; }
+    void		setResultList(ResultList *list) { matches = list; }
+
+    List                *getSearchWords() { return searchWords; }
+    void                setSearchWords(List *list) { searchWords = list; }
+
+    StringMatch         *getSearchWordsPattern() { return searchWordsPattern;}
+    void                setSearchWordsPattern(StringMatch *smatch)
+                            { searchWordsPattern = smatch; }
+
+protected:
+    String              collectionName;
+    String              wordFile;
+    String              indexFile;
+    String              docFile;
+    ResultList		*matches;
+    List                *searchWords;
+    StringMatch         *searchWordsPattern;
+    
+
+    DocumentDB          docDB;
+    Database            *docIndex;     
+
+    int                 isopen;
+};
+
+#endif // COLLECTIONS
+#endif // _Collection_h_
+
--- htsearch/Display.cc.nocoll	Thu Jan 31 17:47:18 2002
+++ htsearch/Display.cc	Mon Nov 11 20:01:28 2002
@@ -10,6 +10,9 @@ static char RCSid[] = "$Id: Display.cc,v
 #endif
 
 #include "htsearch.h"
+#ifdef COLLECTIONS
+#include "Collection.h"
+#endif
 #include "Display.h"
 #include "ResultMatch.h"
 #include "WeightWord.h"
@@ -29,6 +32,11 @@ extern int		debug;
 
 //*****************************************************************************
 //
+#ifdef COLLECTIONS
+Display::Display(Dictionary *collections)
+{
+    active_collections = collections;
+#else
 Display::Display(char *indexFile, char *docFile)
 {
     docIndex = Database::getDatabaseInstance();
@@ -39,6 +47,7 @@ Display::Display(char *indexFile, char *
     docDB.SetCompatibility(config.Boolean("uncoded_db_compatible", 1));
 
     docDB.Read(docFile);
+#endif
 
     limitTo = 0;
     excludeFrom = 0;
@@ -87,7 +96,9 @@ Display::Display(char *indexFile, char *
 //*****************************************************************************
 Display::~Display()
 {
+#ifndef COLLECTIONS
     delete docIndex;
+#endif
 }
 
 //*****************************************************************************
@@ -179,7 +190,12 @@ Display::display(int pageNumber)
     {
 	if (currentMatch >= startAt)
 	{
+#ifdef COLLECTIONS
+	    Collection *collection = match->getCollection();
+	    match->setRef(collection->getDocumentRef(match->getURL()));
+#else
 	    match->setRef(docDB.FindCoded(match->getURL()));
+#endif
 	    DocumentRef	*ref = match->getRef();
 	    if (!ref)
 		continue;	// The document isn't present for some reason
@@ -266,7 +282,11 @@ Display::displayMatch(ResultMatch *match
     String urlanchor(url);
     if (anchor)
       urlanchor << anchor;
+#ifdef COLLECTIONS
+    vars.Add("EXCERPT", excerpt(match, urlanchor, fanchor, first));
+#else
     vars.Add("EXCERPT", excerpt(ref, urlanchor, fanchor, first));
+#endif
     //
     // anchor only relevant if an excerpt was found, i.e.,
     // the search expression matches the body of the document
@@ -699,8 +719,35 @@ Display::createURL(String &url, int page
 	url << "restrict=" << encodeInput("restrict") << ';';
     if (input->exists("exclude"))
 	url << "exclude=" << encodeInput("exclude") << ';';
+
+#ifdef COLLECTIONS
+    // RMI
+    // Put out all specified collections. If none selected, resort to
+    // default behaviour
+    char *config_name = collectionList[0];
+    if (config_name && config_name[0] == '\0')
+	config_name = NULL;
+
+    if (config_name)
+    {
+	for (int i=0; i<collectionList.Count(); i++)
+	{
+	    config_name = collectionList[i];
+	    s = config_name;
+	    encodeURL(s);
+	    url << "config=" << s.get() << ';';
+	}
+    }
+    else
+    {
+	if (input->exists("config"))
+	    url << "config=" << encodeInput("config") << ';';
+    }
+#else
     if (input->exists("config"))
 	url << "config=" << encodeInput("config") << ';';
+#endif
+
     if (input->exists("method"))
 	url << "method=" << encodeInput("method") << ';';
     if (input->exists("format"))
@@ -1358,13 +1405,32 @@ Display::buildMatchList()
 
     // ... MG
 
+#ifdef COLLECTIONS
+  // RMI: deal with all collections
+  active_collections->Start_Get();
+  Collection *collection;
+  while ((collection=(Collection *)active_collections->Get_NextElement()) != 0)
+  {
+    ResultList *results = collection->getResultList();
+    if (results == NULL)
+	continue;
+#endif
+
     results->Start_Get();
     while ((id = results->Get_Next()))
     {
 	//
 	// Convert the ID to a URL
 	//
+#ifdef COLLECTIONS
+	DocMatch *dm = results->find(id);
+	Collection *collection = NULL;
+	if (dm)
+	    collection = dm->collection;
+	if (collection == NULL || collection->Get(id, coded_url) == NOTOK)
+#else
 	if (docIndex->Get(id, coded_url) == NOTOK)
+#endif
 	{
 	    continue;
 	}
@@ -1382,6 +1448,9 @@ Display::buildMatchList()
 	thisMatch = new ResultMatch();
 	thisMatch->setURL(coded_url);
 	thisMatch->setRef(NULL);
+#ifdef COLLECTIONS
+	thisMatch->setCollection(collection);
+#endif
 
 	//
 	// Get the actual document record into the current ResultMatch
@@ -1394,7 +1463,9 @@ Display::buildMatchList()
 	// known at that time, or info about the document itself, 
 	// so this still needs to be done.
 	//
+#ifndef COLLECTIONS
 	DocMatch	*dm = results->find(id);
+#endif
 	double           score = dm->score;
 
 	// We need to scale based on date relevance and backlinks
@@ -1409,7 +1480,13 @@ Display::buildMatchList()
 	if (date_factor != 0.0 || backlink_factor != 0.0 || typ != SortByScore
 	    || timet_startdate > 0 || enddate.tm_year < endoftime->tm_year)
 	  {
+#ifdef COLLECTIONS
+	    Collection *collection = thisMatch->getCollection();
+	    DocumentRef *thisRef = collection->getDocumentRef(
+					thisMatch->getURL());
+#else
 	    DocumentRef *thisRef = docDB.FindCoded(thisMatch->getURL());
+#endif
 	    if (thisRef)   // We better hope it's not null!
 	      {
 		// code added by Mike Grommet for date search ranges
@@ -1452,6 +1529,9 @@ Display::buildMatchList()
 	//
 	matches->Add(thisMatch);
     }
+#ifdef COLLECTIONS
+  }
+#endif
 
     //
     // The matches need to be ordered by relevance level.
@@ -1464,8 +1544,14 @@ Display::buildMatchList()
 
 //*****************************************************************************
 String *
+#ifdef COLLECTIONS
+Display::excerpt(ResultMatch *match, String urlanchor, int fanchor, int &first)
+{
+    DocumentRef	*ref = match->getRef();
+#else
 Display::excerpt(DocumentRef *ref, String urlanchor, int fanchor, int &first)
 {
+#endif
     char	*head;
     int		use_meta_description = 0;
 
@@ -1486,6 +1572,15 @@ Display::excerpt(DocumentRef *ref, Strin
     String	part;
     String	*text = new String();
 
+#ifdef COLLECTIONS
+    Collection *collection = match->getCollection();
+    StringMatch *allWordsPattern = NULL;
+    if (collection)
+	allWordsPattern = collection->getSearchWordsPattern();
+    if (!allWordsPattern)
+	return text;
+#endif
+
     // htsearch displays the description when:
     // 1) a description has been found
     // 2) the option "use_meta_description" is set to true
@@ -1544,20 +1639,32 @@ Display::excerpt(DocumentRef *ref, Strin
 	if (end > temp + headLength)
 	{
 	    end = temp + headLength;
+#ifdef COLLECTIONS
+	    *text << hilight(match, start, urlanchor, fanchor);
+#else
 	    *text << hilight(start, urlanchor, fanchor);
+#endif
 	}
 	else
 	{
 	    while (*end && HtIsStrictWordChar(*end))
 		end++;
 	    *end = '\0';
+#ifdef COLLECTIONS
+	    *text << hilight(match, start, urlanchor, fanchor);
+#else
 	    *text << hilight(start, urlanchor, fanchor);
+#endif
 	    *text << config["end_ellipses"];
 	}
     }
     else
     {
+#ifdef COLLECTIONS
+      *text = buildExcerpts( match, allWordsPattern, head, urlanchor, fanchor );
+#else
       *text = buildExcerpts( head, urlanchor, fanchor );
+#endif
     }
 
     return text;
@@ -1567,7 +1674,11 @@ Display::excerpt(DocumentRef *ref, Strin
 // Handle cases where multiple document excerpts are requested.
 //
 const String
+#ifdef COLLECTIONS
+Display::buildExcerpts( ResultMatch *match, StringMatch *allWordsPattern, char *head, String urlanchor, int fanchor )
+#else
 Display::buildExcerpts( char *head, String urlanchor, int fanchor )
+#endif
 {
   if ( !config.Boolean( "add_anchors_to_excerpt" ) )
   {
@@ -1630,7 +1741,11 @@ Display::buildExcerpts( char *head, Stri
     {
       end = head + headLength;
 
+#ifdef COLLECTIONS
+      text << hilight(match, start, urlanchor, fanchor);
+#else
       text << hilight( start, urlanchor, fanchor );
+#endif
     }
     else
     {
@@ -1644,7 +1759,11 @@ Display::buildExcerpts( char *head, Stri
 
       *end = '\0';
 
+#ifdef COLLECTIONS
+      text << hilight(match, start, urlanchor, fanchor);
+#else
       text << hilight(start, urlanchor, fanchor);
+#endif
       text << config["end_ellipses"];
 
       *end = endChar;
@@ -1660,7 +1779,11 @@ Display::buildExcerpts( char *head, Stri
 
 //*****************************************************************************
 char *
+#ifdef COLLECTIONS
+Display::hilight(ResultMatch *match, char *str, String urlanchor, int fanchor)
+#else
 Display::hilight(char *str, String urlanchor, int fanchor)
+#endif
 {
     static char		*start_highlight = config["start_highlight"];
     static char		*end_highlight = config["end_highlight"];
@@ -1672,6 +1795,19 @@ Display::hilight(char *str, String urlan
     int			first = 1;
 
     result = 0;
+#ifdef COLLECTIONS
+    Collection *collection = match->getCollection();
+    StringMatch *allWordsPattern = NULL;
+    List *searchWords = NULL;
+    if (collection)
+    {
+	allWordsPattern = collection->getSearchWordsPattern();
+	searchWords = collection->getSearchWords();
+    }
+    if (!allWordsPattern || !searchWords)
+	return result;
+#endif
+
     while (allWordsPattern->hasPattern() &&
 	   (pos = allWordsPattern->FindFirstWord(str, which, length)) >= 0)
     {
@@ -1718,6 +1854,28 @@ Display::sort(List *matches)
 	  (typ == SortByTime) ? Display::compareTime :
 	  Display::compare);
 
+#ifdef COLLECTIONS
+    // In case there are duplicate URLs across collections, keep "best" ones
+    // after sorting them.
+    Dictionary	goturl;
+    String	url;
+    char	*coded_url;
+    int		j = 0;
+    for (i = 0; i < numberOfMatches; i++)
+    {
+	coded_url = array[i]->getURL();
+	String url = HtURLCodec::instance()->decode(coded_url);
+	HtURLRewriter::instance()->Replace(url);
+	if (goturl.Exists(url))
+	    delete array[i];
+	else
+	{
+	    array[j++] = array[i];
+	    goturl.Add(url, 0);
+	}
+    }
+    numberOfMatches = j;
+#endif
     char	*st = config["sort"];
     if (st && *st && mystrncasecmp("rev", st, 3) == 0)
     {
--- htsearch/Display.h.nocoll	Thu Jan 31 17:47:18 2002
+++ htsearch/Display.h	Mon Nov 11 17:24:46 2002
@@ -25,18 +25,26 @@ public:
     //
     // Construction/Destruction
     //
+#ifdef COLLECTIONS
+    Display(Dictionary *active_collections);
+#else
     Display(char *indexFile, char *docFile);
+#endif
     ~Display();
 
     void		setStartTemplate(char *templateName);
     void		setMatchTemplate(char *templateName);
     void		setEndTemplate(char *templateName);
 	
+#ifndef COLLECTIONS
     void		setResults(ResultList *results);
     void		setSearchWords(List *searchWords);
+#endif
     void		setLimit(StringMatch *);
     void		setExclude(StringMatch *);
+#ifndef COLLECTIONS
     void		setAllWordsPattern(StringMatch *);
+#endif
     void		setLogicalWords(char *);
     void		setOriginalWords(char *);
     void		setCGI(cgi *);
@@ -59,6 +67,12 @@ public:
     SortType		sortType();
 
 protected:
+#ifdef COLLECTIONS
+    //
+    // The list of search result collections.
+    //
+    Dictionary		*active_collections;
+#else
     //
     // The list of search results.
     //
@@ -78,6 +92,7 @@ protected:
     // A list of words that we are searching for
     //
     List		*searchWords;
+#endif
 
     //
     // Pattern that all result URLs must match or exclude
@@ -88,7 +103,9 @@ protected:
     //
     // Pattern of all the words
     //
+#ifndef COLLECTIONS
     StringMatch		*allWordsPattern;
+#endif
 	
     //
     // Variables for substitution into text are stored in a dictionary
@@ -159,9 +176,15 @@ protected:
     String		*readFile(char *);
     void		expandVariables(char *);
     void		outputVariable(char *);
+#ifdef COLLECTIONS
+    String		*excerpt(ResultMatch *match, String urlanchor, int fanchor, int &first);
+    const String        buildExcerpts( ResultMatch *match, StringMatch *allWordsPattern, char *head, String urlanchor, int fanchor );
+    char		*hilight(ResultMatch *match, char *str, String urlanchor, int fanchor);
+#else
     String		*excerpt(DocumentRef *ref, String urlanchor, int fanchor, int &first);
     const String        buildExcerpts( char *head, String urlanchor, int fanchor );
     char		*hilight(char *str, String urlanchor, int fanchor);
+#endif
     void		setupTemplates();
     void		setupImages();
     String		*generateStars(DocumentRef *, int);
@@ -184,23 +207,29 @@ Display::setExclude(StringMatch *exclude
     excludeFrom = exclude;
 }
 
+#ifndef COLLECTIONS
 inline void
 Display::setAllWordsPattern(StringMatch *pattern)
 {
     allWordsPattern = pattern;
 }
+#endif
 
+#ifndef COLLECTIONS
 inline void
 Display::setResults(ResultList *results)
 {
     this->results = results;
 }
+#endif
 
+#ifndef COLLECTIONS
 inline void
 Display::setSearchWords(List *searchWords)
 {
     this->searchWords = searchWords;
 }
+#endif
 
 inline void
 Display::setLogicalWords(char *s)
--- htsearch/DocMatch.cc.nocoll	Thu Jan 31 17:47:18 2002
+++ htsearch/DocMatch.cc	Mon Nov 11 15:44:41 2002
@@ -20,6 +20,9 @@ static char RCSid[] = "$Id: DocMatch.cc,
 //
 DocMatch::DocMatch()
 {
+#ifdef COLLECTIONS
+    collection = NULL;
+#endif
 }
 
 
--- htsearch/DocMatch.h.nocoll	Thu Jan 31 17:47:18 2002
+++ htsearch/DocMatch.h	Mon Nov 11 15:46:10 2002
@@ -13,6 +13,10 @@
 
 #include <Object.h>
 
+#ifdef COLLECTIONS
+class Collection;
+#endif
+
 class DocMatch : public Object
 {
 public:
@@ -22,6 +26,9 @@ public:
 	float			score;
 	int				id;
 	int				anchor;
+#ifdef COLLECTIONS
+	Collection		*collection;
+#endif
 };
 
 #endif
--- htsearch/htsearch.cc.nocoll	Thu Jan 31 17:47:18 2002
+++ htsearch/htsearch.cc	Mon Nov 11 19:18:31 2002
@@ -15,6 +15,9 @@ static char RCSid[] = "$Id: htsearch.cc,
 #include "WeightWord.h"
 #include "parser.h"
 #include "Display.h"
+#ifdef COLLECTIONS
+#include "Collection.h"
+#endif
 #include "../htfuzzy/Fuzzy.h"
 #include "cgi.h"
 #include "WordRecord.h"
@@ -35,7 +38,11 @@ static char RCSid[] = "$Id: htsearch.cc,
 
 typedef void (*SIGNAL_HANDLER) (...);
 
+#ifdef COLLECTIONS
+void htsearch(Collection *, List &, Parser *);
+#else
 ResultList *htsearch(char *, List &, Parser *);
+#endif
 
 void setupWords(char *, List &, int, Parser *, String &);
 void createLogicalWords(List &, String &, String &);
@@ -49,6 +56,9 @@ int			debug = 0;
 int			minimum_word_length = 3;
 StringList		boolean_keywords;
 
+#ifdef COLLECTIONS
+StringList collectionList;
+#endif
 
 //*****************************************************************************
 // int main()
@@ -59,7 +69,11 @@ main(int ac, char **av)
     int			c;
     extern char		*optarg;
     int		        override_config=0;
+#ifdef COLLECTIONS
+    List		*searchWords = NULL;
+#else
     List		searchWords;
+#endif
     String		configFile = DEFAULT_CONFIG_FILE;
     int			pageNumber = 1;
     StringMatch		limit_to;
@@ -67,7 +81,12 @@ main(int ac, char **av)
     String		logicalWords;
     String              origPattern;
     String              logicalPattern;
+#ifdef COLLECTIONS
+    Dictionary		active_collections;
+    StringMatch		*searchWordsPattern = NULL;
+#else
     StringMatch		searchWordsPattern;
+#endif
     StringList		requiredWords;
     int                 i;
 
@@ -116,6 +135,36 @@ main(int ac, char **av)
     int		filenameok = (debug && getenv("REQUEST_METHOD") == 0);
     String	filenamemsg;
 
+#ifdef COLLECTIONS
+    if (input.exists("config"))
+	collectionList.Create(input["config"], "\001");
+
+    if (collectionList.Count() == 0)
+	collectionList.Add("");
+
+    char *errorMessage = NULL;
+    String       originalWords;
+
+  for (int cInd=0; errorMessage == NULL && cInd < collectionList.Count(); cInd++) 
+  { // RMI
+    // Each collection is handled in an iteration of this loop.
+    // Reset the following, so that each iteration starts with a
+    // clean state.
+    logicalWords = 0;
+    origPattern = 0;
+    logicalPattern = 0;
+    requiredWords.Release();
+    // searchWords.Release();
+    searchWords = new List;
+    // if (searchWordsPattern)
+    //     delete searchWordsPattern;
+    searchWordsPattern = new StringMatch;
+
+    char *config_name = collectionList[cInd];
+    if (config_name && config_name[0] == '\0')
+	config_name = NULL;
+#endif
+
     //
     // Setup the configuration database.  First we read the compiled defaults.
     // Then we override those with defaults read in from the configuration
@@ -125,8 +174,13 @@ main(int ac, char **av)
     config.Defaults(&defaults[0]);
     // To allow . in filename while still being 'secure',
     // e.g. htdig-f.q.d.n.conf
+#ifdef COLLECTIONS
+    if (!override_config && config_name 
+	&& (strstr(config_name, "./") == NULL))
+#else
     if (!override_config && input.exists("config") 
 	&& (strstr(input["config"], "./") == NULL))
+#endif
     {
 	char	*configDir = getenv("CONFIG_DIR");
 	if (configDir)
@@ -137,10 +191,17 @@ main(int ac, char **av)
 	{
 	    configFile = CONFIG_DIR;
 	}
+#ifdef COLLECTIONS
+	if (config_name == NULL || strlen(config_name) == 0)
+	  configFile = DEFAULT_CONFIG_FILE;
+	else
+	  configFile << '/' << config_name << ".conf";
+#else
 	if (strlen(input["config"]) == 0)
 	  configFile = DEFAULT_CONFIG_FILE;
 	else
 	  configFile << '/' << input["config"] << ".conf";
+#endif
     }
     if (access(configFile, R_OK) < 0)
     {
@@ -268,6 +329,28 @@ main(int ac, char **av)
     // Parse the words to search for from the argument list.
     // This will produce a list of WeightWord objects.
     //
+#ifdef COLLECTIONS
+    originalWords = input["words"];
+    originalWords.chop(" \t\r\n");
+    setupWords(originalWords, *searchWords,
+	       strcmp(config["match_method"], "boolean") == 0,
+	       parser, origPattern);
+
+    //
+    // Convert the list of WeightWord objects to a pattern string
+    // that we can compile.
+    //
+    createLogicalWords(*searchWords, logicalWords, logicalPattern);
+
+    // 
+    // Assemble the full pattern for excerpt matching and highlighting
+    //
+    origPattern += logicalPattern;
+    searchWordsPattern->IgnoreCase();
+    searchWordsPattern->IgnorePunct();
+    searchWordsPattern->Pattern(logicalPattern);	// this should now be enough
+    //searchWordsPattern->Pattern(origPattern);
+#else
     String	 originalWords = input["words"];
     originalWords.chop(" \t\r\n");
     setupWords(originalWords, searchWords,
@@ -288,6 +371,7 @@ main(int ac, char **av)
     searchWordsPattern.IgnorePunct();
     searchWordsPattern.Pattern(logicalPattern);	// this should now be enough
     //searchWordsPattern.Pattern(origPattern);
+#endif
     //if (debug > 2)
     //  cout << "Excerpt pattern: " << origPattern << "\n";
 
@@ -298,7 +382,11 @@ main(int ac, char **av)
     //
     if (requiredWords.Count() > 0)
     {
+#ifdef COLLECTIONS
+	addRequiredWords(*searchWords, requiredWords);
+#else
 	addRequiredWords(searchWords, requiredWords);
+#endif
     }
     
     //
@@ -313,7 +401,9 @@ main(int ac, char **av)
 	reportError(form("Unable to read word database file%s\nDid you run htmerge?",
 			 filenamemsg.get()));
     }
+#ifndef COLLECTIONS
     ResultList	*results = htsearch(word_db, searchWords, parser);
+#endif
 
     String	index = config["doc_index"];
     if (access(index, R_OK) < 0)
@@ -330,7 +420,27 @@ main(int ac, char **av)
 			 filenamemsg.get()));
     }
 
+#ifdef COLLECTIONS
+    Collection *collection = new Collection(configFile, word_db, index, doc_db);
+    htsearch(collection, *searchWords, parser);
+    collection->setSearchWords(searchWords);
+    collection->setSearchWordsPattern(searchWordsPattern);
+    active_collections.Add(configFile, collection);
+
+    if (parser->hadError())
+    {
+	errorMessage = parser->getErrorMessage();
+	errorMessage = strdup(errorMessage);
+    }
+
+    delete parser;
+    boolean_keywords.Destroy();
+  } // RMI
+
+    Display	display(&active_collections);
+#else
     Display	display(index, doc_db);
+#endif
     if (display.hasTemplateError())
       {
 	if (filenameok) filenamemsg << " '" << config["template_name"] << "'";
@@ -339,13 +449,23 @@ main(int ac, char **av)
 	return 0;
       }
     display.setOriginalWords(originalWords);
+#ifndef COLLECTIONS
     display.setResults(results);
     display.setSearchWords(&searchWords);
+#endif
     display.setLimit(&limit_to);
     display.setExclude(&exclude_these);
+#ifndef COLLECTIONS
     display.setAllWordsPattern(&searchWordsPattern);
+#endif
     display.setCGI(&input);
     display.setLogicalWords(logicalWords);
+#ifdef COLLECTIONS
+    if (errorMessage)
+	display.displaySyntaxError(errorMessage);
+    else
+	display.display(pageNumber);
+#else
     if (parser->hadError())
 	display.displaySyntaxError(parser->getErrorMessage());
     else
@@ -353,6 +473,7 @@ main(int ac, char **av)
 
     delete results;
     delete parser;
+#endif
     return 0;
 }
 
@@ -702,14 +823,27 @@ convertToBoolean(List &words)
 //   This returns a dictionary indexed by document ID and containing a
 //   List of WordReference objects.
 //
+#ifdef COLLECTIONS
+void
+htsearch(Collection *collection, List &searchWords, Parser *parser)
+#else
 ResultList *
 htsearch(char *wordfile, List &searchWords, Parser *parser)
+#endif
 {
     //
     // Pick the database type we are going to use
     //
     ResultList	*matches = new ResultList;
     if (searchWords.Count() > 0)
+#ifdef COLLECTIONS
+    {
+	parser->setCollection(collection);
+	parser->parse(&searchWords, *matches);
+	parser->setCollection(NULL);
+    }
+    collection->setResultList(matches);
+#else
     {
 	Database	*dbf = Database::getDatabaseInstance();
 
@@ -722,6 +856,7 @@ htsearch(char *wordfile, List &searchWor
     }
 	
     return matches;
+#endif
 }
 
 
--- htsearch/htsearch.h.nocoll	Thu Jan 31 17:47:18 2002
+++ htsearch/htsearch.h	Mon Nov 11 16:16:17 2002
@@ -38,6 +38,9 @@ extern Database		*dbf;
 extern String		logicalWords;
 extern String		originalWords;
 
+#ifdef COLLECTIONS
+extern StringList 	collectionList;
+#endif
 
 #endif
 
--- htsearch/Makefile.in.nocoll	Thu Jan 31 17:47:18 2002
+++ htsearch/Makefile.in	Mon Nov 11 14:09:38 2002
@@ -9,7 +9,7 @@ include $(top_builddir)/Makefile.config
 
 OBJS=		Display.o DocMatch.o ResultList.o ResultMatch.o \
 		Template.o TemplateList.o WeightWord.o htsearch.o \
-		parser.o
+		parser.o Collection.o
 
 FOBJS=		$(top_builddir)/htfuzzy/libfuzzy.a
 TARGET=		htsearch
--- htsearch/Makefile.nocoll	Fri Feb  1 16:58:46 2002
+++ htsearch/Makefile	Mon Nov 11 17:11:23 2002
@@ -9,7 +9,7 @@ include $(top_builddir)/Makefile.config
 
 OBJS=		Display.o DocMatch.o ResultList.o ResultMatch.o \
 		Template.o TemplateList.o WeightWord.o htsearch.o \
-		parser.o
+		parser.o Collection.o
 
 FOBJS=		$(top_builddir)/htfuzzy/libfuzzy.a
 TARGET=		htsearch
--- htsearch/parser.cc.nocoll	Thu Jan 31 17:47:18 2002
+++ htsearch/parser.cc	Mon Nov 11 15:42:37 2002
@@ -11,6 +11,11 @@ static char RCSid[] = "$Id: parser.cc,v 
 #include "parser.h"
 #include "QuotedStringList.h"
 
+#ifdef COLLECTIONS
+#include "Collection.h"
+#include "htsearch.h"
+#endif
+
 #define	WORD	1000
 #define	DONE	1001
 
@@ -462,6 +467,9 @@ Parser::parse(List *tokenList, ResultLis
     for (int i = 0; i < elements->Count(); i++)
     {
 	dm = (DocMatch *) (*elements)[i];
+#ifdef COLLECTIONS
+	dm->collection = collection;
+#endif
 	resultMatches.add(dm);
     }
     elements->Release();
@@ -469,3 +477,25 @@ Parser::parse(List *tokenList, ResultLis
     delete elements;
     delete result;
 }
+
+#ifdef COLLECTIONS
+void
+Parser::setCollection(Collection *coll)
+{
+    if (coll)
+    {
+	dbf = Database::getDatabaseInstance();
+	dbf->OpenRead(coll->getWordFile());
+    }
+    else
+    {
+	if (dbf)
+	{
+	    dbf->Close();
+	    delete dbf;
+	    dbf = NULL;
+	}
+    }
+    collection = coll;
+}
+#endif
--- htsearch/parser.h.nocoll	Thu Jan 31 17:47:18 2002
+++ htsearch/parser.h	Mon Nov 11 17:27:08 2002
@@ -23,7 +23,11 @@ public:
     int			checkSyntax(List *);
     void		parse(List *, ResultList &);
 
+#ifdef COLLECTIONS
+    void		setCollection(Collection *collection);
+#else
     void		setDatabase(Database *db)	{dbf = db;}
+#endif
     char		*getErrorMessage()		{return error.get();}
     int			hadError()			{return valid == 0;}
 	
@@ -46,6 +50,9 @@ protected:
     int			valid;
     Stack		stack;
     Database		*dbf;
+#ifdef COLLECTIONS
+    Collection		*collection;
+#endif
     String		error;
 };
 
--- htsearch/ResultMatch.h.nocoll	Thu Jan 31 17:47:18 2002
+++ htsearch/ResultMatch.h	Mon Nov 11 16:19:30 2002
@@ -21,6 +21,9 @@
 #include <htString.h>
 
 class DocumentRef;
+#ifdef COLLECTIONS
+class Collection;
+#endif
 
 class ResultMatch : public Object
 {
@@ -44,12 +47,20 @@ public:
 	char			*getURL()					{return url;}
 	DocumentRef		*getRef()					{return ref;}
 
+#ifdef COLLECTIONS
+	void		setCollection(Collection *coll) { collection = coll; }
+	Collection	*getCollection() { return collection; }
+#endif
+
 private:
 	float			score;
 	int				incomplete;
 	int				anchor;
 	String			url;
 	DocumentRef		*ref;
+#ifdef COLLECTIONS
+	Collection		*collection;
+#endif
 };
 
 #endif
--- include/htconfig.h.in.nocoll	Thu Jan 31 17:47:18 2002
+++ include/htconfig.h.in	Mon Nov 11 16:46:17 2002
@@ -132,6 +132,10 @@
 /*  regardless of the security problems with this. */
 #undef ALLOW_INSECURE_CGI_CONFIG
 
+/* Define this if you want to allow htsearch to use collections by taking */
+/*  multiple "config" CGI input parameters. */
+#undef COLLECTIONS
+
 /* Define to remove the word count in db and WordRef struct. */
 #undef NO_WORD_COUNT
 
--- include/htconfig.h.nocoll	Fri Feb  1 16:58:46 2002
+++ include/htconfig.h	Mon Nov 11 16:46:28 2002
@@ -133,6 +133,10 @@
 /*  regardless of the security problems with this. */
 /* #undef ALLOW_INSECURE_CGI_CONFIG */
 
+/* Define this if you want to allow htsearch to use collections by taking */
+/*  multiple "config" CGI input parameters. */
+#define COLLECTIONS 1
+
 /* Define to remove the word count in db and WordRef struct. */
 /* #undef NO_WORD_COUNT */
 

-- 
Gilles R. Detillieux              E-mail: <grdetil@scrc.umanitoba.ca>
Spinal Cord Research Centre       WWW:    http://www.scrc.umanitoba.ca/
Dept. Physiology, U. of Manitoba  Winnipeg, MB  R3E 3J7  (Canada)


-------------------------------------------------------
This sf.net email is sponsored by:ThinkGeek
Welcome to geek heaven.
http://thinkgeek.com/sf
_______________________________________________
htdig-general mailing list <htdig-general@lists.sourceforge.net>
To unsubscribe, send a message to <htdig-general-request@lists.sourceforge.net> with a subject of unsubscribe
FAQ: http://htdig.sourceforge.net/FAQ.html

