Date: Mon, 16 Jul 2001 18:03:32 -0500 (CDT)
From: Gilles Detillieux <grdetil@scrc.umanitoba.ca>
To: Joe R. Jah <jjah@cloud.ccsf.cc.ca.us>
Cc: Geoff Hutchison <ghutchis@wso.williams.edu>,
     Dante Picca <dpicca@hochfeiler.it>, htdig-general@lists.sourceforge.net
Subject: Re: [htdig] PATCH for 3.1.5's url_part_aliases bug (was: htdig and
    relative urls)

According to Joe R. Jah:
> To get a pure relative URL page in order to have the browser prepend the
> rest of the URL you can do:
> 
> indexing.conf:
> url_part_aliases: http://users.mysite.it/ *2
> 
> searching.conf:
> url_part_aliases: /./ *2
> 
> First I tried it with just "/", but if you do it like:
> 
> searching.conf:
> url_part_aliases: / *2
> 
> You would get a premature end of script error in your server error log.  
> There may be a way to patch htlib/HtWordCodec.cc to allow that too.  I
> think it expects more than one character;-/

OK, I finally took some time to get to the bottom of this one, which was
a bit of a pain to track down.  The problem isn't the HtWordCodec class
itself.  It handles single characters just fine.  The problem is the way
htsearch 3.1.5 uses the class.  3.2 doesn't suffer from this problem.

The problem is that the URL gets decoded just fine, but then when htsearch
needs to look up the db.docdb record for a given URL, it re-encodes the
URL.  This is because in 3.1, the db.docdb is keyed by URL, not by docID
as in 3.2.  But, when you re-encode / to *2, all slashes get hit, so it
can't find the matching docdb record.  The fix is to keep the encoded
URL for docdb lookups.

Here's the patch for 3.1.5, to fix the problem.  NOTE: THIS PATCH IS
INCOMPATIBLE WITH THE dateRange.1 PATCH ON ftp.ccsf.org!  If you've already
applied the dateRange.1 patch to 3.1.5, use url_part_aliases4dateRange.0
instead.  Apply this patch by  "patch -p0 < url_part_aliases.0".

As always, feedback is welcome.  Please let me know if there are any problems
with this code.

-------- 8< -------- (cut here) -------- 8< --------
--- htcommon/DocumentDB.h.refbug	Thu Feb 24 20:29:10 2000
+++ htcommon/DocumentDB.h	Mon Jul 16 17:12:05 2001
@@ -58,6 +58,7 @@ public:
 
     int			Add(DocumentRef &);
     DocumentRef		*operator [] (char *url);
+    DocumentRef		*FindCoded(char *url);
     int			Exists(char *url);
     int			Delete(char *url);
 
--- htcommon/DocumentDB.cc.refbug	Thu Feb 24 20:29:10 2000
+++ htcommon/DocumentDB.cc	Mon Jul 16 17:12:05 2001
@@ -152,6 +152,24 @@ DocumentRef *DocumentDB::operator [] (ch
 
 
 //*****************************************************************************
+// DocumentRef *DocumentDB::FindCoded(char *u)
+//
+DocumentRef *DocumentDB::FindCoded(char *u)
+{
+    String			data;
+    String			url = u;
+
+    if (dbf->Get(url, data) == NOTOK
+        && (! myTryUncoded || dbf->Get(HtURLCodec::instance()->decode(url), data) == NOTOK))
+	return 0;
+
+    DocumentRef		*ref = new DocumentRef;
+    ref->Deserialize(data);
+    return ref;
+}
+
+
+//*****************************************************************************
 // int DocumentDB::Exists(char *u)
 //
 int DocumentDB::Exists(char *u)
--- htsearch/Display.cc.refbug	Thu Feb 24 20:29:11 2000
+++ htsearch/Display.cc	Mon Jul 16 17:14:28 2001
@@ -179,7 +179,7 @@ Display::display(int pageNumber)
     {
 	if (currentMatch >= startAt)
 	{
-	    match->setRef(docDB[match->getURL()]);
+	    match->setRef(docDB.FindCoded(match->getURL()));
 	    DocumentRef	*ref = match->getRef();
 	    if (!ref)
 		continue;	// The document isn't present for some reason
@@ -235,8 +235,9 @@ Display::displayMatch(ResultMatch *match
 	
     DocumentRef	*ref = match->getRef();
 
-    char    *url = match->getURL();
-    vars.Add("URL", new String(url));
+    char    *coded_url = match->getURL();
+    String url = HtURLCodec::instance()->decode(coded_url);
+    vars.Add("URL", new String(url.get()));
     
     int     iA = ref->DocAnchor();
     
@@ -1029,7 +1030,7 @@ Display::buildMatchList()
 	
 
 	thisMatch = new ResultMatch();
-	thisMatch->setURL(url);
+	thisMatch->setURL(coded_url);
 	thisMatch->setRef(NULL);
 
 	//
@@ -1056,7 +1057,7 @@ Display::buildMatchList()
 
 	if (date_factor != 0.0 || backlink_factor != 0.0 || typ != SortByScore)
 	  {
-	    DocumentRef *thisRef = docDB[thisMatch->getURL()];
+	    DocumentRef *thisRef = docDB.FindCoded(thisMatch->getURL());
 	    if (thisRef)   // We better hope it's not null!
 	      {
 		score += date_factor * 
-------- 8< -------- (cut here) -------- 8< --------


