
From grdetil@scrc.umanitoba.ca Fri Mar 17 11:39:29 2000
Date: Fri, 17 Mar 2000 11:13:54 -0600 (CST)
From: Gilles Detillieux <grdetil@scrc.umanitoba.ca>
To: htdig@htdig.org
Subject: [htdig] PATCH: case insensitive META robots tag parsing

[ Reposted from htdig3-bugs@htdig.org... ]

From: Gilles Detillieux <grdetil@scrc.umanitoba.ca>
Subject: Re: htdig ignores noindex META-Tag (PR#810)
To: pruem@machno.hbi-stuttgart.de
Date: Fri, 17 Mar 2000 11:08:31 -0600 (CST)
Cc: ht3bugs@htdig.org, htdig3-bugs@htdig.org

According to David Pruem (pruem@machno.hbi-stuttgart.de):
> ht://Dig ignored the following directives in a bunch of pages and indexed
> them.
> 
> <HTML>
> <HEAD>
> <TITLE>7</TITLE>
> <META NAME="robots" CONTENT="NOINDEX,FOLLOW">
> </HEAD>
> 
> Have You any idea what could cause this behaviour?

Oops!  The standard clearly says that the name and contents of such tags
should be case insensitive, but when htdig looked at the content parameter,
it looked for words in lower case only!  Clearly a bug.  I've fixed it in
3.2, but here is the fix for 3.1.5...

--- htdig/HTML.cc.robotsbug	Tue Feb 15 14:08:41 2000
+++ htdig/HTML.cc	Fri Mar 17 10:59:38 2000
@@ -911,7 +911,7 @@ HTML::do_tag(Retriever &retriever, Strin
 			 && strlen(conf["content"]) !=0)
 		  {
 		    String   content_cache = conf["content"];
-
+		    content_cache.lowercase();
 		    if (content_cache.indexOf("noindex") != -1)
 		      {
 			doindex = 0;

-- 
Gilles R. Detillieux              E-mail: <grdetil@scrc.umanitoba.ca>
Spinal Cord Research Centre       WWW:    http://www.scrc.umanitoba.ca/~grdetil
Dept. Physiology, U. of Manitoba  Phone:  (204)789-3766
Winnipeg, MB  R3E 3J7  (Canada)   Fax:    (204)789-3930

------------------------------------
To unsubscribe from the htdig mailing list, send a message to
htdig-unsubscribe@htdig.org
You will receive a message to confirm this.

