
From grdetil@scrc.umanitoba.ca Tue Apr 11 11:43:27 2000
Date: Tue, 11 Apr 2000 13:00:47 -0500 (CDT)
From: Gilles Detillieux <grdetil@scrc.umanitoba.ca>
To: Geoff Hutchison <ghutchis@wso.williams.edu>
Cc: Gilles Detillieux <grdetil@scrc.umanitoba.ca>,
    Michael Pfennich <michael.pfennich@parlinkom.gv.at>, htdig@htdig.org
Subject: Re: [htdig] Sort by Date from Meta Tags [patch]

According to Geoff Hutchison:
> On Tue, 11 Apr 2000, Gilles Detillieux wrote:
> > Adding these capabilities to the 3.1.5 release would take a fair bit more
> > effort, so if you want to try the bleeding edge, and don't mind hacking the
> > code a bit, just wait a little longer and 3.2.0b2 should be released.
> 
> Funny this should come up. I actually wrote it for the 3.1.x code first
> because someone asked me to. :-) At the time, I didn't feel it was worth
> putting into the tree for future 3.1.x releases (in part because at the
> time, I didn't think there would be any). I put the code into the 3.2 tree
> because it seemed like a useful feature.
> 
> I no longer have the patch to 3.1.x, but I could backport it from the 3.2
> tree pretty easily. IIRC, it was about a 30-line patch. I'll post this to
> the list sometime tomorrow, probably.

Good thing I'm such a pack rat.  I found your patch, and some of our
discussions with Mike Grommet.  Here's an updated version of your patch,
with Mike's correction added in, as well as a test for the use_doc_date
attribute.  I imagine one might need to tweak the default tm_hour setting
in get_time() to avoid having the date in the search results falling on
the previous date, due to timezone offsets.  htsearch displays times in
the server's local timezone, while htdig parses all dates in UTC.

This should free you to continue on 3.2.0b2.  :-)

On Tue, 6 Apr 1999, Geoff Hutchison wrote:
> On Tue, 6 Apr 1999, mike grommet wrote:
> > My thoughts are to take a meta tag, named something like "Document-date" and
> > a value
> > just like the standard GMT time returned by a web server for a Last
> > Modification
> 
> There is already a standard for this, specified by the Dublin Core
> standard. The tag is named "DATE" and has the ISO-8601 format YYYY-MM-DD.
> 
> > Would you happen to have this code handy?  It would be useful to me at least
> 
> Here you go... I should probably make this an option with something like
> 'use_doc_date' when I commit it.

*** htdig/HTML.cc.orig	Thu Feb 24 20:29:10 2000
--- htdig/HTML.cc	Tue Apr 11 12:58:15 2000
*************** HTML::do_tag(Retriever &retriever, Strin
*** 893,898 ****
--- 893,903 ----
  		{
  		    retriever.got_meta_email(transSGML(conf["content"]));
  		}
+ 		else if (mystrcasecmp(cache, "date") == 0 &&
+ 				config.Boolean("use_doc_date",0))
+ 		{
+ 		    retriever.got_time(transSGML(conf["content"]));
+ 		}
  		else if (mystrcasecmp(cache, "htdig-notification-date") == 0)
  		{
  		    retriever.got_meta_notification(transSGML(conf["content"]));
*** htdig/Retriever.cc.orig	Thu Feb 24 20:29:10 2000
--- htdig/Retriever.cc	Tue Apr 11 12:59:53 2000
*************** Retriever::RetrievedDocument(Document &d
*** 561,566 ****
--- 561,567 ----
      current_ref = ref;
      current_anchor_number = 0;
      current_title = 0;
+     current_time = 0;
      current_head = 0;
      current_meta_dsc = 0;
  
*************** Retriever::RetrievedDocument(Document &d
*** 583,589 ****
      //
      ref->DocHead(current_head);
      ref->DocMetaDsc(current_meta_dsc);
!     ref->DocTime(doc.ModTime());
      ref->DocTitle(current_title);
      ref->DocSize(doc.Length());
      ref->DocAccessed(time(0));
--- 584,593 ----
      //
      ref->DocHead(current_head);
      ref->DocMetaDsc(current_meta_dsc);
!     if (current_time == 0)
!       ref->DocTime(doc.ModTime());
!     else
!       ref->DocTime(current_time);
      ref->DocTitle(current_title);
      ref->DocSize(doc.Length());
      ref->DocAccessed(time(0));
*************** Retriever::got_title(char *title)
*** 1098,1103 ****
--- 1102,1142 ----
      current_title = title;
  }
  
+ //*****************************************************************************
+ // void Retriever::got_time(char *time)
+ //
+ void
+ Retriever::got_time(char *time)
+ {
+     time_t   new_time;
+     struct tm   tm;
+ 
+     tm.tm_hour = 0;
+     tm.tm_min = 0;
+     tm.tm_sec = 0;
+     tm.tm_mon = 0;
+     tm.tm_mday = 1;
+     tm.tm_year = 0;
+ 
+     if (debug > 1)
+       cout << "\ntime: " << time << endl;
+     //
+     // As defined by the Dublin Core, this should be YYYY-MM-DD
+     // In the future, we'll need to deal with the scheme portion
+     //  in case someone picks a different format.
+     //
+     if (mystrptime(time, "%Y-%m-%d", &tm))
+       {
+ #if HAVE_TIMEGM
+         new_time = timegm(&tm);
+ #else
+         new_time = mytimegm(&tm);
+ #endif
+ 	current_time = new_time;
+       }
+     // If we can't convert it, current_time stays the same and we get
+     // the default--the date returned by the server...
+ }
  
  //*****************************************************************************
  // void Retriever::got_anchor(char *anchor)
*** htdig/Retriever.h.orig	Thu Feb 24 20:29:10 2000
--- htdig/Retriever.h	Tue Apr 11 12:34:26 2000
*************** public:
*** 51,56 ****
--- 51,57 ----
      void		got_word(char *word, int location, int heading);
      void		got_href(URL &url, char *description);
      void		got_title(char *title);
+     void		got_time(char *time);
      void		got_head(char *head);
      void		got_meta_dsc(char *md);
      void		got_anchor(char *anchor);
*************** private:
*** 83,88 ****
--- 84,90 ----
      String		current_title;
      String		current_head;
      String		current_meta_dsc;
+     time_t		current_time;
      int			current_id;
      DocumentRef		*current_ref;
      int			current_anchor_number;

-- 
Gilles R. Detillieux              E-mail: <grdetil@scrc.umanitoba.ca>
Spinal Cord Research Centre       WWW:    http://www.scrc.umanitoba.ca/~grdetil
Dept. Physiology, U. of Manitoba  Phone:  (204)789-3766
Winnipeg, MB  R3E 3J7  (Canada)   Fax:    (204)789-3930

------------------------------------
To unsubscribe from the htdig mailing list, send a message to
htdig-unsubscribe@htdig.org
You will receive a message to confirm this.

