Changes between release v1.4.pl2 (February 1, 1996) and v1.4.pl1: - Changes to the Gatherer - Added "robots.txt" support to the gatherer enumeration. - Added HARVEST_NOT_VISITED_LOG environment variable to httpenum. - Added --body-text option to HTML-lax.sum and in the comments of HTML.sum. - Added 'Access-Delay' to gatherer.cf. Now adds delay for LeafNode URLs. - Changed Gopher timeout to 120 seconds. - Changed HTTP-Query byurl pattern to be any URL with a question mark. - Changed SGML.sum to not do "word wrap" on very large strings. - Fixed HTTP authentication to work with Netscape server, and support encoding spaces as RFC1738 escapes. - Changed handling of depth between enum programs. - Changed Essence to print [L] for URLs where local mapping succeeds. - Fixed gatherd timeout bug (caused by eliminating the DNS mismatch warning). - Fixed Carriage-Return substitution in RTF.sum. - Fixed HTML-lax.sum to turn carraige returns to spaces. - Fixed SGML.sum to NOT rewrite correct DOCTYPE declarations. - Fixed using single quotes in Gatherer-Name. - Fixed 'gather' bug when reading binary data in non-compressed mode. - Updated html-mcom.dtd - Changes to the Broker - Added support for Glimpse-based broker to limit the number of matched lines per object. - Changed BrokerQuery.pl to not use a tmpfile, and to sort the results by number of matched lines. - Changed broker connect timeout to happen only after some data has been read. - Fixed BrokerQyery.pl.cgi to protect special characters in a Broker name. - Miscellaneous Changes - Added "prefix =" to components/*/Makefile. - Changed our log() function to be called Log(). - upgraded configure script to autoconf v2.7 - Updated the Users manual. ############################################################################## Changes between release v1.4.pl1 (November 17, 1995) and v1.4 - Changes to the Gatherer - Fixed NULL BASE URL coredump bug in HTMLurls - Fixed Gatherer to make Top-Directory set Lib-Directory value also (like the manual says it does in section 4.6.1). - Fixed essence and SGML.sum to look in multiple lib dirs. Look first in Lib-Directory if set, otherwise in $HARVEST_HOME/lib/gatherer. - Changes to the Broker - Added for compiling on AIX. - Changed BrokerQuery.pl.cgi to send the query to the broker before opening the tmpfile. If there is a delay in opening the tmpfile the broker query could time out. - Fixed potential coredump in Log_rotate() due to large local array. ############################################################################## Changes between release v1.4 (November 10, 1995) and v1.3 - Changes to the Gatherer: - Added symbolic link loop detection to httpenum. - Added a GIF image summarizer (GIFImage.sum), requires netpbm. The GIFImage type is still in the Essence stoplist by default. - Added 'C' version of ftpget. - Added ability to rewrite the SOIF template URL with Essence post-processing. Could be used to gather file:// URLs and have them exported as http:// URLs. - Added the ability to specify a program to generate root/leaf URLs. - Fixed select() timeouts to POSIX semantics. - Fixed SGML summarizer to give error if input is empty. - Fixed a Makefile to actually build and install HTML-lax.sum. - Fixed liburl problem with AFS. Must *copy* files into the cache-liburl directory. - Fixed News gatherering: If 'newsget.pl' exits non-zero, close the NNTP server socket. - Fixed newsget.pl with a major rewrite. - Fixed 'fileenum' to use URLs and not always return file://hostname/. - Fixed gatherd bug where child process would remove parent's gatherd.pid file. - Changed NewsArticle.sum TTL to 7 days by default. - Changed Essence unnesting to occur in individual directories. - Removed confusing gatherd DNS mismatch warning message. - Changes to the Broker: - Added #Restart-Index-Server command to broker admin command set. - Added error logging and debugging in Glimpse inline query code. - Fixed select() timeouts to POSIX semantics. - Fixed Glimpse minor malloc problems. - Fixed the broker on Linux; needs unbuffered input from gather process. - Fixed broker query language bug for high-bit (international) characters. - Changed Broker to allow specifically setting GlimpseServer_Port again; if not set, port is chosen randomly. - Changed BrokerAdmin.cgi to use unbuffered output. - Changed Glimpse macros CLEANUP and RETURN to be functions. - Changed broker admin/LOG to log FQDN instead of IP address. - Remove glimpse version ambiguities in Glimpse/index.c. - Removed getpeername() call in the broker; get address from accept(). - Changes to the Cache: - The cache has been moved to a separate distribution. - Miscellaneous Changes - Dont link with -lmalloc on Solaris. - Fixed User Manual and FAQ inconsistencies. ############################################################################## Changes between release v1.3 (September 7, 1995) and v1.3.beta: - Changes to the Broker: - Added support for auto-validation from the HSR which includes a description.html file, RunUpdate program for each new Broker. - Changes to the Cache: - Added support to dynamically toggle debug level via USR1 and USR2. - Fixed dnsserver parsing numeric addresses. - Added patches for FreeBSD. - Changed source_ping to off by default. - Added optional code for 'local_ip' line in cached.conf. Addresses given as 'local_ip' will be retrieved directly, without sending any probe packets. - Added 'TIMEOUT_DIRECT' as a new kind of entry in cache_hierarchy.log. - Changes to the Gatherer: - Added LMT.gdbm to liburl to keep last-modified-timestamps. - Added support for using BASE element in HTML enumeration. - Added support for HTML-3.0 DTD. - Added support for Netscape DTD. - Added support for HotJava DTD. - Added old HTML.sum as HTML-lax.sum. - Added MacBinHex as a supported nested type in essence. - Changed gatherd to die when its data directory gets removed. - Fixed bug: repeated HTTP redirected URLs (with help from glenn@rockie.nsc.com) - Miscellaneous Changes - Incorporated fixes for FreeBSD port from ted@oz.plymouth.edu. - Incorporated fixes for Ultrix port from dsr@lns598.lns.cornell.edu. ############################################################################## Changes between release v1.3.beta (August 7, 1995) and v1.2: - Changes to the Broker: - Upgraded to Glimpse 3.0. - Improved and updated WAIS, Inc. support to use version 2.1.1. - Added support for Verity VDK as backend indexer/searcher. - Added support for GRASS GIS as spatial database. - Added support for PLS, Inc. PLWeb as backend indexer/searcher. - Added IP numbers for incoming requests to log information. - Added support for displaying individual SOIF attributes via WWW. - Added 'Uniqify' command to Broker; keeps most current object of duplicate URLs. - Added security and name lookup to BrokerQuery.pl. - Added support for Glimpse inline queries. - Added error message to report incorrect WWW installation. - Added some support for Internationalization in the Broker - Added support for automatic validation by HSR. - Removed need for 'gzip' in the Broker. - Changed BrokerQuery.pl to try multiple entries from Brokers.cf - Changed broker to read queries with a timeout. Very long queries can get segmented by TCP. - Fixed bug with matching Description attributes. - Fixed bug with Glimpse regular expression detection. - Fixed bug in CreateBroker -- wrong default Gatherer port number. - Changes to the Cache: - Added persistent disk storage across cached reboots. - Added IP-based access control. - Added setting of the TTL based on URL regular expressions. - Added more sophisticated setting of the TTL based on HTTP headers. - Added more statistics information. - Added support for logging using the common httpd logfile format. - Added support for HEAD HTTP request method. - Added support for user-configurable periodic garbage collection. - Added support for user-configurable stoplist. - Added support for WAIS proxy'ing (from Edward Moy, Xerox PARC). - Added support for quick aborting when client drops connection, cached stops immediately. Useful for slow network links. - Added high/low water marks for disk storage. - Added 'source_ping' to cached.conf. - Added 'dns_children' to cached.conf. - Added -z to force a cached to discard (zap) its disk storage. - Added logging of ftpget.pl failures (exit codes and signals). - Added Expires timestamp to cache log - Improved error messages for DNS name lookup failures. - Improved performance of LRU replacement policy. - Improved performance for generating statistics. - Increased listen(2) socket queue size to 50 or max of OS. - Removed all Tcl code. - Cleaned memory allocation and management. - Cleaned up and updated cached.conf. - Cleaned up debugging output. - Changed default low watermark to 60%. - Changed trace mail into cached.conf option. - Changed algorithm for time estimations using echo ports. - Changed dnsserver to try gethostbyname(3) again sometimes - Fixed bugs with URL intepretation. - Fixed bugs with internal IPcache memory management. - Fixed bug with DNS lookups on IP numbers. - Fixed bug with not finding 'dnsserver'. - Fixed bug with hard timeouts in select loop. - Fixed bug with some platforms needing strdup(). - Fixed bug with ftpget.pl not including MIME content-type for unknown filename extensions. - Fixed bug with ftpget.pl not parsing ls output correctly (wasn't matching dashes in user/group names). - Fixed copyright messages in source code. - Fixed realloc() bug for concurrent object access. - Fixed bug when neighbors specified and dns_servers != 3. - Fixed bug with new hash tables when deleting from table as it is being traversed. - Fixed various minor bugs. - Changes to the Gatherer: - Added ability to pass enumerated URLs through an external filter program. Allows very specific selection of URLs to further enumerate. - Added -background flag to the Gatherer; does export work in bg. - Added IP-based filtering (regular expressions) in host-filter - Added Post-processing of summaries to Essence - Added 'gather' check for 'gzip' before setting compression option. - Added username/password support for HTTP retrievals - Changed gatherer to remove cache-liburl directory after a successful gather session. - Fixed bug: Infinite loops in 'enum' on Invalid URLs - Fixed bug: HTTP headers not parsed from slow servers - Improved URL parsing; support for username/password in FTP urls. - Miscellaneous Changes - Upgraded autoconf 'configure' scripts to v2.4. - liburl: better handling of relative URLs. - liburl retrieval programs abort very large transfers (at 10 Mbytes) - Fixed bug with subscribing to harvest-users mailing list. ############################################################################## Changes between release v1.2 (April 3, 1995) and v1.1: - Changes to the Broker: - Major performance improvements to the collector interface. - Added fast, efficient internal Gatherer ID management. - Added support for clients requesting attributes with #attribute. - Added support for log file rotation, and terse logging. - Added support for #operation in query manager interface. - Cleaned up the log file format. - Cleaned up the administrative interface. - Cleaned up the UNIX file system-based storage manager. - Fixed major bug with WAIS support. - Fixed file descriptor leaks in glimpseserver when the index contained files that had since been deleted. - Fixed bug with overflowing lines from glimpse. - Fixed bug with hostname initialization. - Fixed memory leak with the Description-Tag attribute matching. - Fixed various minor bugs. - Changes to the Cache: - Added httpd accelerator support. - Added IP number logging. - Added setuid() to a user when cached is run as root. - Added support for HTTP servers that die abruptly. - Added client_timeout which places a hard limit on the life of incoming connections on the ascii port, or on outgoing HTTP or Gopher clients. - Cleaner implementation for retrieving FTP URLs via ftpget.pl. - Tries to write cached.pid file in same directory as cached.conf. - Changed FTP support to sacrifice correct HTTP headers for dramatically decreased latency for large FTP objects. - Fixed ftpget.pl -htmlify to determine directory vs. file correctly and send HTTP header as soon as possible. - Fixed rare core dump during HTTP xfers. - Fixed how the error messages are printed. - Better support for larger file descriptor tables. - Debug level 0 and 1 now has timestamp logged. - Cleaned and updated defaults for cached.conf. - When run as root and do suid, cached will change current directory to its swap directory. Swap directory is pretty sure that writable to cached. Just in case, it crashes so it can write core file. - Minor modification of store error message. - Remote client connection resets are handled as soft error. - Strip an extra /r/n from MIME. - Hierachy log (yet another log, but it's optional). - Periodically hunts for zombies processes. - Added more information to the stat interface. - Cleaned up info data for improved parsability/readability. - Changes to the Gatherer: - Added support to follow HTTP redirection pointers. - Added support for $http_proxy environment variable in liburl. - Added support for summarizing SGML data. - Added better support for summarizing TeX data. - Added support for summarizing RTF and MIF data, using Rainbow software provided by EBT, which we make available in our new components distribution - Added support for summarizing WordPerfect 5.1 data. - changed HTML summarizing to use SGML summarizer, providing more easily customizable results - Added support for local filesystem gatherering for NNTP. - Improved incremental gatherering support, and integrated the support into the Essence program (removed dbcheck program). - Added support for "fake" MD5 generation per SOIF object on external presentation unnesting streams (exploders) -- permits incremental gathering on data generated by an Exploder. - Added --memory-efficient to Essence to trade time for memory efficiency; this help users who have limited with memory resources but are dealing with large SOIF objects. - Added --confirm-host to Essence for explicit host DNS validation. - Added --max-refresh to Essence to limit refreshing activity. - RootNode enumerators generate RFC 1738 escaped URLs. - Improved performance of SOIF parsing. - Fixed bug in locating gzip in gatherd. - Fixed bug in the unnesting commands in Essence. - Fixed bug with HTTP/1.0 requests, now sends encoded URIs for GETs. - Fixed ftp.pl for Solaris. Wasn't setting PF_INET correctly. - Changes to the Replicator: - Updated with USC's version from 3/15/95 - Changes to the User's Manual: - Added sections for new plug'n'play components: standard, SGML, HTML, MIF, RTF, WordPerfect 5.1. - Updated support policy. - Added clarification in Local Gatherering section. - Added clarification in RootNode enumeration section. - Added clarification on Gatherer/Broker information flow. - Added clarification for some cached internals. - Added section on upgrading from v1.1 to v1.2. - Added discussion about httpd_accel for cached. - Updated info about software for the replicator section. - Updated numerous facts to v1.2. - Reorganized essence/content extraction customization section. - Added description of SGML summarizing and components distribution (including Rainbow software for MIF and RTF formats) - Added more troubleshooting comments to all sections. - Added more detail to cache and replication sections, including discussions of httpd-accelerator, CreateReplica, and some of the performance and failure-mode characteristics of the cache. - Cleared up inaccuracies and unclarities in Gatherer RootNode specification section. - Added notes about user-contributed software. - Updated support policy. - Added index entries for all programs in appendicies. - Other minor changes. - Miscellaneous changes: - Reorganized the source tree to support plug'n'play components. ############################################################################## Changes between release v1.1 (February 17, 1995) and v1.1.beta.v2: - Changes to the Broker: - Added a leading protocol version header for the result set. - Added support for query flags during Broker-to-Broker collections. - Added support for limiting the lifetime of glimpse queries. - Fixed major bugs in Broker-to-Broker collections. - Fixed major bugs with deleting Registry entries during initial build. - Fixed memory leaks and file descriptor mgmt bugs in glimpseserver. - Fixed bug with -L in glimpseserver. - Fixed bug that increased the size of structured glimpse indexes. - Fixed bugs in the administrative interface and WAIS support. - Fixed core dump when searching the Registry during collections. - Fixed display SOIF links flag in BrokerQuery.pl. - Fixed .cgi pgms, so that httpd kills the cleanly after user abort. - Changed glimpseserver and broker so that they will not block longer than 15 seconds while waiting for an incoming connection. This prevents SunOS from blindly swapping out the process. - Optimized so that a full glimpseindex will only happen if more than 10% of the objects have changed. - Added some more logging output. - Fixed various minor bugs. - Changes to the Cache: - Added Gopher->HTML support. For mosaic proxy, you'll need to set gopher_proxy http://cache.server:3128/ instead of set gopher_proxy gopher://cache.server:3128/ - Fixed bug with HTML-ify FTP directories using ftpget.pl. - Fixed bug with hierachical problem for refreshing. - Fixed bogus client error message. - Improved cached error messages. - Changes to the Gatherer: - Generates the 'Description' attribute whenever possible. - Fixed bug in the expiring of objects from the PRODUCTION database. - Fixed bug in httpenum that wasn't cleaning up correctly. - Fixed newsenum to obey URL-Max limit. - Improved the Mail summarizer. - Improved the USENET support, added NewsArticle and NewsGroup. - Improved gatherd to speed up SEND-UPDATE timestamp computation. - Improved preparation for the Gatherer's database to be exported. - Purify'd Essence to remove memory leaks. - Changes to the User's Manual: - Updated the section on the Broker's Collection.conf file. - Updated many minor points. - Improved HTML version of the manual, by upgrading latex2html pgm. - Miscellaneous changes: - Fixed problems with Solaris' socket.ph for Perl programs. ############################################################################## Changes between release v1.1.beta.v2 (February 3, 1995) and v1.1.beta: - Changes to the Broker: - Major performance improvements while doing collections. - Uses the customizable BrokerQuery.pl for the WWW interface. - Fixed major bugs in Broker-to-Broker transfers. - Fixed minor bug in collections that caused necessary indexing. - Cleaned and improved the information that is logged to broker.out. - Changed broker to run cleanly as a daemon by disconnecting from the controlling terminal. - glimpseserver now prints its error messages correctly. - Fixed various minor bugs. - Changes to the Cache: - Fixed core dump bug when cached is heavily loaded. - Improved error messages. - Changes to the Gatherer: - Site enumeration filter is based on host:port, and better argv processing for 'Gatherer' - fixes by "Albert Dvornik" - Major performance improvements while preparing databases. - Fixed Gatherer to change to Top-Directory before running. - Fixed Gatherer to write dummy index.html files in data/ and tmp/. - Fixed bug in HTTP enumeration to only extract links from HTML. - Fixed various minor bugs. - Changes to the User's Manual: - Added detailed appendix on Harvest software layout and programs. - HTML version of the manual now contains the local copy of the icons. - Added section on customizing BrokerQuery.pl. - Fixed example for Filters during RootNode enumeration. - Added a search interface to the User's Manual using a Broker. - Updated index. - Miscellaneous changes: - Improved log output format to be more readable. - Added HP-UX port/fixes from Chris Dalton (crd@hplb.hpl.hp.com). ############################################################################## Changes between release v1.1.beta (January 26, 1995) and v1.0: - Changes to the Broker: - Upgraded to Glimpse 2.1 which includes glimpseserver. - Added faster, more memory-efficient internal Registry lookups. - Added support for switching the indexing subsystem at run-time. - Added a statistics generator for the Broker. - Fixed BrokerQuery.cgi so that the rejection message from the Broker while its doing indexing works all of the time. - Fixed Broker bug that would cause the Broker to hang sometimes on a pclose() after doing a collection with the gather command. - Immediately denies outside connections during a collection, indexing, or other administrative operations. - Improved the HTML result set generated by BrokerQuery. - Pointers to content summaries in the result set is now an option. - Changed /brokers to /Harvest/brokers, etc. - Limit the time that the Glimpse search engine runs for a query. - Added Query.cgi which can be used to support Broker replicas. - Added support for minimal bookkeeping from Gatherer. - Fixed problems with the Broker's cleaning, added compress Registry. - Fixed problems with the Broker's updating of objects. - Fixed BrokerQuery syntax error message to point to queryhelp.html. - Fixed BrokerRestart for Replicator interface. - Fixed WWW interface to work with any document root. - Fixed various minor bugs. - Changes to the Cache: - Fixed serious hierachical cache bug. - New error messages. HTTP/1.0 compliant. - Nuke If-Modified-Since to work with Netscape. - Non-blocking DNS lookup using dnsserver program. - New config parameter, cache_dns_program. - Removed Tcl library binaries - have a precompiled version of Harvest. - Fixed stat for outgoing message. - Use multiple directories for on-disk swap storage. - Changes to the Gatherer: - Added flexible support for specifying a Gatherer's workload. - Added support for gatherering through the local file system. - Added support for USENET URLs. - Added INFO command to Gatherer for statistics. - Added support for generating minimial bookkeeping attributes. - Improved HTTP/1.0 support for MIME headers and Last-Modified headers. - Fixed bug with 'gather' that caused 'gunzip' decompression to fail. - Made automatic keyword generation, and local disk cache maximum size a run-time flag. - Added a SOIF parser in Perl. - Changed HTML URL extractor from HTML.sum to separate program. - Fixed Gopher support to have longer read timeout. - Consolidated GDBM utilities into the 'gdbmutil' program. - Fixed bug with gatherd leaving zombie children. - Fixed various minor bugs. - Changes to the Replicator: - Replaced with USC's Replicator distribution. - Changes to the User's Manual: - Added a new subsection on Extended RootNode Specifications - Added discussion about new Local-Mapping support - Fixed various typos and clarified wording in various places - Fixed some URLs, and added others - Fixed the discussion on using Glimpse with the Broker. - Added a new subsection the Perl SOIF library. - Added more descriptions about various system components (e.g., HSR) - Added more index entries, and clarified some of the existing entries - Added a note about realtime Gatherer updates - Added mention of cache RAM requirements - Added section on Support Policy and Harvest Team Contact Information - Updated copyright/licensing discussion - Added a section about the binary-only distribution - Changed section names and content at beginning to make it more clear and to make more sense with the new installation. - Reorganized manual by subsystem - Added troubleshootings sections to each subsystem, and shifted some stuff into there that had been in other places - Expanded section on supported platforms and software needed for running/building Harvest - Clarified some parts of the ``Querying a Broker'' section - Added appendix on Directory layout of installed Harvest software - Updated to reflect new httpd reorg - Updated default summarizer action list - Noted that glimpseserver is now part of the system - Added more discussion to replicator section, including a figure - Miscellaneous changes: - Reorganized Harvest's installed directory structure. - Integrated port to AIX 3.2 and AIX cc by greving@dv.go.dlr.de. - Integrated port to HP-UX A.09.03 by steff@csc.liv.ac.uk. - Integrated port to IRIX 5.3 by leclerc@ai.sri.com. - Integrated port to Linux 1.1.59 by hardy@cs.colorado.edu. - Integrated port/fixes to HP-UX 09.03 and HP ANSI C compiler A.09.69 by crd@hplb.hpl.hp.com. - Changed all Perl scripts to work under Perl 4.x or 5.0. - Try to use vfork rather than fork to save memory when possible. - Updated Copyright. ############################################################################## Changes between release v1.0 (November 7, 1994) and v1.0-beta-1.5: - Changes to the Broker: - Upgraded Glimpse from version 1.1 to 2.0. - Added support for Glimpse 2.0 which allows byte-level indexing, limiting result set sizes, arbitrary Boolean queries, and more. - Made case insenstive and word matching the default for Glimpse. - Improved and updated queryhelp.html and adminhelp.html. - Added soifhelp.html to the help suite. - Added a reboot-broker tag to the default broker Makefile. - Fixed various minor bugs. - Changes to the Gatherer: - Better HTTP/1.0 support, sends User-Agent and From fields. - Fixed a problem with cross-site Gopher RootNode enumeration. - Fixed bug in HTTP RootNode enumeration. - Generation of unique, sorted keyword list is optional in config.h. - Changed Gatherer program to work around Solaris 2.3 Perl 4.036 bug. - Fixed various minor bugs. - Changes to the Cache: - Added support for the Netscape browser. - No longer caches /cgi-bin/ URLs. - Updated the Tcl/Tk/dpwish pointers for the Cache manager. - Changes to the User's Manual: - Added an index with over 300 entries. - Added a new section about Querying a Broker. - Added a new section about common SOIF attribute names. - Added a new section on periodic gatherering. - Added a new section on tuning Glimpse. - Added a new section on the WWW interface to the Broker. - Added a new section on integrating new search/indexing subsystems into the Broker, and give detailed interface description. - Added more detail to SOIF appendix. - Improved and updated the Administrating a Broker subsection. - Added more explanation about manual annotations. - Folded in content from FAQ. - Noted particular usefulness of the Essence-Options variable, e.g., for setting --full-text. - Added a note to the Customizing the candidate selection step subsection that it's particularly useful to do section based on file and URL naming heuristics when gathering remote data, because it can avoid retrieving lots of data. - Added a note in the subsection on Running a Gatherer that you can set MAX_ENUM in src/common/include/config.h, and that a future release of Harvest we will make it possible to set this limit more flexibly. Also noted about the robot guidelines. - Added an overview about the lib and bin directories for the Gatherer, including the defaults and descriptions of each file. - Showed RunGatherer and RunGatherd scripts and added discussion of how to use them from cron and /etc/rc.local. - Added pointer to FAQ on setting up HTTPD in the Broker section. - Put the logo on the cover page. - Miscellaneous changes: - Updated the COPYRIGHT and added it to all appropriate source files. - Updated the FAQ, and converted to HTML. - Fixed BSD compatability bug in src/install.sh. ############################################################################## Changes between release v1.0-beta-1.5 (October 14, 1994) and v1.0-beta-1.4: - Added a user manual that is intended to help both novice and advanced Harvest users better use the system. It covers the following topics: - Introduction to Harvest (1 page) - Subsystem Overview (2 pages) - Getting and Installing the Harvest software (1 page) - Making Basic Use of Harvest (3 pages) - Advanced Features of Harvest (5 pages) - References (1 page) - Appendix on The Summary Object Interchange Format (SOIF) (3 pages) - Appendix on Essence Summarizer Actions (1 page) - Appendix on Gatherer Examples (6 pages) - Appendix on Broker's Query Manager and Collector Interface (2 pages) - Changes to the Broker: - Improved Broker installation, and added the CreateBroker program that automatically creates and configures a Harvest Broker based on a brief Question & Answer session with the user. - Improved the Mosaic interface to be more user-friendly. - Added support for duplicate removal based on MD5 values. - Made Query Manager and Administrative interface more extensible. - Rewrote the Broker registry to improve performance and readability. - Added the dumpregistry command to view the Broker's registry. - Added the test-broker command for simple testing of a Broker. - Added support for wais-8-b5, freeWAIS, commerical WAIS, and Nebula. - Cleaned up the admin.html and query.html files. - Cleaned up much of the code to make more extensible. - Fixed bug in the registry garbage collection. - Fixed major memory leak bugs. - Fixed various minor bugs. - Changes to the Cache: - Started using icp version_id 2 of the protocol. - Improved support for OSF/1 v2.0 on 64-bit DEC Alphas. - Added password support for administrative interface. - Fix bug with FTP "Parent Directory", and cleaned up HTML for dirs. - Fixed various major bugs with hierarchial caching. - Fixed various minor bugs. - Changes to the Gatherer: - Added support for generating a sorted, unique keyword attribute, based on the Descripton, Partial-Text, or Keywords attribute. - Added an "allow only these types" in the Candidate Selection step. - Added stub Exploder type to help users use the unnesting step. - Gatherer automatically creates a gatherd.cf file if needed. - Fixed major gatherd bug that caused. - Fixed various minor bugs and memory leaks. - Changes to the Replicator: - Working on instrumenting the code to measure peformance. - Fixed various bugs. ############################################################################## $Id: ChangeLog,v 1.215 1996/02/01 22:45:26 duane Exp $