HardenedBSD/gnu/usr.bin/cvs/PROJECTS

This is a list of projects for CVS.  In general, unlike the things in
the TODO file, these need more analysis to determine if and how
worthwhile each task is.

I haven't gone through TODO, but it's likely that it has entries that
are actually more appropriate for this list.

0. Improved Efficency

* CVS uses a single doubly linked list/hash table data structure for
  all of its lists.  Since the back links are only used for deleting
  list nodes it might be beneficial to use singly linked lists or a
  tree structure.  Most likely, a single list implementation will not
  be appropriate for all uses.

  One easy change would be to remove the "type" field out of the list
  and node structures.  I have found it to be of very little use when
  debugging, and each instance eats up a word of memory.  This can add
  up and be a problem on memory-starved machines.

  Profiles have shown that on fast machines like the Alpha, fsortcmp()
  is one of the hot spots.

* Dynamically allocated character strings are created, copied, and
  destroyed throughout CVS.  The overhead of malloc()/strcpy()/free()
  needs to be measured.  If significant, it could be minimized by using a
  reference counted string "class".

* File modification time is stored as a character string.  It might be
  worthwile to use a time_t internally if the time to convert a time_t
  (from struct stat) to a string is greater that the time to convert a
  ctime style string (from the entries file) to a time_t.  time_t is
  an machine-dependant type (although it's pretty standard on UN*X
  systems), so we would have to have different conversion routines.
  Profiles show that both operations are called about the same number
  of times.

* stat() is one of the largest performance bottlenecks on systems
  without the 4.4BSD filesystem.  By spliting information out of
  the filesystem (perhaps the "rename database") we should be 
  able to improve performance.

* Parsing RCS files is very expensive.  This might be unnecessary if
  RCS files are only used as containers for revisions, and tag,
  revision, and date information was available in easy to read 
  (and modify) indexes.  This becomes very apparent with files
  with several hundred revisions.

* A RCS "library", so CVS could operate on RCS files directly.  

  CVS parses RCS files in order to determine if work needs to be done,
  and then RCS parses the files again when it is performing the work.
  This would be much faster if CVS could do whatever is necessary 
  by itself.

1. Improved testsuite/sanity check script

* Need to use a code coverage tool to determine how much the sanity
  script tests, and fill in the holes.
Import CVS-1.6.3-951211.. Basically, this is the cvs-1.6.2 release plus a couple of minor changes.. Some highlights of the new stuff that was not in the old version: - remote access support.. full checkout/commit/log/etc.. - much improved dead file support.. - speed improvements - better $CVSROOT handling - $Name$ support - support for a "cvsadmin" group to cut down rampant use of "cvs admin -o" - safer setuid/setgid support - many bugs fixed.. :-) - probably some new ones.. :-( - more that I cannot remember offhand.. 1995-12-10 23:31:43 +01:00			`This is a list of projects for CVS. In general, unlike the things in`
			`the TODO file, these need more analysis to determine if and how`
			`worthwhile each task is.`

			`I haven't gone through TODO, but it's likely that it has entries that`
			`are actually more appropriate for this list.`

			`0. Improved Efficency`

			`* CVS uses a single doubly linked list/hash table data structure for`
			`all of its lists. Since the back links are only used for deleting`
			`list nodes it might be beneficial to use singly linked lists or a`
			`tree structure. Most likely, a single list implementation will not`
			`be appropriate for all uses.`

			`One easy change would be to remove the "type" field out of the list`
			`and node structures. I have found it to be of very little use when`
			`debugging, and each instance eats up a word of memory. This can add`
			`up and be a problem on memory-starved machines.`

			`Profiles have shown that on fast machines like the Alpha, fsortcmp()`
			`is one of the hot spots.`

			`* Dynamically allocated character strings are created, copied, and`
			`destroyed throughout CVS. The overhead of malloc()/strcpy()/free()`
			`needs to be measured. If significant, it could be minimized by using a`
			`reference counted string "class".`

			`* File modification time is stored as a character string. It might be`
			`worthwile to use a time_t internally if the time to convert a time_t`
			`(from struct stat) to a string is greater that the time to convert a`
			`ctime style string (from the entries file) to a time_t. time_t is`
			`an machine-dependant type (although it's pretty standard on UN*X`
			`systems), so we would have to have different conversion routines.`
			`Profiles show that both operations are called about the same number`
			`of times.`

			`* stat() is one of the largest performance bottlenecks on systems`
			`without the 4.4BSD filesystem. By spliting information out of`
			`the filesystem (perhaps the "rename database") we should be`
			`able to improve performance.`

			`* Parsing RCS files is very expensive. This might be unnecessary if`
			`RCS files are only used as containers for revisions, and tag,`
			`revision, and date information was available in easy to read`
			`(and modify) indexes. This becomes very apparent with files`
			`with several hundred revisions.`

			`* A RCS "library", so CVS could operate on RCS files directly.`

			`CVS parses RCS files in order to determine if work needs to be done,`
			`and then RCS parses the files again when it is performing the work.`
			`This would be much faster if CVS could do whatever is necessary`
			`by itself.`

			`1. Improved testsuite/sanity check script`

			`* Need to use a code coverage tool to determine how much the sanity`
			`script tests, and fill in the holes.`