	       The development of mm 0.91 and ccmd 2.1
			    [28-Jun-1994]

			  Nelson H. F. Beebe
		   Center for Scientific Computing
		      Department of Mathematics
			  University of Utah
		       Salt Lake City, UT 84112
				 USA
		Email: beebe@math.utah.edu (Internet)


========================================================================
		       COPYRIGHTS, AND ALL THAT

The copyright on all of the work for mm 0.91 and ccmd 2.1 is hereby
assigned by the developer, Nelson H. F. Beebe <beebe@math.utah.edu>,
to The Trustees of Columbia University in the City of New York, with
grateful thanks to

* Mark Crispin and his many Arpanet collaborators who produced the
original mm in DEC-20 assembly code,

* the TOPS-20 developers and users who originally designed and
implemented the COMND% JSYS parser in TOPS-20,

* Andrew Lowry and Howie Kaye who wrote ccmd, the C reimplementation
of the wonderful DEC TOPS-20 COMND% JSYS command parser,

* Frank da Cruz, Christine M. Gianone, and all of the other Kermit
contributors.

for the past 16 years of daily use of mm and kermit.  They have both
handled umpteen gigabytes of data for me.
========================================================================

Version 2.1 of ccmd and 0.91 of mm represents a major step forward for
these packages, including the conversion of the code for compilation
under strict ANSI/ISO Standard C compilers.

The changes from mm 0.90 and ccmd 2.0 are too numerous to usefully
include as rcsdiff patches to that version; thus, the new versions
will have a brand new source distribution after it has been tested on
numerous machines.

The work began after the initial port of version mm 0.90 and ccmd 2.0
to the DEC Alpha running OSF/1 version 2.0.  That work uncovered a
dozen or more bugs, mostly due to the lack of interprocedure type
checking on the compilers available at the time these packages were
written in the mid 1980s, but also do to a coding style that can be
described as `vintage K&R', notably, the omission of function and
function argument types, when they were intended to be int types.
This practice led in several instances to passing on of arguments that
were really pointers: while this works on many machines, it fails on
those for which int and void* do not have the same size (e.g. DEC
Alpha, SGI Indigo MIPS R4000 (IRIX 5.0 or later), and Intel 80xxx).

The README.OSF1 file record fixes to these bugs, some of which
occurred many times:

	(1) FILE*, instead of char*, argument to unlink().
	(2) missing initial FILE* argument in fprint().
	(3) non-comment non-blank text following #endif and #else.
	(4) int argument, instead of char*, argument to fopen().
	(5) use of functions before definition or declaration.
	(6) function declaration as "type functionname()" and later
	    definition as "static type functionname()".
	(7) missing argument in signal handler.
	(8) wrong argument type to time().
	(9) extra */ on preprocess #endif comment.
	(10) duplication of library routine (basename()) with
	     different argument types.
	(11) wrong argument type in lookupzone() in dt.c; on the DEC
	     Alpha, int is 32-bit, while pointers are 64-bit, values.

The further development of mm 0.91 and ccmd 2.1 repaired a great many
more, typically wrong numbers or types of arguments, and non-void
functions returning without defining a function value.

Although I have been running mm under Solaris 2.x for 18 months, I
have not found it as reliable as the version under SunOS 4.1.x.  In
particular, there is nasty bug that results in mm going into an
infinite CPU-hogging loop with the curious side effect that the
process cannot be killed.  The only solution is to reboot the
computer.  Inasmuch as our department plans to upgrade 50 SunOS 4.1.3
systems to Sun Solaris 2.3 in the summer of 1994, the existence of
this bug is particularly troublesome; we would probably choose to run
the SunOS 4.1.3 version instead just to avoid it.  This bug has been
seen within a few minutes of starting mm, and also after 3 weeks or
more of successful use, which has made it frustrating to track down.
I ran mm under dbx for weeks, but each time the loop happened, either
dbx could not regain control, or when it did, the stack trace and
local variable dumps were uninformative.  Given the large number of
errors that the new development has uncovered and fixed, I have some
hope now that that particular bug has either been eliminated, or will
be catchable, and I expect to run mm under dbx on several
architectures during a suitable test period.

In order to root out further instances of such bugs, over the course
of 4 days, I made dozens of passes over the source code of ccmd and
mm, using Sun Solaris 2.3 lint and cc 3.0, and gcc 2.5.8, turning on
all warnings supported by those compilers.  It rapidly became evident
that the only way to do this properly was to provide the compilers
with full information about function arguments and types, through
Standard C and C++ function prototype declarations.

Fortunately, the public-domain mkptypes utility makes it easy to
generate these prototypes automatically from C source code, and I
followed modern programming practice of placing function prototypes
for global functions in header files to eliminate, or at least
minimize, prototype duplication.  lint fortunately provides the
necessary information to distinguish global functions from local ones;
the latter have been declared static, reducing global namespace
pollution.

Sun's Solaris 2.x cscope utility for building a rapid-access database
for C and C++ code was enormously helpful in tracking down function,
variable, preprocessor symbol, and header file uses.

------------------------------

All extern declarations have been moved from function bodies to the
file preamble, enhancing their visibility, and reducing duplication.

------------------------------

gcc identified all of the places where function types and function
arguments were omitted, and the omissions have been rectified.
Functions that don't return a value have been properly declared of
type void, rather than int.

------------------------------

In order to enhance compiler optimization possibilities, and conform
to modern coding practice, I also introduced the const modifier on
char* and char** variables where possible.  This took numerous passes
over the source code, but with the help of gcc's warning messages, was
completed reliably.

I had to rewrite the body of one routine (cmxerr() in ccmd/ccmdio.c)
to remove the need for temporary modification of a const string.
Although most compilers will just issue a warning, Sun Solaris 2.x
lint 3.0 views it as a fatal error, and the replacement code is
marginally faster because it removes one level of function call.  The
old code has been preserved in the false branch of a preprocessor
conditional.

------------------------------

gcc will produce several warnings of the type

filename.c:27: warning: cast discards `const' from pointer target type

These arise from functions that return char* pointers into const char*
argument strings; strcpy() is the best-known example; there is no way
do be strict about this in Standard C, and C++ designers are wrestling
with a syntax to deal with it cleanly.  They also arise from casting
of constant strings in the mady fdb structures.  Both of these
situations are expected, and the warnings are harmless.

------------------------------

gcc will raise numerous complaints about local variables shadowing
global variables; I simply ignore these, since the set of globals
varies from machine to machine.  I did draw the line at locals
shadowing locals, e.g.

int foo()
{
  int n;
  bar(n);
  {
     int n;
     for (n = 0; n < 3; ++n)
	bar(n);
  }
}

Every such case was dealt with by renaming the nested local variable.

------------------------------

gcc and lint also raise warnings about arguments that are unused;
these are either artifacts of older versions of ccmd and mm, or they
arise because the function is a memory of a family of functions that
are stored in tables, and must be called with identical arguments.
There is one aberration that I have not attempted to repair: function
indiract() in ccmd/stdact.c takes 4 arguments, instead of the 3 that
all of the other parsing action functions take.  This is a design flaw
which could be easily remedied through introduction of a global
variable to hold the fourth argument, or more cleanly, but also more
laboriously, by passing a struct as a single argument.  I have not
done either of these, but instead, simply introduced a special type
case, HANDLER_CAST4, to override the default function interpretation
at the point of the two calls in ccmd/ccmd.c.

------------------------------

The compilers and lint detected many instances of dead code, such as
local variables declared, or declared and assigned values, but then
never used.  I've eliminated them all, although in the case of larger
code sections, such as unused functions, I've left them in the code
bracketed by preprocessor lines like this:

#if 0
...dead code..
#endif

------------------------------

In ccmd, lint raised a great many warnings like these:

	bitwise operation on signed value possibly nonportable

Most have been eliminated by the introduction of a new unsigned data
type in ccmd/ccmd.h, flag_t, which is used for the command parser's
bit flags.  It is currently typedef'ed as "unsigned int".

------------------------------

In mm, there are 9 lint warnings

	function falls off bottom without returning value

which arise because a function ends with a while(1) loop that contains
a return statement; a smarter lint implementation would detect that
this is legitimate, and be silent.

------------------------------

lint raises warnings about free() calls where the argument type does
not match the library type (often char*, but strictly, void* in
Standard C).  These could be eliminated by explicit casts, but a
cleaner solution would be to redefine free to include an appropriate
cast:

	#define free(p) (free)((char*)(p))

In Standard C, the parentheses around (free) prevent recursive macro
expansion; unfortunately, older preprocessors, and even the odd
supposedly Standard C preprocessor, will go into an infinite loop with
this form, so I have refrained from using it, and instead, just ignore
lint and compiler warnings about argument types to free().

The same problem arises with the first argument of realloc(), and as
with free(), it has been left unresolved.

------------------------------

In Standard C, memory lengths passed to library routines are of type
size_t, which is required to be an unsigned type.  However, older
implementations have defined it as a signed type, and still older ones
variously use int or long.  This affects many library calls in ccmd
and mm.  Rather than use explicit casts, I recommend using a good
Standard C compiler which will supply the appropriate casts at compile
time, based on its having seen prototypes for all library functions
used in ccmd and mm.

------------------------------

lint warns

	constant in conditional context

about loops coded as "while (1) {...}".  A better choice would be
"for(;;) {...}", which eliminates the warning, and has the same
effect; it might even produce tighter code.

------------------------------

A common practice in mm 0.90 and ccmd 2.0 was conditionals of the form

	if (a = b) {...}

The problem with this is that it requires further study to determine
whether the programmer intended assignment, and subsequent test for
non-zero, or whether the code is wrong, and should have been an
equality test:

	if (a == b) {...}

gcc and lint warn about these, and I fixed several errors where == was
meant, but = was entered.  I even fixed one assignment written as
"a == b;", changing it to the correct "a = b;".

gcc recommends additional parentheses when the first form is wanted,
but there is a better solution: C's comma operator.  I've therefore
rewritten almost all of the first form as

	if ((a = b, a)) {...}

which makes it clear that assignment, then a test for non-zero, is
wanted.  The only places that I refrained from this were ones with
side effects, such as

	if (*a++ = *b++) {...}

------------------------------

lint warnings of the form

	pointer cast may result in improper alignment

can be safely ignored; they are a generic problem with lint
implementations.

------------------------------

mm and ccmd have 5 functions that take variable numbers of arguments.
The current implementation uses the Berkeley <varargs.h> interface.
This poses problems for type checking, and Standard C provides
<stdarg.h> which the restriction that the first argument must always
be explicitly typed, with ... marking remaining arguments.

extern int printf(const char *, ...);

lint doesn't know about the varargs.h style, and warns

	argument does not match remembered type: arg #1

I have not updated the code to use the new style, which would
eliminate the lint warnings; that is work for the next version.  All
of the routines that take a variable number of arguments are easily
identified in ccmd/ccmd.h and mm/extern.h by declarations like this:

extern int sorry ARGS(()); /* NB: cannot represent variable number of arguments
			with old-style varargs.h interface */

The (()) string is distinctive, and occurs in only 5 places in the
entire source code, in those two header files.

------------------------------

lint detects more than 350 external names that collide with others in
their first 6 characters.  I have not attempted to produce
preprocessor symbols to map them to names of 6 or fewer characters, on
the grounds that such severe limits have effectively disappeared,
spurred on perhaps by C++'s name mangling that can require hundreds of
unique characters in external names.

------------------------------

In mm/message.h, getmsg is #define'd to Getmsg, to avoid a conflict
with a Sun library routine.  getmsg() is defined in mm/send.c, and
used there and in mm/sendcmds.c.

------------------------------

The return values of malloc() and realloc() are now always properly
type cast; Standard C types them as void*, where the void* type is
guaranteed to be as large as the pointer to any non-function data
type.  The private memory management functions dcalloc(), dfree(),
dmalloc(), drealloc(), safe_free(), and safe_realloc() now deal with
void*, rather than char*, data.  In a pre-Standard C environment where
void* is not recognized, the code is simply compiled with void
redefined to char at compile time.

------------------------------

I ran mm under the Sun dbx 3.0 debugger with its "check -leaks" option
to check for allocated, but unfreed, memory: this resulted in addition
of several free() calls to eliminate some of the memory leaks.  I also
used dbx's "check -access" option to check for references to
uninitialized variables, and as a consequence, made a source change in
ccmd.c to initialize h->enabled in one place where it was missing.

------------------------------

The Makefiles have been updated to have machine-specific target names,
avoiding the need to make source code changes of configuration files.
ccmd has only a modest number of machine-dependencies which are easily
handled by compile-time preprocessor symbol definitions.  mm has many
more; they are encapsulated in the s-xxx.h files, and one of those
file is included in config.h by this code:

#ifdef S_FILE
#include S_FILE
#else
#include "s-sun50.h"
#endif

This form permits defining the s-xxx.h file as the value of S_FILE at
compile time, so that config.h should not need editing.  I did find
however, that Sun's lint 3.0 does not handle the

#include S_FILE

form: it always included the other file.  For development purposes, I
made it default to s-sun50.h to match the system where most of the mm
0.91 and ccmd 2.1 development was carried out.

------------------------------


========================================================================

FUTURE WORK

(1) Extend ccmd to support 8-bit characters; the current version is
based on 7-bit characters, with break table sizes and macros in
ccmd/cmfncs.h set accordingly.  In mm/send.c, characters are masked
against 0x7f to strip the high order bit, and break tables in
mm/address.c, mm/alias.c, mm/keywords.c, mm/parse.c, mm/parsemsg.c,
mm/seq.c, and mm/sys-prof.c are initialized assumed 128 characters.

(2) Convert to use stdarg.h instead of varargs.h.

(3) Extended function headers to support compilation under K&R C, and
Standard C and C++, possibly using the style

#if STDC
int
foo(int a, const char *b)
#else
int
foo(a,b)
int a;
const char *b;
#endif
{
...
}

where STDC is defined like this:

#if defined(__STDC__) || defined(__cplusplus) || defined(c_plusplus)
#define STDC 1
#else
#define STDC 0
#endif

or possibly using argument-count specific macros to allow code like
this:

int
foo ARGS_2((a,b), int a, const char *b)
{
...
}

with definitions like this:

#if STDC
#define ARGS_2(list,a,b) (a, b)
#else
#define ARGS_2(list,a,b) list a; b;
#endif

I made some exploratory attempts at C++ with copies of a couple of mm
source files modified to use Standard C/C++ style function headers.
C++ has some extra reserved words, including new, this, and delete,
which are used in the mm and ccmd source code as variable names; this
could be easily worked around with preprocessor definitions.  More
troublesome is the reuse of typedef names as variable names, which is
illegal in C++, e.g.

typedef struct mail_msg {
    headers *headers;			/* list of unordered headers */
...
}

Fixing this will require manual editing of the source code.

Nevertheless, the strict type checking of C++ compilers has proved so
useful in my other C code development that I routinely use them in
preference to C compilers.  It is much easier to fix bugs caught at
compile time than at core dump time.

(4) Redesign ccmd and rewrite from scratch in C++, with much more
information hiding and visibility control, and a much simpler
interface to the ccmd parsing facilities, which were largely copied
intact from the DEC-20 assembly code implementation.

(5) Convert ccmd/ccmd.guide to (La)TeXinfo for convenient typeset and
online documentation.
