


                    WORDS (LATIN) DOCUMENTATION 1.93                    


WORDS 1.93 - 
LATIN-to-ENGLISH DICTIONARY PROGRAM
--------------------------------------------------------------------------


                        WORDS (Latin) Version 1.93                        

INSTALLATION
SUMMARY
INTRODUCTION
OPERATIONAL DESCRIPTION
 Program Operation
 Examples
 Signs and Abbreviations in Meaning
BRIEF PROGRAM DESCRIPTION
 Special Cases
 Uniques
 Tricks
 Codes in Inflection Line
 Help for Parameters
GUIDING PHILOSOPHY
 Purpose 
 Method
 Word Meanings
 Proper Names
 Letter Conventions
 Dictionary Codes
 Evolution of the Dictionary
--------------------------------------------------------------------------


                               INSTALLATION                               
 


The WORDS program, with it's accompanying data files should run on PC in 
DOS/Windows 95, any monitor.  Simply download the self-extracting EXE 
file: WORD193D.EXE for DOS, WORD193W.EXE for Windows 95 or NT, 
and execute it in your chosen subdirectory to UNZIP the files into a 
subdirectory of a hard disk.  Then call WORDS.  


                                 SUMMARY                                  
 

This program, WORDS.EXE, for the PC (DOS and Windows 95 console versions),
takes keyboard input or a file of Latin text lines and provides an 
analysis of each word individually.  It uses an INFLECT.SEC, UNIQUES., 
ADDONS., STEMFILE.GEN, INDXFILE.GEN, and DICTFILE.GEN, and possibly .SPE 
and DICT.LOC.  

The dictionary contains over 17000 entries, as would be counted in an 
ordinary dictionary.  This expands to almost twice that number of 
individual stems (the count that the program displays at startup), and may
generate many hundreds of thousands of 'words' that one can construct over
all the declensions and conjugations.  This is still a modest dictionary 
in absolute size.  Kidd's Collins Latin Gem, a breast-pocket (8 by 11 cm.)
edition (which even has English_to_Latin) contains about 17,000 Latin 
entries.  The ultimate 2100 page Oxford Latin Dictionary has about 34,000 
entries, excluding proper names (and it has lots of those).  This version 
of WORDS provides a tool to help in translations for the Latin student, 
but the dictionary is slowly growing, and A through B can match any 
dictionary.  

A few hundred prefixes and suffixes further enlarge the range.  These can 
generate additional words - some of which are recognized Latin words, some
are perfectly reasonable words which may never have been used by Cicero or
Caesar but might have been used by Augustine or a monk of Jarrow, and some
are nonsense.  


                               INTRODUCTION                               


I am no expert in Latin, indeed my training is limited to a couple of 
years in high school almost 50 years ago.  But I always felt that Latin, 
as presented after two millennia, was a scientific language.  It had the 
interesting property of inflection, words were constructed in a logical 
manner.  I admired this feature, but could never remember the vocabulary 
well enough when it came time to exercise it on tests.  

I decided to automate an elementary-level Latin vocabulary list.  As a 
first stage, I produced a computer program that will analyze a Latin word 
and give the various possible interpretations (case, person, gender, 
tense, mood, etc.), within the limitations of its dictionary.  This might 
be the first step to a full parsing system, but, although just a 
development tool, it is useful by itself.  

While developing this initial implementation, based on different sources, 
I learned (or re-learned) something that I had overlooked at the 
beginning.  Latin courses, and even very large Latin dictionaries, are put
together under very strict ground rules.  Some dictionary might be based 
exclusively on 'Classical' (200 BC - 200 AD) texts; it might have every 
word that appears in every surviving writing of Cicero, but nothing much 
before or since.  Such a dictionary will be inadequate for translating 
medieval theological or scientific texts.  In another example, one 
textbook might use Caesar as their main source of readings (my high school
texts did), while another might avoid Caesar and all military writings 
(either for pacifist reasons, or just because the author had taught Caesar
for 30 years and had grown bored with going over the same material, year 
after year).  One can imagine that the selection of words in such 
different texts would differ considerably; moreover, even with the same 
words, the meanings attached would be different.  This presents a problem 
in the development of a dictionary for general use.  

One could produce a separate dictionary for each era and application or a 
universal dictionary with tags to indicate the appropriate application and
meaning for each word.  With such a tag arrangement one would not be 
offered inappropriate or improbable interpretations.  The present system 
has such a mechanism, but it is not yet exploited.  

The Version 1.93 dictionary may be found to be of fairly general use for 
the student; it has the easy words that every text uses.  It also has a 
goodly number of adverbs, prepositions, and conjunctions, which are not as
sensitive to application as are the nouns and verbs.  The system also 
tests a number of prefixes and suffixes, if the raw word cannot be found.  
This allows an interpretation of many of the words otherwise unknown.  The
result of this analysis is fairly straightforward in most cases, is 
accurate but esoteric in some, and for about 1 in 10 constructed words it 
gives an answer that has no relation to the normal dictionary meaning.  

With this facility, and a 17000 word dictionary, trials on some tested 
classical texts have given hit rates of 97-99%, excluding proper names.  
(There are very few proper names in the dictionary.) (I am an old soldier 
and seem to have in the dictionary every possible word for attack or 
distroy.  The system is near perfect for Caesar.) The question arises, 
what hit rate can be expected for a general dictionary.  Classical Latin 
dictionaries have no references to the terminology of Christian theology.  
The legal documents and deeds of the Middle Ages are a challenge of jargon
and abbreviations.  These areas require special knowledge and vocabulary, 
but even there the ability to handle the non-specialized words is a large 
part of the effort.  

The system allows the inclusion of specialized vocabulary (for instance a 
SPEcial dictionary for specialized words not in most dictionaries), and 
the opportunity for the user to add additional words 'on the fly' to a 
DICT.LOC.  

The program is probably much larger than is necessary for the present 
application.  It is stiil in development but some effort has now been put 
into optimization.  

This is a Shareware program, which means it is proper to copy it and pass 
it on to your friends.  Consider it a developmental item for which there 
is at this time no charge.  However, it is Copyrighted (c), so please 
don't sell it as your own without at least telling me.  

This version is distributed without obligation, but the developer would 
appreciate comments and suggestions.  


William A Whitaker 
PO Box 3036 
McLean VA 22103-3036 
USA 
whitaker@erols.com


                         OPERATIONAL DESCRIPTION                          
 

This write up is rudimentary and assumes that the user is experienced with
computers.  

The WORDS program, Version 1.93, with it's accompanying data files should 
run on PC in DOS/Windows 95, any monitor.  Simply download the 
self-extracting EXE file and execute it in your chosen subdirectory to 
UNZIP the files into a subdirectory of a hard disk.  Then call WORDS.  

There are a number of files associated with the program.  These must be in
the subdirectory of the program, and the program must be run from that 
subdirectory.  

    * WORDS.EXE is the executable program.  

    * INFLECT.SEC holds the encoded inflection records.  

    * STEMFILE.GEN contains the stems of the GENERAL dictionary.  

    * MEANFILE.GEN contains the meanings of the GENERAL dictionary 
    entries.  

    * INDXFILE.GEN contains a set of indexes into the DICTFILE.  

    * There may also be a set of files for a SPECIAL (.SPE) dictionary
    of the same structure as the GENERAL dictionary, but there is no 
    SPECIAL dictionary in the present distribution.  

    * A LOCAL dictionary may also be used.  This is a limited 
    dictionary of a different form, which is human readable and 
    writeable.  The knowledgeable user can augment and modify it 
    on-line.  It would consist of the file DICT.LOC.  

    * UNIQUES.  contains certain words which regular processing does 
    not get.  

    * ADDONS.  contains the set of prefixes, suffixes and enclitics 
    (-que, -ve) and the like.  

    * Other files may be generated by the program, so run it in a 
    configuration that allows the creation of files.  

All these files are necessary to run the program (except the optional 
dictionaries SPE and LOC).  This excess of files is a consequence of 
the present developmental nature of the program.  The files are very 
simple, almost human-readable.  Presumably, a later version could 
condense and encode them.  Nevertheless, beyond the original COPY, the
user need not worry about them.  

Additionally, there are files that the program may produce on request.
All of these share the name WORD, with various extensions, and they 
are all ASCII text files which can be viewed and processed with an 
ordinary editor.  The casual user probably does not want to get 
involved with these.  WORD.OUT will record the whole output, WORD.UNK 
will list only words the program is unable to interpret.  These 
outputs are turned on through the PARAMETERS mechanism.  

PARAMETERS may be set while running the program by inputting a line 
containing a '#' mark as the only (or first) character.  
Alternatively, WORD.MOD contains the MODES that can be set by 
CHANGE_PARAMETERS.  If this file does not exist, default modes will be
used.  The file may be produced or changed when changing parameters.  
It can also be modified, if the user is sufficiently confident, with 
an editor, or deleted, thereby reverting to defaults.  

(There is another set of developers parameters which may be set in 
some versions with the input of '!'.  These are not normal user 
facilities and no one but the developer would be interested.  They are
just mentioned here in case they ever come up accidentally, and to 
point out that there are other capabilities, actual and possible, 
which may be invoked if there is a special need.) 

WORD.OUT is the file produced if the user requests, in 
CHANGE_PARAMETERS, output to a file.  This output can be used for 
later manipulation with a text editor, especially when the input was a
text file of some length.  If the parameter UNKNOWNS_ONLY is set, the 
output serves as a sort of a Latin spell checker.  Those words it 
cannot match may just not be in the dictionary, but alternatively they
may be typos.  A WORD.UNK file of unknowns can be generated.  


Program Operation 


To start the program, in the subdirectory that contains all the files,
type WORDS.  A setup procedure will execute, processing files.  Then 
the program will ask for a word to be keyed in.  Input the word and 
give a return (ENTER).  Information about the word will be displayed.  

One can input a whole line at a time, but only one line since the 
return at the end of line will start the processing.  If the results 
would fill more than a computer screen, the output is halted until 
the user responds to the 'MORE' message with a return.  A file 
containing a text, a series of lines, can be input by keying in the 
character '@', followed (with no spaces) by the DOS name of the file 
of text.  This input file need not be in the program subdirectory, 
just use the full DOS path and name of the file.  This is usually
accompanied with the setting of the parameter switchs to create and
write to an output file, WORD.OUT.

One can have a comment in the file, a terminal portion of a line
that is not parsed.  This could be an English meaning, a source 
where the word was found, an indication that it may have been
miscopied, etc.  A comment begins with a double dash [--] and
continues to the end of the line.  The '--' and everything after 
on that line is ignored by the program.  

A '#' character input will permit the user to set modes to prevent the
process from trying prefixes and suffixes to get a match on an item 
unknown to the dictionary, put output to a file, etc.  Going into the 
CHANGE_PARAMETERS, the '?' character calls help for each entry.  

Two successive returns with no no text will terminate the program 
(except in text being read from an @ disk file.) 


Examples


Following are anotated examples of output.  Examination of these will 
give a good idea of the system.  The present version may not match 
these examples exactly - things are changing - but the principle is 
there.  A recent modification is the output of dictionary forms or 
'principle parts' (shown below for some examples).  

=>agricolarum
agricol.arum       N      1  1 GEN  P M P
agricola, agricolae
farmer

This is a simple first declension noun, and a unique interpretation.  
The '1 1' means it is first declension, with varient 1. This is an 
internal coding of the program, and may not correspond exactly with 
the grammatical numbering.  The 'N' means it is a noun.  It is the 
form for genative (GEN), plural (1st 'P').  The stem is masculine (M) 
and represents a person (2nd 'P').  The stem is given as 'agricol' and
the ending is 'arum'.  The stem is normal in this case, but is a 
product of the program, and may not always correspond to conventional 
usage.  

=>feminae
femin.ae           N      1  1 GEN  S F P
femin.ae           N      1  1 DAT  S F P
femin.ae           N      1  1 NOM  P F P
femin.ae           N      1  1 VOC  P F P
femina, feminae
woman

This word has several possible interpretations in case and number 
(Singular and Plural).  The gender is Feminine.  Presumably, the user 
can examine the adjoining words and reduce the set of possibilities.  
Maybe the program will take care of this in some future version.  

=>cornu
corn.u             N      4  2 NOM  S N T
corn.u             N      4  2 DAT  S N T
corn.u             N      4  2 ACC  S N T
corn.u             N      4  2 ABL  S N T
cornu, cornus
horn (of an animal); horn, trumpet; wing of an attacking army

Here is an example of another declension and a second varient.  The 
Masculine (-us) nouns of the declension (fructus) are '4 1' and the 
Neuter (-u) nouns are coded as '4 2'.  This word is neuter (2nd N) and
represents a thing (T).  

=>ego
ego                PRON   5  1 NOM  S C PERS     
I, me; myself

A pronoun is much like a noun.  The gender is common (C), that is, it 
may be masculine or feminine.  It is a personal (PERS) pronoun.  

=>illud
ill.ud             PRON    6  1 NOM S N ADJECT                
ill.ud             PRON    6  1 ACC S N ADJECT                
that; those (pl.); also DEMONST

Here we have an adjectival (ADJECT) and demonstrative (DEMONST) 
pronoun.  

=>hic
hic                ADV    POS                                 
here, in this place                                                      
h.ic               PRON    3  1 NOM S M ADJECT                
this; these (pl.); also DEMONST

In this case there is a adjectival/demonstrative pronoun, or it may be
an adverb.  The POS means that the comparison of the adverb is 
positive.  

=>bonum
bon.um             N      2  2 NOM  S N T
bon.um             N      2  2 ACC  S N T
good thing, profit, advantage; goods (pl.), possessions                  
bon.um             ADJ    1  1 NOM  S N POS   
bon.um             ADJ    1  1 ACC  S M POS   
bon.um             ADJ    1  1 ACC  S N POS   
bon.um             ADJ    1  1 VOC  S N POS   
good, honest, brave, noble; better; best

Here we have an adjective, but it might also be a noun.  The 
interpretation of the adjective says that it is POSitive, but note 
that there are meanings for COMParative and SUPERlative also on the 
line.  Check the comparison value before deciding.  

=>facile
facile             ADV    POS   
easily, readily                                                          
facil.e            ADJ    3  2 NOM  S N POS   
facil.e            ADJ    3  2 ACC  S N POS   
facil.e            ADJ    3  2 VOC  S N POS   
easy, easy to do, without difficulty, ready, quick, good natured, courteous

Here is an adjective or and adverb.  Although they are related in 
meaning, they are different words.  

=>acerrimus
acerrim.us         ADJ    3  2 NOM  S M SUPER 
sharp, bitter, pointed, piercing, shrill; sagacious, keen; severe, vigoro

Here we have an adjective in the SUPERlative.  The meanings are all 
POSitive and the user must add the -est by himself.  

=>optime
optim.e            ADJ    1  1 VOC  S M SUPER 
good, honest, brave, noble; better; best                                 
optime             ADV    SUPER 
well, very, quite, rightly, agreeably, cheaply, in good, style; better; best
Here is an adjective or and adverb, both are SUPERlative.  

=>monuissemus
monu.issemus       V       2  1 PLUP ACTIVE  SUB  1 P X       
remind, advise, warn; teach; admonish; foretell

Here is a verb for which the form is PLUPerfect, ACTIVE, SUBjunctive, 
1st person, Plural.  It is 2nd conjugation, varient 1. 

=>amat
am.at              V       1  1 PRES ACTIVE  IND  3 S X       
amo, amare, amavi, amatus
love, like; fall in love with; be fond of; have a tendency to

Another regular verb, PRESent, ACTIVE, INDicative.  

=>amatus
amat.us            VPAR    1  1 NOM S M PERF PASSIVE PPL X    
amo, amare, amavi, amatus
love, like; fall in love with; be fond of; have a tendency to

Here we have the PERFect, PASSIVE ParticiPLe, in the NOMinative, 
Singlar, Masculine.  

=>amatu
amat.u             SUPINE  1  1 ABL S X                       
amo, amare, amavi, amatus
love, like; fall in love with; be fond of; have a tendency to

Here is the SUPINE of the verb in the ABLative Singular.  

=>orietur
ori.etur           V       3  4 FUT  PASSIVE IND  3 S DEP     
rise, arise; spring from, appear; be descended; begin, proceed, originate

For DEPondent verbs the passive form is to be translated as if it 
were active voice.  

=>ab
ab                 PREP   ABL 
by, from, away from

Here is a PREPosition that takes an ABLative object.  

=>sine
sin.e              V       3  1 PRES ACTIVE  IMP  2 S X       
allow, permit                                                            
sine               PREP   ABL 
without

Here is a PREPosition that might also be a Verb.  

=>contra
contra             PREP   ACC 
against, opposite; facing; contrary to, in reply to                      
contra             ADV    POS   
in opposition, in turn; opposite, on the contrary

Here is a PREPosition that might also be an ADVerb.  This is a very 
common situation, with the meanings being much the same.  

=>et
et                 CONJ   
and, and even; also, even;  (et ... et = both ... and)

Here is a straight CONJunction.  

=>vae
vae                INTERJ 
alas, woe, ah; oh dear;  (Vae, puto deus fio.)

Here is a straight INTERJection.  

=>septem
septem             NUM     2  0 X   X X CARD       7          
seven

An additional provision is the attempt to recognize and display the 
value of Roman numerals, even combinations of appropriate letters that
do not parse conventionally to a value but may be ill-formed Roman 
numerals.  

=>VII
vii                NUM     2  0 X   X X CARD       7          
   7  as a ROMAN NUMERAL

Generally, the meaning is given for the base word, as is usual for 
dictionaries.  For the verb, it will be a present meaning, even when 
the tense given is perfect.  For an adjective, the positive meaning is
given, even if a comparative or superlative form is shown.  This is 
also so when a word is constructed with a suffix, thus an adverb 
constructed from its adjective will show the base adjective meaning 
and an indication of how to make the adverb in English.  
 

Signs and Abbreviations in Meaning


, [comma] is used to separate meanings that are similar.  The 
philosophy has been to list a number of synonyms just to key the 
reader in making his translation.  There is no rigor in this.  

; [semicolon] is used to separate sets of meanings that differ in 
intent.  This is just a general tendency and is not rigorously 
enforced.  

/ [solidus] means 'or' or gives an alternative word.  It sometimes 
replaces the comma and is often used to compress the meaning into a 
short line.  

?  [question mark] in a meaning implies a doubt about the 
interpretation, or even about the existance of the word at all.  For 
the purposes of this program, it does not matter much.  If the word 
does not exist, no one will ask for it.  If it appears in some text, 
the reader is warned that the interpretation is questionable, but it 
the best that is available.  

~ [tilde] stands for the stem or word in question.  It is just a space
saving shorthand or abbreviation.  

=> in meaning this indicates a translation.  

abb.  abbreviation 

(pl.) means that the Latin word is believed by scholars to be used 
always in the plural.  If it appears in the beginning of the meaning, 
before the first comma, it applies to all the meanings.  If it appears
later, it applies only to that and later meanings.  For the purpose of
this program, this is only advisory.  While it is used by some tools 
to find the expected dictionary entry, the program does not exclude a 
singular form in the output.  While it may be true that in good, 
classical Latin it is never used in the singular, this does not mean 
that some text somewhere might not use the singular, nor that it is 
uncommon in later Latin.  

(usu.) usually is weakly advisory (usu.  pl.  is even weaker than pl.  
and may imply that the pl.  tendency occured only during certain 
periods).  

(esp.) especially indicates a significant association, but also 
advisory.  

L&S [Lewis and Short] is used to indicate that the meaning starting 
from the previous semicolon is information from Lewis and Short 'A 
Latin Dictionary' that differs from, or significantly expands on, the 
meaning in the 'Oxford Latin Dictionary' (OLD) which is the baseline 
for this program.  This is not to imply that the meaning listed is 
taken directly from the OLD, just that it is not inconsistant with 
OLD, but the L&S information is.  The program is not taking a position
on which is better, just warning the reader that there is some 
difference.  Often this difference is a meaning that is appropriate 
for late Latin, which is within the scope of LS but not of OLD.S but 
not of OLD.  


                      BRIEF PROGRAM DESCRIPTION                       
 

A effect of the program is to derive the structure and meaning of 
individual Latin words.  A procedure was devised to: 

    * examine the ending of a word, 

    * compare it with the standard endings, 

    * derive the possible stems that could be consistent, 

    * compare those stems with a dictionary of stems, 

    * eliminate those for which the ending is inconsistent with the 
    dictionary stem (e.g., a verb ending with a noun dictionary item),
    

    * if unsuccessful, it tries with a large set of prefixes and 
    suffixes, and various tackons (e.g., -que), 

    * finally it tries various 'tricks' (e.g., 'ae' may be replaced by
    'e', 'inp' by 'imp', syncope, etc.), 

    * and it reports any resulting matches as possible 
    interpretations.  

With the input of a word, or several words in a line, the program 
returns information about the possible accedience, if it can find an 
agreeable stem in its dictionary.  

=>amo
am.o               V       1  1 PRES ACTIVE  IND  1 S X       
love, like; fall in love with; be fond of; have a tendency to

To support this method, an INFLECT.SEC data file was constructed 
containing possible Latin endings encoded by a structure that 
identifies the part of speech, declension, conjugation, gender, 
person, number, etc.  This is a pure computer encoding for a 'brute 
force' search.  No sophisticated knowledge of Latin is used at this 
point.  Rules of thumb (e.g., the fact, always noted early in any 
Latin course, that a neuter noun has the same ending in the nominative
and accusative, with a final -a in the plural) are not used in the 
search.  However, it is convenient to combine several identical 
endings with a general encoding (e.g., the endings of the perfect 
tenses are the same for all verbs, and are so encoded, not replicated 
for every conjugation and variant).  

Many of the distinguishing differences identifying conjugations come 
from the voiced length of stem vowels (e.g., between the present, 
imperfect and future tenses of a third conjugation I-stem verb and a 
fourth conjugation verb).  These aural differences, the features that 
make Latin 'sound right' to one who speaks it, are lost entirely in 
the analysis of written endings.  

The endings for the verb conjugations are the result of trying to 
minimize the number of individual endings records, while yet keeping 
the structure of the inflections data file fairly readable.  There is 
no claim that the resulting arrangement is consonant with any 
grammarian's view of Latin, nor should it be examined from that 
viewpoint.  While it started from the conjugations in text books, it 
can only be viewed as some fuzzy intermediate step along a path to a 
mathematically minimal number of encoded verb endings.  Later versions
of the program might improve the system.  

There are some egregious liberties taken in the encoding.  With the 
inclusion of two present stems, the third conjugation I-stem verbs may
share the endings of the regular third conjugation.  The fourth 
conjugation has disappeared altogether, and is represented as a 
somewhat modified variant of the third conjugation (3, 4)!  There is 
an artificial fifth conjugation for esse and others, and a sixth for 
eo.  

As an example, a verb ending record has the structure: 

    * PART the part code for a verb = V 

    * CONjugation consisting of two parts: 

    * WHICH a conjugation identifier - range 0..9 

    * VAR a variant identifier, on WHICH - range 0..9 

    * TENSE an enumeration type - range PRES..FUTP X X 

    * VOICE an enumeration type - range ACTIVE..PASSIVE X X 

    * MOOD an enumeration type - range IND..PPL X X 

    * PERSON person, first to third - range 1..3 0 0 

    * NUMBER an enumeration type - range S..P X X 

    * KIND enumeration type of verb - range TO_BE..PERFDEF X X 

    * KEY which stem to be used - range 1..4 

    * SIZE number of characters - range 0..9 

    * ENDING the ending as a string of SIZE characters 

Thus, the entry for the ending appropriate to 'amo' is: 

V 1 1 PRES IND ACTIVE 1 S X 1 o

KIND is not often used with the verb endings, but is part of the 
record for convenience elsewhere.  For verbs, the KIND has not yet 
been exploited significantly, except for DEP and IMPERS.  

The rest of the elements are straightforward and generally use the 
abbreviations that are common in any Latin text.  An X or 0 represents
the 'don't know' or 'don't care' for enumeration or numeric types.  
Details are documented below in the CODES section.  

A verb dictionary record has the structure: 

    * STEMS for a verb there are 4 stems 

    * PART the part code for a verb = V 

    * WHICH a conjugation identifier - range 0..6 (actually 1..6 & 0) 

    * VAR a variant identifier - range 0..9 (actually 1..5 & 0) 

    * KIND enumeration type of verb - range TO_BE..PERFDEF & X 

    * MEANING text for English translations (up to 80 characters) 

Thus, an entry corresponding to 'amo amare amavi amatus' is: 

am am amav amat 
V 1 1 X            X X X X X 
like, love

(The dangling X X X X X are used to encode information about the time 
in which this word is found and the subject area.  There is not yet 
enough details in the dictionary to allow much exploitation of this 
information.) 

Endings may not uniquely determine which stem, and therefore the right
meaning.  'portas' could be the ablative plural of 'gate', or the 
second person, singular, present indicative active of 'carry'.  In 
both cases the stem is 'port'.  All possibilities are reported.  

portas 
port.as V 1 1 PRES IND ACTIVE 2 S X 
carry, bring 

port.as N 1 1 ACC P F T 
gate, entrance; city gates; door; avenue;

And note that the same stem (port) has other uses, for 'portus', 
'harbor'.  

portum 
port.um N 4 1 ACC S M T 
port, harbor; refuge, haven, place of refuge

PLEASE NOTE: It is certainly possible for the program to find a valid 
Latin construction that fits the input word and to have that 
interpretation be entirely wrong in the context.  It is even possible 
to interpret a number, in Roman numerals, as a word!  (But the number 
would be reported also.) 

For the case of defective verbs, the process does not necessarily have
to be precise.  Since the purpose is only to translate from Latin, 
even if there are unused forms included in the algorithm, these will 
not come up in any real Latin text.  The endings for the verb 
conjugations are the result of trying to minimize the number of 
individual endings records, while keeping the structure of the base 
INFLECTIONS data file fairly readable.  

In general the program will try to construct a match with the 
inflections and the dictionaries.  There are a number of specific 
checks to reject certain mathematically correct combinations that do 
not appear in the language, but these check are relatively few.  The 
philosophy has been to allow a generous interpretation.  A remark in a
text or dictionary that a particular form does not exist must be 
tempered with the realization that the author probably means that it 
has not been observed in the surviving classical litterature.  This 
body of reference is miniscule compared to the total use of Latin, 
even limited to the classical period.  Who is to say that further 
examples would not turn up such an example, even if it might not have 
been approved of by Cicero.  It is also possible the such reasonable, 
if 'improper', constructs might occur in later writings by less 
educated, or just different, authors.  Certainly English shows this 
sort of variation over time.  

If the exact stem is not found in the dictionary, there are rules for 
the construction of words which any student would try.  The simplest 
situation is a known stem to which a prefix or suffix has been 
attached.  The method used by the program (if DO_FIXES is on) is to 
try any fixes that fit, to see if their removal results in an 
identifiable remainder.  Then the meaning is mechanically constructed 
from the meaning of the fix and the stem.  The user may need to 
interpret with a more conventional English usage.  This technique 
improves the performance significantly.  However, in about 40% of the 
instances in which there is a hit, the derivation is correct but the 
interpretation takes some imagination.  In something less than 10% of 
the cases, the inferred fix is just wrong, so the user must take some 
care to see if the interpretation makes any sense.  

This method is complicated by the tendency for prefixes to be modified
upon attachment (ab+fero => aufero, sub+fero => suffero).  The 
program's 'tricks' take many such instances into account.  Ideally, 
one should look inside the stem for identifiable fragments.  One would
like to start with the smallest possible stem, and that is most 
frequently the correct one.  While it is mathematically possible that 
the stem of 'actorum' is 'actor' with the common inflection 'um', no 
intuitive first semester Latin student would fail to opt for the 
genitive plural 'orum', and probably be right.  To first order, the 
procedure ignores such hints and reports this word in both forms, as 
well as a verb participle.  However, it can use certain generally 
applicable rules, like the superlative characteristic 'issim', to 
further guess.  

In addition, there is the capability to examine the word for such 
common techniques as syncope, the omission of the 've' or 'vi' in 
certain verb perfect forms (audivissem => audissem).  These techniques
('tricks') are primitive in the present version, and might be replaced
by more powerful procedures in later versions.  

If the dictionary can not identify a matching stem, it may be possible
to derive a stem from 'nearby' stems (an adverb from an adjective is 
the most common example) and infer a meaning.  If all else fails, a 
portion of the possible dictionary stem can be listed, from which the 
user can draw in making a guess.  

The program is written in Ada, and is machine independent.  Source is 
available for compiling onto other machines.  


Special Cases


Some adjectives have no conventional positive forms (either missing or
undeclined), or the POS forms have more than one COMP/SUPER.  In these
few cases, the individual COMP or SUPER form is entered separately.  
Since it is not directly connected with a POS form, and only the POS 
forms have different nnumbered declensions, the special form is given 
a declension of (0, 0).  An additional consequence is that the 
dictionary form in output is only for the the COMP/SUPER, and does not
reflect all comparisons.  


Uniques


There are some irregular situations which are not convenient to handle
through the general algorithms.  For these a UNIQUES file and 
procedure was established.  The number of these special cases is less 
than one hundred, but may increase as new situations arise, and 
decrease as algorithms provide better coverage.  The user will not see
much difference, except in that no dictionary forms are available for 
these unique words.  


Tricks


There are a number of situations in Latin writing where certain 
modifications or conventions regularly are found.  While often found, 
these are not the normal classical forms.  If a conventional match is 
not found, the program may be instructed to TRY_TRICKS.  Below is a 
partial list of current tricks.  

    * The syncopated form of the perfect often drops the 'v' and loses
    the vowel.  

    * An initial 'a' followed by a double letter often is used for an 
    'ad' prefix, likewise an initial 'ad' prefix is often replaced by 
    an 'a' followed by a double letter.  

    * An initial 'i' followed by a double letter often is used for an 
    'in' prefix, likewise an initial 'in' prefix is often replaced by 
    an 'i' followed by a double letter.  

    * A leading 'inp' could be an 'imp'.  

    * A leading 'obt' could be an 'opt'.  

    * An initial 'har...' or 'hal...' may be rendered by an 'ar' or 
    'al', likewise the dictionary entry may have 'ar'/'al' and the 
    trial word begin with 'ha...'.  

    * An initial 'c' could be a 'k', or the dictionary entry uses 'c' 
    for 'k'.  

    * A nonterminal 'ae' is often rendered by an 'e'.  

    * An initial 'E' can replace an 'Ae'.  

    * An 'iis...' beginning some forms of 'eo' may be contracted to 
    'is...'.  

    * A nonterminal 'ii' is often replaced by just 'i'; including 
    'ji', since in this program and dictionary all 'j' are made 'i'.  

    * A 'cl' could be a 'cul'.  

    * A 'vul' could be a 'vol'.  

Various manipulations of 'u' and 'v' are possible: 'v' could be 
replaced by 'u', like the new Oxford Latin Dictionary, leading 'U' 
could be replaced by 'V', checking capitalization, all 'U's could have
been replaced by 'V', like stone cutting.  Previous versions had 
various kludges attempting to calculate the correct interpretation.  
They were suprisingly good, but philosophically baseless and certainly
failed in a number of cases.  The present version simply considers 'u'
and 'v' as the same letter in parsing the word.  However, the 
dictionary entries make the distinction and this is reflected in the 
output.  

Various combinations of these tricks are attempted, and each try that 
results in a possible hit is run against the full dictionary, which 
can make these efforts time consuming.  That is a good reason to make 
the dictionary as large as possible, rather than counting on a smaller
number of roots and doing the maximum word formation.  

Finally, while the program can succeed on a word that requires two or 
three of these tricks to work in combination, there are limits.  Some 
words for which all the modifications are supported will fail, if 
there are just too many.  In fact, it is probably better that that be 
the case, otherwise one will generate too many false positives.  
Testing so far does not seem to show excessive zeal on the part of the
program, but the user should examine the results, especially when 
several tricks are involved.  

At the state of the 1.93 dictionary there are so few words that both 
fail the main program and are caught by tricks that this option is 
defaulted to No.  


Codes in Inflection Line


For completeness, the enumeration codes used in the output are listed 
here as Ada statements.  Simple numbers are used for person, 
declension, conjugations, and their varients.  Not all the facilities 
implied by these values are developed or used in the program or the 
dictionary.  This list is only for Version 1.93.  Later versions may 
be somewhat different.  This may make their dictionaries incompatible 
with the present program.  


  type PART_OF_SPEECH_TYPE is (
          X,         --  all, none, or unknown
          N,         --  Noun
          PRON,      --  PRONoun
          PACK,      --  PACKOON -- artificial for code
          ADJ,       --  ADJective
          NUM,       --  NUMeral
          ADV,       --  ADVerb
          V,         --  Verb
          VPAR,      --  Verb PARticiple
          SUPINE,    --  SUPINE
          PREP,      --  PREPosition
          CONJ,      --  CONJunction
          INTERJ,    --  INTERJection
          TACKON,    --  TACKON -- artificial for code
          PREFIX,    --  PREFIX --  here artificial for code
          SUFFIX     --  SUFFIX --  here artificial for code
                                );                                   

  type GENDER_TYPE is (
          X,         --  all, none, or unknown
          M,         --  Masculine
          F,         --  Feminine
          N,         --  Neuter
          C          --  Common (masculine and/or feminine)
                       );

  type CASE_TYPE is (
          X,         --  all, none, or unknown
          NOM,       --  NOMinative
          VOC,       --  VOCative
          GEN,       --  GENitive
          LOC,       --  LOCative
          DAT,       --  DATive
          ABL,       --  ABLative
          ACC        --  ACCusitive
                     );
  
  type NUMBER_TYPE is (
          X,         --  all, none, or unknown
          S,         --  Singular
          P          --  Plural
                       );

  type COMPARISON_TYPE is (
          X,         --  all, none, or unknown
          POS,       --  POSitive
          COMP,      --  COMParative
          SUPER      --  SUPERlative
                           );   

  type TENSE_TYPE is (
          X,         --  all, none, or unknown
          PRES,      --  PRESent
          IMPF,      --  IMPerFect
          FUT,       --  FUTure
          PERF,      --  PERFect
          PLUP,      --  PLUPerfect
          FUTP       --  FUTure Perfect
                      );                        
  
  type VOICE_TYPE is (
          X,         --  all, none, or unknown
          ACTIVE,    --  ACTIVE
          PASSIVE    --  PASSIVE
                      );       
  
  type MOOD_TYPE is (
          X,         --  all, none, or unknown
          IND,       --  INDicative
          SUB,       --  SUBjunctive
          IMP,       --  IMPerative
          INF,       --  INFinative
          PPL        --  ParticiPLe
                     );                    

  type NOUN_KIND_TYPE is (
          X,            --  unknown, nondescript
          S,            --  Singular 'only'
          M,            --  plural or Multiple 'only'
          A,            --  Abstract idea
          N,            --  proper Name
          L,            --  Locale, name of country/city
          P,            --  a Person
          T,            --  a Thing
          W             --  a place Where
                           ); 

  type PRONOUN_KIND_TYPE is (
          X,            --  unknown, nondescript
          PERS,         --  PERSonal
          REL,          --  RELative
          REFLEX,       --  REFLEXive
          DEMONS,       --  DEMONStrative
          INTERR,       --  INTERRogative
          INDEF,        --  INDEFinite
          ADJECT        --  ADJECTival
                             ); 

  type VERB_KIND_TYPE is (
          X,         --  all, none, or unknown
          TO_BE,     --  only the verb TO BE (esse)
          TO_BEING,  --  compounds of the verb to be (esse)
          GEN,       --  verb taking the GENitive
          DAT,       --  verb taking the DATive  
          ABL,       --  verb taking the ABLative
          TRANS,     --  TRANSitive verb
          INTRANS,   --  INTRANSitive verb
          IMPERS,    --  IMPERSonal verb (implied subject 'it', 'they', 'God')
                     --  agent implied in action, subject in predicate
          DEP,       --  DEPonent verb
                     --  only passive form but with active meaning 
          SEMIDEP,   --  SEMIDEPonent verb (forms perfect as deponent) 
                     --  (perfect passive has active force)
          PERFDEF    --  PERFect DEFinite verb  
                     --  having only perfect stem, but with persent force
                          );             

 type NUMERAL_KIND_TYPE is (
         X,          --  all, none, or unknown
         CARD,       --  CARDinal
         ORD,        --  ORDinal
         DIST,       --  DISTributive
         ADVERB      --  numeral ADVERB
                            );


Help for Parameters


One can CHANGE_PARAMETERS by inputting a '#' [number sign] character 
(ANSI 35) as the input word, followed by a return.  (Note that this 
has changed from previous versions in which '?' was used.) Each 
parameter is listed and the user is offered the opportunity to change 
it from the current value by answering Y or N (any case).  For each 
parameter there is some explanation or help.  This is displayed by in 
putting a '?' [question mark], followed by a return.  

The various help displays are listed here: 


HAVE_OUTPUT_FILE  
   This option instructs the program to create a file which can hold the 
   output for later study, otherwise the results are just displayed on   
   the screen.  The output file is named WORD.OUT.                       
   This means that one run will necessarily overwrite a previous run,    
   unless the previous results are renamed or copied to a file of another
   name.  Using this output file slows the program, especially if it is  
   being executed from a floppy; just having it will not matter much.    
   The default is N(o), since this prevents the program from overwriting 
   previous work unintentionally.  Y(es) creates the output file.        

WRITE_OUTPUT_TO_FILE  
   This option instructs the program, when HAVE_OUTPUT_FILE is on, to    
   write results to the file WORD.OUT.                                   
   This option may be turned on and off during running of the program,   
   thereby capturing only certain desired results.                       
   If the option HAVE_OUTPUT_FILE is off, the user will not be given a   
   chance to turn this one on.                 Default is N(o).          

DO_UNKNOWNS_ONLY  
   This option instructs the program to only output those words that it  
   cannot resolve.  Of course, it has to do processing on all words, but 
   those that are found (with prefix/suffix, if that option in on) will  
   be ignored.  The purpose of this option is o allow a quick look to    
   determine if the dictionary and process is going to do an acceptable  
   job on the current text.  It also allows the user to assemble a list  
   of unknown words to look up manually, and perhaps augment the system  
   dictionary.  For those purposes, the system is usually run with the   
   MINIMIZE_OUTPUT option, just producing a list.  Another use is to run 
   without MINIMIZE to an output file.  This gives a list of the input   
   text with the unknown words, by line.  This functions as a spelling     
   checker for Latin.  The default is N(o).                              

WRITE_UNKNOWNS_TO_FILE  
   This option instructs the program to write all unresolved words to a  
   UNKNOWNS file named WORD.UNK.                                         
   With this option on , the file of unknowns is written, even though    
   the main output contains both known and unknown (unresolved) words.   
   One may wish to save the unknowns for later analysis, testing, or to  
   form the basis for dictionary additions.  When this option is turned  
   on, the UNKNOWNS file is written, destroying any file from a previous 
   run.  However, the write may be turned on and off during a single run 
   without destroying the information written in that run.               
   This option is for specialized use, so its default is N(o).           


IGNORE_UNKNOWN_NAMES  
   This option instructs the program to assume that any capitalized word 
   longer than three letters is a proper name.  As no dictionary can be  
   expected to account for many proper names, many such occur that would 
   be called UNKNOWN.  This contaminates the output in most cases, and   
   it is often convenient to ignore these sperious UNKNOWN hits.  This   
   option implements that mode, and calls such words proper names.  Of    
   course, any proper names that are in the dictionary are handled in the 
   normal way.                            The default is Y(es).          

DO_COMPOUNDS  
   This option instructs the program to look ahead for the verb TO_BE (or
   iri) when it finds a verb participle, with the expectation of finding 
   a compound perfect tense or periphastic.  The default choice is Y(es).
   This processing is turned off with the choice of N(o).                

DO_FIXES  
   This option instructs the program, when it is unable to find a proper 
   match in the dictionary, to attach various prefixes and suffixes and  
   try again.  This effort is successful in about a quarter of the cases 
   which would otherwise give UNKNOWN results, or so it seems in limited 
   tests.  For those cases in which a result is produced, about half give
   easily interpreted output; many of the rest are etymologically true,  
   but not necessarily obvious; about a tenth give entirely spurious     
   derivations.  The user must proceed with caution.                     
   The default choice is Y(es), since the results are generally useful.  
   This processing can be turned off with the choice of N(o).            

DO_TRICKS  
   This option instructs the program, when it is unable to find a proper 
   match in the dictionary, and after various prefixes and suffixes, to  
   try every dirty Latin trick it can think of, mainly common letter     
   replacements like cl -> cul, vul -> vol, ads -> ass, inp -> imp, etc. 
   Together these tricks are useful, but may give false positives (>10%).
   They provide for recognized varients in classical spelling.  Most of  
   the texts with which this program will be used have been well edited    
   and standardized in spelling.  Now, moreover,  the dictionary is being  
   populated to such a state that the hit rate on tricks has fallen to a 
   low level.  It is very seldom productive, and it is always expensive. 
   It may be turned on for trying individual words, but default is N(o). 

DO_DICTIONARY_FORMS  
   This option instructs the program to output a line with the forms     
   normally associated with a dictionary entry (NOM and GEN of a noun,   
   the four principle parts of a verb, M-F-N NOM of an adjective, ...).  
   This occurs when there is other output (i.e., not with UNKNOWNS_ONLY).
   The default choice is N(o), but it can be turned on with a Y(es).     

DO_EXAMPLES  
   This option instructs the program to provide examples of usage of the 
   cases/tenses/etc. that were constructed.  The default choice is N(o). 
   This produces lengthly output and is turned on with the choice Y(es). 

SHOW_AGE  
   This option causes a flag, like 'Late>' to be put before the meaning  
   in the output.  The AGE is an indication when this word/meaning came  
   into use, at least from indications is dictionary citations.  It is   
   just an indication, not controlling, useful when there are choices.   
   The default choice is N(o), but it can be turned on with a Y(es).     

SHOW_FREQUENCY  
   This option causes a flag, like 'rare>' to be put before the meaning  
   in the output.  The FREQ is an indication of the relative usage of the
   word use, at least from indications is dictionary citations.  It is   
   just an indication, not controlling, useful when there are choices.   
   The default choice is N(o), but it can be turned on with a Y(es).     

DO_ONLY_MEANINGS  
   This option instructs the program to only output the MEANING for a    
   word, and omit the inflection details.  This is primarily used in     
   analyzing new dictionary material, comparing with the existing.       
   However it may be of use for the translator who knows most all of     
   the words and just needs a little reminder for a few.                 
   The default choice is N(o), but it can be turned on with a Y(es).     

DO_STEMS_FOR_UNKNOWN  
   This option instructs the program, when it is unable to find a proper 
   match in the dictionary, and after various prefixes and suffixes, to  
   try even dirtier tricks, specifically to try all the dictionary stems 
   that it finds that fit the letters, independent of whether the endings
   match the parts of speech to which the stems are assigned.  This will 
   catch a substantive for which only the ADJ stem appears in dictionary,
   an ADJ for which there is only a N stem, etc.  It will also list the  
   various endings that match the end of the input word.  A certain      
   amount of weeding has been done, so only reasonably common endings    
   are quoted, and these are lumped together masking declension, etc.    
   Only N, ADJ, and V endings are given, LOC and VOC omitted, etc.       
   The user can then make his own judgement.      This option should     
   probably only be used with individual UNKNOWN words, and off-line     
   from full translations, therefore the default choice is N(o).         
   This processing can be turned on with the choice of Y(es).            

TRIM_OUTPUT  
   This option instructs the program to remove from the output list of   
   possible constructs those which are least likely.  At the present     
   stage, there is not much trimming, however, if the program grows more 
   powerful this may be a very useful option.  Nevertheless, there is no 
   absolute assurence that the items removed are not correct, just that  
   they are statistically less likely (e.g., vocatives or locatives in   
   certain situations).  Since little is now done, the default is Y(es)  


                          GUIDING PHILOSOPHY                          


Purpose 


The dictionary is intended as a help to someone who knows roughly 
enough Latin for the document understudy.  It gives the accidence and 
meanings possible for an input word.  


Method


The program searches a list of stems and tries to make a match.  If no
exact match is possible, it tries various modifications, begining with
prefixes and suffixes, and eventually involving various regular 
spelling variations (or tricks) common in classical and medieval 
Latin.  

A choice was made that the base was classical Latin as defined by OLD.
Arbitary/roughly (-200 +100) 

The classical form of words is taken as the base.  All modifications 
are in such a way to correct to this base.  Further additions to local
dictionaries should keep this in mind.  Modifications are made to the 
input words, not to the dictionary stems.  It could be done the other 
way, but the present situation seems to be much easier.  There are 
some consequences of this approach.  For instance, it is easy to 
remove an 'h' from an input word to match with a stem.  It is 
prohibitively difficult (but not impossible) to add 'h' in all 
possible positions to check against stems.  

It would be possible to match most words with a relatively smaller 
list of stems (or roots) and generous application of word 
construction.  This approach is not followed.  One difficulty is that 
while words may be constructed correctly, and the underlying meaning 
to be found from this construction, the common usage may be obscured 
by a formal interpretation of the parts.  In practice this occurs in 
20-40% of the cases.  This method is still very useful in approaching 
a word for which there has been no dictionary interpretation, but it 
puts a considerable burden on the normal user.  Further, in about 10% 
of constructions, the result is just wrong.  

If, for instance, there is a noun that matches, a corresponding 
equally valid adjective will not be reported unless it is explicitly 
found in the dictionary.  A Latin expert would not be put off by this,
but for the novice a complete report is very valuable.  

There is also the problem that, in normal usage, if the program finds 
a simple match, it does not go further and consider what constructed 
words might also be valid.  (One can override and force prefix/suffix 
construction with a switch, but one would not want to force all 
possible tricks.) 

Therefore, the philosophy is to populate the stem list as densely as 
possible.  Even easily resolved differences are included redundently 
(adligo as well as alligo - ad- is most of duplicates).  The advantage
is that while regular single-letter modifications are fairly easy, and
two letter differences are possible (but more expensive), further 
deviations are problematical.  The better populated the stem list, the
better the chance of a result.  

The stem list is also overpopulated with varients suggested by 
different sources.  The problem is that what we have of classical 
Latin has gone through many monks along the way.  These copyists may 
have made simple mistakes (typos!), or have made what they thought 
were proper corrections (spell checkers!).  And twenty centuries later
scholars work hard to reassemble the best Latin to present in the 
dictionary.  But a particular document in the form presented to the 
reader may have have a variety of spellings for exactly the same word 
in the same referenced passage (Pliny's Natural History is often 
subject to this problem).  (It may even be that modern texts and 
dictionaries have misprints!) So all forms found in various 
dictionaries can be included, with the exception of those explicitly 
labled 'misread' (and the argument probably should mandate their 
inclusion also).  

An argument against a large stem list is that it increases the storage
required (but this is extremely modest by current standards) and 
increases processing time for search of the stems (this is far 
overshadowed but the processing required to construct or analyze words
working from a smaller stem list).  

Additional parts of verbs are included (first conjugation is easily 
filled out, even excentric verbs if they are compounds of known 
parts), although they may not have been found in any known documents.  
Cases can be logically constructed that are 'missing' in classical 
Latin.  That a form has not been found in surviving (copies of) 
classical documents does not mean that it was not on the lips of every
centurion and his girl friend, or that it might not find its way into 
medieval texts.  

Tricks are expensive in processing time.  Each possible modification 
is made, then the resulting word goes through the full recognition 
process.  If it passed, that is reported as the answer.  If it fails, 
another trick is tried.  This is effective if very few words get this 
far.  It is expected that application of single tricks will solve most
of the resolvable difficulties.  It would be impractical to 
mechanically apply several tricks in series to a word.  If the 
dictionary is heavily and redundently populated, tricks are rarely 
necessary (and therefore not an overall processing burden) and largely
successful (if the input word is a valid, but unusual, 
varient/construction).  

Even in easy cases the overpopulation is helpful.  Antebasis is easily
parsed as ante-basis (pedestal before, which is reasonable), but 
inclusion as a separate word allows the additional information that it
is the hindmost pilar of the pedestal of a ballista.  

Further, a conventional dictionary, especially one that wishes to set 
a standard for proper language, excludes words that may not meet 
criteria of propriety, slang, misspellings, etc.  This may place the 
onus on the reader to convert words.  A computer dictionary ought to 
relieve the reader as much as possible.  The present program may be a 
far way from complete, but it's goal is to strive for that.  


Word Meanings


The meanings listed are generally those in the 
literature/dictionaries.  In the case of common words, there is 
general agreement amoung authors.  Some uncommon words display 
convoluted interpretations.  

Generally, the meaning is given for the base word, as is usual for 
dictionaries.  For the verb, it will be a present meaning, even when 
the tense given is perfect.  For an adjective, the positive meaning is
given, even if a comparative or superlative form is shown.  This is 
also so when a word is constructed with a suffix, thus an adverb 
constructed from its adjective will show the base adjective meaning 
and an indication of how to make the adverb in English.  

I have taken it upon myself to add my interpretations and synonyms, 
and propose common usage for otherwise complex discriptive 
definitions.  The idea is to prompt the reader, expecting that the 
text is not that from which some dictionary copied the meaning (from 
some 18th century translator!).  

The spelling of the English meanings is US (plow not plough, color not
colour), in spite of the fact that most of the Latin dictionaries that
I have are British and use British spelling.  

The reason for this is (besides uniformity in the program) that there 
is much computer processing and checking of the dictionary data, 
including spell-checking of the English.  (This is not to say that 
everything is correct, but it is much better than it would be without 
the computer checking.) All my programs speak US English.  so I can 
count on it.  Only some are available in UK English, and I do not have
all of those versions.  

In addition, I have given US meanings to some terms that seem to be 
litterly translated from the Latin (or German!) (a person who 
steals/drives off cattle is a rustler).  


Proper Names


Only a very few proper names are included, these just for test 
purposes.  The number of proper names is almost limitless but very few
are applicable to a particular document, and if it is an obscure 
document it is unlikely that the names would be found in any 
dictionary.  

There is a switch (defaulted to Yes) that allows the program to assume
that any capitalized unknown word is a proper name, and to ignore it.  
Also, one can make up a local dictionary of names for one's particular
application.  


Letter Conventions


U and/or 

Strictly speaking, Latin did not have a U, just a consonant U, or a U 
character that was easier in capitals (the way Latin was written by 
the Romans) to write or chisel in stone as V. However, most modern 
texts and dictionaries (with the important exception of the OLD) make 
the distinction with two characters (u and v).  It appeared most 
appropriate in a computer context (never distroy information) to make 
the distinction and follow the common practice.  So all dictionary 
entries maintain the V. However, an input word following the U 
convention will be found.  There is a routine that converts the 
appropriate U's to V's so that the dictionary search can be made.  
(This is just a kludged routine, but seems to work for the examples 
against which I have tested.) For best results, this procedure can be 
applied on input to all words by setting a parameter, when it is known
that the text uses the all-U convention.  Otherwise, or for individual
words, the program will fail to find a match and be forced to TRICKS, 
one of which is to make the U-V conversion.  But waiting this long in 
the process will exclude the possibility of other TRICKS.  If the only
problem with he word is U-V, then there is no loss, if there are other
difficulties they will be missed.  So it is best to set the U-V 
parameter if appropriate.  

I and/or 

A similar situation arises with I, and its consonant form, J. In this 
instance, the common practice is use only I, but there are many 
counter-examples, both text and dictionaries.  (Lewis Short uses J, 
but OLD does not.) Short uses J, but OLD does not.) Because of common 
practice, the program started out pure-I and has remained that way, in
spite of the logical inconsistency with U-V.  This may be corrected in
the future, but for the moment that is the way it stands.  All 
dictionary entries are pure-I.  All J characters are converted to I on
input, and the ourpur given in pure-I.  

W

There are some medieval examples of W in Latin.  I have not yet faced 
this, and have no words in the dictionary with W. This is yet to be 
resolved.  


Dictionary Codes


Several codes are associated with each dictionary entry (presently 
AGE, AREA, GEO, FREQ, SOURCE).  These were provided against the 
possibility of the program using them to make a better interpretation.
For the most part, this information is of little additional help to 
the reader, but it is carried in codes because it is not available to 
the program in any other way.  

The program covers a combination of time periods an applications 
areas.  This is certainly not the way in which dictionaries are 
usually prepared.  Usually there is a clear limit to the time or area 
of coverage, and with good reason.  A computer dictionary may have 
capabilities that mitigate those reasons.  Time or area can be coded 
into each entry, so that one could return only classical words, even 
though matching medieval entries existed.  (The program has that 
capability now, but it is not yet clear how to apply it.) 

There is some measure of period and frequency that can be used to 
discriminate between identical forms, but if there is only one 
possible match to an input word, it will be displayed no matter its 
era or rarity.  The user can choose to display age and frequency 
warnings associated with stems and meanings, but the default is not 
to.  

Rare and age specific inflection forms are also outputted, but there 
is a warning associated with each such.  

AGE

The designation of time period is very rough.  It is based on the 
dictionary information.  If the quote(s) cited are in the 4th century,
and none earlier, then the word is assumed to be late Latin, and one 
might conclude that it was not current earlier.  One flaw in this 
argument could be that the citation given was just the best 
illustration from a large number covering a wide period.  

If there is a classical citation, then the word may be designated as 
classical, but unless there is some reason to conclude otherwise, it 
is expected that classical words are valid for use in all periods, are
universal for well considered (published) Latin.  

Much which is designated late or medieval may be vulgar Latin, in 
common use in classical times but not thought suitable for litterary 
works.  

In all periods the tagret is Latin.  Archaic Latin, for purposes of 
the program, is still Latin, not Etruscan or Greek.  Medieval Latin is
that which was written by scholars as the universal Latin, not early 
versions of French or Italian.  

  type AGE_TYPE is (
   X,   --              --  In use throughout the ages -- the default
   A,   --  archaic     --  Very early forms gone by classical times
   B,   --  early       --  Early Latin, pre-classical, used for effect/poetry
   C,   --  classical   --  Limited to classical (200 BC - 200 AD)
   D,   --  late        --  Late, post-classical, including Christian (3-6)
   E,   --  later       --  Latin not in use in Classical times (7-10)
   F,   --  medieval    --  Spanning E and G, including late medieval (11-15)
   G,   --  modern      --  Latin not in use before 16th century (16-18)
   H,   --  neo         --  Coined recently, words for new things (19-20)
   M,   --  graffiti    --  Presently not much used
   N    --  inscription --  Presently not much used
                      );

AREA

While the reader can make his own interpretation of the area of 
application from the given meaning, there may be some cases in which 
the program can also use that information (which it can only get from 
a direct coding).  This has not yet been used in the program, but the 
possibility exists.  If the reader were doing a medical text, then 
higher priority should be given to words coded B, if a farming book, 
then A coded words should be given preference.  

  type AREA_TYPE is (
          X,      --  All or none
          A,      --  Agriculture, Flora, Fauna, Land, Equipment, Rural
          B,      --  Biological, Medical, Body Parts  
          D,      --  Drama, Music, Theater
          E,      --  Ecclesiastic, Biblical, Religious
          G,      --  Grammar, Retoric, Schools                     
          L,      --  Legal, Government, Political, Titles
          N,      --  Things that may appear only in Pliny
          P,      --  Poetic
          S,      --  Science, Philosophy, Logic, Mathematics
          T,      --  Technical, Architecture, Topography, Surveying
          W,      --  War, Military, Naval, Armor
          Y       --  Mythology
                      );

GEO

This code was included to enable the program to distinguish between 
different usages of a word depending on where it was used or what 
country was the subject of the text.  This is a dual usage, origin or 
subject.  

  type GEO_TYPE is (
          X,      --  All or none
          A,      --  Africa      
          B,      --  Britian     
          C,      --  China       
          D,      --  Scandinavia 
          E,      --  Egypt       
          F,      --  France, Gaul
          G,      --  Germany     
          H,      --  Greece      
          I,      --  Italy, Rome
          J,      --  India       
          K,      --  Balkans     
          N,      --  Netherlands
          P,      --  Persia      
          Q,      --  Near East   
          R,      --  Russia              
          S,      --  Spain, Iberia       
          U,      --  Eastern Europe      
          Y       --  Mythology
                     );

FREQUENCY

There is an indication of relative frequency for each entry.  These 
codes also appply to inflections.  If there were several matches to an
input word, this key may be used to sort the output, or to exclude 
rare interpretations.  The first problem is to provide ths score.  The
initial method is to grade each word by how much column space is 
allocated to it in the Oxford Latin Dictionary, on the assumption that
many citations mean a word is common.  One has a check against the 
frequency list of Diederich for the most common, and those are 
probably the only ones that matter.  But the frequency depends on the 
application, and it should be possible to run a new set of frequencies
if one had a reasonable volume of applicable text.  

  type FREQUENCY_TYPE is (
    X,    --              --  Unknown or unspecified
    A,    --  very freq   --  Very frequent, in most Elementry Latin books
    B,    --  frequent    --  Frequent           
    C,    --  common      --  For Dictionary, in top 10,000 words
    D,    --  uncommon    --  Spanning C and E
    E,    --  rare        --  Only one reference in OLD
    M,    --  graffiti    --  Presently not much used
    N     --  inscription --  Presently not much used
                      );

SOURCE

Source is the dictionary or grammar which is the source of the 
information, not the Cicero or Caesar text in which it is found.  

For a large number of entries, X is given as Source.  This is 
primarily for the vocabulary (about 13000 words) which was in place 
before the Source parameter was put in.  In fact, they are from no 
particular Source, just general vocabulary picked up in various texts 
and readings.  Although, when the dictionary was expanded in 1998, all
entries were checked against sources, it seemed improper to credit 
(blame?) a Source when that was not the origin of the entry, 
remembering that the actual entries are of my generation entirely and 
may not corespond exactly to any other view.  However, in the second 
pass (as far as it has progressed) all classical entries have been 
verified with the Oxford Latin Dictionary (OLD).  (By that I mean that
I have checked, not to imply that I have not made errors.) This does 
not mean that the entry necessarily agrees with the OLD, but that I 
read the OLD entry with great respect and put down what I did anyway.  
Newer entries, added in this process, if found in the OLD, have the O 
code.  Words added from Lewis and Short, but not in OLD, have the S 
code, etc.  

There should be no expectation, nor is there any claim, that the 
result of the program is exactly that from the cited Source.  Each 
entry is my responsibility alone, and there are significant 
differences and elaborations.  However, in each case where there is a 
Source, the reader can find the basis from which the program data was 
derived.  If I have done a proper job, he will not often be surprised.

The list of sources goes far beyond what has been directly used so 
far.  I have sought and received permission for those which have been 
extensively used.  Others have only been used for an occasional check 
(fair use) or have denied me permission (Niermeyer).  

  type SOURCE_TYPE is (
       X,      --  General or unknown or too common to say
       A,      --  Allen  Greenough, New Latin Grammar, 1888  Greenough, New Latin Grammar, 1888 
       B,      --  J.T.Bretzke, Consecrated Pharses: Theolog Dict
       C,      --  Cassell's Latin Dictionary 1968        
       D,      --  J.N.Adams, Latin Sexual Vocabulary, 1982
       E,      --  L.F.Stelten, Dictionary of Eccles. Latin, 1995
       G,      --  Gildersleeve  Lodge, Latin Grammar 1895 Lodge, Latin Grammar 1895
       H,      --  Harrington/Pucci/Elliott, Medieval Latin 2nd Ed 1997 
       L,      --  Lewis, C.S., Elementary Latin Dictionary 1891
       M,      --  Latham, Revised Medieval Word List, 1980
       N,      --  Lynn Nelson, Wordlist
       O,      --  Oxford Latin Dictionary, 1982
       S,      --  Lewis and Short, A Latin Dictionary, 1879
       U,      --  Du Cange            
       V,      --  Vademecum in opus Saxonis - Franz Blatt
       W,      --  My personal guess   
       Y       --  Niermeyer, Mediae Latinitatis Lexicon Minus
                       );


Evolution of the Dictionary


The stem list was originally put together from what might be called 
'common knowledge', those words that most Latin texts have.  The first
version had about 5000 dictionary entries, giving up to 95% coverage 
of simple classical texts.  This grew to about 13000 entries with 
specific additions when gaps were found.  With this number it was 
possible to get better than a 99% hit rate on Caesar (an area from 
which the dictionary was built).  Parse of other works fell to 95-97%,
which may be mathematically attractive but leaves a lot to be desired 
in a dictionary, since a translator is usually familiar with the vast 
bulk of the language and just needs help on the obscure words.  Having
just the common words is not enough, indeed not much help at all.  So 
an attempt is made to make the dictionary as complete as possible.  
All possible spellings found in dictionaries are included.  

Starting with the 13000, the expansion project begining in 1998 sought
to verify the existing words and suplement with any new found ones.  
Thus all classical Latin words are consistent with the OLD (not to say
taken from, because most were not, but checked against).  Any 
significant diviation is indicated, either as from another source, or 
in the definition itself.  

LS is used for later Latin and to check OLE work.S is used for later 
Latin and to check OLE work.  This started with the thought that if a 
word was in LS but not in OLES but not in OLE it must be later Latin, 
beyond the range of OLD.  I was surprised at how many words with 
classical citations were in LSS but not in OLD.  

The refinement is proceeding one letter at a time, as is the tradition
for all great dictionaries.  First stage refinement has proceeded 
through B. 

The hardest test is against another dictionary.  While getting a 97%+ 
hit rate on long classical texts, a run against a large dictionary 
might fall to 85%.  This is to be expected, since we both have the 
10000 most common words and have made somewhat different additions 
beyond that.  So large electronic wordlists are a check on the 
program, and are reserved for that purpose, not simply incorporated as
such.  

The Latin Word List of Lynn Nelson is an excellent benchmark, more so 
because of its medieval content.  


----------------------------------------------------------------------


Feedback is invited.  If there is a problem in installing or 
operating, in the results or their display, or if your favorite word 
is omitted from the dictionary, let me know.  

PLEASE comment and check back for new versions releases.  

Contact whitaker@erols.com,
or William Whitaker, PO Box 3036, McLean VA 22103 USA.  
