





    


                 Recursive Make Considered Harmful

                            _P_e_t_e_r _M_i_l_l_e_r
                      millerp@canb.auug.org.au



                              AABBSSTTRRAACCTT
         For large UNIX projects, the traditional method of
         building the project is to use recursive _m_a_k_e_.  On
         some  projects,  this results in build times which
         are unacceptably large, when all you want to do is
         change one file.    In examining the source of the
         overly long build times, it became evident that  a
         number of apparently unrelated problems combine to
         produce the delay, but on analysis  all  have  the
         same root cause.
         This paper explores a number of problems regarding
         the use of recursive _m_a_k_e_, and shows that they are
         all  symptoms  of the same problem.  Symptoms that
         the UNIX community have long accepted as a fact of
         life,  but  which  need not be endured any longer.
         These problems include recursive _m_a_k_es which  take
         "forever"  to  work out that they need to do noth-
         ing, recursive _m_a_k_es which do  too  much,  or  too
         little, recursive _m_a_k_es which are overly sensitive
         to changes in the source code and require constant
         Makefile intervention to keep them working.
         The  resolution  of these problems can be found by
         looking at what _m_a_k_e does, from first  principles,
         and  then  analyzing  the  effects  of introducing
         recursive _m_a_k_e to  this  activity.   The  analysis
         shows  that  the problem stems from the artificial
         partitioning of the build into  separate  subsets.
         This,  in  turn,  leads to the symptoms described.
         To avoid the symptoms, it  is  only  necessary  to
         avoid the separation; to use a single _m_a_k_e session
         to build the whole project, which is not quite the
         same as a single Makefile.
         This  conclusion  runs counter to much accumulated
         folk wisdom in building large  projects  on  UNIX.
         Some  of  the  main objections raised by this folk
         wisdom are examined and  shown  to  be  unfounded.
         The  results  of actual use are far more encourag-
         ing, with routine development performance improve-
         ments  significantly  faster  than  intuition  may
         indicate, and without the intuitvely expected com-
         promise of modularity.  The use of a whole project
         _m_a_k_e is not as difficult to put into  practice  as
         it may at first appear.




    Peter Miller           17 August 2024                 Page 1





    AUUGN'97                   Recursive Make Considered Harmful


               +------------------------------------+
               |Miller, P.A. (1998), _R_e_c_u_r_s_i_v_e _M_a_k_e |
               |_C_o_n_s_i_d_e_r_e_d _H_a_r_m_f_u_l_,                 |
               |AUUGN Journal of AUUG Inc.,  19(1), |
               |pp. 14-25.                          |
               +------------------------------------+



    11..  IInnttrroodduuccttiioonn

    For  large  UNIX  software  development projects, the tradi-
    tional methods of building the project use what has come  to
    be  known  as "recursive _m_a_k_e."  This refers to the use of a
    hierarchy of directories containing  source  files  for  the
    modules  which  make  up the project, where each of the sub-
    directories contains a _M_a_k_e_f_i_l_e which  describes  the  rules
    and instructions for the _m_a_k_e program.  The complete project
    build is done by arranging for  the  top-level  Makefile  to
    change directory into each of the sub-directories and recur-
    sively invoke _m_a_k_e_.

    This paper explores some  significant  problems  encountered
    when  developing  software projects using the recursive _m_a_k_e
    technique.  A simple solution is offered, and  some  of  the
    implications of that solution are explored.

    Recursive _m_a_k_e results in a directory tree which looks some-
    thing like this:
                          +++
                          ++_P+_r|_o_j_e_c_t
                           ++++Mmaokdeufliel1e
                           |++ +|Makefile
                           | + +|source1.c
                           | + +|_e_t_c_._._.
                           ++++m+odule2
                             + +|Makefile
                             + +|source2.c
                             + +|_e_t_c_._._.

    This hierarchy of modules can be  nested  arbitrarily  deep.
    Real-world  projects  often  use two- and three-level struc-
    tures.

    11..11..  AAssssuummeedd KKnnoowwlleeddggee

    This paper assumes that the reader is familiar with develop-
    ing  software  on  UNIX, with the _m_a_k_e program, and with the
    issues of C programming and include file dependencies.

    This paper assumes that you have installed GNU Make on  your
    system  and are moderately familiar with its features.  Some
    -----------
    Copyright (C) 1997 Peter Miller



    Peter Miller           17 August 2024                 Page 2





    AUUGN'97                   Recursive Make Considered Harmful


    features of _m_a_k_e described below may not be available if you
    are using the limited version supplied by your vendor.

    22..  TThhee PPrroobblleemm

    There  are  numerous  problems with recursive _m_a_k_e, and they
    are usually observed daily in practice.  Some of these prob-
    lems include:

    +o It is very hard to get the _o_r_d_e_r of the recursion into the
      sub-directories correct.  This order is very unstable  and
      frequently needs to be manually "tweaked."  Increasing the
      number of directories, or  increasing  the  depth  in  the
      directory tree, cause this order to be increasingly unsta-
      ble.

    +o It is often necessary to do more than one  pass  over  the
      sub-directories  to  build  the whole system.  This, natu-
      rally, leads to extended build times.

    +o Because the builds take so long, some dependency  informa-
      tion  is omitted, otherwise development builds take unrea-
      sonable lengths of time, and the developers are  unproduc-
      tive.  This usually leads to things not being updated when
      they need to be, requiring frequent  "clean"  builds  from
      scratch, to ensure everything has actually been built.

    +o Because inter-directory dependencies are either omitted or
      too hard to express, the Makefiles are  often  written  to
      build _t_o_o _m_u_c_h to ensure that nothing is left out.

    +o The  inaccuracy of the dependencies, or the simple lack of
      dependencies, can result in a product which  is  incapable
      of  building  cleanly,  requiring  the build process to be
      carefully watched by a human.

    +o Related to the above, some projects are incapable of  tak-
      ing  advantage  of various "parallel make" impementations,
      because the build does patently silly things.

    Not all projects experience all of  these  problems.   Those
    that  do  experience  the problems may do so intermittently,
    and dismiss the problems as unexplained  "one  off"  quirks.
    This  paper  attempts  to bring together a range of symptoms
    observed over long practice, and presents a systematic anal-
    ysis and solution.

    It  must be emphasized that this paper does not suggest that
    _m_a_k_e itself is the problem.  This paper is working from  the
    premise  that  _m_a_k_e  does nnoott have a bug, that _m_a_k_e does nnoott
    have a design flaw.  The problem is not in _m_a_k_e at all,  but
    rather  in  the  input given to _m_a_k_e - the way _m_a_k_e is being
    used.




    Peter Miller           17 August 2024                 Page 3





    AUUGN'97                   Recursive Make Considered Harmful


    33..  AAnnaallyyssiiss

    Before it is possible to address these  seemingly  unrelated
    problems, it is first necessary to understand what _m_a_k_e does
    and how it does it.  It is then  possible  to  look  at  the
    effects recursive _m_a_k_e has on how _m_a_k_e behaves.

    33..11..  WWhhoollee PPrroojjeecctt MMaakkee

    _M_a_k_e  is  an  expert system.  You give it a set of rules for
    how to construct things, and a  target  to  be  constructed.
    The rules can be decomposed into pair-wise ordered dependen-
    cies between files.  _M_a_k_e takes the rules and determines how
    to  build  the  given target.  Once it has determined how to
    construct the target, it proceeds to do so.

    _M_a_k_e determines how to build the target  by  constructing  a
    _d_i_r_e_c_t_e_d  _a_c_y_c_l_i_c  _g_r_a_p_h_,  the DAG familiar to many Computer
    Science students.  The vertices of this graph are the  files
    in  the  system,  the edges of this graph are the inter-file
    dependencies.  The edges of the graph are  directed  because
    the  pair-wise  dependencies  are  ordered;  resulting in an
    _a_c_y_c_l_i_c graph - things which look like loops are resolved by
    the direction of the edges.

    This  paper  will use a small example project for its analy-
    sis.  While the number of files in this  example  is  small,
    there  is  sufficient  complexity  to demonstrate all of the
    above recursive _m_a_k_e problems.  First, however, the  project
    is presented in a non-recursive form.
                           +++
                           ++_P+_r|_o_j_e_c_t
                            + +|Mmaakienf.icle
                            + +|parse.c
                            + +|parse.h
                              -

    The Makefile in this small project looks like this:

                    +--------------------------+
                    |OBJ = main.o parse.o      |
                    |prog: $(OBJ)              |
                    |  $(CC) -o $@ $(OBJ)      |
                    |main.o: main.c parse.h    |
                    |  $(CC) -c main.c         |
                    |parse.o: parse.c parse.h  |
                    |  $(CC) -c parse.c        |
                    +--------------------------+
    Some  of  the  implicit  rules  of  _m_a_k_e  are presented here
    explicitly, to assist the reader in converting the  Makefile
    into its equivalent DAG.

    The  above  Makefile  can be drawn as a DAG in the following
    form:



    Peter Miller           17 August 2024                 Page 4





    AUUGN'97                   Recursive Make Considered Harmful


    
                                prog



                          main.o   parse.o


                      main.c   parse.h  parse.c



    This is an _a_c_y_c_l_i_c graph because of the arrows which express
    the  ordering  of  the  relationship  between the files.  If
    there _w_a_s a circular dependency according to the arrows,  it
    would be an error.

    Note that the object files (.o) are dependent on the include
    files (.h) even though it is the source files (.c) which  do
    the  including.  This is because if an include file changes,
    it is the object files which are out-of-date, not the source
    files.

    The  second part of what _m_a_k_e does it to perform a _p_o_s_t_o_r_d_e_r
    traversal of the DAG.  That is, the dependencies are visited
    first.  The actual order of traversal is undefined, but most
    _m_a_k_e implementations work down the graph from left to  right
    for  edges  below the same vertex, and most projects implic-
    itly rely on this behavior.  The last-time-modified of  each
    file is examined, and higher files are determined to be out-
    of-date if any of the lower files on which they  depend  are
    younger.   Where a file is determined to be out-of-date, the
    action associated with the relevant graph edge is  performed
    (in the above example, a compile or a link).

    The  use of recursive _m_a_k_e affects both phases of the opera-
    tion of _m_a_k_e_: it causes _m_a_k_e to construct an inaccurate DAG,
    and  it  forces _m_a_k_e to traverse the DAG in an inappropriate
    order.

    33..22..  RReeccuurrssiivvee MMaakkee

    To examine the effects of recursive _m_a_k_es, the above example
    will  be  artificially segmented into two modules, each with
    its own Makefile, and a top-level Makefile  used  to  invoke
    each of the module Makefiles.

    This example is intentionally artificial, and thoroughly so.
    However, all "modularity" of all projects is artificial,  to
    some  extent.  Consider: for many projects, the linker flat-
    tens it all out again, right at the end.

    The directory structure is as follows:




    Peter Miller           17 August 2024                 Page 5





    AUUGN'97                   Recursive Make Considered Harmful


                          +++
                          ++_P+_r|_o_j_e_c_t
                           ++++Maanktefile
                           |++ +|Makefile
                           | - +|main.c
                           ++++b+ee
                             + +|Makefile
                             + +|parse.c
                             + +|parse.h

    The top-level Makefile  often  looks  a  lot  like  a  shell
    script:

                  +-------------------------------+
                  |MODULES = ant bee              |
                  |all:                           |
                  |  for dir in $(MODULES); do \  |
                  |    (cd $$dir; ${MAKE} all); \ |
                  |  done                         |
                  +-------------------------------+
    The ant/Makefile looks like this:

                  +------------------------------+
                  |all: main.o                   |
                  |main.o: main.c ../bee/parse.h |
                  |  $(CC) -I../bee -c main.c    |
                  +------------------------------+
    and the equivalent DAG looks like this:
    
                               main.o



                          main.c    parse.h

    The bee/Makefile looks like this:

                   +----------------------------+
                   |OBJ = ../ant/main.o parse.o |
                   |all: prog                   |
                   |prog: $(OBJ)                |
                   |  $(CC) -o $@ $(OBJ)        |
                   |parse.o: parse.c parse.h    |
                   |  $(CC) -c parse.c          |
                   +----------------------------+
    and the equivalent DAG looks like this:











    Peter Miller           17 August 2024                 Page 6





    AUUGN'97                   Recursive Make Considered Harmful


    
                              prog



                        main.o    parse.o


                             parse.h  parse.c



    Take  a  close look at the DAGs.  Notice how neither is com-
    plete - there are vertices and edges  (files  and  dependen-
    cies) missing from both DAGs.  When the entire build is done
    from the top level, everything will work.

    But what happens when small  changes  occur?   For  example,
    what would happen if the parse.c and parse.h files were gen-
    erated from a parse.y yacc grammar?  This would add the fol-
    lowing lines to the bee/Makefile:

                    +--------------------------+
                    |parse.c parse.h: parse.y  |
                    |  $(YACC) -d parse.y      |
                    |  mv y.tab.c parse.c      |
                    |  mv y.tab.h parse.h      |
                    +--------------------------+
    And the equivalent DAG changes to look like this:
    
                              prog



                        main.o    parse.o


                             parse.h  parse.c



                                  parse.y



    This  change  has  a  simple  effect:  if parse.y is edited,
    main.o will nnoott be constructed correctly.  This  is  because
    the DAG for ant knows about only some of the dependencies of
    main.o, and the DAG for bee knows none of them.

    To understand why this happens, it is necessary to  look  at
    the  actions _m_a_k_e will take _f_r_o_m _t_h_e _t_o_p _l_e_v_e_l_.  Assume that
    the project is in a self-consistent state.  Now edit parse.y
    in such a way that the generated parse.h file will have non-



    Peter Miller           17 August 2024                 Page 7





    AUUGN'97                   Recursive Make Considered Harmful


    trivial differences.  However, when the  top-level  _m_a_k_e  is
    invoked,  first ant and then bee is visited.  But ant/main.o
    is _n_o_t recompiled, because  bee/parse.h  has  not  yet  been
    regenerated  and  thus  does not yet indicate that main.o is
    out-of-date.  It is not until bee is visited by  the  recur-
    sive  _m_a_k_e  that parse.c and parse.h are reconstructed, fol-
    lowed by parse.o.  When the program  is  linked  main.o  and
    parse.o  are  non-trivially incompatible.  That is, the pro-
    gram is _w_r_o_n_g_.

    33..33..  TTrraaddiittiioonnaall SSoolluuttiioonnss

    There are three traditional fixes for the above "glitch."

    33..33..11..  RReesshhuuffffllee

    The first is to manually tweak the order of the  modules  in
    the  top-level  Makefile.  But why is this tweak required at
    all?  Isn't _m_a_k_e supposed to be an expert system?   Is  _m_a_k_e
    somehow flawed, or did something else go wrong?

    To answer this question, it is necessary to look, not at the
    graphs, but the _o_r_d_e_r _o_f _t_r_a_v_e_r_s_a_l of the graphs.  In  order
    to operate correctly, _m_a_k_e needs to perform a _p_o_s_t_o_r_d_e_r tra-
    versal, but in separating the DAG into two pieces, _m_a_k_e  has
    not  been  _a_l_l_o_w_e_d  to  traverse  the graph in the necessary
    order - instead the project has dictated an order of traver-
    sal.   An order which, when you consider the original graph,
    is plain _w_r_o_n_g_.  Tweaking the  top-level  Makefile  corrects
    the order to one similar to that which _m_a_k_e could have used.
    Until the next dependency is added...

    Note that "make -j" (parallel build) invalidates many of the
    ordering  assumptions  implicit  in  the reshuffle solution,
    making it useless.  And then there are all  of the sub-makes
    all doing their builds in parallel, too.

    33..33..22..  RReeppeettiittiioonn

    The  second  traditional  solution  is to make more than one
    pass in the top-level Makefile, something like this:

                  +-------------------------------+
                  |MODULES = ant bee              |
                  |all:                           |
                  |  for dir in $(MODULES); do \  |
                  |    (cd $$dir; ${MAKE} all); \ |
                  |  done                         |
                  |  for dir in $(MODULES); do \  |
                  |    (cd $$dir; ${MAKE} all); \ |
                  |  done                         |
                  +-------------------------------+





    Peter Miller           17 August 2024                 Page 8





    AUUGN'97                   Recursive Make Considered Harmful


    This doubles the length of time  it  takes  to  perform  the
    build.   But that is not all: there is no guarantee that two
    passes are enough!  The upper bound of the number of  passes
    is  not  even  proportional  to the number of modules, it is
    instead proportional to the  number  of  graph  edges  which
    cross module boundaries.

    33..33..33..  OOvveerrkkiillll

    We  have  already  seen an example of how recursive _m_a_k_e can
    build too little, but another common problem is to build too
    much.  The third traditional solution to the above glitch is
    to add even _m_o_r_e lines to ant/Makefile:

                    +--------------------------+
                    |.PHONY: ../bee/parse.h    |
                    |../bee/parse.h:           |
                    |    cd ../bee; \          |
                    |    make clean; \         |
                    |    make all              |
                    +--------------------------+
    This means that whenever main.o is made, parse.h will always
    be  considered to be out-of-date.  All of bee will always be
    rebuilt including parse.h, and  so  main.o  will  always  be
    rebuilt, _e_v_e_n _i_f _e_v_e_r_y_t_h_i_n_g _w_a_s _s_e_l_f _c_o_n_s_i_s_t_e_n_t_.

    Note that "make -j" (parallel build) invalidates many of the
    ordering assumptions implicit in the overkill solution, mak-
    ing  it  useless, because all of the sub-makes are all doing
    their builds ("clean" then "all")  in  parallel,  constantly
    interfering with each other in non-deterministic ways.

    44..  PPrreevveennttiioonn

    The  above  analysis  is based on one simple action: the DAG
    was artificially separated  into  incomplete  pieces.   This
    separation  resulted  in  all  of  the  problems familiar to
    recursive _m_a_k_e builds.

    Did _m_a_k_e get it wrong?  No.  This is a case of  the  ancient
    GIGO  principle:  _G_a_r_b_a_g_e _I_n_, _G_a_r_b_a_g_e _O_u_t_.  Incomplete Make-
    files are _w_r_o_n_g Makefiles.

    To avoid these problems, don't break the  DAG  into  pieces;
    instead, use one Makefile for the entire project.  It is not
    the recursion itself which is harmful, it  is  the  crippled
    Makefiles  which  are used in the recursion which are _w_r_o_n_g.
    It is not a deficiency of _m_a_k_e itself that recursive _m_a_k_e is
    broken,  it does the best it can with the flawed input it is
    given.

         "_B_u_t_, _b_u_t_, _b_u_t_._._.  _Y_o_u _c_a_n_'_t _d_o _t_h_a_t_!" I hear  you
         cry.   "_A _s_i_n_g_l_e Makefile _i_s _t_o_o _b_i_g_, _i_t_'_s _u_n_m_a_i_n_-
         _t_a_i_n_a_b_l_e_, _i_t_'_s _t_o_o _h_a_r_d _t_o _w_r_i_t_e _t_h_e _r_u_l_e_s_, _y_o_u_'_l_l



    Peter Miller           17 August 2024                 Page 9





    AUUGN'97                   Recursive Make Considered Harmful


         _r_u_n  _o_u_t _o_f _m_e_m_o_r_y_, _I _o_n_l_y _w_a_n_t _t_o _b_u_i_l_d _m_y _l_i_t_t_l_e
         _b_i_t_, _t_h_e _b_u_i_l_d _w_i_l_l _t_a_k_e _t_o_o _l_o_n_g_.  _I_t_'_s _j_u_s_t  _n_o_t
         _p_r_a_c_t_i_c_a_l_."

    These  are  valid  concerns,  and  they frequently lead _m_a_k_e
    users to the conclusion that re-working their build  process
    does  not  have any short- or long-term benefits.  This con-
    clusion is based on ancient, enduring, false assumptions.

    The following sections will address each of  these  concerns
    in turn.

    44..11..  AA SSiinnggllee Makefile IIss TToooo BBiigg

    If  the  entire project build description were placed into a
    single Makefile this would certainly be true, however modern
    _m_a_k_e  implementations have _i_n_c_l_u_d_e statements.  By including
    a relevant fragment from each module, the total size of  the
    Makefile  and  its  include files need be no larger than the
    total size of the Makefiles in the recursive case.

    44..22..  AA SSiinnggllee Makefile IIss UUnnmmaaiinnttaaiinnaabbllee

    The complexity of using a single  top-level  Makefile  which
    includes a fragment from each module is no more complex than
    in the recursive case.  Because the DAG  is  not  segmented,
    this  form  of  Makefile becomes less complex, and thus _m_o_r_e
    maintainable, simply because fewer "tweaks" are required  to
    keep it working.

    Recursive  Makefiles  have a great deal of repetition.  Many
    projects solve this by using include files.  By using a sin-
    gle  Makefile  for  the  project,  the need for the "common"
    include files disappears - the single Makefile is the common
    part.

    44..33..  IItt''ss TToooo HHaarrdd TToo WWrriittee TThhee RRuulleess

    The only change required is to include the directory part in
    filenames in a number of places.  This is because  the  _m_a_k_e
    is  performed  from  the  top-level  directory;  the current
    directory is not the one in which the file  appears.   Where
    the  output file is explicitly stated in a rule, this is not
    a problem.

    GCC allows a -o option in conjunction with  the  -c  option,
    and  GNU Make knows this.  This results in the implicit com-
    pilation rule placing  the  output  in  the  correct  place.
    Older  and dumber C compilers, however, may not allow the -o
    option with the -c option, and will leave the object file in
    the  top-level  directory (_i_._e_. the wrong directory).  There
    are three ways for you to fix this: get GNU  Make  and  GCC,
    override  the  built-in  rule  with one which does the right
    thing, or complain to your vendor.



    Peter Miller           17 August 2024                Page 10





    AUUGN'97                   Recursive Make Considered Harmful


    Also, K&R C compilers will start  the  double-quote  include
    path  (#include  "_f_i_l_e_n_a_m_e_._h")  from  the current directory.
    This will not do what you want.  ANSI C compliant C  compil-
    ers,  however,  start the double-quote include path from the
    directory in which the source file appears; thus, no  source
    changes are required.  If you don't have an ANSI C compliant
    C compiler, you should consider installing GCC on your  sys-
    tem as soon as possible.

    44..44..  II OOnnllyy WWaanntt TToo BBuuiilldd MMyy LLiittttllee BBiitt

    Most  of  the  time,  developers are deep within the project
    tree and they edit one or two files and  then  run  _m_a_k_e  to
    compile  their  changes  and try them out.  They may do this
    dozens or hundreds of times a day.  Being  forced  to  do  a
    full project build every time would be absurd.

    Developers  always have the option of giving _m_a_k_e a specific
    target.  This is always the case, it's just that we  usually
    rely  on  the  default target in the Makefile in the current
    directory to shorten the command line for us.  Building  "my
    little bit" can still be done with a whole project Makefile,
    simply by using a specific target, and an alias if the  com-
    mand line is too long.

    Is  doing  a  full project build every time so absurd?  If a
    change made in a module has repercussions in other  modules,
    because  there  is  a dependency the developer is unaware of
    (but the Makefile is aware of), isn't  it  better  that  the
    developer  find out as early as possible?  Dependencies like
    this _w_i_l_l be found, because the DAG is more complete than in
    the recursive case.

    The  developer is rarely a seasoned old salt who knows every
    one of the million lines  of  code  in  the  product.   More
    likely the developer is a short-term contractor or a junior.
    You don't want implications like these to blow up after  the
    changes are integrated with the master source, you want them
    to blow up on the developer in some nice safe  sand-box  far
    away from the master source.

    If  you  want to make "just your little" bit because you are
    concerned that performing a full project build will  corrupt
    the  project  master  source, due to the directory structure
    used in your project, see the "Projects  _v_e_r_s_u_s  Sand-Boxes"
    section below.

    44..55..  TThhee BBuuiilldd WWiillll TTaakkee TToooo LLoonngg

    This  statement  can  be  made from one of two perspectives.
    First, that a whole project _m_a_k_e, even  when  everything  is
    up-to-date,  inevitably  takes a long time to perform.  Sec-
    ondly, that these inevitable delays are unacceptable when  a
    developer  wants  to  quickly  compile and link the one file



    Peter Miller           17 August 2024                Page 11





    AUUGN'97                   Recursive Make Considered Harmful


    that they have changed.

    44..55..11..  PPrroojjeecctt BBuuiillddss

    Consider a hypothetical project with 1000 source (.c) files,
    each  of which has its calling interface defined in a corre-
    sponding include (.h) file with defines,  type  declarations
    and  function  prototypes.   These 1000 source files include
    their own interface definition, plus the  interface  defini-
    tions  of any other module they may call.  These 1000 source
    files are compiled into 1000 object  files  which  are  then
    linked  into  an  executable  program.  This system has some
    3000 files which _m_a_k_e must be told about, and be told  about
    the  include  dependencies, and also explore the possibility
    that implicit rules (.y -> .c for example) may be necessary.

    In order to build the DAG, _m_a_k_e must "stat" 3000 files, plus
    an additional 2000 files or so, depending on which  implicit
    rules  your  _m_a_k_e  knows  about  and  your Makefile has left
    enabled.  On the author's humble 66MHz i486 this takes about
    10  seconds; on native disk on faster platforms it goes even
    faster.  With NFS over 10MB Ethernet it takes about 10  sec-
    onds, no matter what the platform.

    This  is an astonishing statistic!  Imagine being able to do
    a single file compile, out of 1000 source files, in only  10
    seconds, plus the time for the compilation itself.

    Breaking  the  set of files up into 100 modules, and running
    it as a recursive _m_a_k_e takes about 25 seconds.  The repeated
    process  creation  for the subordinate _m_a_k_e invocations take
    quite a long time.

    Hang on a minute!  On real-world  projects  with  less  than
    1000 files, it takes an awful lot longer than 25 seconds for
    _m_a_k_e to work out that  it  has  nothing  to  do.   For  some
    projects,  doing  it in only 25 minutes would be an improve-
    ment!  The above result tells us that it is not  the  number
    of  files  which is slowing us down (that only takes 10 sec-
    onds), and it is not the repeated process creation  for  the
    subordinate  _m_a_k_e  invocations  (that  only takes another 15
    seconds).  So just what _i_s taking so long?

    The traditional solutions  to  the  problems  introduced  by
    recursive _m_a_k_e often increase the number of subordinate _m_a_k_e
    invocations beyond the minimum described here; _e_._g_. to  per-
    form multiple repetitions (3.3.2), or to overkill cross-mod-
    ule dependencies (3.3.3).  These can take a long time,  par-
    ticularly  when combined, but do not account for some of the
    more spectacular build times; what else is taking so long?

    Complexity of the Makefile is what is taking so long.   This
    is covered, below, in the _E_f_f_i_c_i_e_n_t _M_a_k_e_f_i_l_e_s section.




    Peter Miller           17 August 2024                Page 12





    AUUGN'97                   Recursive Make Considered Harmful


    44..55..22..  DDeevveellooppmmeenntt BBuuiillddss

    If, as in the 1000 file example, it only takes 10 seconds to
    figure out which one of the files needs  to  be  recompiled,
    there is no serious threat to the productivity of developers
    if they do a whole-project _m_a_k_e as opposed to a  module-spe-
    cific  _m_a_k_e.  The advantage for the project is that the mod-
    ule-centric developer is reminded  at  relevant  times  (and
    only  relevant  times)  that  their work has wider ramifica-
    tions.

    By consistently using C include files which contain accurate
    interface  definitions (including function prototypes), this
    will produce compilation errors in many of the  cases  which
    would result in a defective product.  By doing whole-project
    builds, developers discover such errors very  early  in  the
    development  process, and can fix the problems when they are
    least expensive.

    44..66..  YYoouu''llll RRuunn OOuutt OOff MMeemmoorryy

    This is the most interesting response.  Once long ago, on  a
    CPU far, far away, it may even have been true.  When Feldman
    [feld78] first wrote _m_a_k_e it was 1978 and  he  was  using  a
    PDP11.  Unix processes were limited to 64KB of data.

    On  such  a  computer, the above project with its 3000 files
    detailed in the whole-project Makefile, would  probably  _n_o_t
    allow the DAG and rule actions to fit in memory.

    But  we  are not using PDP11s any more.  The physical memory
    of modern computers exceeds 10MB for  _s_m_a_l_l  computers,  and
    virtual  memory  often exceeds 100MB.  It is going to take a
    project with  hundreds  of  thousands  of  source  files  to
    exhaust  virtual  memory on a _s_m_a_l_l modern computer.  As the
    1000 source file example takes less  than  100KB  of  memory
    (try  it,  I did) it is unlikely that any project manageable
    in a single directory tree on a  single  disk  will  exhaust
    your computer's memory.

    44..77..  WWhhyy NNoott FFiixx TThhee DDAAGG IInn TThhee MMoodduulleess??

    It  was  shown in the above discussion that the problem with
    recursive _m_a_k_e is that the DAGs are incomplete.  It  follows
    that  by  adding the missing portions, the problems would be
    resolved without  abandoning  the  existing  recursive  _m_a_k_e
    investment.

    +o The  developer needs to remember to do this.  The problems
      will not affect the  developer  of  the  module,  it  will
      affect the developers of _o_t_h_e_r modules.  There is no trig-
      ger to remind the developer to do this, other than the ire
      of fellow developers.




    Peter Miller           17 August 2024                Page 13





    AUUGN'97                   Recursive Make Considered Harmful


    +o It  is  difficult to work out where the changes need to be
      made.  Potentially every Makefile in  the  entire  project
      needs  to  be  examined  for  possible  modifications.  Of
      course, you can wait for your fellow  developers  to  find
      them for you.

    +o The include dependencies will be recomputed unnecessarily,
      or will be interpreted incorrectly.  This is because  _m_a_k_e
      is string based, and thus "." and "../ant" are two differ-
      ent places, even when you are in the ant directory.   This
      is  of concern when include dependencies are automatically
      generated - as they are for all large projects.

    By making sure that each Makefile is complete, you arrive at
    the  point  where  the Makefile for at least one module con-
    tains the equivalent of  a  whole-project  Makefile  (recall
    that these modules form a single project and are thus inter-
    connected), and there is no need for the recursion any more.

    55..  EEffffiicciieenntt MMaakkeeffiilleess

    The central theme of this paper is the _s_e_m_a_n_t_i_c side-effects
    of artificially separating a Makefile into the pieces neces-
    sary  to perform a recursive _m_a_k_e.  However, once you have a
    large number of Makefiles,  the  speed  at  which  _m_a_k_e  can
    interpret this multitude of files also becomes an issue.

    Builds can take "forever" for both these reasons: the tradi-
    tional fixes for the separated DAG may be building too  much
    _a_n_d your Makefile may be inefficient.

    55..11..  DDeeffeerrrreedd EEvvaalluuaattiioonn

    The text in a Makefile must somehow be read from a text file
    and understood by _m_a_k_e so that the DAG can  be  constructed,
    and  the  specified  actions attached to the edges.  This is
    all kept in memory.

    The input language for Makefiles is deceptively  simple.   A
    crucial  distinction  that  often  escapes  both novices and
    experts alike is that _m_a_k_e's input language is  _t_e_x_t  _b_a_s_e_d_,
    as  opposed  to  token  based,  as is the case for C or AWK.
    _M_a_k_e does the very least possible to process input lines and
    stash them away in memory.

    As an example of this, consider the following assignment:

                    +--------------------------+
                    |OBJ = main.o parse.o      |
                    +--------------------------+
    Humans  read  this  as  the  variable OBJ being assigned two
    filenames "main.o" and "parse.o".  But _m_a_k_e does not see  it
    that  way.   Instead   OBJ  is  assigned  the _s_t_r_i_n_g "main.o
    parse.o".  It gets worse:



    Peter Miller           17 August 2024                Page 14





    AUUGN'97                   Recursive Make Considered Harmful


                    +--------------------------+
                    |SRC = main.c parse.c      |
                    |OBJ = $(SRC:.c=.o)        |
                    +--------------------------+
    In this case humans expect _m_a_k_e to assign two  filenames  to
    OBJ,  but  _m_a_k_e  actually assigns the string "$(SRC:.c=.o)".
    This is because it is a _m_a_c_r_o language with deferred evalua-
    tion, as opposed to one with variables and immediate evalua-
    tion.

    If this does not seem too problematic, consider the  follow-
    ing Makefile:

                   +-----------------------------+
                   |SRC = $(shell echo 'Ouch!' \ |
                   |  1>&2 ; echo *.[cy])        |
                   |OBJ = \                      |
                   |  $(patsubst %.c,%.o,\       |
                   |    $(filter %.c,$(SRC))) \  |
                   |  $(patsubst %.y,%.o,\       |
                   |    $(filter %.y,$(SRC)))    |
                   |test: $(OBJ)                 |
                   |  $(CC) -o $@ $(OBJ)         |
                   +-----------------------------+
    How  many  times  will the shell command be executed?  OOuucchh!!
    It will be executed _t_w_i_c_e just to construct the DAG,  and  a
    further _t_w_o times if the rule needs to be executed.

    If  this shell command does anything complex or time consum-
    ing (and it usually does) it will  take  _f_o_u_r  times  longer
    than you thought.

    But  it  is  worth looking at the other portions of that OBJ
    macro.  Each time it is named, a huge amount  of  processing
    is performed:

    +o The  argument  to  _s_h_e_l_l is a single string (all built-in-
      functions take a single string argument).  The  string  is
      executed  in  a sub-shell, and the standard output of this
      command is read back in, translating newlines into spaces.
      The result is a single string.

    +o The  argument to _f_i_l_t_e_r is a single string.  This argument
      is broken into two strings at the first comma.  These  two
      strings are then each broken into sub-strings separated by
      spaces.  The first set are the patterns,  the  second  set
      are  the  filenames.   Then,  for each of the pattern sub-
      strings, if a filename sub-string matches it,  that  file-
      name  is  included  in the output.  Once all of the output
      has been found, it is re-assembled into  a  single  space-
      separated string.

    +o The  argument  to _p_a_t_s_u_b_s_t is a single string.  This argu-
      ment is broken into three strings at the first and  second



    Peter Miller           17 August 2024                Page 15





    AUUGN'97                   Recursive Make Considered Harmful


      commas.   The third string is then broken into sub-strings
      separated by spaces, these are the filenames.   Then,  for
      each  of  the filenames which match the first string it is
      substituted according to the second string.  If a filename
      does  not match, it is passed through unchanged.  Once all
      of the output has been generated, it is re-assembled  into
      a single space-separated string.

    Notice how many times those strings are disassembled and re-
    assembled.  Notice how many  ways  that  happens.   _T_h_i_s  _i_s
    _s_l_o_w_.   The  example  here names just two files but consider
    how inefficient this would be for 1000 files.  Doing it _f_o_u_r
    times becomes decidedly inefficient.

    If  you  are using a dumb _m_a_k_e that has no substitutions and
    no built-in functions, this cannot bite you.  But  a  modern
    _m_a_k_e  has  lots  of  built-in  functions and can even invoke
    shell commands on-the-fly.  The  semantics  of  _m_a_k_e's  text
    manipulation  is  such  that  string manipulation in _m_a_k_e is
    very CPU intensive, compared to performing the  same  string
    manipulations in C or AWK.

    55..22..  IImmmmeeddiiaattee EEvvaalluuaattiioonn

    Modern  _m_a_k_e  implementations  have  an immediate evaluation
    ":=" assignment operator.  The above example can be re-writ-
    ten as

                  +------------------------------+
                  |SRC := $(shell echo 'Ouch!' \ |
                  |  1>&2 ; echo *.[cy])         |
                  |OBJ := \                      |
                  |  $(patsubst %.c,%.o,\        |
                  |    $(filter %.c,$(SRC))) \   |
                  |  $(patsubst %.y,%.o,\        |
                  |    $(filter %.y,$(SRC)))     |
                  |test: $(OBJ)                  |
                  |  $(CC) -o $@ $(OBJ)          |
                  +------------------------------+
    Note  that _b_o_t_h assignments are immediate evaluation assign-
    ments.  If the first  were  not,  the  shell  command  would
    always  be  executed  twice.   If  the  second were not, the
    expensive substitutions would be performed  at  least  twice
    and possibly four times.

    As  a rule of thumb: always use immediate evaluation assign-
    ment unless you knowingly want deferred evaluation.

    55..33..  IInncclluuddee FFiilleess

    Many Makefiles perform the same text processing (the filters
    above,  for  example)  for  every  single  _m_a_k_e run, but the
    results of the processing rarely change.   Wherever  practi-
    cal,  it is more efficient to record the results of the text



    Peter Miller           17 August 2024                Page 16





    AUUGN'97                   Recursive Make Considered Harmful


    processing into a file, and have the Makefile  include  this
    file.

    55..44..  DDeeppeennddeenncciieess

    Don't  be  miserly  with include files.  They are relatively
    inexpensive to read, compared to $(shell),  so  more  rather
    than less doesn't greatly affect efficiency.

    As  an  example of this, it is first necessary to describe a
    useful feature of GNU Make: once a Makefile  has  been  read
    in, if any of its included files were out-of-date (or do not
    yet exist), they are re-built, and then _m_a_k_e  starts  again,
    which  has  the  result that _m_a_k_e is now working with up-to-
    date include files.  This feature can be exploited to obtain
    automatic  include  file  dependency tracking for C sources.
    The obvious way to implement it, however, has a subtle flaw.

                    +--------------------------+
                    |SRC := $(wildcard *.c)    |
                    |OBJ := $(SRC:.c=.o)       |
                    |test: $(OBJ)              |
                    |  $(CC) -o $@ $(OBJ)      |
                    |include dependencies      |
                    |dependencies: $(SRC)      |
                    |  depend.sh $(CFLAGS) \   |
                    |    $(SRC) > $@           |
                    +--------------------------+
    The depend.sh script prints lines of the form

         _f_i_l_e.o: _f_i_l_e.c _i_n_c_l_u_d_e.h ...

    The  most  simple  implementation of this is to use _G_C_C_, but
    you will need an equivalent awk script or C program  if  you
    have a different compiler:

                    +--------------------------+
                    |#!/bin/sh                 |
                    |gcc -MM -MG "$@"          |
                    +--------------------------+
    This  implementation  of tracking C include dependencies has
    several serious flaws, but the one most commonly  discovered
    is  that  the  dependencies file does not, itself, depend on
    the C include files.  That is, it is not re-built if one  of
    the  include  files  changes.   There  is no edge in the DAG
    joining the dependencies vertex to any of the  include  file
    vertices.   If  an  include  file changes to include another
    file (a nested include), the dependencies will not be recal-
    culated,  and potentially the C file will not be recompiled,
    and thus the program will not be re-built correctly.

    A classic build-too-little problem, caused  by  giving  _m_a_k_e
    inadequate  information,  and  thus  causing  it to build an
    inadequate DAG and reach the wrong conclusion.



    Peter Miller           17 August 2024                Page 17





    AUUGN'97                   Recursive Make Considered Harmful


    The traditional solution is to build too much:

                    +--------------------------+
                    |SRC := $(wildcard *.c)    |
                    |OBJ := $(SRC:.c=.o)       |
                    |test: $(OBJ)              |
                    |  $(CC) -o $@ $(OBJ)      |
                    |include dependencies      |
                    |.PHONY: dependencies      |
                    |dependencies: $(SRC)      |
                    |  depend.sh $(CFLAGS) \   |
                    |    $(SRC) > $@           |
                    +--------------------------+
    Now, even if  the  project  is  completely  up-do-date,  the
    dependencies will be re-built.  For a large project, this is
    very wasteful, and can be a major contributor to _m_a_k_e taking
    "forever" to work out that nothing needs to be done.

    There  is  a  second problem, and that is that if any _o_n_e of
    the C files changes, _a_l_l of the C files will  be  re-scanned
    for  include dependencies.  This is as inefficient as having
    a Makefile which reads

                    +--------------------------+
                    |prog: $(SRC)              |
                    |  $(CC) -o $@ $(SRC)      |
                    +--------------------------+
    What is needed, in exact analogy to the C case, is  to  have
    an  intermediate form.  This is usually given a ".d" suffix.
    By exploiting the fact that more than one file may be  named
    in  an  include  line, there is no need to "link" all of the
    ".d" files together:

                    +--------------------------+
                    |SRC := $(wildcard *.c)    |
                    |OBJ := $(SRC:.c=.o)       |
                    |test: $(OBJ)              |
                    |  $(CC) -o $@ $(OBJ)      |
                    |include $(OBJ:.o=.d)      |
                    |%.d: %.c                  |
                    |  depend.sh $(CFLAGS) \   |
                    |    $*.c > $@             |
                    +--------------------------+

    This has one more thing to fix:  just  as  the  object  (.o)
    files  depend  on the source files and the include files, so
    do the dependency (.d) files.

         _f_i_l_e.d _f_i_l_e.o: _f_i_l_e.c _i_n_c_l_u_d_e.h

    This means tinkering with the depend.sh script again:






    Peter Miller           17 August 2024                Page 18





    AUUGN'97                   Recursive Make Considered Harmful


                +-----------------------------------+
                |#!/bin/sh                          |
                |gcc -MM -MG "$@" |                 |
                |sed -e 's@^\(.*\)\.o:@\1.d \1.o:@' |
                +-----------------------------------+

    This method of determining include file dependencies results
    in  the  Makefile  including  more  files  than the original
    method, but opening files is less expensive than  rebuilding
    all  of  the dependencies every time.  Typically a developer
    will edit one or two files before re-building;  this  method
    will  rebuild  the  _e_x_a_c_t  dependency file affected (or more
    than one, if you edited an include file).  On balance,  this
    will use less CPU, and less time.

    In  the case of a build where nothing needs to be done, _m_a_k_e
    will actually do  nothing,  and  will  work  this  out  very
    quickly.

    However,  the above technique assumes your project fits eni-
    trely within the one directory.  For  large  projects,  this
    usually  isn't  the  case.   This  means  tinkering with the
    depend.sh script again:

                +-----------------------------------+
                |#!/bin/sh                          |
                |DIR="$1"                           |
                |shift 1                            |
                |case "$DIR" in                     |
                |"" | ".")                          |
                |gcc -MM -MG "$@" |                 |
                |sed -e 's@^\(.*\)\.o:@\1.d \1.o:@' |
                |;;                                 |
                |*)                                 |
                |gcc -MM -MG "$@" |                 |
                |sed -e "s@^\(.*\)\.o:@$DIR/\1.d \  |
                |$DIR/\1.o:@"                       |
                |;;                                 |
                |esac                               |
                +-----------------------------------+
    And the rule needs to change, too, to pass the directory  as
    the first argument, as the script expects.

                    +---------------------------+
                    |%.d: %.c                   |
                    |  depend.sh `dirname $*` \ |
                    |    $(CFLAGS) $*.c > $@    |
                    +---------------------------+
    Note  that  the  .d  files will be relative to the top level
    directory.  Writing them so that they can be used  from  any
    level is possible, but beyond the scope of this paper.






    Peter Miller           17 August 2024                Page 19





    AUUGN'97                   Recursive Make Considered Harmful


    55..55..  MMuullttiipplliieerr

    All of the inefficiencies described in this section compound
    together.  If you do 100 Makefile interpretations, once  for
    each module, checking 1000 source files can take a very long
    time - if the interpretation requires complex processing  or
    performs  unnecessary  work, or both.  A whole project _m_a_k_e,
    on the other hand, only needs to interpret  a  single  Make-
    file.

    66..  PPrroojjeeccttss _v_e_r_s_u_s SSaanndd--bbooxxeess

    The  above discussion assumes that a project resides under a
    single directory tree, and this is often  the  ideal.   How-
    ever,  the realities of working with large software projects
    often lead to weird and wonderful  directory  structures  in
    order  to  have  developers working on different sections of
    the project without taking complete copies and thereby wast-
    ing precious disk space.

    It  is  possible to see the whole-project _m_a_k_e proposed here
    as impractical, because it does not match the evolved  meth-
    ods of your development process.

    The  whole-project _m_a_k_e proposed here does have an effect on
    development methods: it can give  you  cleaner  and  simpler
    build  environments  for  your  developers.  By using _m_a_k_e's
    VPATH feature, it is possible to copy only those  files  you
    need  to  edit  into  your private work area, often called a
    _s_a_n_d_-_b_o_x_.

    The simplest explanation of what VPATH does is  to  make  an
    analogy  with  the  include file search path specified using
    -I_p_a_t_h options to the  C  compiler.   This  set  of  options
    describes  where to look for files, just as VPATH tells _m_a_k_e
    where to look for files.

    By using VPATH, it is possible to "stack"  the  sand-box  _o_n
    _t_o_p _o_f the project master source, so that files in the sand-
    box take precedence, but it is the union of  all  the  files
    which _m_a_k_e uses to perform the build.
                      +          +
                     +_M+_a_s_t_e_r _S_o_u_r+_c+_e
                     +   main.c +   _C_o_m_b_i_n_e_d _V_i_e_w
                    +   parse.y+       main.c
                     _S_a_n_d_-_B_o_x    +     parse.y
                      main.c    ++   variable.c
                                +
                    variable.c +


    In  this  environment, the sand-box has the same tree struc-
    ture as the project master source.  This  allows  developers
    to  safely  change  things  across separate modules, _e_._g_. if



    Peter Miller           17 August 2024                Page 20





    AUUGN'97                   Recursive Make Considered Harmful


    they are changing a module interface.  It  also  allows  the
    sand-box  to be physically separate - perhaps on a different
    disk, or under their home directory.   It  also  allows  the
    project master source to be read-only, if you have (or would
    like) a rigorous check-in procedure.

    Note: in addition to adding a VPATH line to your development
    Makefile, you will also need to add -I options to the CFLAGS
    macro, so that the C compiler uses the  same  path  as  _m_a_k_e
    does.   This  is  simply done with a 3-line Makefile in your
    work area - set a macro, set the VPATH, and then include the
    Makefile from the project master source.

    66..11..  VVPPAATTHH SSeemmaannttiiccss

    For  the above discussion to apply, you need to use GNU make
    3.76 or later.  For versions of GNU Make earlier than  3.76,
    you  will  need  Paul  Smith's  VPATH+  patch.   This may be
    obtained from  ftp://ftp.wellfleet.com/netman/psmith/gmake/.

    The  POSIX  semantics  of  VPATH are slightly brain-dead, so
    many other _m_a_k_e implementations are too  limited.   You  may
    want to consider installing GNU Make.

    77..  TThhee BBiigg PPiiccttuurree

    This  section  brings  together all of the preceding discus-
    sion, and presents the example  project  with  its  separate
    modules,  but  with a whole-project Makefile.  The directory
    structure is changed little from the recursive case,  except
    that  the  deeper  Makefiles are replaced by module specific
    include files:
                          +++
                          ++_P+_r|_o_j_e_c_t
                           ++++Maanktefile
                           |++ +|module.mk
                           | - +|main.c
                           ++++b+ee
                           | + +|module.mk
                           | + +|parse.y
                           + +|depend.sh


    The Makefile looks like this:

                  +-------------------------------+
                  |MODULES := ant bee             |
                  |# look for include files in    |
                  |#   each of the modules        |
                  |CFLAGS += $(patsubst %,-I%,\   |
                  |  $(MODULES))                  |
                  |# extra libraries if required  |
                  |LIBS :=                        |
                  +-------------------------------+



    Peter Miller           17 August 2024                Page 21





    AUUGN'97                   Recursive Make Considered Harmful


                  +-------------------------------+
                  |# each module will add to this |
                  |SRC :=                         |
                  |# include the description for  |
                  |#   each module                |
                  |include $(patsubst %,\         |
                  |    %/module.mk,$(MODULES))    |
                  |# determine the object files   |
                  |OBJ :=                    \    |
                  |  $(patsubst %.c,%.o,     \    |
                  |    $(filter %.c,$(SRC))) \    |
                  |  $(patsubst %.y,%.o,     \    |
                  |    $(filter %.y,$(SRC)))      |
                  |# link the program             |
                  |prog: $(OBJ)                   |
                  |  $(CC) -o $@ $(OBJ) $(LIBS)   |
                  |# include the C include        |
                  |#   dependencies               |
                  |include $(OBJ:.o=.d)           |
                  |# calculate C include          |
                  |#   dependencies               |
                  |%.d: %.c                       |
                  |  depend.sh `dirname $*.c` \   |
                  |    $(CFLAGS) $*.c > $@        |
                  +-------------------------------+
    This looks absurdly large, but it has all of the common ele-
    ments  in  the  one place, so that each of the modules' _m_a_k_e
    includes may be small.

    The ant/module.mk file looks like:

                    +--------------------------+
                    |SRC += ant/main.c         |
                    +--------------------------+
    The bee/module.mk file looks like:

                    +--------------------------+
                    |SRC += bee/parse.y        |
                    |LIBS += -ly               |
                    |%.c %.h: %.y              |
                    |  $(YACC) -d $*.y         |
                    |  mv y.tab.c $*.c         |
                    |  mv y.tab.h $*.h         |
                    +--------------------------+

    Notice that the built-in rules are used for the C files, but
    we  need  special  yacc  processing  to get the generated .h
    file.

    The savings in this example  look  irrelevant,  because  the
    top-level  Makefile is so large.  But consider if there were
    100 modules, each with only a  few  non-comment  lines,  and
    those specifically relevant to the module.  The savings soon
    add up to a total size often _l_e_s_s _t_h_a_n the  recursive  case,



    Peter Miller           17 August 2024                Page 22





    AUUGN'97                   Recursive Make Considered Harmful


    without loss of modularity.

    The equivalent DAG of the Makefile after all of the includes
    looks like this:
    
                                prog



                          main.o   parse.o
                            main.d|  parse.d|

                      main.c   parse.h  parse.c



                                   parse.y



    The vertexes and edges for the include file dependency files
    are also present as these are important for _m_a_k_e to function
    correctly.

    77..11..  SSiiddee EEffffeeccttss

    There are a couple of desirable side-effects of using a sin-
    gle Makefile.

    +o  The  GNU  Make -j option, for parallel builds, works even
    better than before.  It can find even more unrelated  things
    to do at once, and no longer has some subtle problems.

    +o The general make -k option, to continue as far as possible
    even in the face of errors, works even better  than  before.
    It can find even more things to continue with.

    88..  LLiitteerraattuurree SSuurrvveeyy

    How  can  it be possible that we have been misusing _m_a_k_e for
    20 years?  How can it be possible that  behavior  previously
    ascribed to _m_a_k_e's limitations is in fact a result of misus-
    ing it?

    The author only started thinking about the  ideas  presented
    in  this  paper when faced with a number of ugly build prob-
    lems on utterly different projects, but  with  common  symp-
    toms.   By  stepping  back from the individual projects, and
    closely examining the thing they had  in  common,  _m_a_k_e,  it
    became  possible  to see the larger pattern.  Most of us are
    too caught up in the minutiae of  just  getting  the  rotten
    build  to  work that we don't have time to spare for the big
    picture.  Especially when the item in  question  "obviously"
    works, and has done so continuously for the last 20 years.



    Peter Miller           17 August 2024                Page 23





    AUUGN'97                   Recursive Make Considered Harmful


    It  is  interesting  that the problems of recursive _m_a_k_e are
    rarely mentioned in the very books Unix programmers rely  on
    for accurate, practical advice.

    88..11..  TThhee OOrriiggiinnaall PPaappeerr

    The  original  _m_a_k_e  paper [feld78] contains no reference to
    recursive _m_a_k_e_, let alone any discussion as to the  relative
    merits of whole project _m_a_k_e over recursive _m_a_k_e_.

    It is hardly surprising that the original paper did not dis-
    cuss recursive _m_a_k_e, Unix projects at the time  usually  _d_i_d
    fit into a single directory.

    It  may  be this which set the "one Makefile in every direc-
    tory" concept so firmly in the collective  Unix  development
    mind-set.

    88..22..  GGNNUU MMaakkee

    The GNU Make manual [stal93] contains several pages of mate-
    rial concerning recursive _m_a_k_e_, however  its  discussion  of
    the  merits or otherwise of the technique are limited to the
    brief statement that

         "This technique is useful when you want  to  sepa-
         rate makefiles for various subsystems that compose
         a larger system."

    No mention is made of the problems you may encounter.

    88..33..  MMaannaaggiinngg PPrroojjeeccttss wwiitthh MMaakkee

    The Nutshell Make book [talb91] specifically promotes recur-
    sive _m_a_k_e over whole project _m_a_k_e because

         "The  cleanest  way  to build is to put a separate
         description file in each directory, and  tie  them
         together  through  a  master description file that
         invokes _m_a_k_e recursively.  While  cumbersome,  the
         technique  is  easier  to  maintain than a single,
         enormous file that covers  multiple  directories."
         (p. 65)

    This  is  despite the book's advice only two paragraphs ear-
    lier that

         "_m_a_k_e is happiest when you keep all your files  in
         a single directory." (p. 64)

    Yet the book fails to discuss the contradiction in these two
    statements, and goes on to describe one of  the  traditional
    ways  of  treating the symptoms of incomplete DAGs caused by
    recursive _m_a_k_e.



    Peter Miller           17 August 2024                Page 24





    AUUGN'97                   Recursive Make Considered Harmful


    The book may give us a clue as to  why  recursive  _m_a_k_e  has
    been  used  in  this  way for so many years.  Notice how the
    above quotes confuse the concept of  a  directory  with  the
    concept of a Makefile.

    This  paper suggests a simple change to the mind-set: direc-
    tory trees, however deep, are places to store  files;  Make-
    files are places to describe the relationships between those
    files, however many.

    88..44..  BBSSDD MMaakkee

    The tutorial for BSD Make [debo88] says nothing at all about
    recursive  _m_a_k_e,  but  it  is  one of the few which actually
    described, however briefly, the relationship between a Make-
    file and a DAG (p. 30).  There is also a wonderful quote

         "If  _m_a_k_e doesn't do what you expect it to, it's a
         good chance the makefile is wrong." (p. 10)

    Which is a pithy summary of the thesis of this paper.

    99..  SSuummmmaarryy

    This paper presents a number of related problems, and demon-
    strates  that  they are not inherent limitations of _m_a_k_e, as
    is commonly believed,  but  are  the  result  of  presenting
    incorrect  information to _m_a_k_e.  This is the ancient _G_a_r_b_a_g_e
    _I_n_, _G_a_r_b_a_g_e _O_u_t principle at work.  Because  _m_a_k_e  can  only
    operate  correctly with a complete DAG, the error is in seg-
    menting the Makefile into incomplete pieces.

    This requires a shift in thinking: directory _t_r_e_e_s are  sim-
    ply a place to hold files, Makefiles are a place to remember
    relationships between files.  Do not confuse the two because
    it is as important to accurately represent the relationships
    between files in different directories as it is to represent
    the relationships between files in the same directory.  This
    has the implication that there should be exactly  one  Make-
    file for a project, but the magnitude of the description can
    be managed by using a _m_a_k_e include file in each directory to
    describe  the subset of the project files in that directory.
    This is just as modular as having a Makefile in each  direc-
    tory.

    This  paper  has shown how a project build and a development
    build can be equally brief for a whole-project _m_a_k_e.   Given
    this  parity  of time, the gains provided by accurate depen-
    dencies mean that this process will, in fact, be faster than
    the recursive _m_a_k_e case, and more accurate.







    Peter Miller           17 August 2024                Page 25





    AUUGN'97                   Recursive Make Considered Harmful


    99..11..  IInntteerr--ddeeppeennddeenntt PPrroojjeeccttss

    In organizations with a strong culture of re-use, implement-
    ing whole-project _m_a_k_e can present  challenges.   Rising  to
    these challenges, however, may require looking at the bigger
    picture.

    +o A module may be shared between two  programs  because  the
      programs  are  closely related.  Clearly, the two programs
      plus the shared module belong to  the  same  project  (the
      module  may  be self-contained, but the programs are not).
      The dependencies must be explicitly stated, and changes to
      the  module  must result in both programs being recompiled
      and re-linked as appropriate.  Combining them all  into  a
      single  project  means  that whole-project _m_a_k_e can accom-
      plish this.

    +o A module may be shared between two projects  because  they
      must  inter-operate.  Possibly your project is bigger than
      your current directory structure implies.   The  dependen-
      cies  must be explicitly stated, and changes to the module
      must result in both  projects  being  recompiled  and  re-
      linked  as  appropriate.  Combining them all into a single
      project means that whole-project _m_a_k_e can accomplish this.

    +o It  is  the  normal  case  to  omit the edges between your
      project and the operating system or other installed  third
      party tools.  So normal that they are ignored in the Make-
      files in this paper, and they are ignored in the  built-in
      rules of _m_a_k_e programs.
      Modules shared between your projects may fall into a simi-
      lar category: if they change, you  will  deliberately  re-
      build  to  include their changes, or quietly include their
      changes whenever the next build  may  happen.   In  either
      case,  you  do  not explicitly state the dependencies, and
      whole-project _m_a_k_e does not apply.

    +o Re-use may be better served if the module were used  as  a
      template,  and  divergence between two projects is seen as
      normal.  Duplicating the module in each project allows the
      dependencies  to  be explicitly stated, but requires addi-
      tional effort if maintenance is  required  to  the  common
      portion.

    How to structure dependencies in a strong re-use environment
    thus becomes an exercise in _r_i_s_k _m_a_n_a_g_e_m_e_n_t.   What  is  the
    danger  that  omitting  chunks  of  the  DAG  will harm your
    projects?  How vital is it to rebuild if a  module  changes?
    What  are  the consequences of _n_o_t rebuilding automatically?
    How can you tell when a rebuild is necessary if  the  depen-
    dencies  are  not  explicitly  stated?   What are the conse-
    quences of forgetting to rebuild?





    Peter Miller           17 August 2024                Page 26





    AUUGN'97                   Recursive Make Considered Harmful


    99..22..  RReettuurrnn OOnn IInnvveessttmmeenntt

    Some of the techniques presented in this paper will  improve
    the speed of your builds, even if you continue to use recur-
    sive _m_a_k_e.  These are not the focus of this paper, merely  a
    useful detour.

    The  focus  of this paper is that you will get more accurate
    builds of your project if you use whole-project _m_a_k_e  rather
    than recursive _m_a_k_e.

    +o The  time  for  _m_a_k_e  to work out that nothing needs to be
      done will not be more, and will often be less.

    +o The size and complexity of the total Makefile  input  will
      not be more, and will often be less.

    +o The  total  Makefile  input is no less modular than in the
      resursive case.

    +o The difficulty of maintaining  the  total  Makefile  input
      will not be more, and will often be less.

    The disadvantages of using whole-project _m_a_k_e over recursive
    _m_a_k_e are often un-measured.  How much time is spent figuring
    out  why  _m_a_k_e  did  something unexpected?  How much time is
    spent figuring out that _m_a_k_e ddiidd something unexpected?   How
    much  time is spent tinkering with the build process?  These
    activities are often  thought  of  as  "normal"  development
    overheads.

    Building  your  project is a fundamental activity.  If it is
    performing poorly, so are development, debugging  and  test-
    ing.  Building your project needs to be so simple the newest
    recruit can do it immediately with only  a  single  page  of
    instructions.   Building  your project needs to be so simple
    that it rarely needs any development effort at all.  Is your
    build process this simple?



















    Peter Miller           17 August 2024                Page 27





    AUUGN'97                   Recursive Make Considered Harmful


    1100..  RReeffeerreenncceess


         ddeebboo8888:: Adam de Boor (1988).  _P_M_a_k_e _- _A _T_u_t_o_r_i_a_l.  Uni-
    versity of California, Berkeley

         ffeelldd7788:: Stuart I. Feldman (1978).  _M_a_k_e _- _A _P_r_o_g_r_a_m _f_o_r
    _M_a_i_n_t_a_i_n_i_n_g  _C_o_m_p_u_t_e_r _P_r_o_g_r_a_m_s.  Bell Laboratories Computing
    Science Technical Report 57

         ssttaall9933:: Richard M. Stallman and Roland McGrath  (1993).
    _G_N_U _M_a_k_e_: _A _P_r_o_g_r_a_m _f_o_r _D_i_r_e_c_t_i_n_g _R_e_c_o_m_p_i_l_a_t_i_o_n.  Free Soft-
    ware Foundation, Inc.

         ttaallbb9911:: Steve Talbott (1991).  _M_a_n_a_g_i_n_g  _P_r_o_j_e_c_t_s  _w_i_t_h
    _M_a_k_e_, _2_n_d _E_d.  O'Reilly & Associates, Inc.

    1111..  AAbboouutt tthhee AAuutthhoorr

    Peter  Miller  has worked for many years in the software R&D
    industry, principally on UNIX systems. In that time  he  has
    written  tools  such as Aegis (a software configuration man-
    agement system) and Cook (yet  another  _m_a_k_e-oid),  both  of
    which  are freely available on the Internet.  Supporting the
    use of these tools  at  many  Internet  sites  provided  the
    insights which led to this paper.

    Please  visit  http://www.canb.auug.org.au/~millerp/  if you
    would like to look at some of the author's free software.




























    Peter Miller           17 August 2024                Page 28


