
                   Getting Started with Linux-HA (heartbeat)

Intro

   Let  me preface this document by saying most of this is _not_ original
   work.   My  purpose  for  writing  this  document  is  just  trying to
   contribute  in  some  way to possibly help those who REALLY get things
   done.   The  "work"  I  am  contributing  is mostly compiling bits and
   pieces  from  other  HA  documents  (such as Volker Wiegand's Hardware
   Installation  Guide) into a document that can help novices get started
   on  HA  without pestering Alan (like I did!) and to cut down on repeat
   questions on the mailing list.


Getting Started

   The  first  thing  you'll  need  is  two computers.  You need not have
   identical  hardware  in both machines (or amount of memory, etc.), but
   if  you did, it would make your life that much easier when a component
   fails.

   Now you have to decide on some of your implementation.  Your "cluster"
   is  established  via  a  "heartbeat" between the two computers (nodes)
   generated  by  the  software  package of the same name.  However, this
   heartbeat  needs  one  or  more  media  paths (serial via a null modem
   cable, ethernet via a crossover cable, etc.) between the nodes.

   At  this  point,  you're  actually  ready  to begin hardware-wise.  Of
   course,  since  you're  looking  into HA, you'll mostly likely want to
   avoid  having  only one point of failure.  In this case, that would be
   your    null    modem   cable/serial   port   or   network   interface
   card(NIC)/crossover cable.  So, you need to decide whether you wish to
   add  a  second  serial/null  modem  connection  or  a  second  network
   interface card (NIC)/crossover connnection to each node.  See Appendix
   A  for  instructions  on  how  to  build  a Cat-5 crossover cable.  My
   heartbeat  path setup uses one serial port and one extra NIC because I
   only had one null modem cable, had an extra of NIC on hand and thought
   it was good to have two medium types for the heartbeats.

   Once your hardware is in order, you must install your OS and configure
   your  networking  (I  used  Red  Hat).   Assuming you have 2 NICs, one
   should  be  configured  for  your  "normal" network and the other as a
   private  network  between  your  clustered  nodes  (via  the crossover
   cable).  For an example, we will assume that our cluster will have the
   following addresses:

   Node 1 (linuxha1):   192.168.85.1  (normal 192x net)
                        10.0.0.1 (private 10x net for heartbeat)
   Node 2 (linuxha2):   192.168.85.2  (192x)
                        10.0.0.2  (10x)
   Note:   None of these addresses should be your "cluster address" - the
   address handled by heartbeat and failed over between nodes!

   Most *nix distributions this easy during installation, however, if you
   are  having  any  problems, refer to either the Ethernet HOWTO, or the
   documentation  for your distribution.     To check your configuration,
   type:

            ifconfig

   This  will  show your network interfaces and their configuration.  You
   can obtain your network routing information from "netstat -nr".

   If  it  looks  good,  make sure you can ping between both nodes on all
   interfaces.

   Next,   if   you're  using  one,  you'll  need  to  test  your  serial
   connection.  On one node, which will be the receiver, type:
              cat </dev/ttyS0

   On the other node, type,:
              echo hello >/dev/ttyS0

   You  should  see  the  text on the receiver node.  If it works, change
   their  roles  and  try  again.   If it doesn't, it may be as simple as
   having  the  wrong  device  file.   Volker's HA Hardware Guide and the
   Serial  HOWTO  are  two good resources for troubleshooting your serial
   connection.

Installing Heartbeat.

   You  can  now  install the heartbeat package.  If you're reading this,
   you already have it, but in any case it's available at:

          [1]http://linux-ha.org/download

   There  are binary RPMs at the website, or you can build heartbeat from
   source.   Grab the tarball (or install the source RPM).  Untar it into
   your  favorite  source  directory.    From the top of the source tree,
   type "./ConfigureMe configure", followed by "make" and "make install".
    If  you   have  problems installing the RPMs found at the website and
   want a way to make your own, there  may be help in the [2]FAQ.

Configuring Heartbeat

   Configuring ha.cf
   There  are  three  files you will need to configure before starting up
   heartbeat.   First,  is  ha.cf.   This will be placed in the /etc/ha.d
   directory that is created after installation.  It tells heartbeat what
   types  of media paths to use and how to configure them.   The ha.cf in
   the  source  directory  contains  all the various options you can use,
   I'll go through it line by line...

   serial /dev/ttyS0
          Use  a  serial heartbeat - if you don't use a serial heartbeat,
          you  must  use  another  medium,  such  as  a  bcast (ethernet)
          heartbeat.  Replace /dev/ttyS0 with the appropriate device file
          for your required serial heartbeat.

   watchdog /dev/watchdog
          Optional.   The  watchdog  function  provides  a  way to have a
          system that is still minimally functioning, but not providing a
          heartbeat,  reboot  itself  after a minute of being sick.  This
          could  help  to avoid a scenario where the machine recovers its
          heartbeat  after being pronounced dead.  If that happened and a
          disk  mount  failed  over,  you could have two nodes mounting a
          disk  simultaneously.  If you wish to use this feature, then in
          addition  to  this  line,  you  will need to load the "softdog"
          kernel  module  and create the actual device file.  To do this,
          first  type  "insmod  softdog"  to  load the module. Then, type
          "grep  misc  /proc/devices"  and  note  the  number  it reports
          (should  be  10).   Next, type "cat /proc/misc | grep watchdog"
          and  note  that number (should be 130).  Now you can create the
          device  file  with  that info typing, "mknod /dev/watchdog c 10
          130".

   bcast eth1
          Specifies  to use a broadcast heartbeat over the eth1 interface
          (replace with eth0, eth2, or whatever you use).

   keepalive 2
          Sets the time between heartbeats to 2 seconds.

   warntime 10
          Time  in  seconds  before issuing a "late heartbeat" warning in
          the logs.

   deadtime 30
          Node is pronounced dead after 30 seconds.

   initdead 120
          With  some configurations, the network takes some time to start
          working  after  a  reboot.     This is a separate "deadtime" to
          handle  that  case.   It  should  be  at least twice the normal
          deadtime.

   hopfudge 1
          Optional.   For  ring  topologies,  number  of  hops allowed in
          addition to the number of nodes in the cluster.

   baud 19200
          Speed at which to run the serial line (bps).

   udpport 694
          Use  port  number 694 for bcast or ucast communication. This is
          the default, and the official IANA registered port number.

   auto_failback on

        Required.  For those familiar with Tru64 Unix, heartbeat acts as
                if in "favored member" mode.  The master listed in the
                haresources file holds all the resources until a
                failover, at which time the slave takes over.  When
                auto_failback is set to on once the master comes back
                online, it will take everything back from the slave.
                When set to off this option will prevent the master node
                from re-acquiring cluster resources after a failover.
                This option is similar to to the obsolete nice_failback
                option. If you want to upgrade from a cluster which had
                nice_failback set off, to this or later versions, special
                considerations apply in order to want to avoid requiring
                a flash cut. Please see the [3]FAQ for details on how to
                deal with this situation.

   node linuxha1.linux-ha.org
          Mandatory.   Hostname  of  machine  in  cluster as described by
          `uname -n`.

   node linuxha2.linux-ha.org
          Mandatory.   Hostname  of  machine  in  cluster as described by
          `uname -n`.

   respawn  userid  cmd
          Optional:   Lists  a command to be spawned  and monitored.  Eg:
          To spawn ccm daemons the following line has to be added:
                  respawn hacluster /usr/lib/heartbeat/ccm
          Informs  heartbeat to spawn the command with the credentials of
          that  of  userid  (hacluster, in this example) and monitors the
          health  of the process, respawning it if dead.  For ipfail, the
          line would be:
                    respawn hacluster /usr/lib/heartbeat/ipfail
          NOTE: If  the  process  dies with exit code 100, the process is
          not respawned.

   ping    ping1.linux-ha.org  ping2.linux-ha.org ....
          Optional:  Specify  ping nodes.  These nodes are not considered
          as cluster nodes.  They are used to check  network connectivity
          for modules like ipfail.

   ping_group    name  ping1.linux-ha.org  ping2.linux-ha.org ....
          Optional: Specify a group ping nodes.  These are the similar to
          ping  nodes,  but  if any node in a group is available then the
          group is considered available. The group name can be any string
          and  is  used  to  uniquely identify the group. Each group must
          appear  on  a  separate  line. Like ping nodes the group is not
          considered  to be a cluster node. They appear to be the same as
          ping  nodes  and  are  used  to check  network connectivity for
          modules like ipfail.

   Configuring haresources
   Once  you've got your ha.cf set up, you need to configure haresources.
   This  file  specifies the services for the cluster and who the default
   owner is.
   Note:  This file must be the same on both nodes!

   For  our  example,  we'll  assume  the  high availability services are
   Apache  and  Samba.   The  IP  for the cluster is mandatory, and don't
   configure  the  cluster  IP  outside  of  the  haresources file!.  The
   haresources will need one line:
                  linuxha1.linux-ha.org 192.168.85.3 httpd smb

   So,  this  line  dictates  that on startup, have linuxha1 serve the IP
   192.168.85.3 and start apache and samba as well.
   On  shutdown, heartbeat will first stop smb, then apache, then give up
   the   IP.   This  assumes  that  the  command  "uname  -n"  spits  out
   "linuxha1.linux-ha.org"  - yours may well produce "linuxha1" and if it
   does, use that instead!

   Note:   httpd  and  smb are the name of startup scripts for Apache and
   Samba,  respectively.   Heartbeat will look for startup scripts of the
   same name in the following paths:
       /etc/ha.d/resource.d
       /etc/rc.d/init.d

   These scripts must start services via "scriptname start" and stop them
   via "scriptname stop".
   So  you  can  use  any  services  as long as they conform to the above
   standard.

   Should you need to pass arguments to a custom script, the format would
   be:
                scriptname::argument

   So,  if  we added a service "maid" which needed the argument "vacuum",
   our haresources line would modify to the following:
                linuxha1 192.168.85.3 httpd smb maid::vacuum

   This  brings us to some added flexibility with the service IP address.
   We  are  actually  using  a shorthand notation above.  The actual line
   could have read (we've canned the maid):
                linuxha1 IPaddr::192.168.85.3 httpd smb

   Where  IPaddr  is  the name of our service script, taking the argument
   192.168.85.3.    Sure   enough,   if   you   look   in  the  directory
   /etc/ha.d/resource.d,  you  will  find  a  script called IPaddr.  This
   script  will  also  allow  you  to  manipulate  the netmask, broadcast
   address  and  base  interface of this IP service.  To specify a subnet
   with  32  addresses,  you could define the service as (leaving off the
   IPaddr because we can!):
                linuxha1 192.168.85.3/27 httpd smb

   This  sets  the  IP  service  address  to 192.168.85.3, the netmask to
   255.255.255.224   and   the   broadcast   address   would  default  to
   192.168.85.31  (which is the highest address on the subnet).  The last
   parameter  you  can  set  is  the  broadcast address.  To override the
   default  and set it to 192.168.85.16, your entry would read:
                linuxha1 192.168.85.3/27/192.168.85.16 httpd smb

   You  may  be  wondering whether any of the above is necessary for you.
   It  depends.   If you've properly established a net route (independent
   of  heartbeat)  for the service's IP address, with the correct netmask
   and  broadcast address, then no, it's not necessary for you.  However,
   this  case  won't fit everybody and that's why the option's there!  In
   addition,  you may have more than one possible interface that could be
   used for the service IP.  Read on to see how heartbeat treats this...

   Once  you  straighten  out  your  haresources  file,  copy  ha.cf  and
   haresources to /etc/ha.d and you're ready to start!

   Configuring ipfail
   The  ipfail  plugin attempts to provide detection of network failures,
   and  then  intelligently  react,  directing  the  cluster  to failover
   resources as necessary. In order to accomplish this goal, it uses ping
   nodes  or  ping  groups  which  work  as  "dumb"  third parties in the
   cluster.  Provided  both  HA  nodes  can  communicate with each other,
   ipfail  can reliably detect when one of their network links has become
   unusable, and compensate.
   To configure ipfail, the following steps must be performed.
    1. Select good ping node candidates.
       It  is  essential  that good strategic ping nodes be selected. The
       better  your  choices,  the  stronger  your  HA  cluster  becomes.
       Choosing solid network devices like switches and routers is a good
       idea.  Do  not choose either of the members of the HA cluster. Nor
       should  you  select someone's workstation. It is also important to
       select  ping nodes that reflect the connectivity of your HA nodes.
       If  you  wish to monitor the connectivity of two interfaces, it is
       wise  to  select a ping node for each interface, that is reachable
       exclusively from said interface. Consult [4]ipfail-diagram.pdf for
       a graphical representation of this idea.
    2. Set auto_failback to on or off.
       ipfail  will  only  operate  if  heartbeat  has been configured to
       something other than legacy In ha.cf, set the auto_failback option
       to "on" or "off" like so:

     auto_failback on
       or

     auto_failback off
    3. Configure your ha.cf to start ipfail.
       Add  a  line  like  the  following to ha.cf (assuming your compile
       PREFIX is /usr)

     respawn hacluster /usr/lib/heartbeat/ipfail
    4. Add the ping nodes to ha.cf.
       The  ping  nodes  can be added to the cluster by using a line like
       the following:

     ping pnode1 pnode2 pnodeN
       Simply replace pnode1, pnode2, ... pnodeN with the IP addresses of
       your ping nodes.

   Ensure  that the above configuration directives are added to the ha.cf
   on both members of the cluster, and that they are identical.

     NOTE:  You will want to check on the availability of the ping nodes
     prior  to  using  them. If you cannot ping them from both of the HA
     nodes, they are useless.

Selecting an Interface

   One important aspect of configuring the haresources file for a machine
   which  has  multiple  ethernet  interfaces  is  to  know how heartbeat
   selects  which interface will wind up supporting the service addresses
   that  are  configured  in  haresources.   After  all, no interface was
   specified in the haresources file.

   Heartbeat  decides  which  interface  will  be  used by looking at the
   routing  table.   It  tries  to select the lowest cost route to the IP
   address  to be taken over.  In the case of a tie, it chooses the first
   route  found.   For  most  configurations this means the default route
   will be least preferred.

   If  you  don't specify a netmask for the IP address in the haresources
   file,  the  netmask  associated  with the selected route will be used.
   Simmilarly,  if  an  interface  is  not specivied, then the virtual ip
   address  will  be  added to the interface associated with the selected
   route.  If  the broadcast address is omitted then the hightest address
   in the subnet is used.

   Configuring Authkeys

   The  third  file  to  configure  determines  your authentication keys.
   There  are three types of authentication methods available:  crc, md5,
   and  sha1.  "Well, which should I use?", you ask.  Since this document
   is called "Getting Started", we'll keep it simple......

   If  your  heartbeat  runs over a secure network, such as the crossover
   cable  in  our  example, you'll want to use crc.  This is the cheapest
   method  from a resources perspective.  If the network is insecure, but
   you're  either  not  very  paranoid  or concerned about minimizing CPU
   resources,  use  md5.   Finally,  if  you want the best authentication
   without  regard  for  CPU  resources,  use  sha1.  It's the hardest to
   crack.

   The format of the file is as follows:
   auth <number>
   <number> <authmethod> [<authkey>]

   SO, for sha1, a sample /etc/ha.d/authkeys could be:
   auth 1
   1 sha1 key-for-sha1-any-text-you-want

   For  md5, you could use the same as the above, but replace "sha1" with
   "md5".

   Finally, for crc, a sample might be:
   auth 2
   2 crc

   Whatever  index  you put after the keyword auth must be found below in
   the  keys  listed in the file. If you put "auth 4", then there must be
   an "4 signaturetype" line in the list below.

   Make sure its permissions are safe, like 600.  And "any text you want"
   is  not  quite right.  There's a limit to the number of characters you
   can use.
   That's it!

Starting and testing heartbeat

   From  Red  Hat,  or  other distributions which use /etc/init.d startup
   files, simply type /etc/init.d/heartbeat start on both nodes.  I would
   recommend  starting  on  the  system  master (in our example linuxha1)
   first.

   If  you  want  heartbeat  to run on startup, what to do will differ on
   your  distribution.  You may need to place links to the startup script
   in  the  appropriate init level directories, but the RPM versions will
   do  this  for  you.   I have heartbeat start at its default sequential
   priority  (75,  which  means it starts after services 74 and lower and
   before  services  with  priority 76-99), end at its default sequential
   priority   (05),   and   only   care  about  the  0(halt),  6(reboot),
   3(text-only), 5(X) run levels.

   So,  if  I had to do it by hand, I'd need to type in the following (as
   root, of course):

       cd /etc/rc.d/rc0.d ; ln -s ../init.d/heartbeat K05heartbeat
       cd /etc/rc.d/rc3.d ; ln -s ../init.d/heartbeat S75heartbeat
       cd /etc/rc.d/rc5.d ; ln -s ../init.d/heartbeat S75heartbeat
       cd /etc/rc.d/rc6.d ; ln -s ../init.d/heartbeat K05heartbeat

   The last time I ran slackware, there was no /etc/rc.d/init.d directory
   (may  have  changed  by  now)  and  to do the same thing, I would have
   placed in /etc/rc.d/rc.local:
       /etc/ha.d/heartbeat start
   ***This  assumes  you  copy the file ha.rc to /etc/ha.d/heartbeat.  If
   you  can't  find  /etc/rc.d/init.d  with  your distribution and you're
   unsure  of  how processes start, you can use the rc.local method.  But
   you're on your own for shutdown, I just don't remember...

   Note:   If  you  use  the  watchdog  function, you'll need to load its
   module  at  bootup  as well.  You can put the following command at the
   bottom of the /etc/rc.d/rc.sysinit file:
       /sbin/insmod softdog
   For  the rc.local method, just put the same line right above where you
   start heartbeat.

   Once  you've  started heartbeat, take a peek at your log file (default
   is  /var/log/ha-log) before testing it.  If all is peachy, the service
   owner's log (linuxha1 in our example) should look something like this:
   heartbeat:  2003/02/10_13:52:22  info: Neither logfile nor logfacility
   found.
   heartbeat:    2003/02/10_13:52:22    info:   Logging   defaulting   to
   /var/log/ha-log
   heartbeat: 2003/02/10_13:52:22 info: **************************
   heartbeat: 2003/02/10_13:52:22 info: Configuration validated. Starting
   heartbeat 0.4.9f
   heartbeat: 2003/02/10_13:52:22 info: nice_failback is in effect.
   heartbeat: 2003/02/10_13:52:22 info: heartbeat: version 0.4.9f
   heartbeat: 2003/02/10_13:52:22 info: Heartbeat generation: 17
   heartbeat:  2003/02/10_13:52:22 info: Starting serial heartbeat on tty
   /dev/ttyS0 (19200 baud)
   heartbeat:  2003/02/10_13:52:22  info: UDP Broadcast heartbeat started
   on port 694 (694) interface eth1
   heartbeat: 2003/02/10_13:52:23 info: pid 28140 locked in memory.
   heartbeat: 2003/02/10_13:52:23 info: pid 28137 locked in memory.
   heartbeat: 2003/02/10_13:52:23 info: pid 28139 locked in memory.
   heartbeat:   2003/02/10_13:52:23   notice:   Using   watchdog  device:
   /dev/watchdog
   heartbeat: 2003/02/10_13:52:23 info: pid 28141 locked in memory.
   heartbeat: 2003/02/10_13:52:23 info: Local status now set to: 'up'
   heartbeat: 2003/02/10_13:52:23 info: pid 28138 locked in memory.
   heartbeat: 2003/02/10_13:52:23 info: pid 28134 locked in memory.
   heartbeat:  2003/02/10_13:52:25  info: Link linuxha1.linux-ha.org:eth1
   up.
   heartbeat:  2003/02/10_13:53:23  WARN:  node linuxha2.linux-ha.org: is
   dead
   heartbeat:  2003/02/10_13:53:23  info: Dead node linuxha2.linux-ha.org
   held no resources.
   heartbeat:  2003/02/10_13:53:23  info:  Resources  being acquired from
   linuxha2.linux-ha.org.
   heartbeat: 2003/02/10_13:53:23 info: Local status now set to: 'active'
   heartbeat:  2003/02/10_13:53:23  info:  Running  /etc/ha.d/rc.d/status
   status
   heartbeat:   2003/02/10_13:53:23  info:  /usr/lib/heartbeat/mach_down:
   nice_failback: acquiring foreign resources
   heartbeat: 2003/02/10_13:53:23 info: mach_down takeover complete.
   heartbeat:  2003/02/10_13:53:23  info: mach_down takeover complete for
   node linuxha2.linux-ha.org.
   heartbeat:   2003/02/10_13:53:23   info:   Acquiring  resource  group:
   linuxha1.linux-ha.org   192.168.85.3  datadisk::drbd0  datadisk::drbd1
   mirror
   heartbeat: 2003/02/10_13:53:23 info: Running
   /etc/ha.d/resource.d/IPaddr 192.168.85.3 start
   heartbeat:    2003/02/10_13:53:23    info:    /sbin/ifconfig    eth0:0
   192.168.85.3 netmask 255.255.255.0  broadcast 192.168.85.255
   heartbeat:   2003/02/10_13:53:23  info:  Sending  Gratuitous  Arp  for
   192.168.85.3 on eth0:0 [eth0]
   heartbeat:    2003/02/10_13:53:23   /usr/lib/heartbeat/send_arp   eth0
   192.168.85.3 00304823BD48 192.168.85.3 ffffffffffff
   heartbeat: 2003/02/10_13:53:24 info: Running
   /etc/ha.d/resource.d/datadisk drbd0 start
   heartbeat: 2003/02/10_13:53:24 info: Running
   /etc/ha.d/resource.d/datadisk drbd1 start
   heartbeat: 2003/02/10_13:53:25 info: Running
   /etc/ha.d/resource.d/mirror  start
   heartbeat:    2003/02/10_13:53:25   /usr/lib/heartbeat/send_arp   eth0
   192.168.85.3 00304823BD48 192.168.85.3 ffffffffffff
   heartbeat: 2003/02/10_13:53:26 info: Resource acquisition completed.
   heartbeat:    2003/02/10_13:53:28   /usr/lib/heartbeat/send_arp   eth0
   192.168.85.3 00304823BD48 192.168.85.3 ffffffffffff
   heartbeat:    2003/02/10_13:53:30   /usr/lib/heartbeat/send_arp   eth0
   192.168.85.3 00304823BD48 192.168.85.3 ffffffffffff
   heartbeat:    2003/02/10_13:53:32   /usr/lib/heartbeat/send_arp   eth0
   192.168.85.3 00304823BD48 192.168.85.3 ffffffffffff
   heartbeat:   2003/02/10_13:53:33   info:  Local  Resource  acquisition
   completed. (none)
   heartbeat:   2003/02/10_13:53:33   info:   local  resource  transition
   completed.
   heartbeat:  2003/02/10_13:56:30  info: Link linuxha2.linux-ha.org:eth1
   up.
   heartbeat:   2003/02/10_13:56:30   info:   Status   update   for  node
   linuxha2.linux-ha.org: status up
   heartbeat:  2003/02/10_13:56:30  info:  Running  /etc/ha.d/rc.d/status
   status
   heartbeat:   2003/02/10_13:56:30   info:   Status   update   for  node
   linuxha2.linux-ha.org: status active
   heartbeat:   2003/02/10_13:56:30   info:  remote  resource  transition
   completed.
   heartbeat:  2003/02/10_13:56:30  info:  Running  /etc/ha.d/rc.d/status
   status
   heartbeat: 2003/02/10_13:56:31 info: Link
   linuxha2.linux-ha.org:/dev/ttyS0 up.
   NOTE:   Your log may differ depending on when you started heartbeat on
   linuxha2!!!  I started heartbeat on the linuxha2 @13:56:30...
                   _____________________________________

   OK,  now  try to ping your cluster's IP (192.168.85.3 in the example).
   If this works, ssh to it and verify you're on linuxha1.
   Next,  make  sure  your services are tied to the .3 address.  Bring up
   netscape  and type in 192.168.85.3 for the URL.  For Samba, try to map
   the  drive  "\\192.168.85.3\test"   assuming you set up a share called
   "test".   See  Samba  docs  to  get that going.  As an aside, however,
   you'll  want  to  use  the "netbios name" parameter to have your Samba
   share  listed  under  the  cluster  name  and not the hostname of your
   cluster member!

   NOTE:  If you can't bring up the service IP address and you get ha-log
   entries similar to this:

             SIOCSIFADDR: No such device
             SIOCSIFFLAGS: No such device
             SIOCSIFNETMASK: No such device
             SIOCSIFBRDADDR: No such device
             SIOCSIFFLAGS: No such device
             SIOCADDRT: No such device

     It  may  mean  that  you  need to enable IP aliasing in your kernel
     build.  Check /usr/src/linux/.config for "CONFIG_IP_ALIAS=y" if you
     don't  have  it, you'll have the line "CONFIG_IP_ALIAS is not set".
     Rebuild your kernel with IP aliasing enabled.

   If  this all works, you've got availability.  Now let's see if we have
   High Availability :-)

   Take down linuxha1.  Kill power, kill heartbeat, whatever you have the
   stomach  for,  but  don't just yank both the serial and eth1 heartbeat
   cables.   If  you  do that, you'll have services running on both nodes
   and when you re-connect the heartbeat, a bit of chaos....
   Now  ping  the  cluster IP. Approximately 5-10 seconds later it should
   start  responding  again.  Telnet again and verify you're on linuxha2.
   If it happens but takes more like 30 seconds, something is wrong.

   If  you  get  this far, it's probably working, but you should probably
   check all your heartbeats, too.
   First,  check  your serial heartbeat.  Unplug the crossover cable from
   your  eth1 NIC that you're using for your bcast heartbeat.  Wait about
   10 seconds.
   Now, look at /var/log/ha-log on linuxha2 and make sure there's no line
   like this:
       1999/08/16_12:40:58 node linuxha1.linux-ha.org: is dead
   If  you  get that, your serial heartbeat isn't working and your second
   node  is  taking  over.  To avoid any problems, shut down heartbeat on
   the first node, then test your null modem cable.  Run the above serial
   tests again.

   If  your  log  is clean, great.  Re-connect the crossover cable.  Once
   that's  done,  disconnect  the serial cable, wait 10 seconds and check
   the linuxha2 log again.
   If  it's  clean,  congrats!  If not, you can check /var/log/ha-log and
   /var/log/ha-debug for more clues.

   Appendix A - Ethernet Crossover Cable Construction

   Your cable diagram should be as follows:

       Connector A     Connector B


   Connector A Connector B
      Pin #       Pin #
        1           3
        2           6
        3           1
        6           2
        4           7
        5           8
        7           4
        8           5

   Rev 1.2.0
   (c) 2003 Rudy Pawul
   [5]rpawul@iso-ne.com

参照

   1. http://linux-ha.org/download
   2. file://localhost/tmp/heartbeat-2.1.4-1/doc/faqntips.html
   3. http://linux-ha.org/download/faqnstuff.html
   4. file://localhost/tmp/heartbeat-2.1.4-1/doc/ipfail-diagram.pdf
   5. mailto:rpawul@iso-ne.com
