Troubleshooting



The Log

If you're reading this, it's because your install failed at some point. The most important bit of debug information you possess is the console log from the install. Look at /tftpboot/lim/log/(clientname).log. There should be a set of progress messages there for each node. In the log is a message just *before* LUI attempts to do an operation. So, if you see a message like: "about to install rpms for an RPM type installation", and that's the last message you see, it's a sure bet that rpm installation failed.

Serious Debugging

OK, installation failed, you looked at the log, and still have no idea why installation did not complete -- what to do? Well, it's quite easy to debug a LUI install. First, get onto the node's console, you should have a message there, hit enter for a login prompt. From there, first make sure all your remote file systems were mounted ok, you should see 5 nfs mounted filesystems: root (/), /usr, /lim, /tar and /rpm. If any are missing, then there is an nfs problem between server and client. Assuming all the file systems are there, you can then run the clone script in the foreground, and watch what happens, cd to the LUI install directory, enter "./clone" to initiate. If you're a perl kind of programmer, cp ./clone to /tmp/clone, and add a -d for debug to the first line of the script, and then run /tmp/clone.

Partitions and file systems

To see what partitions have been created, do an fdisk /dev/sda (/dev/hda if you're using ide drives) and "p" to print the contents of the partition table, and "q" to quit. Does the partition table look like the one you specified in your disk partition table resource? If not, perhaps there is a problem with the disk partitioning table (in this case, it could be the LUI code itself that failed). If the partitions are created correctly, did the file systems get created? You can mount each partition as /mnt and examine its contents (eg mount /dev/sda8 /mnt).

tarball and RPM failures

If you're doing a tarball install, mount each file system as above, and look to see if the file systems were populated correctly from the tar files. An empty file system means that untarring the archive failed. If you're doing an rpm install, mount all the partitions, and do an rpm -qa -r /mnt (assuming you mounted the root file system as /mnt) and it will tell you how many rpms got installed. When you run clone from the foreground, if the rpm install fails, you will see the error message from the rpm command.

lilo errors

If everything else went ok, and lilo itself failed, you can attempt to run lilo from the shell prompt. Again, mount all the filesystems by hand, and then run lilo -C /mnt/etc/lilo.conf (again assuming that root got mounted as /mnt) and you should see any error messages that lilo issued.

If all else fails

Use the power of The Force, er, mailing list. Post the problems in the mailing list and let the community figure it out. To quote Eric Raymond, who was quoting Linus, "Given enough eyeballs, all bugs are shallow".