[Go to Home]

Installing the XEON test system with LCFG old-style

As part of the NCF "Grid fabric research cluster" project we evaluate two different systems as candidates for a future 50-node research farm: a dual-AMD based system and a dual-XEON system. The "XEON" system has the following hardware components:
  • 1U rack-mountable chassis
  • 400W PS
  • Tyan S2720 motherboard
  • 60 GByte Maxtor 6L060J3
  • 82557 Ethernet Pro 100 MAC 00:E0:81:22:A0:5A, PXE capable
  • Intel 82544GC 10/100/1000 MAC 00:E0:81:22:A0:5B, PXE capable
  • 1.44 MByte FDD
Our target was to make this system self-configure using the LCFG system as shipped with EDG testbed release 1.2. This LCFG install system is based on stock RedHat 6.2, with "recommended patches". Unfortunately, the world has evolved since RedHat 6.2 and thus the following problems were encountered:
  1. RH 6.2 is shipped with kernel 2.2.19-6.2.12 and LCFG uses a boot kernel based on 2.2.16. Both these kernels have a serious bug in the EEpro100 support, causing malfunctioning at boot time.
    Characteristic responses: "card reports no resources" and "card reports no RX buffers".
  2. The LCFG install kernel 2.2.16 does not recognise the XEON as a Pentium processor, and is thus unable to report the system architecture (uname -m makes it in a "i?86"). Therefore, the RPM install fials since no RPMs for the "i?86" architecture can be found.
  3. The 2.2 kernels do not support the E1000 NIC.

Getting a working install disk

It is essential that the system is booted from the eepro100 interface, and not from the GigE. No statically linked driver for the GigE card is available, and any boot kernel must be able to drive the booting NIC to get at the NFS root filesystem ("/ir62").
For a floppy-disk-assisted boot this requirement remains, since the space on the boot disk is too limited to include the relevant drivers. You might have some luck trying to make an "initrd" boot disk that includes the e1000.o driver in /lib/modules*, but I did not try since a recognised interface was available.
See the S2720 BIOS farm config page for details on boot sequence and chipset settings.

Next is to build a working kernel based on the 2.4 series that can use the eepro100 NIC correctly and do an NFS root boot. We already has a 2.4.18 NFSroot kernel from previous exercices with the AMD node and the rescue disk. You should not use devfs with this kernel since that will break the LCFG "update" object. But also without devfs the nfsroot kernel works fine. The kernel config below craetes a kernel that will just fit on a single floppy (if the kernel file is not fragmented). Do not bother with modules: all the stuff you really need is built into the kernel.

This kernel can also be used for PXE booting. This is our PXE config for the test box:

DEFAULT lcfg24
LABEL lcfg24
  KERNEL vmlinuz-2.4.18-nfsroot
  APPEND root=/dev/nfs ip=both init=/etc/dcsrc

With this kernel you can start installing the system using LCFG.

Building the production kernel

At install time you must make sure that a new - 2.4 based - kernel gets installed on the node. For this, we have to build a new kernel RPM (both single-processor and SMP) that can co-exist with RedHat 6.2 and will recognize the XEON processor and the eepro100 and E1000 NICs.
Based on the 2.4.9 kernel-RPM by RedHat as used in RedHat 7.2, we start building two new kernel RPMs. The specfile used is available here. Main modifications:
  • Removed "Requires", "BuildReq" and "Conflicts" lines everywhere
  • put "buildsmp 1" everywhere
  • Add the section to symlink "/boot/vmlinuz" to the proper kversion also to the smp package
  • built using rpm -ba --target=i686 SPECS/kernel-2.4.spec, since otherwise no SMP kernel will be build (SMP is now incompatible with i386 for this kernel RPM!)
This will yield the following kernel rpms: Now, update the LCFG "rpmcfg" file for the new system. From the file "rpmlist-xeontest" in /var/obj/conf/rpmcfg/:
#include "edg-1.2/WN-rpm"
#include "nikheflocal-rpm"

-kernel-2.2.19-6.2.12
-kernel-smp-2.2.19-6.2.12
+kernel-2.4.9-34/i686
+kernel-smp-2.4.9-34/i686
-kernel-pcmcia-cs
The PCMCIA package is removed from the install to avoid confusion at probe time. The new kernel does not support PCMCIA (PCcards).

Modifications to the profile

Some more modification ot the "standard" profile are needed:
  • Set the proper ethernet drivers in the conf.modules file (configured in inc/nikhef-nodetype-oneUxeon0.h):
    /* -- entire /etc/modules.conf or /etc/conf.modules */
    +update.modlist                 ethe gige
    +update.mod_ethe                alias eth0 eepro100 /* NIC type */
    +update.mod_gige                alias eth1 e1000 /* requires RH 2.4.9 kernel */
    
  • Set UDMA mode to UDMA5:
    +hdparm.options_hda                      -X69 -d1 -u1 -m16 -c3
    
After publishing the new profile the node should install just fine. Note that the GigE adapter is NOT configured at boot time by default; add the hostname to the appropriate update object configuration if you want that.

Metainfo

Author: David Groep
Date: 2002.08.20

Comments to David Groep