Things we definitely need to do:
-- MikePatterson - 20 Dec 2004
First, backup:
Simultaneously with this, shut down unneeded machines in the mudge region (gooch, quadra). First do "dpkg --get-selections > quadrapkgs" on quadra, just to get latest package selections. Those will be over on tumbo due to the backup, so we'll be able to get at them when we restore files.
We got:
2>(root)@mudge[118]% rsync -a --progress u/ tumbo:/fsys2/u 138 100% 0.00kB/s 0:00:00 1920 100% 0.00kB/s 0:00:00 1953 100% 0.00kB/s 0:00:00 7724 100% 0.00kB/s 0:00:00 2>(root)@mudge[119]%
That's a reasonable number of files to have changed, given that I was faffing about as my own user on several of the machines during the first rsync.
Now, I'll leave tumbo alive while mudge is reinstalled. That way I can refer back if necessary. It may be safest to leave tumbo's network off while I'm in the machine room and able to physically access the console.
Just to be sure I also put a copy of mudge's dpkg --get-selections on mpatters@torres.
Like GoochSargeUpgrade, I needed to start out with a 2.4 kernel to make it see the onboard SCSI controller.
It would appear we were misled: there's no hardware RAID controller in mudge, just two SCSI controllers (one of which isn't even being used, nice). So I'll have to cheat and use software RAID for the /u partition. 5 x 36GB(ish) disks.
This leaves 427.7mb of freespace, according to the sarge partitioner. I'll just leave that as-is for now. Next, let's use those other disks. I told the partitioner to use each as a RAID volume. It did some stuff and then just started installing the base system! That's ok, I didn't want to actually USE those partitions or anything. I'm not clear on what the differences between what the sarge installer calls "Software RAID" and "LVM" anyway - two ways of accomplishing the same thing? Well, worry about that later.
Copy old ssh host keys back from tumbo. Also install rsync, rsh-client, and rsh-server, I know I'm going to need those. Also copy over the kernel-image I built on quadra. Go into dselect, install module-init-tools and everything else it wants.
mdadm package asked about starting RAID volumes automatically (I told it no). I told it to email alerts to root. Install the kernel-image I made, then reboot.
Guess what I forgot - /fsys1 partition. sigh. OK, we'll just make that a directory off of / - I was thinking about xhier when I made the partition that big, honest. So, create the rest of the xhier tree. Won't use it right away. Now I need to figure out software RAID. I want to RAID-0 sd{b,c,d,e}.
Looks like this is what I want to do:
mudge:~# mdadm --create /dev/md0 --level=0 --raid-devices=4 /dev/sdb1 /dev/sdc1 /dev/sdd1 /dev/sde1(note that ought to be all one line)
Indeed:
mudge:~# cat /proc/mdstat Personalities : [raid0] md0 : active raid0 sde1[3] sdd1[2] sdc1[1] sdb1[0] 142238976 blocks 64k chunks unused devices: <none> mudge:~#
So now I can make a filesystem on it: "mke2fs -j /dev/md0". Hopefully the defaults aren't too slow.
mudge:~# df -h /u Filesystem Size Used Avail Use% Mounted on /dev/md0 134G 33M 127G 1% /u mudge:~#and that's about what I expect to see. Now I ought to be able to copy the user data back from tumbo. "rsync -a --progress root@tumbo.cs:/fsys2/u /u". Whups, that wasn't quite what I wanted, it's putting everything under /u/u. Oh well, I just mv'ed the contents of /u/u up one level and rmdir'ed /u/u.
Hrm, one thing I noticed: no oddities with xh-first-time barfing a few times and then mysteriously working. It Just Worked the first time.
Now accounts software is busted. Bill and I did a bunch of stuff on cscf.cs and now everything that's broken is broken on mudge.
Hrm, ssh doesn't seem to honour my client key any more. I wonder if sshd_config is up to snuff. Ah, turns out it was permissions on my home directory. Fixed some sshd_config options too though.
Post-mortem (25 April 2005):
The above procedure did not create an /etc/mdadm/mdadm.conf, nor did it create an entry in /etc/fstab for /u. As a result, when the machine was rebooted it did not mount user home directories (nor did it know how to configure the RAID). I created mdadm.conf by hand and used mdadm to re-assemble the RAID. See ST#48324.