Archive for August, 2007

26 Aug 2007

Sunday, August 26th, 2007

I need to move ldap and the bnc. I think I may not move some of the old irc stuff (trafficOP and slIRC). There is not that much traffic to count anymore, and I think I have a better IRC web client now. I certainly need to archive them and find a place to keep them (where I will forget them no doubt).

I am considering playing with making a box with shell accounts on it. I will outline what I need to make them work, but I will do that on its own tiddler.

Another one bites the dust

Saturday, August 25th, 2007

I had another UPS battery die on Friday. This time it had a larger impact – it was one of the cheaper workstation UPS and was powering the core switch, router and bridge (internet). So everything went black for a short time. What was worse is that several batteries that were working on the rack appear to have been at the end of their life and so I had to try three or four before finding a working UPS. And it still need new batteries too. :P

Moving the gallery
I moved the gallery at http://www.capturinglife.org from the stand alone box back to the web server now that the web server has a lot of storage. That let me power down another box. Less power, less heat, less noise…

Speaking of noise, I put the french door back on my office as well so the 1U is not filling the house with the buzz of fans.

All I need to do to migrate gallery was to backup the database, copy the files over and then restore the database. I used phpMyAdmin to backup and restore, and upgraded phpMyAdmin while I was at it. One minor path change in config.inc.php and a dns update and it was up and going.

Xen block process
I ran in to a problem with a vm last week where processes named “block” spawned many process, the host machine was non-responsive and the vms had odd problems. I restarted and it cleared up, but I will need to monitor for that kind of behavior.

Software RAID – Replacing a failed hard drive

Thursday, August 16th, 2007

I will make the partition table on sdb the same as sda. I will duplicate sda1 (/boot) as well so that if sda fails I can get it booting more quickly as well.

note: md0 doesn’t have a partition. That is really best suited to a seperate article discussing raid and lvm and so I am not going to delve in to it at this moment.

# fdisk -l
Disk /dev/sda: 320.0 GB, 320072933376 bytes
255 heads, 63 sectors/track, 38913 cylinders
Units = cylinders of 16065 * 512 = 8225280 bytes

   Device Boot      Start         End      Blocks   Id  System
/dev/sda1   *           1          13      104391   83  Linux
/dev/sda2              14       38913   312464250   fd  Linux raid autodetect

Disk /dev/sdb: 320.0 GB, 320072933376 bytes
255 heads, 63 sectors/track, 38913 cylinders
Units = cylinders of 16065 * 512 = 8225280 bytes

Disk /dev/sdb doesn't contain a valid partition table

Disk /dev/md0: 319.9 GB, 319963267072 bytes
2 heads, 4 sectors/track, 78116032 cylinders
Units = cylinders of 8 * 512 = 4096 bytes

Disk /dev/md0 doesn't contain a valid partition table
# mdadm --detail /dev/md0
/dev/md0:
        Version : 00.90.03
  Creation Time : Sun Apr  8 00:22:19 2007
     Raid Level : raid1
     Array Size : 312464128 (297.99 GiB 319.96 GB)
    Device Size : 312464128 (297.99 GiB 319.96 GB)
   Raid Devices : 2
  Total Devices : 1
Preferred Minor : 0
    Persistence : Superblock is persistent

    Update Time : Thu Aug 16 12:18:10 2007
          State : active, degraded
 Active Devices : 1
Working Devices : 1
 Failed Devices : 0
  Spare Devices : 0
UUID : 7ffa6982:50ea5134:11c17882:91cfa617
         Events : 0.960047
Number   Major   Minor   RaidDevice State
       0       8        2        0      active sync   /dev/sda2
       1       0        0        1      removed

I replace the failed hard drive with an identical hard drive. Create the partitions using the same layout if they match.

# fdisk /dev/sdb
n - new
p - primary
1 - partition number
Start 1
End 13
n - new
p - primary
2 - partition number
Start 14
End 38913
t - type
fd (Linux raid autodetect)
w - write and quit

Added the new raid partition to md0 (/dev/md0 is the mirrored raid array device)

# mdadm /dev/md0 -a /dev/sdb2
mdadm: added /dev/sdb2
# cat /proc/mdstat
Personalities : [raid1]
md0 : active raid1 sdb2[2] sda2[0]
      312464128 blocks [2/1] [U_]
      [>....................]  recovery =  0.3% (1072128/312464128) finish=275.0min speed=18869K/secunused devices: <none>
# mdadm --detail /dev/md0
/dev/md0:
        Version : 00.90.03
  Creation Time : Sun Apr  8 00:22:19 2007
     Raid Level : raid1
     Array Size : 312464128 (297.99 GiB 319.96 GB)
    Device Size : 312464128 (297.99 GiB 319.96 GB)
   Raid Devices : 2
  Total Devices : 2
Preferred Minor : 0
    Persistence : Superblock is persistent

    Update Time : Thu Aug 16 12:23:25 2007
          State : active, degraded, recovering
 Active Devices : 1
Working Devices : 2
 Failed Devices : 0
  Spare Devices : 1
Rebuild Status : 0% complete
UUID : 7ffa6982:50ea5134:11c17882:91cfa617
         Events : 0.960629
Number   Major   Minor   RaidDevice State
       0       8        2        0      active sync   /dev/sda2
       2       8       18        1      spare rebuilding   /dev/sdb2

ssh using public keys

Thursday, August 16th, 2007

To generate keys for the client, use this command. Since you want unattended login, press enter when it asks for a password.

$ ssh-keygen -t dsa

The destination machine requires an .ssh directory in the home of the user you want to login to, and that directory should be chmod 700. Here are the command to create it in the event it doesn’t exist:

$ mkdir ~/.ssh
$ chmod 700 ~/.ssh

If it already exists, you can place the public key from the client without needing to login to the remote machine.

$ cat ~/.ssh/id_dsa.pub | ssh SERVERB 'sh -c "cat - >>~/.ssh/authorized_keys && chmod 600 ~/.ssh/authorized_keys"'

Notice the >> to APPEND to the authorizes keys. If you do not append you will lose the ability to login if you have added other public keys. Most tutorials that scp directly overwrite the authorized_keys.

Test smtp from telnet

Thursday, August 16th, 2007

Something I was doing required changing SMTP. You can test smtp from telnet. In this case I was local and used telnet localhost 25. Make sure you get the last . or it doesn’t finish the message correctly.

 

ehlo localhost
mail from:me@mail.com
rcpt to:you@mail.com
data
subject: test 8-9
this is test 1
.

A new hard drive for owl

Thursday, August 16th, 2007

I ordered a new hard drive from ZipZoomFly.com last week and it arrived Monday evening. I powered the server down and put the drive in that night about midnight since I doubt anyone was using it at that time. I should find out if it is possible to hot swap the SATA drives in that Tyan chassis. I was not sure enough of it to try.

Migrated the web server to xen

Sunday, August 12th, 2007

Early in the morning…
I started migrating tigger tonight. tigger vm It is moving about 8G of data and so I am going to sleep and will pickup tomorrow. This has definately been smoother with all the things I have learned from the first two. It will also be the last full server migration as everything remaining needs to be moved as services rather than servers.

And later…
tigger, the web server went the smoothest of all the servers. I am really pleased with the process. The only hitch I had was that while rotating the filesystem /tmp permissions changed. It caused an error but was easy to fix.

tigger vm

Sunday, August 12th, 2007

Tigger is a web server with php, perl, mysql, etc.

I unpacked my model…

# cd /xen
# mkdir gopher
# cd gopher/
# tar xjvf ../debian-4.0-20070801.tar.bz2
debian-4.0.img
debian-4.0.xen3.cfg
debian.swap

The original partition layout:

/dev/hda2 on / type ext3 (rw,errors=remount-ro)
/dev/hda1 on /boot type ext3 (rw)
/dev/hdb1 on /home type ext3 (rw,errors=remount-ro)

Filesystem            Size  Used Avail Use% Mounted on
/dev/hda2             2.4G  1.2G  1.2G  51% /
/dev/hda1              45M  9.4M   33M  23% /boot
/dev/hdb1              29G  8.1G   19G  31% /home

I created LVMs and formatted them. This will be the new partition layout.

lvcreate -L1G -n tigger-usr VolGroup00 && \
lvcreate -L10G -n tigger-home VolGroup00 && \
lvcreate -L512M -n tigger-tmp VolGroup00 && \
lvcreate -L2G -n tigger-var VolGroup00 && \
lvcreate -L20G -n tigger-var-www VolGroup00
mkfs -t ext3 /dev/VolGroup00/tigger-usr
mkfs -t ext3 /dev/VolGroup00/tigger-home
mkfs -t ext3 /dev/VolGroup00/tigger-tmp
mkfs -t ext3 /dev/VolGroup00/tigger-var
mkfs -t ext3 /dev/VolGroup00/tigger-var-www

Although right now I have http in /home, I want to move it to /var/www so I will create a partition there large enough to accomodate it. First I will migrate the whole system. After everything works then I will move files around and change configuration. That way I only have one variable at a time.

This is the config file I will use. I just renamed the model and filled in the fields. I used the command from previous to generate a random MAC, leaving the first 3 bytes as the Xen vendor.

dd if=/dev/urandom bs=1 count=3 2>/dev/null | od -tx1 | head -1 | cut -d' ' -f2- | tr -d ' ' | tr '[a-f]' '[A-F]'

/xen/tigger/tigger.cfg

kernel = "/boot/vmlinuz-2.6-xenU"
memory = 512
name = "tigger"
vif = [ 'bridge=xenbr0,mac=00:16:3e:51:9A:81' ]
dhcp = "dhcp"
disk = ['file:/xen/tigger/debian-4.0.img,sda1,w'
, 'file:/xen/tigger/debian.swap,sda2,w'
, 'phy:VolGroup00/tigger-usr,sda3,w'
, 'phy:VolGroup00/tigger-home,sda4,w'
, 'phy:VolGroup00/tigger-tmp,sda5,w'
, 'phy:VolGroup00/tigger-var,sda6,w'
, 'phy:VolGroup00/tigger-var-www,sda7,w'
]
root = "/dev/sda1 ro"
ramdisk = "/boot/initrd-2.6-xenU.img"

I linked the config so it will autostart when owl boots.

ln -s /xen/tigger/tigger.cfg /etc/xen/auto/

And I started the vm

xm create -c tigger.cfg

192.168.0.211

I used the flip script to move the files to the partitions.

./flip sda3 usr
./flip sda6 var

and mounted the empty directories by hand…

mount /dev/sda4 /home
mount /dev/sda5 /tmp
mount /dev/sda7 /var/www
chown www-data:www-data /var/www

Set the IP variable, imported /etc.

export IP='192.168.0.1'
# mkdir ~/etc
cp /etc/mtab ~/etc/ && \
cp /etc/init.d/makedev ~/etc/init.d/ && \
rsync -e ssh -avz root@$IP:/etc/* /etc/ && \
cp -R ~/etc/* /etc/

I remarked out mkdir because I have already created ~/etc in the model vm.

nano /etc/network/interfaces && mkdir -p ~/etc/network && cp /etc/network/interfaces ~/etc/network/

mkdir -p /var/www
rsync -e ssh -avz root@$IP:/var/www/* /var/www/
rsync -e ssh -avz root@$IP:/home/* /home/

To migrate MySQL, I stopped MySQL on the live server and ran

rsync -e ssh -avz $IP:/var/lib/mysql /var/lib/

I will reference my notes here I am sure kanga: Migrating MySQL

Next we are getting the package list from the running server and applying it to this one. In general, take the defaults to NOT change settings.

ssh $IP 'dpkg --get-selections' >~/selections.dpkg && \
apt-get update && \
dpkg --set-selections  < ~/selections.dpkg && \
apt-get dselect-upgrade

I started screen, then ran apt-get dselect-upgrade to start the process. Then I used Ctrl+a d to detach the screen. I can do other things, and disconnect from ssh. When I return I will screen -r to reattach and continue.

If you get “4gb seg fixup” errors, this will probably fix it.

apt-get install libc6-xen
echo "hwcap 0 nosegneg" > /etc/ld.so.conf.d/nosegneg.conf
ldconfig -v -p 2>&1 | grep libc.so
ldconfig

Everything works…
Except Drupal on www.guildplace.org
Ahh, ha! An error about /tmp. I thought I fixed this in the model, but it would appear I did not, or during the filesystem rotation it got broken.

chmod 1777 tmp

A bad egg

Thursday, August 9th, 2007

I made a lot of progress tonight. The problems with amavis-new were /tmp permissions that were wrong, and missing nodes in /dev. The missing /dev nodes led me to the root cause of the missing urandom and so I fixed a deeper problem. In the progress I believe I have completed the migration of the mail server, gopher.

I am more than fed-up with NewEgg.com. When I setup owl, the xen server, I bought 9 hard drives over a period of a month.
I bought 2 and I bought another 4 shortly after. Because of a smashed AMD Opteron CPU I did not immediately unpack the first 2.
First 2, 1 bad.
Next 4, 2 bad.
NewEgg had me ship them back seperately on 2 RMA numbers so it cost about $16 in shipping.
Because they do not cross ship and I am now 2 weeks behind, I returned those 3 for a refund and ordered 3 more. This makes the total 9 now.
When the 3 replacements arrived, one of them was DOA. That brings the score to 4 out of 9 dead on arrival. I should also mention the first batch looked like they were dropped before leaving NewEgg – the case was flanged.

NewEgg has no way to file a complaint about product handling. Even the employee I spoke with did not know how to spell Seagate (C gate as he put it). Seagate took care of the last one; they replaced it with a 750G SATA drive.

Avoid NewEgg hard drives. They do not handle them properly.

gopher vm

Thursday, August 9th, 2007

This server provides smtp, pop3 and imap. It also provides the ssl versions of these protocols. The old server provided squirrelmail on apache, but I’m not going to impliment that on this server.

The physical server is still running sarge. The vm is running etch. With gopher that jump worked, but this is a much more complex host. I tried deploying gopher across versions and ran in to problems, so I upgraded the physical server and made sure everything works first. I will do it this way in the future as well.

After a PAINFUL experience because a courier directive changed…
LDAP_SERVER changed to LDAP_URL
This is a great URL for troubleshooting http://www.courier-mta.org/authlib/README.authdebug.html

# cd /xen
# mkdir gopher
# cd gopher/
# tar xjvf ../debian-4.0-20070801.tar.bz2
debian-4.0.img
debian-4.0.xen3.cfg
debian.swap

The sizes were 500M, 20M and 200M and they rounded up by lvcreate.

lvcreate -L512M -n gopher-usr VolGroup00 && \
lvcreate -L32M -n gopher-home VolGroup00 && \
lvcreate -L224M -n gopher-tmp VolGroup00 && \
lvcreate -L5G -n gopher-var VolGroup00
mkfs -t ext3 /dev/VolGroup00/gopher-usr
mkfs -t ext3 /dev/VolGroup00/gopher-var
mkfs -t ext3 /dev/VolGroup00/gopher-tmp
mkfs -t ext3 /dev/VolGroup00/gopher-home

Make the last 3 bytes of the MAC:

dd if=/dev/urandom bs=1 count=3 2>/dev/null | od -tx1 | head -1 | cut -d' ' -f2- | tr -d ' ' | tr '[a-f]' '[A-F]'

/vserver/gopher/gopher.cfg

kernel = "/boot/vmlinuz-2.6-xenU"
memory = 384
name = "gopher"
vif = [ 'bridge=xenbr0,mac=00:16:3e:CF:AA:21' ]
dhcp = "dhcp"
disk = ['file:/xen/gopher/debian-4.0.img,sda1,w'
, 'file:/xen/gopher/debian.swap,sda2,w'
, 'phy:VolGroup00/gopher-usr,sda5,w'
, 'phy:VolGroup00/gopher-home,sda6,w'
, 'phy:VolGroup00/gopher-tmp,sda7,w'
, 'phy:VolGroup00/gopher-var,sda8,w'
]
root = "/dev/sda1 ro"
ramdisk = "/boot/initrd-2.6-xenU.img"

I linked the config and restarted the server to test.

ln -s /xen/gopher/gopher.cfg /etc/xen/auto/gopher.cfg

xm create -c gopher.cfg
192.168.0.218
ssh’d to the VM, default password of “password”

Created flip script. vm flip filesystems

~/flip sda5 usr
mount /dev/sda6 /home
# ~/flip sda6 home
# home is empty so it just gives an error. just umount and then mount it to /home
# maybe I will fix this... or not.
~/flip sda7 tmp
~/flip sda8 var

Set the IP variable, imported /etc.

export IP='192.168.0.1'
mkdir ~/etc
cp /etc/mtab ~/etc/ && \
cp /etc/init.d/makedev ~/etc/init.d/ && \
rsync -e ssh -avz root@$IP:/etc/* /etc/ && \
cp -R ~/etc/* /etc/

Change hd? to sd?. Used nano and remarked out cd and floppy. Made note of swap, sda9.
I started a ~/etc/ so that if we need to resync etc we just copy it back to /etc.

mv /etc/fstab{,~} && sed 's#/hd#/sd#' /etc/fstab~ >/etc/fstab
nano /etc/fstab && cp /etc/fstab ~/etc/

I changed the IP to a temp address during the initial reboot. This can cause a little trouble, but less than having 2 machines with the same IP.

nano /etc/network/interfaces && cp /etc/network/interfaces ~/etc/network/

grep -i documentroot /etc/apache-ssl/httpd.conf
DocumentRoot /var/www/ssl

mkdir -p /var/spool
rsync -e ssh -avz root@$IP:/var/spool/* /var/spool/
mkdir -p /var/www
rsync -e ssh -avz root@$IP:/var/www/* /var/www/
mkdir -p /var/mail
rsync -e ssh -avz root@$IP:/var/mail/* /var/mail/

Fixed in the model, this is no longer needed.


As you may remember, I had to create a script to create /dev/urandom and start ssh. Debian 4.0 added rc.local and broke the link I made before.
There is an exit 0 that is required, but it needs to be at the bottom of the file. I started nano to remove exit 0 from the middle of the file and put it at the end.

cat /etc/init.d/local >>/etc/rc.local && nano /etc/rc.local


Next we are getting the package list from the running server and applying it to this one. In general, take the defaults to NOT change settings.

ssh $IP 'dpkg --get-selections' >~/selections.dpkg && \
dpkg --set-selections  < ~/selections.dpkg && \
apt-get update && \
apt-get dselect-upgrade

If you get “4gb seg fixup” errors, this will probably fix it.

apt-get install libc6-xen
echo "hwcap 0 nosegneg" > /etc/ld.so.conf.d/nosegneg.conf
ldconfig -v -p 2>&1 | grep libc.so
ldconfig

I stopped postfix and ran this until there was nothing left to update.

rsync -e ssh -avz --delete root@$IP:/var/mail/* /var/mail/

restarted

restarting: shutdown -r now