original in en Atif Ghaffar
Atif is a chameleon. He changes his roles, from System
Administrator, to programmer, to teacher, to project manager, to
whatever is required to get the job done.
Atif thinks that he owes a lot to Linux and open-source community
and projects for being his teacher.
More about him can be found at his homepage
Aren't fileservers supposed to make data available to clients?
Yes they are.
If we use a file server that shares files over NFS or SMB etc, then we have a bottle neck and a Single Point Of Failure.
If we share data over GFS with a shared storage (SAN or MultiChannel SCSI), we have the Storage Box as the Single Point Of Failure and its very expensive to set up a system with that configuration.
We can use NBD (Network Block Devices) to set up a network mirror, but I am not very comfortable with that. NBDs have their limitations, are difficult to setup and manage and are just too much bother, when all you need is to replicate a few webserver data across a few webservers.
Ok, lets try replicating.
Here is one scenario
You have 2 webservers, one main server and the other as the backup.
You make all changes on the master machine and rsync the changes to the second machine.
Simple?
But how to automate it? Your users will FTP to master machine multiple times a day. What will happen if there is a failure on the master server and the back server takes over?
Easy. I have the answer to that. They will not see that changes they made, and will be pretty pissed off. :)
Well you can run "rsync -av --delete source destination" from CRON every 5 seconds, but then your machine will not be really useful for anything else. would it?
Here is another scenario
You have one FTP server to upload the data and
six webserver that respond in a round robin fashion.
So the data on each machine should be the same. You can get away with NFS for sometime if you are lucky, but you wont for long.
Now, what should be done?
I think the answer is "copy the data to the webservers only if there is a
change to the files", and if there is no change to the data, don't do anything.
This is exactly we will do using "fam".
So how do we know there is a change on the files?
Here is one answer that I would expect from a M$ Windowns developer.
We can search the directory we are monitoring every few seconds and compare its timestamps and size with the version we had in cache.
Yeah right
Polling: looking for files timestamps/size and comparing with the older version is expensive.
Imagine if your box is running "ls -lR /somedirectory" every 5 seconds on your webserver :)
The elegant way would be for the file to tell us when it has changed, so we can take an action upon it.
This is exactly what "IMON" will do for us.
source: http://oss.sgi.com/projects/fam/faq.html
fam, the File Alteration Monitor, provides an API which applications can use
to be notified when specific files or directories are changed.
FAM comes in two parts: fam, the daemon which listens for requests
and delivers notification, and libfam, a library which client
applications can use to communicate with FAM.
If the monitored files are mounted from a remote host, the local fam will
attempt to contact fam on the remote host, and will pass the requests on to
the remote fam.
fam can also notify its clients when a file starts and stops execution.
(The IRIX Interactive Desktop uses this to change a program's icon while it's
running, for example.)
fam was originally written for IRIX in 1989 by Bruce Karsh, and was
rewritten in 1995 by Bob Miller. This open-source release of fam builds and
runs on both Linux and IRIX, and is the same fam that will be included with
IRIX 6.5.8.
source: http://oss.sgi.com/projects/fam/faq.html
imon, the Inode Monitor, is the part of the kernel that tells fam when
files have changed. When applications tell fam they're interested in files or
directories, fam passes that interest on to imon. When file operations are
performed on files monitored by imon, the kernel tells imon; imon tells fam,
and fam notifies the applications which are interested in the files.
imon was originally written for the IRIX kernel in 1989 by Wiltse Carpenter;
the Linux port was done by Roger Chickering. The Linux implementation in the
imon kernel patch is similar to the IRIX implementation in most ways, but it
hooks into the kernel filesystem code differently.
FAM and IMON are both available from SGI's website. See Resources below.
IMON is a patch that you can apply to your kernel. This will add possibility for your kernel to monitor Inodes.
To patch the kernel, cd to your kernel sources directory.
and apply the patch
cd /usr/src/linux
patch -pi < patchfile
then run make config or make menuconfig and select
when you are asked for
Inode Monitor (imon) support (EXPERIMENTAL)
in the FileSystems section
compile the kernel as usual and reboot (sorry).
Compiling FAM itself is pretty simple.
cd to the fam sources directory and run
./configure && make all install
Voilla its installed.
Next we will install a Perl module called SGI::FAM, so we can write our event handler in perl.
You didn't really think, I would ask you to code C/C++. Did you?
Well I don't know about you, but I am too lazy and impatient, so I will write my replication handler in Perl
Download and install SGI::FAM by Jesse N. Glick
To install these modules, simply run the CPAN module
perl -MCPAN -e shell
install SGI::FAM
this should install SGI::FAM and all prerequisite modules.
fam_mirror is a script that I wrote to automate the replication.
you can view
or download it here.
You can edit it and
change $replicaHosts to meet your hosts,
change $rsh with whatever command you can run from one machine to another
and the same with $rsync.
So back to scenario 1
2 machines running as webservers (web1, web2). 1 of them as master (web1) and the other as slave (web2).
Primary FTP server is (web1).
web2 does not run FTP service at all. (otherwise users may try to write to files even when the system is in backup mode)
The web document root on both machines is /var/www
setup rsh or ssh on both machines. web2 should allow web1 to run remote commands without a password. I usually add my ssh_key to the authorized_keys of replica Hosts.
rsync all data from web1 to web2
rsync -avz /var/www/ web2:/var/www/
Edit fam_mirror and change @replicaHosts to
@replicaHosts=qw(web2)
run fam_mirror on web1.
fam_mirror /var/www &
and then make changes to files on web1. All changes will also be written to web2.
Now to scenario 2 (A farm of webservers)
Hosts "linuxweb1", "linuxweb2", "linuxweb3" and "linuxweb4" runs as webservers
Host "linuxftp1" runs as ftp server (main fileserver)
web hosts do not allow FTP to users.
install fam, imon, SGI::FAM and fam_mirror on host "linuxftp1"
Setup rsh or ssh between the machines.
hosts linuxweb[1-4] should allow linuxftp1 to run remote commands without prompting for a password.
Edit fam_mirror and set @replicaHosts to
@replicaHosts=qw(linuxweb1 linuxweb2 linuxweb3 linuxweb4);
Change $rsh and $rsync if neccessary.
Assuming that web document root is /var/www on all machines.
run on linuxftp1
INIT_MIRROR=1 fam_mirror /var/www &
Now all changes on linuxftp1 should be visible on linuxweb[1-4]