I've recently been fixing up a few things in work and setting up mirrors for security updates, updates etc... The distro of choice that we're currently using is http://www.scientificlinux.org since it's just a RHEL recompile with some extra added features.
It's not a bad choice for building HPC machines with, since it gives you binary compatibility with a lot of things e.g. clusterfs and GPFS, Other nice utilities like pdsh and so on just works. The OS is well defined and supported.
In short since we run a few hundred SL4 based machines in work, we tend to mirror the distro's of choice on a local system. To do this, I chose to just rsync from a local mirror located ftp.heanet.ie
Here's my shell script for running rsync
#!/bin/sh
# ftp.heanet.ie::mirrors/rsync.scientificlinux.org/
SERVER=ftp.heanet.ie
SOURCE=mirrors/rsync.scientificlinux.org
echo ################
echo ##
echo ## $SERVER - $SOURCE
echo ##
rsync -arvHP --stats \
-4 -8 \
--exclude '3*/' \
--exclude '44/' \
--exclude '50/' \
--exclude 'sites/Fermi' \
--exclude 'errata/debuginfo' \
--exclude 'errata/obsolete' \
--exclude '4x/archives/debuginfo' \
--exclude '5x/archives/debuginfo' \
ftp.heanet.ie::mirrors/rsync.scientificlinux.org/. ~/public_html/rsync.scientificlinux.org/.
#rsync.mirrorservice.org::sites/ftp.scientificlinux.org/linux/scientific/. ~/public_html/rsync.scientificlinux.org/.
rsync -arvHP --stats \
-4 -8 \
--delete \
--exclude '3*/' \
--exclude '44/' \
--exclude '50/' \
--exclude 'sites/Fermi' \
--exclude 'errata/debuginfo' \
--exclude 'errata/obsolete' \
--exclude '4x/archives/debuginfo' \
--exclude '5x/archives/debuginfo' \
ftp.heanet.ie::mirrors/rsync.scientificlinux.org/. ~/public_html/rsync.scientificlinux.org/.
#rsync.mirrorservice.org::sites/ftp.scientificlinux.org/linux/scientific/. ~/public_html/rsync.scientificlinux.org/.
The above script runs as a normal user which I then just create an alias in our web server such that it appears as http://myservername.org/mirrors/ the script is quick and dirty and probably needs cleaning up. The script is run as a batch job from a crontab with the command
batch -f myscript.sh
As you can see, I've excluded a few things. We don't care much for the 3x series of SL, I intend on just mirroring the current material plus one revision behind releases for compatibility and migrational reasons. The above script will produce approximately a 195gbyte mirror. The last time I had mirrored the entire SL site, it was approximately 500gbytes.
I currently also mirror the freshrpms site as well which is around 90gbytes in size.