May 14, 2007
Franklin Annex Power Outage May 19th 8am - 12pm
POWER OUTAGE on Saturday May 19, 8AM-12NOON. There is no confirmation that
this will or will not affect FBA P114 and FBA P121 machine rooms. Please
plan accordingly; CETS' plan is to shut down our clusters, and be on hand
(and in communication with facilities) during the outage to confirm whether
or not this particular riser affects those machine rooms. This will prevent
future guesswork and confusion.
Regards,
Dan Widyono
----- Forwarded message from Mike Ferraiolo <mikeferr@pobox.upenn.edu> -----
From: Mike Ferraiolo <mikeferr@pobox.upenn.edu>
Subject: RE: Franklin Building electrical riser shut down (MAY 19TH)
Reply-To: mikeferr@pobox.upenn.edu
Please be advised that the electrical riser shut down in the Franklin
Building will occur on Saturday May 19, 2007 from 8:00AM to 12:00NOON.
This is a 480V buss duct riser that feeds the 277v lighting and 120-208
outlets. The work will entail the installation of a 60A isolation breaker
that will be for the Development Office 5th fl Server.
Please assume that all power will be affected in both Franklin Building and
Franklin Annex for the maximum duration of 4 hours.
----- End forwarded message -----
Blogged with Flock
Posted by ssc_upenn at 12:29 PM | Comments (0) | TrackBack
November 6, 2006
Power Outage Scheduled for Clusters 11/11 2006
This just in...
-------------------------------Dear FBA Cluster Owners:
Please take note of the scheduled power outage for both
FBA P121 and FBA P114. You should assume that it will
happen unless I post otherwise.
It is the responsibility of each cluster owner to
protect their own equipment in the manner that they
wish. For a two hour outage, this probably means
shutting down the cluster over this time period.
Thanks, John
The Details:
All,
I would like to propose *Saturday, November 11th from 7AM-9AM* as the rescheduled date and time for the electrical shutdown. Details below:
Buildings: Franklin Building and Franklin Building
Annex
Utility: Electric
Date: Saturday, November 9, 2006
Time/Duration: 7:00AM – 9:00 AM
Areas Affected: All areas of both buildings. All lights and
power will be out for 2 hours.
Reason for Shutdown: Landair Wireless and Liberty Electric will
install a new circuit breaker into existing switchgear in the 8^th floor mechanical room which is needed to provide power to the Nextel / Sprint wireless antenna installation on the roof
This means both the old cluster room (jove and momax) and the new cluster room (nemeth) will be affected. I will shut down the clusters at close of business on Friday the 10th, and start them up at the beginning of the day on Monday the 13th
Blogged with Flock
Posted by ssc_upenn at 3:32 PM | Comments (0) | TrackBack
November 15, 2005
intel compilers on jove.pop.upenn.edu
I have installed the intel version 9 compilers on jove.pop.upenn.edu. They are installed into /opt/intel, so the next step is to build mpich against them. I will file updates as they are available.
Remarkably, there were no dependency problems with the installation of the compilers themselves. Update:mpich compiled using the intel compilers; the mpich build is in /usr/local/mpich/mpich_intel, so those who want to test the intel-built mpi can do so. I'll work on porting the pgi environment scripts to intel, so that this testing will be easier.
Posted by ssc_upenn at 1:47 PM | Comments (0) | TrackBack
November 9, 2005
What is the difference between a processor and a node?
When discussing the Beowulf clusters, the distinction between a processor and a node is a fundamental one.
A cluster can be though of as simply a stack of computers. A node is simply a different name for a computer. For example, jove.pop.upenn.edu is a cluster made up of 13 nodes, or computers. In mpich, which we use on the Econ clusters, you identify a node in the context of the machine file.
The processor is the part of the computer that does the computation work; you may also hear it referred to as the CPU. A node can have more than one CPU in it; for example, jove.pop.upenn.edu and momax.pop.upenn.edu have two CPUs in each node, while nemeth has four CPUs in each node. mpich allows you to customize exactly how many processors you want to use to do a particular job with the --np directive.
Posted by ssc_upenn at 11:42 AM | Comments (1) | TrackBack
How do I tell the cluster to only use certain machines when running my code?
My professor has told me that I can only use the first five nodes of the cluster max.econ.upenn.edu. How do I make my code obey this edict?
On the ssc.upenn.edu-managed Beowulf clusters, we use the mpich package to run jobs across multiple nodes. The way to solve your problem is to create a machine file. An example machine file which would work in this instance would look like this:
maxsl1-d 2 maxsl2-d 2 maxsl3-d 2 maxsl4-d 2 maxsl5-d 2
You could save this file to your home directory on max.econ.upenn.edu, calling it machines.max5. Then when you wanted to start running your program, you would use the following command:
# mpirun --machinefile=~/machines.max5 --np=10 myjob
This command says "Run myjob using the 10 processors on the list of machines contained in the file machines.max5 in the root level of my home directory."
Posted by ssc_upenn at 11:31 AM | Comments (0) | TrackBack
November 8, 2005
jove.pop.upenn.edu node 6 returns
Node 6 for the cluster jove.pop.upenn.edu has been returned to us from Aspen.
Node 6 will be reinstalled (and restored to the machines.LINUX file) tomorrow morning by 11am.
This message also posted to the ssc-cluster-contacts list.
Update: node 6 was replaced before 11am, and has been made available in the machines_ files for use.
Posted by ssc_upenn at 3:22 PM | Comments (0) | TrackBack
October 25, 2005
FBA Cluster Outage, Part II
The electrical service which was supposed to have been performed in Franklin Annex this morning did not occur. I have been informed that it *will* occur tomorrow morning. This requires the shutdown of the cluster prior to 6am. The cluster will be restarted at 9am tomorrow morning. I will send a note when the clusters are available again.
The affected clusters are nemeth, jove, and momax.
This message was also posted to the ssc-cluster-contacts list.
Update: The affected clusters were brought online at 9am this morning.
Posted by ssc_upenn at 2:30 PM | Comments (0) | TrackBack
October 24, 2005
Cluster Outages 10/25 FBA
The clusters housed in the Franklin Building Annex:
jove.pop.upenn.edu
momax.pop.upenn.edu
nemeth.pop.upenn.edu
will be unavailable between 6am and 8am tomorrow morning while electrical service is performed.
Please make sure that you have stopped your jobs before 6am.
This notice also has gone to the ssc-cluster-contacts list
Update: The FBA clusters were restarted and operational by 9 am.
Posted by ssc_upenn at 8:28 PM | Comments (0) | TrackBack
April 1, 2005
Cluster Backups
Does SSC back up the information in my home directory on the Bewoulf clusters? If not, how can I do it myself?
SSC does not backup up the home directories on the Beowulf clusters. It is certainly advisable that cluster users take steps to make sure that their data are backed up to another machine in the event of a cluster failure.
One way to do this is by using a UNIX utility called 'rsync' in concert with SSH to automatically copy your home directory to another UNIX location (your home directory on lambic, say...) and then to synchronize the changes nightly. Below are instructions on how to set this up. Substitute your userid for $username, and lines beginning with # are shell commands you type.
Step 1
Log into lambic.ssc.upenn.edu (or another UNIX server, if you have access to one) using SecureCRT or another SSH client. This is the server you will be backing your cluster files to.
# cd /home/$username
Step 2
Check to see if you already have a public/private key pair on this server# cd .ssh # ls *.pub
Look for a file called id_rsa.pub or id_dsa.pub -- these are public keyfiles If you find a public key, skip to step 3, below. If not, follow step 2a.
Step 2a
Generate a public key on the UNIX server# ssh-keygen -t dsa -N "" -b 1024This generates a public/private keypair for your userid, and places it in the expected location, above.
Step 3
Move your public key to the cluster you wish to backup ($cluster).# scp ~/.ssh/id_dsa.pub $username@$cluster.upenn.edu:/home/$username/.ssh/authorized_keys2
(this copies your pubkey from athena, and places it on max. This allows you to ssh in to max without having to type a password. This will prove useful in the next step)
Step 4
Backup your home directory on the cluster to the UNIX server you logged into in Step 1, above.# mkdir /home//$cluster_backup
This step creates a directory on the local UNIX server where your backup files will be stored.
# rsync -avze ssh $username@$cluster.upenn.edu:/home/$username/$cluster_backup
Once you run the second command, you should see the message 'generating file list...' and then the names of the files that are being backed up whiz by. Because you've made your public key from the local UNIX server one of the cluster's authorized keys for your username, you should not have to provide your password.
Once you're satisfied that this has worked properly, you can add an entry to cron along these lines (using the command 'crontab -e'):
18 3 * * * /path/to/rsync -avze ssh $username@$cluster.upenn.edu:/home/$username/$cluster_backup
which would run the backup at 3:18am every morning.
(Remember to subsititue your username and the name of the cluster you wish to backup in the above examples).
If there are problems or questions, please don't hesitate to contact SSC-help.
Posted by ssc_upenn at 10:47 AM | Comments (0)