Princeton University has purchased licenses for MATLAB as well as a substantial number of MATLAB Toolboxes. This software can be used on CS public use servers (cycles), including the ionic cluster, as well as individual users' machines. Installation instructions and detailed licensing information can be found on OIT's Princeton Software MATLAB page.
MATLAB Distributed Computing Server
MATLAB Distributed Computing Server (MDCS) is available on the ionic cluster. Using the Parallel Computing Toolbox (PCT) it is possible to run up to 32 MATLAB worker processes on 1 or more of the cluster's compute nodes. In order to run your job across multiple nodes, you will need to encapsulate your MATLAB job into 1 or more functions. Included below are examples for running simple parfor loops as well as a simple example that can be expanded upon for non-parfor parallel jobs.
Running a MDCS job on the cluster can be done in 1 of 2 ways, depending on how many processor cores you wish to use. If you need 8 cores or fewer, you can create a standard "qsub" file that you then submit to the batch system. If, however, you wish to use more than 8 cores, you must use MATLAB PCT code to build your job and submit it to the batch queue for you. This requires that you run MATLAB interactively on the head node of the cluster. Normally this is discouraged, as running computationally intensive jobs on the head node impacts the ability of the cluster to function. However, if you are simply running MATLAB on the head node to submit jobs, there should be no impact on other users.
Example: A 4-core parfor
Assuming that you have a parfor-based lab encapsulated in a function named myParforLab and contained in a file myParforLab.m in the same directory as the "qsub" file, and that the function takes no inputs and returns a single output, you can use a "qsub" file similar to the one here to submit your job on the cluster. Note: be sure to replace YOUR_USERNAME_HERE with your CS unix username.
#!/bin/csh # #*** #*** "#PBS" lines must come before any non-blank, non-comment lines *** #*** # # 1 nodes, 4 CPUs, wall clock time of 2 hours, 4 GB of memory # #PBS -l walltime=2:00:00,nodes=1:ppn=4,mem=4gb # # merge STDERR into STDOUT file #PBS -j oe # # send mail if the process aborts, when it begins, and # when it ends (abe) #PBS -m abe #PBS -M YOUR_USERNAME_HERE@CS.Princeton.EDU # if ($?PBS_JOBID) then echo "Starting" $PBS_JOBID "at" `date` "on" `hostname` echo "" else echo "This script must be run as a PBS job" exit 1 endif cd $PBS_O_WORKDIR matlab -nosplash -nodisplay <<EOF % the size of the pool should equal the PBS ppn matlabpool open local 4 % call the function myParforLab % close the pool matlabpool close EOF echo "" echo "Done at " `date`
Example: A 16-core parfor
Assuming, again, that you have a parfor-based lab encapsulated in a function named myParforLab and contained in a file myParforLab.m in your current working directory, and that the function takes no inputs and returns a single output, you can use MATLAB code similar to the following to submit your job on the cluster. Actually submitting the job to the cluster takes 2 steps.
First: Create a file, let's call it parforJob.m, that defines the job, using MATLAB code rather than "qsub" semantics. The ResourceTemplate specifies the PBS job parameters for number of nodes and cores, and how much memory is required. Please be sure to adjust these values accordingly for your job.
sched = findResource('scheduler','type','torque'); sched.ClusterSize = 16; sched.HasSharedFilesystem = true; sched.ClusterMatlabRoot = '/usr/local/matlab'; sched.ResourceTemplate = '-l nodes=4:ppn=4,mem=6gb'; sched.RcpCommand = 'scp'; sched.RshCommand = 'ssh'; % Construct a MATLAB pool job object and set file dependencies job = createMatlabPoolJob(sched); set(job,'FileDependencies',{'myParforLab.m'}); % Set the number of workers required for parallel execution. job.MinimumNumberOfWorkers = 16; job.MaximumNumberOfWorkers = 16; % Add the task to the job. task = createTask(job, @myParforLab, 1, {}); % Run the job. submit(job); % Wait until the job is finished. waitForState(job, 'finished'); % Retrieve the job results. out = getAllOutputArguments(job); % Display the output. celldisp(out); % Destroy the job. destroy(job);
Second: On the c2 head node give the following command, from the directory containing both myParforLab.m and parforJob.m:
matlab -nosplash -nodisplay < parforJob.m > parforJob.out
The output of your job will be in the file parforJob.out. If your job does not run properly, comment out the destroy(job) line and run your job again. MATLAB will create a "JobN" directory and a number of files named "JobN.*" where N is a small integer. Look in the file "JobN/JobN.log" for error information.
Example: Simple Parallel Job on 10 Cores
This example is derived from the createParallelJob documentation page. The original example has been expanded to use Torque/PBS, as in the example above, in order to use more than the number of cores available on a single machine. It could be scaled back to something more like the original, and wrapped in a "qsub" script, if you only needed the CPU resources of a single 4- or 8-core machine.
sched = findResource('scheduler','type','torque'); sched.ClusterSize = 10; sched.HasSharedFilesystem = true; sched.ClusterMatlabRoot = '/usr/local/matlab'; sched.ResourceTemplate = '-l nodes=5:ppn=2,mem=1gb'; sched.RcpCommand = 'scp'; sched.RshCommand = 'ssh'; % Construct a parallel job object job = createParallelJob(sched); % Set the number of workers required for parallel execution. job.MinimumNumberOfWorkers = 10; job.MaximumNumberOfWorkers = 10; % Add the task to the job. task = createTask(job, 'rand', 1, {3}); % Run the job. submit(job); % Wait until the job is finished. waitForState(job, 'finished'); % Retrieve the job results. out = getAllOutputArguments(job); % Display the output. celldisp(out); % Destroy the job. destroy(job);
Assuming that the above was saved in a file named myJob.m, you would submit it to the cluster as in the above example, from the head node:
matlab -nosplash -nodisplay < myJob.m > myJob.out
If your job fails to run properly, you can follow the same debugging steps as above.