Home High Performance Computing
High Performance Computing most generally refers to the practice of aggregating computing power in a way that delivers much higher performance than one could get out of a typical desktop computer or workstation in order to solve large problems in science, engineering, or business.
A Rocks cluster (Rocks version 6.1 with Cent OS 6.3-64 bit version) which is an implementation of "Beowulf" cluster, running Sun Grid scheduler for job submissions. It has 10 nodes: - a master node with 24GB of RAM and nine compute nodes with 8 GB RAM each. Each node is a dual six-core Intel®XeonE5645 series 2.40GHz rack server.
This cluster having Front end nodes and Compute nodes. Front end nodes are where users login in, submit jobs, compile code, etc. This node can also act as a router for other cluster nodes by using network address translation (NAT). Compute nodes are the workhorse nodes. Rocks management scheme allows the complete OS to be reinstalled on every compute node in a short amount of time (~10 minutes). These nodes are not seen on the public Internet.
Computational work is submitted from the login/master node to the compute nodes by users via a batch system. The cluster is accessed remotely via SSH. Users authenticate (i.e., login) using an SSH client; after successful authentication a command-line interface is presented. This can be used to submit computational jobs to the batch system queues.
This cluster uses 10-gigabit Ethernet networks for MPI traffic. This network is especially designed to provide the fastest message passing systems available at bandwidths of multiple gigabytes per second. Also it uses shared data storage which helps the management overhead of a large cluster can be significantly reduced as there is no need to copy data to every node in the cluster for running jobs.
There is a facility to view the cluster from the website. Which will shows the configuration of the nodes and its current status.
PROTECTING DATA AND INFORMATION
The software installed on the HPC cluster is designed to assist multiple users to efficiency share a large pool of compute nodes , ensuring that resources are fairly available to everyone within certain configured parameters. Security between users and groups is strictly maintained, providing you with mechanisms to control which data is shared with collaborators or kept private.
When using the HPC cluster, you have a responsibility to adhere to the information security policy for your site which outlines acceptable behaviour and provides guidelines designed to permit maximum flexibility for users while maintaining a good level of service for all.But users are encouraged to:
Authentication and Security
For the cluster, you will have a username and password to enable you to authenticate and run computational jobs. For security reasons, as soon as you have received your credentials for a system, you should login and change your password.
Secure Shell (SSH) is used to connect to remote computers, i.e., to authenticate (login) and interact with the remote system.
It is likely that you will wish to upload files to the HPC system, or download them to your desktop/laptop. Linux users can do this by using the OpenSSH utilities suite like WinSCP.
Running Computational Jobs
The HPC system is a shared computational resource. To ensure everyone gets a fair share and to allow the system to function correctly:
Users can run a number of different types of job via the cluster scheduler, including Batch jobs, Array jobs, SMP jobs, Parallel jobs and Interactive jobs.
Monitoring and Managing Jobs
When submitting jobs to the batch system, one may wish to know: Are my jobs running? What other jobs are running? Which queues are busy? When will my job run?
It may be necessary to remove a waiting job from the queue, or to stop a running job. This also can be done in this system.