CénitS - COMPUTAEX
Published on CénitS - COMPUTAEX (https://web.computaex.es)


User questions

In this section you will find user questions answered.

How do I stop a job in a queue manager?

If you want to stop a job because of an error or any other reason, run the following command:

  • $ bkill id_job

How can I view the output of a job that is running?

The running job output can only be view with the next queue manager command:

  • $ bpeek [id_job]

If id_job is skipped, it will display the information on the lasted job.

Set environment variables for Intel Fortran compiler 10.1

To work with the particular version of a compiler, such as Fortran 10.1, you must set their environment variable, running the following script.

  • $ /opt/intel/fc/10.1.025/bin/ifortvars.sh

My software has defined storage requirements. How can I request this storage?

The resources request form [1] includes a section about resources for storage applications required by users.

My jobs demand lots of I/O operations. Is there any type of storage to meet this need?

The two compute nodes have a scratch partition mounted on /scratch to meet the high demand for I/O in execution time for users jobs.

Why doesn't my job run?

This may be because the resources are not yet available to run your job. Check the status of the run queue with the following command:

  • $ bqueues

The information displayed will refer to the total number of jobs submitted to the queue, how many of them are running, pending and were suspended.

How can I know what the job resources use are?

To know job resources used, check the state of it with the following command:

  • bjobs-a-W <id_job>

The information displayed will refer to the name of the project, cpu use, memory, swap, the job's PID, job start/end time.

I need to upload my source code to LUSITANIA supercomputer. How must i proceed?

See section 2.4 of the user manual which details the process of uploading files.

When connecting LUSITANIA, a server authenticity message is shown. What should I do?

Server authenticity message:

The authenticity of host 'ssh.cenits.es (193.144.255.13)' can't be established.
RSA key fingerprint is fa:83:85:6c:88:2a:6b:31:74:f7:8f:39:98:a3:75:f0.
Are you sure you want to continue connecting (yes/no)?

This message is displayed the first time you try to connect to a ssh server. Also it will be shown later if you delete the known_hosts file from your computer.

This message indicates that the public key of the server you are trying to access is not known, and you are asked to trust the server. It should be accepted to login.

I am experiencing performance issues in my job. What is the problem?

Performance issues can be for different reasons:

  • Misuse of the queue manager. When the job is launched it must be specify the number of processes the job must use, if the number is incorrect it can affect job performance and the other users too.
  • Improper implementation and processes execution on nodes. For example, it is possible that you are using message passing in a single node, when it would be better shared memory.
  • Inappropriate storage use. If high-performance storage at /scratch is not used, you can experience an I/O performance decrease.
  • Inappropriate network communication use. If processes are running on both nodes and communications are needed between them, ensure you have not specified the name of the nodes as cn001 and cn002. By default, it uses the computing network, so it is not necessary to indicate the nodes name to run in. If needed, it must be specify the name as cncp001 and cncp002.

After checking the above, if problems persist, please contact the CénitS technical team.


Source URL:https://web.computaex.es/en/faq/user-questions

Links
[1] https://web.computaex.es/en/formularios/resource-request-form