Configuration Instructions to Run Desmond on GPUs
Running Desmond on GPUs achieves exceptional throughput on commodity Linux clusters with both typical and high-end networks and improves computing speed by 100x on general-purpose GPU (GPGPU) compared to single CPU. See Desmond performance data for supported cards for more information.
To run Desmond on GPUs, there are several steps to configure the system:
1. Disable ECC memory
Run the following command to disable ECC memory on your GPUs:
nvidia-smi -e 0
Reboot the machines. Check the output of the nvidia-smi to ensure all GPUs have ECC memory diabled.
2. Set Compute Mode
Set the compute mode of the GPU to ""Exclusive Process"":
nvidia-smi -c 3
On some systems, you may need to set the compute mode for each card individually:
nvidia-smi -c 3 -i 0
nvidia-smi -c 3 -i 1
...
To enable this options automatically, add the above lines to /etc/rc.local.
3. Enable Persistence Model
Enable persistence mode for faster startup of the CUDA runtime:
nvidia-smi -pm 1
This line can also be added to /etc/rc.local to ensure it is set on boot.
Ensure that /dev/nvidiactl is created and the kernel module is loaded.
Add the line below to /etc/rc.local as well.
modprobe nvidia
Ensure that users can read and write to the /dev/nvidia* devices:
chmod 666 /dev/nvidia*
4. Set up and configure the software
After installing the Schrödinger software, ensure that the GPU cards are recognized by the software by running the following command:
$SCHRODINGER/utilities/query_gpgpu -a
Note: Using GPUs in combination with FEP jobs requires a batch queueing system. For other jobs it is highly recommended.
5. Configure schrodinger.hosts file
Single Host Configuration
To make use of GPU cards for Desmond calculations, the schrodinger.hosts file must be configured for each host.

# Local workstation name: myhostname host: localhost processors: 4 tmpdir: /usr/tmp gpgpu: 0, Tesla V100
This indicates that the local workstation has a single Tesla V100 card available at device index 0.

# Remote host
name: otherhost1
host: remote1.mycore.com
processors: 8
schrodinger: /path/to/schrodinger/
tmpdir: /usr/tmp
gpgpu: 0, Tesla V100
gpgpu: 1, Tesla V100
Here, the remote workstation has two Tesla V100 cards at device indices 0 and 1.
Queueing System Host Configuration
For each queueing system, there are numerous ways to configure the queue and make resource requests.

# SGE
name: sge-gpu
host: remote1.mycore.com
queue: SGE
qargs: -q myqueue.q -pe smp %NPROC% -l gpus=1
tmpdir: /usr/local/tmp
schrodinger: /path/to/schrodinger
gpgpu: 0, Tesla V100
gpgpu: 1, Tesla V100
gpgpu: 2, Tesla V100
gpgpu: 3, Tesla V100
This indicates that the local workstation has a single Tesla V100 card available at device index 0.
Here, we set gpus=1, not gpus=%NPROC%. The reason for this is that SGE interprets the resource request as: “for every slot requested, provide -l gpus=N GPUs resources”. By setting -l gpus=1 you are effectively asking for %NPROC% number of GPUs resources as well.
If you are using a queuing system with the ability to specify individual GPUs, such as Univa Grid Engine, configured with an RSMAP GPU resource, you will also need to set CUDA_VISIBLE_DEVICES via an environment script or a prolog script. Schrödinger GPU-enabled products respect this setting and only use the GPUs designated by the CUDA_VISIBLE_DEVICES environment variable.
An example of an environment script is as follows:
#!/bin/sh if [ -n ""$SGE_HGR_gpus"" ]; then CUDA_VISIBLE_DEVICES=`echo ""$SGE_HGR_gpus"" | sed -e 's/gpus//g' -e 's/ /,/g'` export CUDA_VISIBLE_DEVICES fi
where gpus is the name of the consumable resource (and should be changed if you called it something different). The script should be copied to /etc/profile.d/cuda_env.sh on every GPU compute node.

# Torque
name: torque-gpu
host: headnode.mycluster.com
queue: Torque
qargs: -q myqueue.q -l nodes=1:ppn=%NPROC%:gpus=%NPROC%
tmpdir: /usr/local/tmp
schrodinger: /path/to/schrodinger
gpgpu: 0, Tesla V100
gpgpu: 1, Tesla V100
gpgpu: 2, Tesla V100
gpgpu: 3, Tesla V100
Torque does not interpret the GPU resource on a per CPU basis so we specify %NPROC% for both the ppn and the gpus resource.

# LSF
name: lsf-gpu
host: headnode.mycluster.com
queue: LSF
qargs: -q myqueue.q -n %NPROC% -R \""span[hosts=1]\"" -R \""rusage[ngpus_excl_p=1]\""
tmpdir: /usr/local/tmp
schrodinger: /path/to/schrodinger
gpgpu: 0, Tesla P100
gpgpu: 1, Tesla P100
gpgpu: 2, Tesla P100
gpgpu: 3, Tesla P100
Here, we set gpus_excl_p=1, not ngpus_excl_p=%NPROC%. The reason for this is that we are assuming that you have configured this resource on a per-task basis (however this can be set as a per-host resource too in which case we would instead request %NRPROC%.

# Slurm
name: slurm-gpu
host: headnode.mycluster.com
queue: SLURM2.1
qargs: --partition=all --nodes=1 --ntasks-per-node=%NPROC% --gres=gpu:Tesla:%NPROC%
tmpdir: /usr/local/tmp
schrodinger: /path/to/schrodinger
gpgpu: 0, Tesla V100
gpgpu: 1, Tesla V100
gpgpu: 2, Tesla V100
gpgpu: 3, Tesla V100
Here, we set gpus_excl_p=1, not ngpus_excl_p=%NPROC%. The reason for this is that we are assuming that you have configured this resource on a per-task basis (however this can be set as a per-host resource too in which case we would instead request %NRPROC%.