Platform: All Platforms Versions: All versions

Problem Description

This solution describes how COMSOL takes advantage of multicore computers.

Solution

COMSOL supports two mutual modes of parallel operation: shared-memory parallelism and distributed-memory parallelism (cluster support). Shared-memory parallelism is supported with all COMSOL license types, while distributed-memory parallelism requires a floating network license. Using shared-memory parallelism it is possible to utilize all CPU sockets on a computer, but for computers with multiple sockets, it can sometimes be advantageous with a floating network license to utilize the computer's full capacity; for further information, please see Hybrid Computing: Advantages of Shared and Distributed Memory Combined. For more general information on selecting hardware, see: What hardware do you recommend for COMSOL Multiphysics?.

This solution is dedicated to shared-memory parallel operations. For distributed-memory parallel operations, see Solution 1001. Shared-memory processing, or multithreading is important for the performance of COMSOL computations. Some terms that are frequently used in when describing multithreading are:

  • Core: A physical processor core used in shared-memory parallelism by a computational node with multiple processors.
  • Speedup: is how many times faster a job runs on N cores compared to 1 core, on a specific compute node. The speedup depends both on the problem type, the hardware used, and hardware drivers used.

Windows

On Windows platforms, the default number of processor cores used by COMSOL is the total number of available physical cores. For example, if you have a 2 x dual core machine, 4 cores will be used in parallel by a COMSOL Multiphysics process by default.

If you want COMSOL to leave out one or more processor cores you can manually set the number of cores used for a job, you can change the default behavior by starting the COMSOL Desktop and setting the Number of processors option on the Multicore and Cluster Computing section in the Preferences menu.

Alternatively, create a new shortcut on your Desktop to the COMSOL executable and modify it to set the desired number of threads.

  1. Create a new shortcut on the Desktop.
  2. Right-click the shortcut and select Properties.
  3. Change the Target field to "C:\Program Files\COMSOL\COMSOL61\Multiphysics\bin\win64\comsol.exe" -np 2 if you want COMSOL to use only 2 cores.

mac OS X

On Mac OS X, controlling the number of processor cores used by COMSOL is only possible when launching COMSOL from the Terminal application. The default behavior is to use all available physical processor cores for the COMSOL Multiphysics application. You can find how many processor cores you have in the System Profiler application, or by using the command sysctl hw.ncpu. You can override the default by using the command line switches. For example, start COMSOL with 2 threads using the command /Applications/COMSOL61/Multiphysics/bin/comsol -np 2.

Linux

The number of cores available to a COMSOL process in parallel can be displayed on some systems by the command more /proc/cpuinfo | grep proc

Note that if you have hyperthreading activated you need to divide the cores count reported by the above command by relevant hyperthreading factor (2) to get the physical core count. COMSOL does not benefit from hyperthreading; if COMSOL is started with more threads than there are physical CPU cores, performance will decrease.

On Linux the default behavior is to use all available physical cores for the COMSOL Multiphysics application. You can override the default behavior by using the command line switches. For example, start by the command comsol -np 2.

Hyperthreading

COMSOL does currently not benefit from hyperthreading. COMSOL will use only as many threads as there are physical CPU cores on the system. The result is that if hyperthreading is active, the Windows Task Manager will show at most 50% CPU utilization for the COMSOL process. This is expected and not an indication that CPU utilization is too low. It is recommended to have hyperthreading enabled such that other applications running simultaneously can take advantage of it.

The -mpmode option

The values turnaround and throughput for -mpmode correlate directly with the OpenMP runtime settings for the KMP_LIBRARY environment variable. The -mpmode option overwrites the system settings (if KMP_LIBRARY is not set).

All options use KMP_BLOCKTIME = 200 by default. turnaround is also the default, when -mpmode is not set at all. The serial mode is not used by COMSOL. The third value that COMSOL lists for -mpmode is owner. The owner option is similar to turnaround, the difference is that owner also specifies a thread affinity that is optimized for the number of sockets on the computer, so owner is more aggressive than turnaround.

NUMA awareness

COMSOL is aware of NUMA (Non-Uniform Memory Access) systems. NUMA systems can be systems with several CPU sockets or systems with CPUs based on multiple tiles. These systems are characterized by RAM memory modules that can be reached with different access times. COMSOL is automatically detecting the number of available sockets. Manual changes can be applied by specifying the Number of sockets in Preferences / Multicore and Cluster Computing. From the command line, the number of NUMA sets (sockets) can be set by means of the flag -numasets <no. of sets>. A larger number for the NUMA sets means that there is a lower probability that the OS is moving threads from one core to another while the simulation is running.

Verifying Which Math Library is Used

The default option is to use the Intel Math Kernel Library for Intel and AMD processors. For ARM processors, the default library is Arm Performance Libraries. The Basic Linear Algebra Subprograms (BLAS) is part of these libraries and contains low-level routines for common linear algebra operations. However, you can change the version of BLAS used, see below, and also check which version is used. To see which BLAS library is currently used, you can set the environment variable COMSOL_BLAS_DEBUG to 1. This will output a file in the start directory on Windows and write to standard out on Linux. The output is intended for debugging in case you are running into issues and want more information.

Alternatively, you can use the Process Explorer from Microsoft to determine which DLLs have been loaded into a process.

For Windows, the possible DLL names are:

  • csmklblas.dll: Intel MKL
  • csaoclblas.dll: AMD AOCL
  • csblasblas.dll: Netlib BLAS

For Linux on Intel and AMD, instead of the Process Explorer, you can use lsof:

  • libcsmklblas.so: Intel MKL
  • libcsaoclblas.so: AMD AOCL
  • libcsblasblas.so: Netlib BLAS
  • libcsblisblas.so: BLIS

The AOCL library is available from COMSOL 6.0 update 2. The BLIS library on Linux is available up to COMSOL 6.0. It is not available from COMSOL 6.1.

For Linux on ARM:

  • libcsarmplblas.so: Arm Performance Libraries
  • libcsopenblas.so: OpenBLAS
  • libcsblasblas.so: Netlib BLAS

Troubleshooting

My new server has 48 cores; however, speedup is poor when increasing the number of threads beyond a certain number. What gives?

  1. Problem size matters for speedup. Speedup for very large models (like several million degrees of freedom) is better. If you use very small models, speedup will be limited when using many cores. In addition, the maximum possible speedup is limited by the non-parallel fraction of the algorithms. This limit is described by Amdahl's law.

  2. If you are using the MUMPS direct solver, switch to the PARDISO direct solver in COMSOL. It provides better shared-memory speedup for high number of cores than MUMPS.

  3. By default, the Intel MKL library is used. For AMD processors you could also try the AOCL library by specifying -blas aocl.

See Also

See also Selecting hardware (solution 866).
See also Running COMSOL on clusters (solution 1001).