What affects Cluster efficiency and How to detect Number of Cores


What affects Cluster efficiency

It is important to check what are the characteristics of the machine where you are running the model referring to the aspects reported below.

<1> An important factor with Cluster is the number of physical cores: “Unlimited” number of nodes can be used in parallel but ultimately the total processing power is given by the total physical cores in the machine (ref. “Note on Hyperthreading” below).

It is important to avoid “overloading” the processors, e.g.:

<2> Second aspect is RAM: RAM plays a more important role for Analyst and other programs or memory-intensive tasks, e.g. Matrix with Arrays, Automdarrays, etc. (together with processors’ usage). There is not an exact rule in terms of memory usage:

[Note that each node requires the same amount of memory as a non-Cluster step when doing intrastep, or add up the memory requirements for each node if mulistep]

<3> Third factor when running cluster is disk usage (this can be checked by running the model and monitoring the disk usage from task manager – if close to 100%). This is a specific element to consider for PT Program or program with high I/O requirements:

…if disk usage is very high, there could be no benefit in using more cluster nodes. This has to do with IOPS (IO Operations Per Second). Modern SSDs can perform high amounts of IOPS and so can withstand a greater workload than hard disks.

<4> Fourth aspect is that Cluster itself adds some overhead time which involves communication between the master and the slave nodes. However this is only significant when using large number of slave nodes (e.g. more than 30).

Note on Hyperthreading

Hyperthreaded cores are part of the execution unit, i.e., they help with the processors pipeline processing, whilst the number of physical cores are effectively as able to execute in parallel.

Turning hyperthreading “off” would not improve parallel performance and there is no conflict if using hyperthreading, as hyperthreading itself does not constitute any issue for Cluster, but it is recommended not to use more nodes than the number of physical cores.

For example, if your machine has 16 physical cores, then problems might arise if you over provision the CPU by having more than 16 fully CPU bound processes. So, if you try to run 32 cluster nodes, this might likely negatively impact performance.

Note: on occasions you can run a little more processes/threads than you have cores if your computation involves a lot of I/O with a lot of waiting time, for instance you might benefit from running 19-20 cluster nodes on 16 cores, but this is an optimization and is very dependent on the use case (Voyager/Cluster). However, if the program performs a lot of I/O, then the more threads performing I/O, the more work the I/O subsystem has to do and you may get a performance hit from I/O. In particular, an issue that might arise in this situation is when Disk Usage is very high, and cause an increase in runtimes when using more cores. This is reported above as “third factor”.

How to detect Number of Cores

The below Command Line variables can provide the number of processors within Pilot in Cube.

  1. %NUMBER_OF_PROCESSORS% variable gives the number of logical cores if Hyperthreading is ON. If you turn hyperthreading OFF you do have higher probability that this variable will report the number of physical cores without hyperthreading, but it may not be the most reliable number.

  2. To get the number of Physical Cores is possible using WMIC command below (with the need to convert the myCores.dat file to AMCI):

    *WMIC CPU Get NumberOfCores > “{CATALOG_DIR}\Model\myCores.dat”
    *cmd /a /c type “{CATALOG_DIR}\Model\myCores.dat” > “{CATALOG_DIR}\Model\myCores1.dat”

  3. There are other programs that can accurately determine the number of cores, one example is CPU-Z available from the link below:
    https://www.cpuid.com/
    https://www.cpuid.com/softwares/cpu-z.html

CPU-Z can be used through command line. An example of using CPU-Z with Voyager script (Pilot) is provided below:

;*********************************************************
; PILOT Script
*"C:\Program Files\CPUID\CPU-Z\cpuz.exe" -txt={Scenario_dir}\CPU-Z_report
; End of PILOT Script
; Script for program MATRIX
RUN PGM=MATRIX PRNFILE="{Scenario_dir}\EAMAT0A0.PRN" MSG='Reading the TXT file and creating a PRN file with the variable n_CPUs'
FILEO PRINTO[1] = "{Scenario_dir}\CPU-Z_nCPUs.txt"
FILEI RECI = "{Scenario_dir}\CPU-Z_report.TXT"
_nCores=strpos('Number of cores',reci)
if (_nCores>0)
_length1=_nCores+strlen('Number of cores')
_postbr =strpos('(',reci)
_length2=_postbr-_length1
n_CPUs=substr(reci,_length1,_length2)
PRINT PRINTO=1 LIST="n_CPUs=", val(n_CPUs)(L10.0)
endif
ENDRUN
;*********************************************************


Note on opening/closing Cluster Nodes

Cluster nodes should be opened at the beginning of the model and closed at the end of the model. Opening and closing cluster nodes more than one time (i.e., every time they are needed, instead of just at the beginning and at the end) should be avoided for the following reasons: