I/O is King when Optimizing NX Nastran Performance

Fast I/O to Scratch and SCR300 needed for fast performance.

If you’ve always believed that processor speed was the key to speed in NX Nastran simulations, then you’re only partly right. In fact, the biggest factor affecting the speed of your simulation is I/O. That was the message of Dr. Paul Blelloch during his presentation at the Siemens CAE & Test Symposium.

The largest amount of I/O on NX Nastran is between the SCRATCH and SCR300 files…not the database (or DBALL). Therefore, the best recommendation to speed up your solution is to locate the SCRATCH and SCR300 files on the fastest memory available, either RAM or a fast local solid state disk (SSD).

“Nastran allows the user to specify a certain location for the high I/O files,” said Dr. Blelloch, Director at the engineering consulting firm ATA Engineering. “If a computer has a very large amount of RAM (typically 128 GB or larger), optimal performance can be achieved by using the RAM for the SCRATCH and SCR300 files. Beyond that, typically a local solid state drive (SSD) is better than a local spinning drive, which is better than a network drive. Perhaps the cheapest, fastest way to speed up I/O is with a 1 TB SSD.”

Controlling NX Nastran’s SCRATCH and I/O

Parameters used to control Nastran I/O at the command line, options window, or globally through the RCF file.

Various parameters are able to control the I/O of NX Nastran. These parameters can be set within the RCF file, command line, or options window. The RCF file is created at “install” on the cluster or work station; it acts as a reference to the global default settings. Therefore, the command line and option window will act as overrides for specific simulations.

RCF files on the cluster will tend to determine the organization’s standard settings for NX Nastran. However, more experienced users might be more comfortable setting up their own local, or project-specific RCF.

“A subtle parameter that can affect the speed of NX Nastran is the SCR parameter,” explained Blelloch. “This controls how much data is written to the SCRATCH and SCR300 files vs the DBALL database. When SCR = YES then everything will be written to SCRATCH. Though this is the fastest option it comes at a price as it will not allow for a restart. Setting SCR = NO, however, will write the maximum amount of data to the DBALL, slowing things down but assuring efficient restart solutions. A good compromise is to set SCR=MINI which writes the minimal amount of data to the database. This will allow for restarts when the only difference is a new output request.”

Additionally, Blelloch added, “Another important parameter is smem. This variable determines how much of the RAM is dedicated to the SCRATCH files. The more RAM dedicated to the SCRATCH the faster the NX Nastran becomes. If a lot of RAM is available this can be a very effective way of speeding up a Nastran job.”

Advantages of NX Nastran running on Clusters

How to make a cluster work to speed up NX Nastran.

It is worth noting that the CPU’s in cluster nodes are typically not much faster than the CPU’s on a workstation. In actuality, using a cluster to calculate a NX Nastran simulation can slow down your computation if you are not careful, declares Blelloch:

“Just like on a workstation, the key is to run your SCRATCH files on fast local disks (like RAM or local SSD) as opposed to the network drive. Using the network drive for the scratch will typically run a lot slower than on a workstation. Cluster nodes do offer advantages, however, such as a large amount of RAM. If this RAM is used for SCRATCH memory then excellent performance can be achieved.”

DMP Vs. SMP configuration.

Using a cluster will also allow users to take advantage of Distributed Memory Processing (DMP) as opposed to the default Shared Memory Processing (SMP). SMP will distribute tasks (matrix multiplies, decompositions, substitutions, etc.) through multiple processors which share the same memory. Blelloch states, “SMP works best on workstations. However, due to the sharing of memory, adding additional CPU’s beyond 4 processors typically yields a minimal speed boost.”

DMP, however, will break a job into pieces. At this point, each piece which will be sent to a different processor associated with dedicated memory. This method can be scaled up with more processors and memory. Therefore, large servers and clusters are ideal for DMP.

Dr. Leonard Hoffnung, NX Nastran numerical methods manager, said “it’s ideal for NX Nastran if every node has a dedicated local scratch file system.”

DMP is also well suited to work in conjunction with the RDMODES calculation algorithm. RDMODES calculates the system’s modes by automatically breaking the model into small pieces. The process is similar to a third party Nastran add-on named AMLS.

“RDMODES isn’t necessarily associated with DMP, but it is particularly effective in a DMP environment as each piece can be solved in an independent processor. For models with many modes, RDMODES is effective at reducing run times,” said Blelloch.

Hoffnung added “There is a misconception that RDMODES requires DMP. This is not the case. It is possible to use RDMODES with SMP, DMP, and both simultaneously. For instance, you can utilize a cluster of machines with multiple cores on each node.”

Performance Comparisons of Versions of NX Nastran

Nastran Linux LP (top), and ILP (second). NX Nastran Windows ILP (third) and LP (bottom).

As with any software, legacy is a large deciding factor for purchasing decisions. However, if you are not trapped by legacy then there are a few NX Nastran version options that can help reduce your runtime.

First is the difference between NX NASTRAN ILP and LP. The biggest advantages of NX Nastran ILP, is its long integers (64-bit) and unlimited RAM access. LP, on the other hand is limited to 32-bit integers and 8 GB of RAM.

Since RAM can play an important factor in speeding up runtime performance, ILP can offer a substantial advantage when lots of memory is available for smem. Additionally, the added integer memory could improve accuracy in certain situations without affecting speed significantly.

Note, due to legacy and compatibility issues, the decision between LP and ILP is relatively final. This is because transferring data between ILP and LP can result in truncation errors. As such, it is recommended to use only one version of Nastran.

Hoffnung agreed, “Results often differ slightly between LP and ILP because reading the bulk data into 32-bit or 64-bit precision creates perturbation. Additionally, since integers are 64-bit in ILP it will require additional disk space. But in most cases, except matrix reordering, this larger memory storage will not affect the runtime, though it could increase accuracy.”

Blelloch explained, “Nastran handles memory in ‘words.’ One of the implications of the ILP version is that one word is 8 bytes, while LP one word is 4 bytes. This is an important distinction when assessing memory in the .f04 file. And learning to read the .f04 file will help the user to better assess their runtimes.”

Another version based decision associated with NX Nastran is the decision between the Windows and Linux operating systems. Between these two versions, Blelloch notes that Linux will likely be the faster option. He said, “because of the way Linux handles memory, our experience shows that NX Nastran seems to perform better when working on large simulation jobs.”

DOF’s effect on NX Nastran’s Performance isn’t as important as you think

Summary of tips to speed up NX Nastran.

Often you will hear a simulation analyst define the degrees of freedom (DOF) of a system and how long it took to solve the system. However, Front Size might be a variable to pay more attention to. Front Size is the maximum length of a non-zero column in a matrix after it has been reordered. It will therefore have a strong effect on the number of nonzero terms in the factored matrix.

“The DOF size really doesn’t matter much,” expressed Blelloch. “A smaller model with a large Front Size might take much longer to solve. Models with a lot of elements in 3 dimensions tend to have larger Front Sizes than models with primary 1D and 2D elements.”

Hoffnung agreed, noting that “the cost of the factorization of a matrix grows approximately as the cube of the max front size. Therefore, runtimes can increase very quickly with larger maximum Front Sizes.”

To minimize the Front Size, NX Nastran offers reordering algorithms that run before the decomposition. The algorithm choice is controlled by the DCMPSEQ Nastran System Cell.

By default, Nastran will choose between the BEND (aka EXTREME) algorithm and METIS. “Based on the number of 3D elements, NX Nastran will choose between the two algorithms,” said Hoffnung. “However, the heuristic and reordering algorithms are not perfect. BEND will typically consume more memory than METIS which can lead to poor performance for very large models.”

Therefore, if your simulation is a little sluggish due to a large Front Size, Blelloch and Hoffnung recommend to switch the algorithm from METIS to BEND, or vice versa.

Other tips to speed up your NX Nastran simulation include:

Using an iterative solver where there are a small number of load cases and a large Front Size
Restart from a previous solution if the difference from the previous run is a different output
Learn to read .f04 files so you can understand performance