PDL::ParallelCPU - Parallel Processor MultiThreading Support in PDL
PDL has support (currently experimental) for splitting up numerical processing
between multiple parallel processor threads (or pthreads) using the
can improve processing performance (by greater than 2-4X in most cases) by
taking advantage of multi-core and/or multi-processor machines.
# Set target of 4 parallel pthreads to create, with a lower limit of
# 5Meg elements for splitting processing into parallel pthreads.
$a = zeroes(5000,5000); # Create 25Meg element array
$b = $a + 5; # Processing will be split up into multiple pthreads
# Get the actual number of pthreads for the last
# processing operation.
$actualPthreads = get_autopthread_actual();
The use of the term threading
can be confusing with PDL, because it can
refer to PDL threading
, as defined in the PDL::Threading docs, or to
To reduce confusion with the existing PDL threading terminology, this document
to refer to processor multi-threading
, which is
the use of multiple processor threads to split up numerical processing into
This is a brief listing and description of the PDL pthreading functions, see the
PDL::Core docs for detailed information.
- Set the target number of processor-threads (pthreads) for
multi-threaded processing. Setting auto_pthread_targ to 0 means that no
pthreading will occur.
See PDL::Core for details.
- Set the minimum size (in Meg-elements or 2**20 elements) of
the largest PDL involved in a function where auto-pthreading will be
performed. For small PDLs, it probably isn't worth starting multiple
pthreads, so this function is used to define a minimum threshold where
auto-pthreading won't be attempted.
See PDL::Core for details.
- Get the actual number of pthreads executed for the last pdl
See PDL::get_autopthread_actual for details.
PDL PThreading can be globally turned on, without modifying existing code by
setting environment variables PDL_AUTOPTHREAD_TARG
before running a PDL script. These environment
variables are checked when PDL starts up and calls to
with the environment variable's values.
For example, if the environment var PDL_AUTOPTHREAD_TARG
is set to 3, and
is set to 10, then any pdl script will run as if
the following lines were at the top of the file:
The auto-pthreading process works by analyzing threaded array dimensions in PDL
operations and splitting up processing based on the thread dimension sizes and
desired number of pthreads (i.e. the pthread target or pthread_targ). The
offsets and increments that PDL uses to step thru the data in memory are
modified for each pthread so each one sees a different set of data when
$a = sequence(20,4,3); # Small 3-D Array, size 20,4,3
# Setup auto-pthreading:
set_autopthread_targ(2); # Target of 2 pthreads
set_autopthread_size(0); # Zero so that the small PDLs in this example will be pthreaded
# This will be split up into 2 pthreads
$c = maximum($a);
For the above example, the maximum
function has a signature of
"(a(n); [o]c())", which means that the first dimension of $a (size
20) is a Core
dimension of the maximum
function. The other
dimensions of $a (size 4,3) are threaded
dimensions (i.e. will be
threaded-over in the maximum
The auto-pthreading algorithm examines the threaded dims of size (4,3) and picks
the 4 dimension, since it is evenly divisible by the autopthread_targ of 2.
The processing of the maximum function is then split into two pthreads on the
size-4 dimension, with dim indexes 0,2 processed by one pthread
and dim indexes 1,3 processed by the other pthread.
Auto-PThreading only works if your PDL installation was compiled with POSIX
threads enabled. This is normally the case if you are running on linux, or
other unix variants.
Not all the libraries that PDL intefaces to are thread-safe, i.e. they aren't
written to operate in a multi-threaded environment without crashing or causing
side-effects. Some examples in the PDL core is the fft
function and the
To operate properly with these types of functions, the PPCode flag
has been introduced to indicate a function as not
being pthread-safe. See PDL::PP docs for details.
Due to the way a PDL is split-up for operation using multiple pthreads, the size
of a dimension must be evenly divisible by the pthread target. For example, if
a PDL has threaded dimension sizes of (4,3,3) and the auto_pthread_targ
has been set to 2, then the first threaded dimension (size 4) will be picked
to be split up into two pthreads of size 2 and 2. However, if the threaded
dimension sizes are (3,3,3) and the auto_pthread_targ
is still 2, then
pthreading won't occur, because no threaded dimensions are divisible by 2.
The algorithm that picks the actual number of pthreads has some smarts (but
could probably be improved) to adjust down from the auto_pthread_targ
to get a number of pthreads that can evenly divide one of the threaded
dimensions. For example, if a PDL has threaded dimension sizes of (9,2,2) and
is 4, the algorithm will see that no dimension is
divisible by 4, then adjust down the target to 3, resulting in splitting up
the first threaded dimension (size 9) into 3 pthreads.
If you have a 8 core machine and call auto_pthread_targ
with 8 to
generate 8 parallel pthreads, you probably won't get a 8X improvement in
speed, due to memory bandwidth issues. Even though you have 8 separate CPUs
crunching away on data, you will have (for most common machine architectures)
common RAM that now becomes your bottleneck. For simple calculations (e.g
simple additions) you can run into a performance limit at about
4 pthreads. For more complex calculations the limit will be higher.
Copyright 2011 John Cerney. You can distribute and/or modify this document under
the same terms as the current Perl license.