Setup
All OpenCL actions occur via contexts, which are containers for related devices, and command queues which are attached to specific devices. Each context is associated with a specific platform where a platform generally coincides with a particular vendor.
1a. Quick setup
We can use the fclInit
function to quickly select a device based on criteria from all devices available on the system.
If a device matching the specified criteria is found, then this device is returned as an object and the default context is set
using the platform containing the matching device.
Interface:
device = fclInit([vendor],[type],[nameLike],[extensions],[sortBy])
-
device
(type(fclDevice)
): the chosen device returned by the function. -
vendor
(character(*)
,optional
): if specified, look for this (sub)string in the device vendor to filter available devices. -
type
(one of'cpu'
or'gpu'
,optional
): if specified, filter available devices based on device type. -
nameLike
(character(*)
,optional
): if specified, look for this (sub)string in the device name to filter available devices. -
extensions
(character(*)
,optional
): if specified, look for these OpenCL extensions (command-separated) to filter available devices; any device that does not support all extensions specified will be filtered out. -
sortBy
(one of'core'
,'memory'
,'clock'
,optional
): from the filtered list, choose the device with the most compute units or total memory or clock speed.
Example:
type(fclDevice) :: device
...
device = fclInit(vendor='nvidia,amd',type='gpu',sortBy='cores')
In this example we have specified any gpu
device belonging to vendors nvidia
OR amd
and to choose the device
with the most compute units (cores
).
Note
fclInit
automatically creates an OpenCL context for the chosen device and uses it to set the default context.
By setting the default context, subsequent Focal API calls can omit the context ('ctx'
) argument.
To add more devices from the same vendor, see fclFindDevices
below.
Jump down to command queues to now create a command queue on your chosen device
1b. Advanced setup
Querying platforms
OpenCL is able to support multiple different implementations on the same host using a platform model. An OpenCL platform is a specific OpenCL implementation; in general, platforms coincide with different hardware vendors. For example, if your machine has an Intel CPU and an NVIDIA graphics card both with drivers supporting OpenCL, then you will have two platforms available: one each for Intel and NVIDIA.
We can get a list of platforms using the Focal subroutine fclGetPlatforms()
.
This returns a list of Focal fclPlatform
objects:
type(fclPlatform), pointer :: platforms(:)
platforms => fclGetPlatforms()
The Focal platform object contains fields such as name
, vendor
, version
, and numDevice
which we can use
to select a particular platform.
API ref: fclPlatform, fclGetPlatforms, fclGetPlatformInfo
Create a context
We can explicitly create an OpenCL context with a particular platform or vendor using fclCreateContext
:
There are two ways of calling this function, either with a platform object (see fclGetPlatforms()
above to query platforms) or with
a vendor string to specify the desired vendor:
Interface:
ctx = fclCreateContext(platform)
ctx = fclCreateContext(vendor)
-
ctx
(type(fclContext)
): context object returned. -
platform
(type(fclPlatform)
): a Focal platform object on which to create the context. -
vendor
(character(*)
): string or comma-separate list of strings to select a particular device vendor. If multiple vendors are specified, then the first vendor in the list that is found on the system is chosen, i.e. specify vendors in order of preference.
Example:
type(fclContext) :: ctx
...
ctx = fclCreateContext('nvidia,intel')
In this example we create a context with first preference 'nvidia'
and second preference 'intel'
as device vendors.
The default context
Once created, the resulting context object (called ctx
above) is used in subsequent Focal calls to indicate which context to use.
If you are only using one context throughout your program, then your code can be simplified by setting the default context.
This is a global variable which allows Focal calls to omit the context variable.
In the following examples, we query devices in a context for GPUs; the first example uses an explicit context variable ctx
whereas the
second example calls fclSetDefaultContext
and is able to omit the context variable in the call to fclFindDevices
.
Example: with explicit context variable
type(fclContext) :: ctx
type(fclDevice), allocatable :: devices(:)
...
ctx = fclCreateContext(vendor='nvidia')
devices = fclFindDevices(ctx,type='gpu')
Example: with default context
type(fclDevice), allocatable :: devices(:)
...
call fclSetDefaultContext( fclCreateContext(vendor='nvidia') )
devices = fclFindDevices(type='gpu')
API ref: fclPlatform, fclSetDefaultContext, fclDefaultCtx
Query devices on the context
A useful way of querying available devices in a context is provided by the fclFindDevices
function which enables us
to filter the device list based on device type, device name as well as sort the list according to device properties.
Interface:
devices = fclFindDevices(ctx,[type],[nameLike],[extensions],[sortBy])
devices = fclFindDevices([type],[nameLike],[extensions],[sortBy])
where arguments type
, nameLike
, extensions
, and sortBy
have the same definitions as defined for fclInit
above.
-
devices
(type(fclDevice)
,allocatable
): an array of devices allocated on assignment. -
ctx
(type(fclContext)
,optional
): the context from which to query available devices. The default context is used if this argument is omitted.
Example:
List CPUs in context ctx
sorted (descending) by number of cores:
type(fclContext) :: ctx
type(fclDevice), allocatable :: devices(:)
...
devices = fclFindDevices(ctx,type='cpu',sortBy='cores')
Example: List GPUs in the default context where the name contains 'tesla' (case insensitive):
type(fcLDevice), allocatable :: devices(:)
...
devices = fclFindDevices(type='gpu',nameLike='tesla')
From this list we can choose the first one or more devices as required.
We can further query device properties using fclGetDeviceInfo
(this requires inclusion of the clfortran
module which defines values for the property key
argument).
API ref: fclFindDevices, fclGetDeviceInfo fclDevice
2. Creating a command queue
Once a context created, and a device selected (see above) we can create an OpenCL command queue; all device actions are submitted via command queues.
Command queues are associated with individual devices where one device can have multiple command queues (but not vice versa).
Command queues are created using the fclCreateCommandQ
function.
Interface
cmdq = fclCreateCommandQ(ctx,device,[enableProfiling],[outOfOrderExec], &
[blockingRead], [blockingWrite])
cmdq = fclCreateCommandQ(device,[enableProfiling],[outOfOrderExec], &
[blockingRead], [blockingWrite])
-
cmdq
(type(fclCommandQ)
): the created command queue object -
ctx
(type(fclContext)
,optional
), the context associated withdevice
. If no context is specified, then the default context is used. -
device
(type(fclDevice)
): the device on which to create the command queue. -
enableProfiling
(logical
,optional
): whether to enable event profiling on this command queue, default.false.
-
outOfOrderExec
(logical
,optional
): whether to enable out-of-order execution on this command queue, default.false.
-
blockingRead
(logical
,optional
): whether memory read operations are host-blocking on this command queue, default.true.
-
blockingWRite
(logical
,optional
): whether memory write operations are host-blocking on this command queue, default.true.
Example:
Create command queue on first device in devices
list with profiling enabled.
type(fclCommandQ) :: cmdq
...
cmdq = fclCreateCommandQ(devices(1),enableProfiling=.true.)
API ref: fclCreateCommandQ, fclCommandQ
2.1 The default command queue
Once created, the resulting command queue object (called cmdq
above) is used in subsequent Focal calls to indicate which device to use.
In a similar way to the default context, a default command queue is provided to allow subsequent focal calls to omit the command queue object.
This is useful if you are only using one command queue throughout your program.
Example
Set the default command queue on first device in devices
list.
call fclSetDefaultCommandQ( fclCreateCommandQ(devices(1)) )
API ref: fclSetDefaultCommandQ, fclDefaultCommandQ
2.2 Command queue pools
Many devices support multiple harware queues which allow different kernel and memory operations to be processed concurrently; this is particularly important when wanting to overlap memory transfers with compute operations or when individual compute kernels do not fully utilise the device compute units.
OpenCL allows multiple command queues to be created for a single device; if the device supports multiple hardware queues, then the OpenCL command queues will map in some way to the available hardware queues. If not supported then the command queue work will be serialised on the device
Note
You can use the tracing functionality to visually check for kernel/memory runtime concurrency.
Focal provides a fclCommandQPool
type which performs simple round-robin scheduling across multiple command queues.
To create a command queue pool we can use the fclCreateCommandQPool
command which accepts the same arguments are fcCreateCommandQ
in
addition to an argument N
specifying the number of command queues to create.
Interface
cmdq = fclCreateCommandQPool(ctx,N,device,[enableProfiling],[outOfOrderExec], &
[blockingRead], [blockingWrite])
cmdq = fclCreateCommandQPool(N,device,[enableProfiling],[outOfOrderExec], &
N
(integer
): number of command queues to create within command queue pool.
Once created we can use the methods next()
and current()
to return the next scheduled or currently selected command queues respectively.
Example:
type(fclCommandQPool) :: qPool
...
qPool = fclCreateCommandQPool(3,device)
do i=1,3
kernel1%launch(qPool%next(),data(i))
kernel2%launch(qPool%current(),data(i))
end do
In this example we launch two sequential kernels three times on three different command queues for three different
sets of data. Note that the second kernel launches on the same queue as the first kernel and will hence be launched in
sequence, but each iteration of the do loop increments the current queue using the next()
method.
API ref: fclCreateCommandQPool, fclCommandQPool