Getting started with pfocal

library(pfocal)

Introduction

Moving window calculations are common in many fields, from image processing to raster data manipulation. The process of ingesting values in the neighborhood of a each cell in a grid, passing them through a function, and returning a reduced value, is “embarrassingly parallel”. This package implements a this very process with parallel C code. In doing so, it provides a much faster option than other similarly aimed packages and functions.

A simple example

Let’s start with a simple 1000 by 1000 square matrix.

size <- 1000
data <- matrix(nrow = size, ncol = size, 
               data = runif(n = size*size, min = 0, max = 10))
image(data, asp = 1)

The package contains two types of functions: kernel generation functions and kernel application functions. With the package’s kernel generation functions, it is easy to create distance, exponential, circular as well as Gaussian kernels. These kernels are returned under the form of a matrix. Here we generate a Chebyshev distance kernel and show its shape.

kernel <- chebyshev_distance_kernel(10)
image(kernel, asp = 1)

The second type of functions in the package, kernel applications functions, can apply a kernel to a matrix, raster object or terra object (although the most efficient approach is to pass a matrix to the function). The main application function is pfocal.

convoluted <- pfocal(data = data, kernel = kernel, edge_value = 0)
image(convoluted, asp = 1)

Convolution functions

The pfocal function is not as flexible as other focal calculations functions which can accommodate any functional treatment of the values covered by the kernel (for example raster::focal can take on any R function). However, it is optimized for a few common combinations of “transform” and “reduce” functions. The transform function dictates how the values d covered by the kernel k are treated with regard to the focal cell. To get the list of possible transform functions:

pfocal_info_transform()
#>      [,1]       [,2]
#> [1,] "MULTIPLY" "0" 
#> [2,] "ADD"      "1" 
#> [3,] "R_EXP"    "2" 
#> [4,] "L_EXP"    "3" 
#>      [,3]                                                                   
#> [1,] "For data value 'd' and kernal value 'k', intermediate_value = (d * k)"
#> [2,] "For data value 'd' and kernal value 'k', intermediate_value = (d + k)"
#> [3,] "For data value 'd' and kernal value 'k', intermediate_value = (d ^ k)"
#> [4,] "For data value 'd' and kernal value 'k', intermediate_value = (k ^ d)"

The reduce function, on the other hand, dictates how the transformed values are reduced, with the most common option being sum, product, minimum and maximum. See the list of reduce functions:

pfocal_info_reduce()
#>      [,1]          [,2]
#> [1,] "SUM"         "0" 
#> [2,] "ABS_SUM"     "1" 
#> [3,] "PRODUCT"     "2" 
#> [4,] "ABS_PRODUCT" "3" 
#> [5,] "MIN"         "4" 
#> [6,] "MAX"         "5" 
#>      [,3]                                                                                                                          
#> [1,] "Accumulator starts at 0. For each intermediate value, in no particular order, acc = ( acc + iv )"                            
#> [2,] "Accumulator starts at 0. For each intermediate value, in no particular order, acc = ( acc + abs(iv) )"                       
#> [3,] "Accumulator starts at 1. For each intermediate value, in no particular order, acc = ( acc * iv )"                            
#> [4,] "Accumulator starts at 1. For each intermediate value, in no particular order, acc = ( acc * abs(iv) )"                       
#> [5,] "Accumulator starts at the highest possible value. For each intermediate value, in no particular order, acc = min( acc , iv )"
#> [6,] "Accumulator starts at the lowest possible value. For each intermediate value, in no particular order, acc = max( acc , iv )"

Finally, there is the option of dividing the final result (a method for scaling). The default is no division. The different options can be seen with:

pfocal_info_mean_divisor()
#>       [,1]                    [,2]
#>  [1,] "ONE"                   "0" 
#>  [2,] "KERNEL_SIZE"           "1" 
#>  [3,] "KERNEL_COUNT"          "2" 
#>  [4,] "KERNEL_SUM"            "3" 
#>  [5,] "KERNEL_ABS_SUM"        "4" 
#>  [6,] "KERNEL_PROD"           "5" 
#>  [7,] "KERNEL_ABS_PROD"       "6" 
#>  [8,] "DYNAMIC_COUNT"         "7" 
#>  [9,] "DYNAMIC_SUM"           "8" 
#> [10,] "DYNAMIC_ABS_SUM"       "9" 
#> [11,] "DYNAMIC_PROD"          "10"
#> [12,] "DYNAMIC_ABS_PROD"      "11"
#> [13,] "DYNAMIC_DATA_SUM"      "12"
#> [14,] "DYNAMIC_DATA_ABS_SUM"  "13"
#> [15,] "DYNAMIC_DATA_PROD"     "14"
#> [16,] "DYNAMIC_DATA_ABS_PROD" "15"
#>       [,3]                                                                                                                            
#>  [1,] "Does not divide the final value by anything"                                                                                   
#>  [2,] "Divide the final value at each point by nrow(k)*ncol(k)"                                                                       
#>  [3,] "Divide the final value at each point by sum(+!is.na(k))"                                                                       
#>  [4,] "Divide the final value at each point by sum(k[!is.na(k)])"                                                                     
#>  [5,] "Divide the final value at each point by sum(abs(k[!is.na(k)]))"                                                                
#>  [6,] "Divide the final value at each point by prod(k[!is.na(k)])"                                                                    
#>  [7,] "Divide the final value at each point by prod(abs(k[!is.na(k)]))"                                                               
#>  [8,] "Divide the final value at each point by sum(!is.na( intermediate_data )), recalculated at every point"                         
#>  [9,] "Divide the final value at each point by sum(intermediate_data[!is.na( intermediate_data )]), recalculated at every point"      
#> [10,] "Divide the final value at each point by sum(abs(intermediate_data[!is.na( intermediate_data )])), recalculated at every point" 
#> [11,] "Divide the final value at each point by prod(intermediate_data[!is.na( intermediate_data )]), recalculated at every point"     
#> [12,] "Divide the final value at each point by prod(abs(intermediate_data[!is.na( intermediate_data )])), recalculated at every point"
#> [13,] "Divide the final value at each point by sum(local_data[!is.na( intermediate_data )]), recalculated at every point"             
#> [14,] "Divide the final value at each point by sum(abs(local_data[!is.na( intermediate_data )])), recalculated at every point"        
#> [15,] "Divide the final value at each point by prod(local_data[!is.na( intermediate_data )]), recalculated at every point"            
#> [16,] "Divide the final value at each point by prod(abs(local_data[!is.na( intermediate_data )])), recalculated at every point"

Shortcut functions

Other kernel applications are available in the package, bearing the pfocal_fast, and wrapping pfocal as a shortcut to some default arguments.

fast_convoluted <- pfocal_fast_abs_rectangle(data = data, height = 10, width = 10)
image(fast_convoluted, asp = 1)

Comparison with raster::focal

To compare with raster::focal, we will make use of the raster and microbenchmark packages:

library(raster)
#> Loading required package: sp
library(microbenchmark)
#> Warning: package 'microbenchmark' was built under R version 4.1.2

We will run the functions 100 times on the same data grid above. We show here how the mean() function can be replicated in pfocal.

data_raster <- raster::raster(data)

mbm <- microbenchmark(
  "pfocal" = {
    pfocal_results <- pfocal::pfocal(data = data_raster, kernel = kernel, 
                                     transform_function = "MULTIPLY",
                                     reduce_function = "SUM", 
                                     mean_divider = "KERNEL_COUNT")
    gc()
  },
  "raster_focal" = {
    raster_focal_results <- raster::focal(x = data_raster, w = kernel, 
                                          fun = mean, pad = TRUE, padValue = 0)
    gc()
  },
  times = 100,
  unit = "s")
mbm
#> Unit: seconds
#>          expr       min        lq      mean    median        uq       max neval
#>        pfocal 0.2178424 0.2228699 0.2279975 0.2253432 0.2300356 0.2677352   100
#>  raster_focal 0.7477754 0.7576559 0.7686383 0.7624928 0.7725864 0.9350483   100
plot(mbm)