Playing with Utilization Distribution (UD) scaling

Start by generating some data
Playing around with the grid
- Grid 1
- Grid 2
- Grid 3
Comparing UD pixel values
Zoom in on UD comparisons
Abbreviated Conclusion

Start by generating some data

Generate SpatialPoints data for 4 different UDs.

set.seed(100)
sp <- list( 
  #small sample, small spread
  SMLn_SMLsd = SpatialPoints(cbind(rnorm(n = 10, mean = 4, sd=2), rnorm(n = 10, mean = 4, sd=2))),
  
  #large sample, small spread
  LGn_SMLsd = SpatialPoints(cbind(rnorm(n = 30, mean = 7, sd=2), rnorm(n = 30, mean = 7, sd=2))),
  
  #small sample, large spread
  SMLn_LGsd = SpatialPoints(cbind(rnorm(n = 10, mean = -2, sd=10), rnorm(n = 10, mean = -2, sd=10))),
  
  #large sample, large spread
  LGn_LGsd = SpatialPoints(cbind(rnorm(n = 30, mean = 10, sd=10), rnorm(n = 30, mean = 10, sd=10)))
)

Take a look at the points.

Playing around with the grid

Grid 1

#make the grid
x <- seq(-40,40,by=1.5) 
y <- seq(-40,40,by=1.5) 
xy <- expand.grid(x=x,y=y) 
xy.sp <- SpatialPoints(xy)
gridded(xy.sp) <- TRUE 

#calculate the ranges
ud <- lapply(sp, kernelUD, grid=xy.sp)

#sum the @data$ud slots
for (i in 1:4) { 
  print(sum(ud[[i]]@data$ud))
}

## [1] 0.444307
## [1] 0.4443922
## [1] 0.4434003
## [1] 0.4443624

All UDs sum to the almost exactly the same value, regardless of sample size (n) and spread (sd).

Grid 2

Now change the grid extent and see what happens…

x <- seq(-40,60,by=1) #grid x min and x max
y <- seq(-60,60,by=1) #grid y min and y max
xy <- expand.grid(x=x,y=y) 
xy.sp <- SpatialPoints(xy)
gridded(xy.sp) <- TRUE 

#calculate the ranges
ud <- lapply(sp, kernelUD, grid=xy.sp)

#sum the @data$ud slots
for (i in 1:4) { 
  print(sum(ud[[i]]@data$ud))
}

## [1] 0.9998706
## [1] 0.9998943
## [1] 0.9990162
## [1] 0.9998803

Again, all UDs sum to the almost exactly the same value, regardless of spread and sample size. But, changing the grid has changed what value the UDs sum to (bigger grid, higher value, it would seem).

Grid 3

Now change the grid resolution to be finer…

x <- seq(-40,60,by=0.1) #grid resolution set in the 'by=' arg (was by=1)
y <- seq(-60,60,by=0.1) #grid resolution set in the 'by=' arg (was by=1)
xy <- expand.grid(x=x,y=y) 
xy.sp <- SpatialPoints(xy)
gridded(xy.sp) <- TRUE 

#calculate the ranges
ud <- lapply(sp, kernelUD, grid=xy.sp)

#sum the @data$ud slots
for (i in 1:4) { 
  print(sum(ud[[i]]@data$ud))
}

## [1] 99.98728
## [1] 99.98737
## [1] 99.88627
## [1] 99.98732

Using the previous grid extents, the sums were all around 1, now they are all approaching 100. i.e. making the grid resolution finer by a factor of 100 (/10 on each axis), increased the sum of @data$ud by a factor of 100. So now plot these ranges to look at them.

for (i in 1:4) { 
  plot(ud[[i]], xlim=c(-20,25), ylim=c(-35,50),
       main = names(ud)[i])
}

Note how the scale bar is diff for each UD! Thus, from range 1 to range 4, the high density areas are “worth” less and less (shallower peaks) in order to maintain the equal sum of @data$ud across wider home ranges. Therefore, this is NOT a good value to use for UD comparisons, because larger UDs will end up having smaller values (as an artifact of the math, not the actual density).

To illustrate this even more clearly, force the colour scales to be the same (using the zlim arg), then more wider-spread (LGsd) ranges almost dissappear…

for (i in 1:4) { 
  plot(ud[[i]], xlim=c(-20,25), ylim=c(-35,50), zlim=c(0,.08),
       main = names(ud)[i])
}

Comparing UD pixel values

Generate random points in the grid that cluster around where the ranges overlap the most (for the sake of having nice plots, without too many (0,0) points).

pts <- SpatialPoints(cbind(rnorm(n = 100, mean = 5, sd=5), rnorm(n = 100, mean = 5, sd=5)))

Try some different methods for extracting and comparing pixel values…

Using raw UD data

Look at some UD pixel value comparisons - eventhough this method is prob not the right one, but just to see what happens…

#plot the points
par(resetPar(), bg="white")
plot(sp[[1]], col=1, xlim=c(-5,20), ylim=c(-20, 30), asp=1, pch=16, axes=T)
for (i in 2:4) { 
  plot(sp[[i]], col = i, pch=16, add=T)
}
plot(pts, col="orange", pch=4, add=T)
legend("topright", c("SMLn_SMLsd", "LGn_SMLsd", "SMLn_LGsd", "LGn_LGsd", "100 random points"), 
       col=c(1:4, "orange"), pch=c(rep(16, times=4), 4))

Now extract the value of each UD at the random pts…

rst <- lapply(ud, raster)
vals <- lapply(rst, extract, pts)
df.raw <- as.data.frame(matrix(unlist(vals), 100, 4, byrow=FALSE))
names(df.raw) <- c("SMLn_SMLsd", "LGn_SMLsd", "SMLn_LGsd", "LGn_LGsd")
plot(df.raw, pch=16, cex=0.7, asp=1)

Super teeny tiny numbers on the axes, if set asp=1, can barely see any variation in relationships between ranges with different degrees of spread (SMLsd vs LGsd). Ok, try something else….

Using inverted volume UD data

Continue with the UDs from the last section, but now use the volume function before extracting values…

vud <- lapply(ud, getvolumeUD)

#take a look at it

for (i in 1:4) { 
  plot(vud[[i]], xlim=c(-20,25), ylim=c(-35,50),
       main = names(vud)[i])
}

for (i in 1:4) { 
  print(sum(vud[[i]][[1]]))
}

## [1] 120062506
## [1] 119861560
## [1] 109944887
## [1] 115771854

The scale bar doesn’t change (so peaks are ‘equal’ even with different sample sizes (n) and spreads (sd). Notice how the sum of the volume is less for wider spread data (because there are fewer 100 cells), but is not so affected by sample size. This is good… now we are getting somewhere….

To make these values more intuitive (for me with my current goals, higher use = higher value), change all cell values to their inverse.

inverse <- function(x) {
  max(x) - x
}

inv.vud <- vud
for (i in 1:4) {
  inv.vud[[i]]@data$n <- inverse(vud[[i]]@data$n)
}

#now look at the plots again
for (i in 1:4) { 
  plot(inv.vud[[i]], xlim=c(-20,25), ylim=c(-35,50),
       main = names(inv.vud)[i])
}

Now the volume values are inverted - so a higher value means higher use - and the range of values is independent of the spread (sd) of the points, i.e. high peaks in the different ranges are ‘equally high’.

Now extract the values at our random pts again…

rst.vol <- lapply(inv.vud, raster)
vals.vol <- lapply(rst.vol, extract, pts)
df.vol <- as.data.frame(matrix(unlist(vals.vol), 100, 4, byrow=FALSE))
names(df.vol) <- c("SMLn_SMLsd", "LGn_SMLsd", "SMLn_LGsd", "LGn_LGsd")
plot(df.vol, pch=16, cex=0.7, asp=1)

Values extracted from the volume UDs yield easier to interpret plots and visible relationships between values in ranges with different spreads (SMLsd vs LGsd).

Using scaled raw UD values

Using the original UDs’ (before the volume conversion) @data$ud slot, scale the values in this slot for each range so that the values range from 0 to 1…

#duplicate the UD objects, then replace the @data$ud with a rescaled version of itself
ud.rscl <- ud
for (i in 1:4) {
  ud.rscl[[i]]@data$ud <- rescale(ud[[i]]@data$ud, to=c(0,1))
}

#take a look at it...
for (i in 1:4) { 
  plot(ud.rscl[[i]], xlim=c(-20,25), ylim=c(-35,50),
       main = names(ud.rscl)[i])
}

Notice how the scale bar is the same for all ranges. i.e. high peaks in the different ranges are ‘equally high’.

Now extract the values at the random pts and plot them…

rst.rscl <- lapply(ud.rscl, raster)
vals.rscl <- lapply(rst.rscl, extract, pts)
df.rscl <- as.data.frame(matrix(unlist(vals.rscl), 100, 4, byrow=FALSE))
names(df.rscl) <- c("SMLn_SMLsd", "LGn_SMLsd", "SMLn_LGsd", "LGn_LGsd")
plot(df.rscl, pch=16, cex=0.7, asp=1)

Again, these rescaled values are easier to interpret, and the relationships between ranges with different spreads are visible and not artificially flattened.

Zoom in on UD comparisons

Now subset these final data frames to just look at points that are essentially within the first (small sample size, small spread, SMLn_SMLsd) range’s 99%iso (but for the sake of simplicity, I won’t actually calculate the range and put points in there, I’ll just use points for which the values are already calculated, and the SMLn_SMLsd values are >0).

df.ls <- list(raw = subset(df.raw, SMLn_SMLsd > 0), vol = subset(df.vol, SMLn_SMLsd > 0), rscl = subset(df.rscl, SMLn_SMLsd > 0))

par(resetPar())

And look at the relationship between ‘SMLn_SMLsd’ and ‘LGn_SMLsd’

par(mfrow=c(1,3))
for (i in 1:3) { 
  plot(SMLn_SMLsd~LGn_SMLsd, data=df.ls[[i]],
       pch=16, cex=0.7, 
       asp=1, main=names(df.ls)[i])
}

Note here how, because we’re comparing two ranges with similar spread (both SMLsd), it doesn’t really matter too much which values we use, the relationship always looks the same.

And look at the relationship between ‘SMLn_SMLsd’ and ‘SMLn_LGsd’

par(mfrow=c(1,3))
for (i in 1:3) { 
  plot(SMLn_SMLsd~SMLn_LGsd, data=df.ls[[i]],
       pch=16, cex=0.7, 
       asp=1, main=names(df.ls)[i])
}

But now here, comparing a small spread range (SMLsd) to a wide-spread range (LGsd), we see that the raw data looses it’s power to show the relationship because the variance within the wide-spread range is so little since each UD pixel is worth less. The two different scaling methods, however, both produce similar results

And look at the relationship between ‘SMLn_SMLsd’ and ‘LGn_LGsd’

par(mfrow=c(1,3))
for (i in 1:3) { 
  plot(SMLn_SMLsd~LGn_LGsd, data=df.ls[[i]],
       pch=16, cex=0.7, 
       asp=1, main=names(df.ls)[i])
}

Same as the previous one.

Abbreviated Conclusion

I think that using the inverted volume (vol) or the rescaled @data$ud (rscl) methods are both appropriate and both basically the same, in the end. I like the rscl option because it’s fewer steps, and I find the 0 to 1 range to be even more intuitive than a 0 to 100 range.