Skip to main content

Normalized difference vegetation index (NDVI)

This dataset is a compilation of openly available remote-sensed NDVI (normalized difference vegetation index) data for BC. NDVI is a measure of greenness calculated from spectrometric data at two specific bands (red and near-infrared) and regularly used as a measure of ecosystem productivity.

Formally, NDVI is defined as the ratio $(\eta_{\rm NIR}-\eta_{\rm red})/ (\eta_{\rm NIR}+\eta_{\rm red})$, where $\eta_{\rm NIR}$ and $\eta_{\rm red}$ are the values of the reflectance in the near-infrared and in the red bands, respectively. NDVI always falls between $-1$ and $+1$ and can highlight the following features:


Feature Specs NDVI range
Dense forest canopy, e.g. in the Amazon dark in red and bright in NIR close to +1
Dense vegetation dark in red and bright in NIR 0.6 – 0.8
Sparse vegetation (shrub and grassland) brighter in NIR 0.2 – 0.3
Dry land with nothing growing almost equal reflectance 0 – 0.1
Snow, glaciers, clouds low reflectance in red, even lower NIR reflectance -0.5 – 0
Open water low reflectance in red, almost no NIR reflectance close to -1

You can find more about the NDVI on Wikipedia and in 5 Things To Know About NDVI.

Contest dataset

What makes the current dataset unique is that – while the mean NDVI has been calculated since the 1970s – this dataset is one of the first attempts to map the variance in NDVI over space and time. Both the mean NDVI and its variance provided here were produced by a BC-scale hierarchical GAM (generalized additive model) over a multi-GB raw dataset so that – for a given location and time – the mean and the variance are informed by data close in time or space.

The Contest dataset contains 5,935,736 points and 53 timesteps. The points are not connected, i.e. they do not form a smooth surface. The 53 time steps are uniformly spread throughout 2022 from Jan-01 (first step) to Dec-31 (last step).

Data are provided in two formats: VTK and compressed CSV. Each format is self-contained – use one of them depending on which data description you like best (no need to use both).

  1. In the VTK files each point is placed in the 3D space using its Cartesian coordinates ($x$, $y$, $z$). On top of each point, we store three variables: the mean NDVI ($\mu$), its variance ($\sigma_2$), and elevation in km.

  2. In the compressed CSV format, each row corresponds to a data point with longitude, latitude, elevation in km, two horizontal coordinates ($x_{\rm alb}$, $y_{\rm alb}$) in the Albers equal-area conic projection, the mean NDVI ($\mu$), and its variance ($\sigma_2$).

Downloading the data

To start playing with this dataset, you can download only the first time step, but for a production-quality animation you will need all 53 time steps.

VTK format

File Size MD5 checksum
First time step 138M df82fda21a542d64255fde9d31856051
All 53 time steps (gzipped compressed file) 7.1G 91d3daa7cb75dd124981c80ee3cd74b3

Compressed CSV format

File Size MD5 checksum
First time step 139M 172176097e1c66cf415b8c22da6dbe94
All 53 time steps (gzipped compressed file) 7.1G c7b67777bad4d2b45345db753d4e0963

After you download the files, you can check against the provided md5 checksum to see if the download succeeded.

Loading the data in ParaView

Each data format can be loaded easily into ParaView. When loading from CSV, you have to pass points through the Table To Points filter.

Please note that when you load points, you will normally not see them (they are infinitely small points!), but you can render data by:

  • using the Point Gaussian representation, or
  • using Glyphs, or
  • triangulating or projecting data onto a mesh (uniform or not).

You can easily manipulate data inside ParaView with the Programmable Filter. To give you an example, assuming you have read data from the compressed CSV format, a new filter with Output Type = Same as Input and the following Python code inside the filter

import numpy as np
npoints = inputs[0].Points.shape[0]
lon = np.radians(inputs[0].Points[:,0])
lat = np.radians(inputs[0].Points[:,1])
points = vtk.vtkPoints()
radius = 6371
for i in range(npoints):
    r = radius + inputs[0].Points[i,2]
    x = r * np.cos(lon[i]) * np.cos(lat[i])
    y = r * np.sin(lon[i]) * np.cos(lat[i])
    z = r * np.sin(lat[i])
    points.InsertNextPoint(x,y,z)

output.SetPoints(points)
output.PointData.append(inputs[0].PointData['mu'], 'mu')
output.PointData.append(inputs[0].PointData['sigma2'], 'sigma2')

will create a new set of points that are mapped into the 3D space using their longitude, latitude, and elevation. To learn more about ParaView’s Programmable Filter, watch our January 2021 webinar.

NDVI colour map

If you like, you can use the blue-to-brown-to-green NDVI colour map covering the values from $-1$ to $+1$.

Loading the data in Python

To read VTK files in Python, you can use the official VTK Python library, as well as a number of 3rd-party libraries, e.g. meshio.

The compressed CSV files can be read directly with Pandas:

import pandas as pd
data = pd.read_csv('step000.csv.gz')
print(data.shape)
print(data.columns)

and then exported to numpy or xarray.

Reference

N. Pettorelli, S. Ryan, T. Mueller, N. Bunnefeld, B. Jędrzejewska, M. Lima, K. Kausrud (2011): The Normalized Difference Vegetation Index (NDVI): unforeseen successes in animal ecology. Climate Research 46, 15-27.

Acknowledgments

Data courtesy of Michael Noonan and Stefano Mezzini from the University of British Columbia at Okanagan.