Friday, 28 July 2017

Finding the Perfect Place to Put New Measurement Stations

One of the major outputs of this project was an "Optimal Network Design" for the placement of new CO2 measurement stations. South Africa already has a long term monitoring station at Cape Point where various "species" are continually monitored, including CO2. This station is located at one of the most beautiful spots in South Africa, and if you ever get the opportunity to visit Cape Point in the Table Mountain National Park, I would highly recommend it. The purpose of the station is to obtain background levels of the different substances it's geared to measure. It's run by the most wonderful bunch of scientists from the South African Weather Service.

Information about Cape Point GAW Station

The job of the new stations would be to provide information about continental CO2. By using information from what we think we know about where fossil fuel emissions are originating and how much is being emitted, and about the amount of CO2 taken up and respired by vegetation, we had the prior (or initial - best guess) estimates of the different South African emission sources we were interested in. We then hooked this up to a Lagrangian particle dispersion model (a fancy name for a physics model which uses various bits of climate information like wind speed and temperature to drive parcels of air around), which could then tell us, for a particular location, what information we could hope to obtain about the major emission sources (and therefore reduce our uncertainty in what we thought we knew about the emissions) if a new station was located in this position.

From a statistical/mathematical point of view, the interesting thing about this particular task were the different optimisation techniques available for this task (like the Genetic Algorithm and Incremental Optimisation), and what effect changing the spatial resolution of the information would have on the results of the optimisation.

Of all the factors we considered, the spatial resolution had the most profound impact on the solution. The spatial resolution determines something called "aggregation error". This error results from the fact that we are using a mean measure of wind speed to represent an entire block in the grid, whose size of course depends on the resolution. If, for example, the particles being carried by the Lagrangian particle dispersion model skirts the boundary of a block, the information it carries away with it will be the average information of that block. If the section that was skirted is very different from the rest of the block (e.g. the particles skirt over a section of busy road, whereas the rest of the block is covered in natural vegetation), then the information we think we're taking away will look like biogenic activity (and our model will try to correct information about biogenic activity), when in reality we're really seeing a lot more fossil fuel information than the model thinks. And that's why the higher the spatial resolution which can be managed, the better for an optimal network design, as this reduces the amount of aggregation error (which is really difficult to determine).  The limitation on spatial resolution implemented in these optimisation activities is the amount of computational resources and time available.

Another factor which impacts on the computation resources needed is the optimisation algorithm used. We considered two - Incremental Optimisation and the Genetic Algorithm. Incremental Optimisation is pretty straightforward and is actually very handy for this type of optimisation task. A list of all the potential sites for where new stations could be located is given. For every single location in the list, the algorithm will calculate the uncertainty reduction (i.e. the information gained) of having a measurement station here. It then selects the site which gives the highest information gain. That's the first station sorted. It then goes through all the remaining stations again, until it finds the one which gives the best information gain (excluding the one we already have in the solution set), and proceeds in this way until there are the number of stations required in the network.

The genetic algorithm is a little more complicated, but has the potential to find much more interesting solutions. Whereas the incremental optimisation only ever considers one station at a time, and in a sequential manner, the genetic algorithm tries to solve for all five (in our case) stations at the same time. It does this by creating a large population of different five station solutions. Each solution is assigned a fitness value (which is based on the information gained). Those with the highest fitness have the largest probability of surviving and reproducing in the next generation of solutions. The subsequent generation of solutions is then made up of the fittest members from the previous generation, and the offspring of those solutions. Some new combinations are created using mutation and cross-over (all genetic terms), whereby parts of the two solutions can randomly cross over, and where an individual station in a solution can randomly mutate into another station from the list. This ensures diversity in the the population of solutions. In our setup, we always ensured that the best solution of the previous generation made it through to the new generation intact. The solution with the best fitness after a set number of iterations is then selected as the final solution.

If you want to read more about the optimal network solution for South Africa, it's available at Cape Town Optimisation Paper ACP.

Some of the results from the South African optimal network design - Nickless et al. 2015
And at the beginning of next month (Sep 2017) I'll get a chance to present some of the results specifically looking at the Genetic Algorithm and how it compares to the Incremental Optimisation under different conditions, at the RSS 2017 conference. That is - the Royal Statistical Society Conference as opposed to the Robotics: Science and Systems Conference (which would have been cool too).

Thursday, 13 March 2014

Al Gore Gives Climate Talk in South Africa

Presenting at a climate training course in Johannesburg, Al Gore spoke about the effects of Climate Change on South Africa. South Africa is expected to get drier, and if only a 1.5 degree increase in temperature occurs, about half of our indigenous plant species are expected to be lost. But with the current trajectory of climate change, this increase in the average temperature is expected to be much higher. Al Gore also asked the question "Why are we still debating who caused climate change?" when we should be spending more time and resources are finding ways to adapt and mitigate climate change, regardless of why it is happening.

Image of Al Gore from Mail & Guardian at the Climate Reality Leadership Corps training in Johannesburg on Thursday the 13th of March

Friday, 6 December 2013

To take the Gautrain or not to take the Gautrain?

Today I found myself using public transport to get to work for the first time. Although I have a 62km road trip to get from my home in Edenvale, near OR Tambo International Airport, to the CSIR in Pretoria, it usually takes me no more than 45 minutes. This is because I travel almost the entire distance on the R21 highway, which is hardly ever slow in the direction I’m going as I’m travelling against traffic.

Gauteng's Gautrain -

But this week things changed. The etolling system came into effect, and the additional cost that this now adds to my journey now makes the Gautrain combined trip to work cheaper than driving my personal vehicle. Under the current petrol price, my monthly budget to get to work is about R1500. But with the addition of toll roads, which leads to an additional cost of R240, my budget now needs to be increased to R1740. With petrol prices set to increase and the deteriorating South African rand, this amount guaranteed to increase over time.

If I were to take the Gautrain, and taking into account the average number of days I’m in the Pretoria office during a month, the cost for the train trip would be R1272, the cost for parking would be R180 and the cost of the bus would be R132, totaling R1584. If I take into account that I’m only going to be travelling one tenth of my normal monthly road travel by personal vehicle, it means that my petrol to get to work will be approximately R150. Which makes the total cost of my Gautrain trip R1734.

So we’ve established that the Gautrain combined trip is marginally cheaper, and may be more cost effective in the future, but what about the paramount reason for using public transport – mitigation of climate change? Does this choice of transport reduce my own person carbon footprint?

I was particularly concerned because when I started my last leg to work, the bus ride, there were only five people on the bus for the entire length of the journey. Is my personal footprint really going to be reduced if I’m using underutilized public transport? So I decided it was necessary to do the calculation and see for myself.

To start with, what is the amount of carbon dioxide emitted if I travel in my personal vehicle to work? For one journey to work, this is a 62km road trip. Using the guidelines for mobile emissions published by Defra, a medium passenger vehicle emits approximately 186.8 g of CO2 per km traveled. Therefore, for my 62km journey, I’m looking at emitting an astronomical 11.58 kg of carbon dioxide, or 0.01158 tons of carbon dioxide.

Now let’s compare that to the combined Gautrain journey to work. To start the journey from home, I have to travel 8.3km by personal vehicle to get to the Rhodesfield Gautrain station. That means I’m going to emit approximately 1.55 kg of carbon dioxide, or 0.00155 tons of carbon dioxide.  

The next leg of the journey is the train trip from Rhodesfield station to Hatfield station. This represents 27 minutes of travel time. The Gautrain, on a typical four car train, has twelve 200 kWatt motors. That’s a combine value of 2400 kWatts or 2.4 MWatts. Taking into account the 27 minutes, or 0.45 hours, travel time, the MWatt hours are 1.08 MWatt hours.

The Gautrain is running off coal powered electricity. The EPA approximates that a coal fired power station emits 1.020129 tons of carbon dioxide per MWatt hour. This means that the total amount of carbon dioxide emitted from my train leg would be 1.101739 tons of carbon dioxide. Now that’s for the entire train. A four car train can hold 321 seated passengers and 138 standing passengers. When I got on the train this morning, it was reasonably full at about 75% seated occupation. This is approximately 241 passengers. The emissions need to be divided by the number of passengers, to get the emissions per passenger. This works out to be 0.00457 tons of carbon dioxide per passenger.

The last leg of my journey is the bus trip. Using the same Defra guidelines, the emissions from a bus are approximately 1034.61 g of carbon dioxide per km traveled. Therefore, for my 7 km bus trip, the total emissions would have been 8.587 kg of carbon dioxide, or 0.008587 tons of carbon dioxide.  Again, this has to be divided by the number of passengers using the bus. There were only five people on the bus, so the total amount of emission per passenger was 0.001717 tons of carbon dioxide.

The total carbon dioxide emission for the entire journey is then 0.007844 tons of carbon dioxide. This is only 67% of the carbon emissions that I would have emitted if I had traveled by my own personal vehicle.

Carbon dioxide emission (tons) depending on passenger composition 

So as far as carbon dioxide emissions go, public transport is a sure winner. The biggest draw back was the amount of time it took to get to my office, versus the normal 45 minute journey if I took my own personal vehicle. It took almost two hours for me to get to my office. The major cause of the long journey time was waiting for the next train (I managed to just miss my connecting train) and waiting for the bus to leave. This probably added at least half an hour onto my journey time. The other additional leg of my journey is the 20 minute 1.5 km walk from the bus stop to my office. Now I can probably do with the walk, but it certainly adds to the length of the journey, and doesn't do much to help my body odour during the course of the day.

So to make the Gautrain journey to work feasible, I need to learn to be productive on the train and on the bus. But it's not too difficult because after all, I wrote this blog while sitting on the train. There are plenty of activities that I would normally have done at the office, which I could do during the journey, such as reading papers, writing out ideas, or checking emails. But it will require a mind set change to be really successful at this.

So for the time being, while my ire about the etolls is still a strong motivation, I will continue to use the Gautrain and make a concerted effort to adapt to a new way of traveling. 

Friday, 15 November 2013

Technical Post: Rasterizing line data from a shape file - but conserving the sum of the field of interest

Hi there,

It's been a while since I've posted. I'm deep in the data analysis and modelling mode now. I've been trying to convert information I have on vehicles which is represented as lines in a shape file to raster data, and had a major success today. I thought I would share this since I could fine no reference to the problem elsewhere. I'm using R to do my GIS manipulations, particularly the packages rgeos, raster and ncdf.

So the problem is: I have line data in a shape file, but I want it as raster data (i.e. represented in pixels). I'm interested in a particular field (in my case the number of vehicles per hour on a road). This data is represented as a line, which has a field for number of vehicles on that road in an hour. There are functions in R which can rasterize data.

So for examples I could use:

rasterize(VehAMPKhrpoll_trans, q, 'Vkmhr', fun = "sum")

where VehAMPKhrpoll is my shape file which I've imported into R and reprojected.

But I found with this function that the sum over all the lines for the field of interest does not equal the sum for this field over all the pixels. This was a major problem for me. The model for the vehicle numbers already has errors, and I didn't want to compound those errors by incorrectly rasterizing my data.

So after a few days of hacking away at R code I managed to come up with this function which uses the extent from a NETCDF file which has the extent I require, and then rasterizes the line data from my shape file, by cropping the lines within a particular pixel, then extracting the lengths of each line in the pixel, then dividing the length of that line by the original length of the line to get the proportion of the line inside the pixel, and then multiplying the field for that line (i.e. number of vehicles) by the proportion, so that the value of the field for that line inside the pixel is proportional to how much of the line is in the pixel. And in this way, if I sum over all my pixels, I obtain the sum over all my lines for that particular field of interest.

The R function I used is:
## filename1 = file name and/or location of NETCDF file
## filename2 = file name and/or location of shape file (just the first part of the file names)
## field = The field of interest in the shape file
## nrow = number of rows in the raster file
## ncol = number of columns in the raster file

rasterize_sum <-function(filename1, filename2, field, nrow, ncol) {
### Function to obtain the sum of lines in a pixel for a particular field, but where the proportional value 
### of the field that is allocated to a line inside the pixel is based on the length of the line which falls
### into the pixel.
  #Read in the NETCDF file
  ncid_old <- open.ncdf( filename1, write=FALSE )
  #Read in the shape file
  VehAMPKhrpoll <- readOGR(dsn = getwd(), layer = filename2)
  VehAMPKhrpoll_trans <- spTransform(VehAMPKhrpoll, CRS("+proj=longlat +ellps=WGS84 +datum=WGS84 +no_defs"))
  #Extract the lon and lat from the NETCDF file
  lon <- ncid_old$dim$lon$vals
  lat <- ncid_old$dim$lat$vals
  ## This part is here because the coordinates in a NETCDF file gives the centre value of each pixel
  lon_min <- min(lon)-(lon[2]-lon[1])*2^(-1)
  lon_max <- max(lon)+(lon[2]-lon[1])*2^(-1)
  lat_min <- min(lat)-(lat[2]-lat[1])*2^(-1)
  lat_max <- max(lat)+(lat[2]-lat[1])*2^(-1)
  space_to_centre_lon <- ((lon_max - lon_min)*ncol^(-1))*2^(-1)
  space_to_centre_lat <- ((lat_max - lat_min)*nrow^(-1))*2^(-1)
  lon1 <- seq((lon_min + space_to_centre_lon),(lon_max - space_to_centre_lon), length = ncol)
  lat1 <- seq((lat_min + space_to_centre_lat),(lat_max - space_to_centre_lat), length = nrow)

  ## Getting the lengths for each line in the original shape file
  BB <- gLength(VehAMPKhrpoll_trans, byid = TRUE)

  ## A box to get the extent of the NETCDF file
  dd <- cbind(x=c(lon_min, lon_min, lon_max, lon_max, lon_min), y=c(lat_min, lat_max, lat_max, lat_min, lat_min)) 
  #Initialising the raster file 
  Vkmhr_RASTER <- raster(ncol = ncol, nrow = nrow)
  extent(Vkmhr_RASTER) <- extent(dd)

  # a loop to go through each pixel in the raster file
  for (i in 1:length(lon1)) {
    for (j in 1:length(lat1)) {
      # Obtaining the extent for a pixel
      xmin <- lon1[i] - (lon1[2]-lon1[1])/2 ;  xmax <- lon1[i] + (lon1[2]-lon1[1])/2; ymin <- lat1[j]- (lat1[2]-lat1[1])/2; ymax <- lat1[j] + (lat1[2]-lat1[1])/2 
      aa <- cbind(x=c(xmin, xmin, xmax, xmax, xmin), y=c(ymin, ymax, ymax, ymin, ymin)) 
      # cropping the SpatialLinesDataFrame by the current pixel
      VEH1 <- crop(VehAMPKhrpoll_trans, extent(aa))
      # Initialising a vector which will contain the proportions of each line in the pixel
      CC <- NULL
      # Initialising a vector which will contain the proportional values of the field of interest
      VEHprop <- NULL
      #Filling the raster file, from bottom row to top row (lowest lats first)
      #Row 1 of the raster file will be for the maximum latitude value
      if (is.null(VEH1)) {
        values(Vkmhr_RASTER)[(nrow-j)*ncol + i] <- NA 
      } else {
        AA <- gLength(VEH1, byid = TRUE)
        for (k in 1:length(names(AA))) {
          CC[k] <- AA[names(AA)==names(AA)[k]]/BB[names(BB)==names(AA)[k]]
          ##BAD BAD hard coded the field value here, need it to be input in function
          VEHprop[k] <- VEH1@data[,field][k]*CC[k]
        values(Vkmhr_RASTER)[(nrow-j)*ncol + i] <- sum(VEHprop, na.rm = TRUE)

filename1 <- ""  ## The NETCDF file
filename2 <- "VehAMPKhrpoll"   ## The shape file

Vkmhr_RASTER1 <- rasterize_sum(filename1, filename2, "Vkmhr", 25, 25) ## Be sure to put the field name
                                                                      ## in inverted commas

Image of the rasterized shape file lines data for vehicle kilometre numbers on Western Cape roads

Hopefully this helps someone out there!

Thursday, 26 September 2013

IPCC to Issue Fifth Assessment Report Friday 27th of Sep 2013

Today the IPCC will be issuing the Fifth Assessment Report on climate change. There promises to be some ominous warnings on the threat of our continued addiction to fossil fuels, but they are going to provide suggestions on the best possible ways to mitigate climate change. There are fears though that the climate deniers are going to put up every effort to dilute the seriousness of the findings and to do their usual doubt casting. Guardian Report

The report will be publicly available at

Neville Smith explains the process behind the report:

Monday, 23 September 2013

On the Side: Please help Save the Rhino

It goes without saying that the natural environment plays an important role in mitigating climate change and keeping Earth as the hospitable place that it is for us humans. It's hard to know for sure the effect that one species can have on the environment, but the rapid removal of one species from the ecosystem can have complex and unpredictable effects on its ecosystem counterparts.

The Rhino (Ceratotherium species) is a keystone species - this means that the activities of the Rhino and the mere presence of the Rhino in the ecosystem is important for the survival of other species and the balance of the ecosystem. If the Rhino is suddenly removed, it will have far reaching consequences for other species in the environment, and could disrupt the delicate balance of the ecosystems where these animals exist. As an example, White Rhinos are responsible for maintaining "grazing lawns", which is beneficial to grazers like wildebees, zebra, and impala. and therefore they also assist in maintaining the balance between trees and grasses which is what makes a savannah a savannah. 

This year alone, 688 Rhinos have already been poached in South Africa. Please lend your voice, or just a tweet, to WWF South Africa, to support their initiative to raise one million tweets in support of saving the Rhino in South Africa. Tweet #iam4rhinos!