Monday, November 20, 2017

QGIS Heatmap Using Kernel Density Estimation Explained

Heatmap is a nice visualization method to display event density or occurrence. Heatmap is also used in clustering points where more points in an area will have higher value compare to less point in the same area. Therefore, with a heatmap we can see a concentration of event's occurrence. For example criminal events in a city can be visualized in a heatmap, so one knows how the crime events spread around the city. Such map is commonly called a crime map.

To create a heatmap is quite easy and straight forward. There are many software and tools that can be  used to produce a heatmap like QGIS, ArcGIS, Crimestat, Google table fusion, etc. We just need to upload point dataset, setting some parameters and the result will come up. But the purpose of this post is not just to produce a heatmap with some clicks. But more than that to explore and trying to get more understanding what is going on behind the screen for data and parameters that have been specified. In another term, how the algorithm  is working in producing a heatmap.

QGIS is an open source GIS software that can be used to produce a heatmap from a set of data point with Heatmap Plugin. The plugin is using Kernel Density Estimation algorithm for creating a heatmap. Because of that I will discuss how this algorithm(Kernel Density Estimation) is applied to process an input point dataset into a heatmap.

Creating A Heatmap in QGIS

To get an insight for what we will discuss, firstly let's create a heatmap of theft from vehicle incident across Vancouver, Canada.

1. Download crime data of Vancouver city 2016 from Vancouver city data catalogue. For convenient I provided the data here. (Actually the data consist of several crime events such as mischief, theft from vehicle, break and enter residential and other theft). I just selected theft from vehicle crime.
2.  Add the data into QGIS map window as in figure 1.

Theft from vehicle accident in Vancouver city 2016
Figure 1. Theft from vehicle accident in Vancouver city 2016
3. Select Heatmap plugin. It can be found in menu Raster >> Heatmap >> Heatmap...(See figure 2)

QGIS Heatmap plugin
Figure 2. Heatmap plugin
4. The heatmap plugin window will appear as in figure 3. Make sure the Input point layer is the theft_fr_vehicle. Give a name for Output raster. I named it theft_fr_vehicle_heatmap. Then set Radius 500. Cliked Advanced option, set Cell size X 20 and Cell size Y 20. Leave the Kernel shape as Quartic(biweight) and the other parameter as default.

QGIS Heatmap Plugin window
Figure 3. Heatmap Plugin window
5. After everything is done. Push OK button and the heatmap will appear as in figure 4.

Theft from vehicle hetmap
Figure 5. Theft from vehicle heatmap
6. Hmm...it looks ugly, doesn't it? So let's make it more colorful. Right click the heatmap layer. In Layer Properties window as in figure 6, choose Style. Change Render type to Singleband pseudo color. Then choose your favorite color in the Color option.

Change heatmap color style
Figure 6. Change heatmap color style
 8. Next goes to figure 7. Still in Layer Properties, here we change the blending mode. Choose Darken (You can experiment with other blending option). Click apply to see the result immediately in the thumbnail preview. Finally push OK button when done. The result of heatmap with the new style can be seen in the figure 8. 

Change Heatmap color blend
Figure 7. Change Heatmap color blend
Theft from vehicle heatmap with singleband pseudo color
Figure 8. Theft from vehicle heatmap with singleband pseudocolor
It's done. The theft from vehicle heatmap of Vancouver city shows us how the incidents are spread in the city. In the north of the city can be seen the color is darker which indicates high density value. So take care when you park a car there :).

Kernel Density Estimation Algorithm

As I mentioned earlier. The heatmap was created with Kernel Density Estimation algorithm. Now let's explore how this algorithm is working, so we can tune related parameters to get a more meaningful heatmap cause we understand how the result comes up.

Kernel Shape

Estimation is predicting an unknown value at a location from reference points. Actually in predicting the unknown value we interpolate the value from known points value. Therefore we also called KDE as Kernel Density Interpolation. In estimation a point value, KDE uses a Probability Density Function (PDF). What is PDF and how could it be used to estimate an unknown value? To answer this question, I will give an illustration. Assume there is a road segment and I want to know how long it is. So I measure it with a tape, first measurement (forward)  I got 100.251 m.Then backward I got 100.310 m? Hmm...why they are different. I am curious and take the third measurement, got the figure 100.279 m. How could this happen? The answer is Error. In every measurement there is an error. Nobody knows a real measurement value except God. But what could I do to get a value closest to the real value? Take more measurement!!! More measurement, more number of  values that close the real value, thus it has more probability to get the real value which is my expected value. But I don't want to waste my time measuring the road 1000 times.......

The same problem already happened hundred years ago. In the 17th century, Galileo made an astronomical observation and noted that the errors from the observation were symmetric and small errors occurred more frequently than large errors. This led to several hypothesized of distribution errors. Later in 1809 Gauss developed the distribution normal function and showed that errors were fitted well by this distribution. (Read the history here). The function of normal distribution, which is also called Gauss distribution can be seen in equation 1. If this function were plotted with some defined parameters, we can see how it looks like in figure 9.

Distribution Normal Equation
Equation 1. Distribution Normal function
Figure 9. Normal distribution curve
The normal distribution curve looks like a bell, and simply in KDE we call this as Kernel Shape. If we look at heatmap plugin, there are some Kernel shapes available, there are: Quartic, Triangular, Uniform and Epanechnikov.  But there are more kernel shapes available like Cosine, Gaussian, Tricube, etc. Some of these shapes can be seen in figure 10.
Kernel Shapes
Figure 10. Kernel Shapes (Source: https://en.wikipedia.org/)
Next question. How those shapes are used in estimation of unknown point? As we already knew, the density value is defined by a Kernel shape with a function . To apply it for a heatmap, the function must be transformed into a "geospatial sense" form, because it will be used to predict an unknown value at a point from a point with known value. Here the distance between known to unknown point is a parameter which must be involved in the function. But where we will put it in the function? For example, in the normal distribution function in equation 1, we can see there is (𝚡-𝜇) term. This term is the distance between a value (x) with the expected value (𝜇). This term can be translated as the distance between known to unknown point (d). Then 𝜎 is standard deviation. Standard deviation in statistics measures variation of dataset values. Actually it is the average distance of x to 𝜇. So this term in KDE is translated into bandwidth (h). For Gaussian, h remains as standard deviation but for other kernels h is radius. Hence the equation of KDE with Gaussian Kernel shape has the form as in equation 2, with the visual illustration can be seen in figure 11. If we standardize the h parameter into 1 (100%), the equation 2 can be simplified into equation 3.
Gaussian KDE function
Equation 2. Gaussian KDE function
Equation 3. Standardize Gaussian KDE Function
Gaussian Kernel Shape
Figure 11. Gaussian Kernel Shape
From figure 11 we can observe that the density value of unknown point will decrease smoothly following the Gaussian Probability Density Function (PDF). It means the unknown value will deviate largely as it moves away from a known point. Other Kernel Shape functions are summarized in the following table.

KernelFunction Shape
Gaussian


Quartic

Epanechnikov

Triangular

Triweight

Uniform

 

How to Choose A "Right" Kernel Shape

This is the hardest part in this post. How to choose a right kernel shape? Honestly I don't know the exact answer. But after some research I found a hint in determining a kernel shape. Based on Crimestat III workbook, there are two questions to answer in deciding a kernel shape. First. How large an area will have the effect from an incidents. Second. How much of the effect will remain at original location and how much will be dispersed throughout the bandwidth interval. For example if you think the effect of accident will decrease in linear pattern from an original location, then Triangular Kernel would be a good option. Otherwise if the accident will have the same effect throughout a specified radius, the Uniform kernel would be a good option. So understanding the two questions above for a case will give a better decision in choosing a kernel shape and related parameters.

The following table gives kernel shape option for some types of crime. Although it is for crime mapping, hopefully it could give the idea in defining a kernel shape for other cases. The table was taken from Crimestat III workbook.

How to choose a kernel shape for estimation

I hope this post could give you an understanding what is a Kernel Density Estimation and its application in creating a heatmap. In the next post I will try to give a numerical example how this algorithm is used to estimate an unknown value from a reference point. So you will get a complete understanding from theory and its implementation.

Wednesday, October 18, 2017

How to Clean Topology Error in QGIS

In the previous tutorial, had been discussed how to check topology error in QGIS with Topology Checker plugin. Some topology errors such as dangle (overshoot, undershoot), overlap and gaps are successfully detected using the topology checker plugin with defining rules in the topology rule settings. There is one question still remain, how to clean the topology error?

The best way to clean topology error is by manual editing, but it's tiring and time consuming. Fortunately, there are some tools that can be used to clean topology error in QGIS such as geometry checker and Grass v.clean tool. But must be kept in mind, using the tools not always working perfectly for any cases, but at least it could save time and speed up the process in cleaning topology error.

Clean Topology Error in Polyline Dataset

Firstly in this tutorial let's see how to clean topology error for dangles (undershoot and overshoot). To clean this error we use v.clean tool. The tool can be found in the toolbox. If the toolbox menu does not show up. At the top menu click Processing >> Toolbox as in figure 1.

QGIS Toolbox Menu
Figure 1. Toolbox Menu
Then the processing toolbox will appear. Type clean, the v.clean tool will appear under GRASS commands tool as in figure 2. Actually there are two tools appear v.clean and v.clean.advanced. What is the difference? The difference is: v.clean allow us to do one process in one time, while v.clean.advanced allow us to do more.
Finding v.clean tool from Processing Toolbox
Figure 2. Finding v.clean tool from Processing Toolbox
In cleaning dangle errors, we use v.clean.advanced tool, because  we will apply several steps, such as break, snap and  remove dangle. Follow these steps to clean the error.

1. Open v.clean.advanced. The window of v.clean.advanced will appear as in figure 3.
2. Select Road layer for Layer to clean.
3. Then we apply three cleaning process: break, snap and remove dangle. For that, type below Cleaning tools: break,snap,rmdangle.
4. In the Threshold, fill with 0,5,5. Those are the threshold for each process. In this case I defined 0 for break. For break it is just a dummy value, because it does not need a threshold value. For snap and remove dangle the threshold value is 5. Threshold means the process will take into account if it is equal or less than the specified threshold value. Otherwise it will ignore it. That's why specifying an appropriate threshold value is very important. The threshold unit is in map unit.
5. The other parameters just leave as default. But if you want to explore more please free to change them and observe the result.
6. Push the Run button to start cleaning process.

v.clean.advanced tool window
Figure 3. v.clean.advanced tool window
Now let's see the result. I captured some parts of the result and compare them before and after the cleaning to see the difference. In figure 4, can be seen there is a gap between two road segments that can be considered as undershoot error. The error was fixed as in left picture. The two segments were joined together.
Road gap (undershoot) and the clean result
Figure 4. Road gap (undershoot) and the clean result
In figure 5 can be seen also undershoot error where a road segment does not connect to another road segment. The error was fixed as in the right picture. Both the error in figure 4 and 5 were fixed with snap process.
Undershoot and the cleaning result
Figure 5. Undershoot error and the cleaning result


Figure 6 shows the overshoot error, where a segment road passes another segment with a length 5 meter or less (because we set the threshold 5, remember :)). The result of cleaning is shown in the right picture, which the overpass segment was cut with break process then the remain part was removed with remove dangle. 

Overshoot error and the cleaning result
Figure 6. Overshoot error and the cleaning result

Clean Topology Error in Polygon Data set

We have seen how to clean topology error for road data. What about topology error in polygon dataset such as overlap, gap and sliver polygon. How to clean these error?  To clean the error we also use the v.clean.advanced tool. For this tutorial I used a building dataset with overlap and gap error as in figure7.

Building dataset with topology error
Figure 7. Building dataset with topology error
 Let's clean the errors with the following steps.
1. Open v.clean.advanced.tool.
2. To clean the error, there are three processes are needed to be done: break polygon, remove duplicate and remove area. Therefore, under Cleaning tools type: bpol, rmdupl,rmarea. See figure 8.
3. In the third step we must define threshold value. For bpol dan rmdupl we can just give a dummy value 0, but  for remove area, I gave it 3. It means an area less or equal with 3 (map unit square) will be removed. Because of that it's very important to know about polygon area in the data. To do this we can make a field containing all polygon area. Sort the field and observe small area to help us in defining threshold value.
4. Leave the other parameter as default. You could change them, and observe the result.
5. Click Run button to start cleaning.

v.clean.advanced tool to clean polygon topology error
Figure 8. v.clean.advanced tool to clean polygon topology error
The result can be seen in figure 9. The errors were removed. Checked it with topology error again, no errors were found.

Building topology error cleaned
Figure 9. Building topology error cleaned

Clean Topology Error with Geometry Checker Plugin

Last part of this tutorial we will use Geometry Checker plugin to clean topology error. I would say, this tool more intuitive than v.clean and also can be used to detect geometry validity. Moreover we can use this tool to detect sliver polygon. To use Geometry Checker Plugin. Ensure to enable this plugin in the Plugin window as in figure 10.

Figure 10. Geometry Checker Plugin
Then the Geometry Checker plugin can be accessed via menu Vector >> Geometry Tools >> Check Geometries. See figure 11.

Figure 11. Check Geometries Tool
Using the same building dataset, I cleaned the topology error with Geometry checker plugin with the following steps:
1. Open Check Geometries tool. The window will appear as in figure 12.
2. In the window can be seen there are a lot of processes could be done such as to check geometry validity, properties, condition and of course topology.
3. We just deal with topology. So check No sliver polygons to not allow sliver polygon exist with maximum area 1 in map unit square.
4. Under topology checks part. Select Check for overlaps smaller than (map unit sqr) and Check for gaps smaller than (map unit sqr). I gave the threshold value for both 5. It means any area with overlap or gaps smaller than 5 square meter will be marked as error.
5. Save the result as a new file in create a new layer option.
6. Click Run to start processing.   
Check Geometries window
Figure 12. Check Geometries window


7. The result can be seen in the figure 13. We can see there are 3 errors were found, 2 overlaps and a gap. No sliver polygon error were found. Click the error and the error position will be marked with circular red marker on the map.

Check geometries topology error
Figure 13. Check geometries topology error
8. Before cleaning the error, we can set how the error will be fixed with Error resolution settings. Click the button and the window shown up as in figure 14. In the respective window we can see and select how the error will be fixed. Choose what appropriate as your purpose.
Figure 14. Set Error Resolutions window
9. To fix errors, select an error or all errors at once and then click button Fix selected errors using default resolution. A fixed error will be highlighted in green color otherwise in red if it can not be fixed. In this case all errors can be fixed. See figure 15.
Topology errors are fixed
Figure 15. Topology errors are fixed
 10. Now let's see how the result look like. It can be seen in figure 16. If you observe the result is different with the result of topology cleaning with v.clean tool. The difference was found in the overlap feature. 
Different result of topology cleaning using v.clean and Check Geometries
Figure 16. Different result of topology cleaning using v.clean and Check Geometries
Overall, this tutorial has shown that topology errors can be cleaned using v.clean and Check Geometries tool in QGIS. The result from each tool could be different. And as mentioned earlier, using the tools not always working perfectly for any cases, but at least it could save time and speed up the process in cleaning topology error.  

Sunday, October 1, 2017

How to Check Topology Error in QGIS

Topology Error

Topology defines mathematical relationship between features in geographic region. It describes a spatial connection between objects like overlap, touches, cross, intersect, within, etc. Topology is very important to check a data free from errors. There are some errors that can be found in a dataset such as overlap, gap, sliver polygon, undershoot and overshoot.

Overlap
Overlap is a topology error where a part or whole part of a feature occupies the same position with another feature. This become an error, because it is impossible for a feature to have same position with other. For example a parcel, it is not possible some part of it, is over another parcel.

Gap
Gap is a topology error which is opposite to overlap, where two adjacent features which share a common boundary, contains a blank area, which is no data. It is impossible for a parcel again which is neighbor with another parcel has no data area (blank area) around their common boundary.

Sliver polygon
Another topology error which can be found in polygon feature is sliver polygon. Sliver polygon happen where there is very small polygon along the common boundary. Or we could say when a gap is filled with polygon, then it becomes sliver polygon with a very small area. In figure 1 can be seen the illustration of these topology errors.

Overlap, gap and sliver polygon topology error
Figure 1. Overlap, gap and sliver polygon topology error
Additional topology error is undershoot and overshoot which are grouped as dangle.

Undershoot
Undershoot is a topology error that can be found in line dataset which is a line is not touching another line segment. For example a road which is represented by a line must be connected with other line at road intersection.

Overshoot
Opposites to undershoot, overshoot is a topology error when a line touch another line but it passes the line with a very short length. So when we zooming in to the intersection, the line is not stop at the boundary, but it passes it a little bit. In figure 2 can be seen the illustration of overshoot and undershoot topology error.
Overshoot and Undershoot topology error
Figure 2. Overshoot and Undershoot topology error

QGIS Topology Checker Plugin

Checking hundreds or even thousands features with topological error is tiring and time consuming. Because of that, we need a tool that can help us to check topology error. Topology Checker is a plugin in QGIS that can be used to check topology error. To use this plugin, make sure the topology checker plugin already installed. To install a plugin, click menu Plugins >> Manage and install plugins.. Then plugins window will appear as in figure 3. Type topology in search menu and topology checker plugin will be shown. Check the topology checker plugin and then push Install Plugin button.

Installing Topology Checker Plugin
Figure 3. Installing Topology Checker Plugin

 If the plugin already installed, then lets see how to check topology error.

Checking Undershoot and Overshoot (Dangle) Topology Error

Firstly we will check topology error for line feature. For this case I used road network dataset.
1. Add the road data to QGIS.
2. Open topology checker. You can use the topology checker's icon or access it from menu vector as in figure 4.

Topology Checker Menu
Figure 4. Topology Checker Menu
3. Topology checker panel will appear. Clik configure button as in figure 5. Topoloy Rule Settings window will appear.

Topology Checker Panel
Figure 5. Topology Checker Panel
4. Now come to very important step, defining rules. For identify undeshoot and overshoot error, the rule which applied is must not have dangles. Select Road layer then select the rule in the drop down list next to it. When finish select Add Rule Button. The layer and selected rule will be added to the list as in figure 6. Then click OK button.

Topology Rule Settings
Figure 6. Topology Rule Settings
5. After defining rule, in topology checker panel click button Validate All or Validate Extent. Validate All will check the topology error for the defined rule all over the data. On the other hand Validate Extent will check error in the extent of map view. If you have a big number of features (thousands) it will be better to validate in extent to speed up the process. The error will be shown in topology checker panel list and also will be marked on the map view with red marker as in figure 7.

Topology errors are marked with red marker
Figure 7. Topology errors are marked with red marker
6. Zoom in to the area with red marker to see the error clearly. You will see the error as in figure 8.

Dangle Error (Undershoot)
Figure 8. Dangle Error (Undershoot)

Checking Polygon Topology Error

We have checked the error for line feature. Now lets move to polygon feature. We will check for topology error such as overlap and gap.
1. Add polygon layer to QGIS. For this tutorial I used building polygon.
2. Click Topology Checker plugin. In the panel click Configure. The Topology Rule Settings window will appear as in figure 6. Choose the layer and then select rule must not overlap and must not have gaps. We have to add each rule one by one. See figure 9.
Topology Rule Settings for not overlap and gaps
Figure 9. Topology Rule Settings for not overlap and gaps
 
3. Click Validate, and then the error will appear and marked with red color as in figure 10.

Topology errors overlap and gaps
Figure 10. Topology errors overlap and gaps are marked with red color

Spatial Integrity

Last part of this tutorial we will talk about spatial integrity. What is spatial integrity? Spatial integrity is to make sure a data complies with the real condition as it is in the real world regarding with another data or features. To implement this spatial integrity we set the rule such as must contain, within, must not overlap with, must intersect with, etc.  For example in the real world, a road can not be overlapped with building. If there is a road which is overlapped with building, it can not be accepted because it is not comply with the real condition.

For this tutorial I used two data, road network and building with the rule must not overlap with. For this case both layers must be in polygon.
1. Add Road network polygon and Building to QGIS.
2. In Topology Rule Settings window, select Road for the first layer, then select rule must not overlap with and select Building for the second layer. See figure 11.
Topology Rule Settings for must not overlap with
Figure 11.Topology Rule Settings for must not overlap with
3. Click Validate. The error which is Road overlap with Building will be marked in red color as in figure 12.

Figure 12. Road which is overlapped with buildings are marked with red color
4. Zoom in to the red area in the map window to see the error clearly as in figure 13.
Figure 13. Zoom in to the overlap area
That's all the way how to check topology error in QGIS. Then how we can fix the error? The best way to fix the error is by manual editing, but it is not smart enough, tiring and time consuming. We need a better way to fix the topology error. I will show you in this tutorial.