ArcGIS Pro 2.1 brings a blizzard of new features

ArcGIS Pro 2.1 brings a blizzard of new features

The latest update for ArcGIS Pro, Esri’s flagship desktop application has arrived. With so many new features, even the most observant eye might miss some of the exciting advances that have been made in this most recent update. This blog will take you through my selection of the new tool developments and features in ArcGIS Pro 2.1.

Read More

Density analysis by another name

I was recently writing a geoprocessing model to calculate the density of a point feature class from which all the areas above a specific threshold could be selected. I had been thinking about writing this for a while and had in mind the process I would use, but I soon discovered that I didn’t understand how the Density tools in Spatial Analyst work and would need to find an alternative.

The Density toolset in the Esri Spatial Analyst extension contains three tools: Kernel Density, Line Density and Point Density. I had thought that I would run the Point Density tool and then use the Raster Calculator or even the Contour tool (which would have taken me straight to vector format) to select out the areas above my threshold.

But I hadn’t taken into account the method by which the Point (and Line) Density tool calculates the output cell values. The ArcGIS Desktop 10 help says:

By calculating density, you are in a sense spreading the values (of the input) out over a surface. The magnitude at each sample location (line or point) is distributed throughout the study area, and a density value is calculated for each cell in the output raster.

It was the last part that I hadn’t thought about: a ‘density value’ is calculated for each cell in the output raster. What unit would the density value be in?

Let me give an example. You are analysing population density and want to identify all the areas where the density is greater than 500 people per square kilometre. You open the Point Density tool, choose the neighbourhood, and set the units to square kilometres.

You’ve set the units to square kilometres so the values of the cells in the output raster are ‘number of people per square kilometre’. Right? Well, sort of.

The Point Density tool totals the number of points that fall within a neighbourhood, applies your population weighting if you have chosen one, and then divides this total by the area of the neighbourhood. It then applies a scaling factor according to the area units you selected.

An example is a farm house with a population of 4 and no other houses nearby. The Point Density tool will total the number of points within the neighbourhood (1 farm house), weight it by the population field (4 people) and divide it by the area of the neighbourhood (in the example above a 250m circle, or 196,349.5m2). As the units were set to square kilometres the resulting figure (0.00002037) is multiplied by 1,000,000 (the number of square metres in a square kilometre) giving a cell value of 20.37. But what does that mean?

My head says that logically there are 4 people living in the area, but my density raster gives me a value of 20.37. Now apply this to a city, or a country, and how do I now select out areas above my threshold of 500 people per square kilometre? This was especially confusing as I wasn’t modelling population, I was modelling energy use. I wanted to identify areas where the demand for energy was high. The output units were simply not what I was expecting.

So I went back to the drawing board, or in this case the Desktop online help. I eventually came across the Neighbourhood toolset, containing six tools which I had never used before: Block Statistics, Filter, Focal Flow, Focal Statistics, Line Statistics and Point Statistics.

It was the last one that caught my eye. The help says that the Point Statistics tool calculates statistics on point features that fall in the neighbourhood around each output raster cell. The statistics available include mean, majority, maximum, minimum, standard deviation and most importantly for me, sum. What if I summed the energy demand in a neighbourhood? If I know the area of the neighbourhood is one square kilometre, then I know the output cell values are ‘energy demand per square kilometre’.

I soon realised that this was what I wanted. I was using the energy data to find locations where local demand was high enough to support a Combined Heat and Power (CHP) plant. CHP plants create electricity from fuel and circulate the heat produced through a network of pipes to provide hot water for radiators and taps. To make the most of this efficient process there has to be sufficient local demand for hot water, preferably as close as possible to the source.

From here on in it was easy. I decided to use the Reclassify tool to classify areas above the energy demand threshold as 1 and areas below as NoData, and then the Raster to Polygon tool to convert the areas to vector. This gave me polygons within which the density (or sum!) of energy demand met my threshold and would therefore support a CHP plant.

So, in conclusion, if you are doing density analysis, take a look at the Neighbourhood toolset to see if it could help you. Although not advertised in the Desktop help using the word density, I think it has useful parallels.

Before you start out, think carefully about what it is you want to do. I had thought that I was doing traditional density analysis, but knowing that ultimately I wanted to know the sum of something within a defined area might of helped me get there quicker.

Finally, don’t underestimate the ArcGIS help. After 8 years of specialising in Desktop I still use the help most weeks and always learn something new.

Debugging GP Services

So here’s the deal: you’ve spent ages perfecting a GP model, incorporating loads of different tools and adding custom touches in python. In fact, maybe you’ve even gone the whole hog and scripted your GP tool from scratch – and it works like a charm in ArcMap. Bingo.

You publish it to ArcGIS Server, you pull up the ArcGIS Rest Services Directory and navigate through to the GP Server Rest Endpoint. You painstakingly harvest some valid JSON to define your input parameters, and patiently wade through the ‘Invalid value for parameter X’ messages. You hit ‘Submit Job’…..and it goes kaput. Nothing, zilch, nada, zippo.

Except of course:

 

Where do you go now? Why does a perfectly valid GP tool break the minute you turn it into a GP Service? Well, sometimes these things happen.

First things first, I extracted the whole thing to one big Python script, and I began to pepper it with arcpy.AddMessage() calls, mainly logging vital information like, err,  arcpy.AddMessage(“I am here”) and arcpy.AddMessage(“Now I’m here”). I saved the script, refreshed the model in ArcMap, saved the MXD, refreshed the map/GP services, cleared the Rest cache, (did a handstand and shut my eyes), and…

 

Hurrah! No, wait…

So I took a different approach. And I’m delighted to say that after a lot of huffing, puffing and flapping; a few cups of tea, the odd lunch break; maybe a quick chat about the weather with someone passing by, a few more cups of tea, a comfort break; some cheese-on-toast, a quick distraction to fix a LocatorHub web service, a stroll into town, and back again….

 

Um. Yeah. Cheese on toast never fixed anything.

So I bit the bullet and decided to recreate my script, line by line, to pin down the problem. With a deep breath, I started out with just a simple opener:

 

Failed

Soon I was pegged back to this:

 

Failed

What is going on?

My colleague Simon suggested I take a look at some log files in C:\Program Files (x86)\ArcGIS\Server10.0\server\user\log\. That was a good start. In one of them I came across the following (emphasis added):

<Msg time='2011-08-12T15:07:22' type='ERROR' code='20010' target='pipelines2.GPServerSync' methodName='GPServerSync.CheckMessages' machine='TAYLOR' process='3156' thread='6244'>Executing (GetIntersections2Model): Model1 Start Time: Fri Aug 12 15:07:22 2011 Executing (GetIntersections2): GetIntersections2 Start Time: Fri Aug 12 15:07:22 2011 ERROR 000576: Script associated with this tool does not exist. Failed to execute (GetIntersections2). Failed at Fri Aug 12 15:07:22 2011 (Elapsed Time: 0.00 seconds) Failed to execute (GetIntersections2Model). Failed at Fri Aug 12 15:07:22 2011 (Elapsed Time: 0.00 seconds)</Msg>

Then it dawned on me. While I had been carefully storing my MXDs and data in a location visible to the ArcGIS SOC process (critical for ArcGIS Server permissions) I had stuffed my python script in a folder inside my user space. ArcGIS Server doesn’t have permissions to see in there. Such a simple and obvious mistake. But such a time-waster!

Having moved the script outside my user space to somewhere more sensible, I regenerated my model, republished my MXD and once again, cleared my Rest cache, and my script succeeded first time.

 

But hang on a minute. Where are my debug messages?

Remember, I had bunged a load of arcpy.AddMessage() calls into my script just to try and get some debug info. Hey – I’d even got so frustrated I used an arcpy.AddWarning() just because I like to live dangerously. But now my script was clearly executing, where was my debug info?

Well, ladies and gentlemen, here’s the crux of it. Your GP service properties window has an option in the Parameters tab, called Show Messages, but its default state is unchecked. You need to turn this on in order to see any of your custom message information (and you’ll need to do this while your service is stopped).

For security reasons, debug information is not shown by default. But providing you’re aware of the risks of exposing potentially exploitable information (eg file names/paths/usernames/etc) you can turn this feature on. That way your end user sees status information, warnings and – most importantly – error messages, which could actually be incredibly useful to them. (After all, as an end user I would want to know that a call to a service has failed.) Plus as a developer I can then give users progress updates via status messages, especially for tasks that may take more than a few seconds to execute.

So in my book, the Show Messages option should  be checked as long as we’re aware of the sensitivity of outputting certain types of debug/status information.

 

(Aside: it is also possible to achieve the same results by changing the ShowMessages property in your service’s CFG file, but you’ll need to restart your whole SOM before the changes will take effect.)

So after all that, I finally had my messages coming through:

 

 

In summary…

If you are having problems debugging ArcGIS Server GP Services

1)      Make sure your GP tool works correctly in ArcMap before publishing it as a service

2)      Make sure all resources for your service – MXDs, toolboxes, data, script files, etc – are stored in a location that ArcGIS Server can access (ie outside your user space)

3)      Check your log files in C:\Program Files (x86)\ArcGIS\Server10.0\server\user\log\ for clues as to why your GP service is failing

4)      Turn on Show Messages and use arcpy.AddMessage() to send debug information to the client.

And if all else fails, ask a colleague called Simon*

*not all colleagues called Simon may have ArcGIS Server, GP and python experience. Use of Simons is at your own risk.

ArcPy demonstrations from the 2011 UC

Earlier this week I presented a session on using the ArcPY site package to manipulate map objects. I decided to take this approach since the standard geoprocessing through python is quite well known now. It turned out to have been a timely presentation since only yesterday ESRI announced a new online course on the subject http://training.esri.com/acb2000/showdetl.cfm?DID=6&Product_ID=1011

I also demonstrated a tool validator class which isn’t exactly new (it was introduced at 9.3) but which is little used functionality within the model/scripting framework. I realise that whilst the slides will be published there wasn’t much there without the demos. This article is partly to redress that by attaching the scripts I used so that anyone can download them: 

https://static1.squarespace.com/static/55bb8935e4b046642e9d3fa7/55bb8e8ee4b03fcc125a74c0/55bb8e90e4b03fcc125a7933/1306326540373/ArcPydemos.zip

Note that throughout the demos I chose to mix and match Arcpy and core (or OS module) methods. In many cases these are interchangeable when dealing with disk files.

The first demo centred on adding a feature class as a layer in a map. Much of the processing relies on creating a temporary disk file for the layer and this is then added to the map. I also played with the progressor bar in this, although it is fairly quick so you have to be watching to see it change.

The second demo iterated through the layers in a map and re-pointed these from an FGDB to an ArcSDE geodatabase. It was, in fact, a personal ArcSDE gdb on my machine but the same logic applies for an enterprise geodatabase. This showed the simplicity of looping with python lists and required nested lists to iterate through layers within data frames within the map. It also shows the perverse way in which group layers are handled. When I started the script I imagined that I’d iterate through a group layer if I found one but the demo2 script shows this is not the case. The layers are presented in the order they are in the TOC as if all the + toggles are hit. So the group layer appears in turn and the next layer is the first in the group. Once all the layers in the group layer are accessed the next layer is whatever is in the TOC after the group.

Due to time constraints I didn’t get chance to talk about the error handling. As a result of making the whole thing more “pythonic” it is no longer enough just to output the arcpy.GetMessages(2). Many of the errors are raised as python errors; assertion error, type error etc. and these have to be trapped by using the sys methods and the traceback module to report on these errors. The demos both show a rather crude example of this which has the essential steps to get the python error codes out of a failing script.

The final demo is a simple tool validator class to extract all the feature classes from a FGDB and use the list to populate a value list on a string field. While the example easily shows what could be done it is not especially useful as the field has to be of type String not FeatureClass for a value list to be applied to it. The called script therefore has to establish the workspace in the same way as the tool validator has. Take care writing code in a tool validator, it can be very hard to debug. Also ensure that it is in the correct place, initialise, update or message. InitialiseParameters runs as the tool opens. UpdateParameters runs whenever a parameter is changed by the user. The code must determine whether a parameter that is being “watched” has changed; there is no distinction in the invocation of the class method.

One of the biggest strengths of choosing python is that it opens up a whole range of prebuilt packages to perform tasks. Yesterday I was advising on using python modules to zip files and directories once geoprocessing has completed and investigation revealed that at least 3 (zipfile, gzip and tarfile) are packaged with the default python 2.6 installation.

I hope you find the demo scripts useful, and if you attended the session I hope you enjoyed it. Thanks, Rob

Improving the performance of geoprocessing models and services

Recently one of my colleagues wanted to calculate driving routes and CO2 emissions for car journeys in the UK. He put together a prototype in ModelBuilder and published it as a geoprocessing service to ArcGIS Server so that we could run it from the web.

Although the model worked well, we noticed that it felt sluggish for a web service that was going to be accessed repeatedly, so we started looking at straightforward ways to boost the model's performance.

The first thing we did was to measure! We looked carefully at which tools in the model were running quickly and which ones weren't. To do this, we opened up the model for editing in ArcCatalog and from the "Model" menu selected "Run Entire Model". As each tool in the sequence is executed, its process box is temporarily highlighted red.

Any tool that stays red for an extended period of time is a possible bottleneck that needs further investigation. This is a great way to visualise where delays in your model are happening. For the more numerically inclined, the number of seconds taken to execute each tool (rounded down to the nearest second) is also shown in the progress box.

Although how you proceed will depend on your individual circumstances, we found two quick wins to speed up our particular model, which made it run about four times faster!

  1. We used in-memory workspaces to store temporary data
  2. We reduced the number of tools by replacing entire tool sequences with Python scripts

Let's look at these in more detail.

Typically, each tool in the model processes some data and writes its output to a feature class in the scratch workspace (a directory on your computer's hard disk). The next tool in the model reads its input from this location and the cycle repeats.

Writing to the hard disk is much slower than writing to a temporary location in the computer's memory (RAM), so consider using an in-memory workspace for passing data from one tool to the next.

In-memory workspaces have some important limitations, which are discussed in the articles below:

Intermediate data and the scratch workspace (ArcGIS 9.3, scroll down to see the section on in-memory workspaces)

Using in-memory workspace (ArcGIS 10.0)

In particular, data (e.g. feature classes) written to an in-memory workspace cannot be edited once they have been written. You should also delete in-memory data after you are finished with them, otherwise they can fill up your computer's RAM until you restart your ArcGIS Desktop client (if running a Desktop model) or your server process is recycled (if running a geoprocessing service), which might not happen for hours. The "Delete" tool can be used to clear an "in_memory" workspace.

Secondly, our model uses a sequence of tools such as "Calculate Field" and "Join Field" to look up CO2 emissions data from a table and calculate an estimate for our car journey. However, we could do this more efficiently by replacing the tool sequence with a single Python script that doesn't spend a lot of time passing data between different tools.

The following pages have useful information and Python code samples that show you how to wire up a Python script's inputs and outputs so it can be used as a tool in a ModelBuilder model.

Setting script tool parameters using the arcgisscripting module (ArcGIS 9.3)

Setting script tool parameters using the arcpy module (ArcGIS 10.0)

The general process of performance tuning involves measuring the areas of your model that are slow and making them more efficient (e.g. slow actions might be reading from/writing to disk, using more tools than you need to, using data without attribute indexes, etc)

If you're interested in learning more, the ArcGIS Resource Center has some general performance tips for geoprocessing services.

For more detailed insights into geoprocessing on Server, there's a useful video presentation from the 2010 Esri, Inc. Developer Summit:

Building and Optimizing Geoprocessing Services in ArcGIS Server

Happy performance tuning!

 

What search distance should I use when I create a hotspot map?

When using some of the Spatial Statistics tools (such as the Getis-Ord Gi*) or when undertaking a Point Density Analysis it is sometimes hard to know what search distance to set. There are different strategies for defining the optimal distance:

  1. You could guess
  2. If you understand your data you could make an educated guess based on needing approximately 8 other features within the search distance
  3. You could let your data tell you.

Clearly 3. is the preferred option!

There has been a really useful tool in the Spatial Statistics toolbox*, the Spatial Autocorrelation (Global Moran’s I), for some time which helps you measure spatial autocorrelation based on features locations and their attribute values (the data must have attribute values for this method to work). This will provide you with a Z score which indicated statistical significance. To determine the optimal search distance however you need to take this tool and run it multiple times (say 20) over different distances and noting each Z Score you plot them in Excel (or the Charting tools in ArcGIS Desktop) and based on where the z score peaks this will provide you with an idea of where you should set the search distance.

This method works really well although running the tool multiple times can take a while, especially if your dataset is quite big. You could of course be clever and build this into a Geoprocessing Model to iterate the Global Moran’s I tool. However, Esri have just released a “Supplementary Spatial Statistics” Toolbox which contains a Incremental Spatial Autocorrelation tool which does exactly what is described above except it iterates over multiple distances, thus meaning you don’t have to run the Global Moran’s I tool multiple times. It also provides you with an outputted table and a nice little graph showing you the plotted Z Scores.

You can download the Toolbox for ArcGIS 10 here.

In addition to the Incremental Spatial Autocorrelation tool there is also a tool for Exploratory Regression.

 

Whilst I’m writing this post, it provides me with a good opportunity to provide a link to one of the most useful spatial statistics resources on the ArcGIS Resource Centre. It contains many useful “How to” videos, presentations and case studies. If you’re new to Spatial Statistics in ArcGIS then this is definitely worth a visit. These resources can be found here.

* the Spatial Statistics Toolbox is available at ALL ArcGIS Desktop Licence levels.

TwitterLinkedIn