Journal of Geographical Systems

, Volume 12, Issue 1, pp 69–87 | Cite as

Arc_Mat: a Matlab-based spatial data analysis toolbox

Original Article

Abstract

This article presents an overview of Arc_Mat, a Matlab-based spatial data analysis software package whose source code has been placed in the public domain. An earlier version of the Arc_Mat toolbox was developed to extract map polygon and database information from ESRI shapefiles and provide high quality mapping in the Matlab software environment. We discuss revisions to the toolbox that: utilize enhanced computing and graphing capabilities of more recent versions of Matlab, restructure the toolbox with object-oriented programming features, and provide more comprehensive functions for spatial data analysis. The Arc_Mat toolbox functionality includes basic choropleth mapping; exploratory spatial data analysis that provides exploratory views of spatial data through various graphs, for example, histogram, Moran scatterplot, three-dimensional scatterplot, density distribution plot, and parallel coordinate plots; and more formal spatial data modeling that draws on the extensive Spatial Econometrics Toolbox functions. A brief review of the design aspects of the revised Arc_Mat is described, and we provide some illustrative examples that highlight representative uses of the toolbox. Finally, we discuss programming with and customizing the Arc_Mat toolbox functionalities.

Keywords

Matlab Spatial econometrics Spatial data analysis Object-oriented 

JEL Classification

C87 C88 

1 Introduction

Development and implementation of computer programs that support exploration and modeling of spatial phenomena has been an active area of research for the last two decades (Haining 1989; Goodchild et al. 1992; Fischer and Nijkamp 1993; Fotheringham and Rogerson 1994; Fischer and Getis 1997; Goodchild et al. 2000; Goodchild and Haining 2004; Rey and Anselin 2006; De Smith et al. 2009). These programs were developed based on a range of design philosophies to provide different functionality. For examples, see Geographical Analytical Machine (Openshaw 1990), statistical spatial data analysis (Wise et al. 2001), SpaceStat (Anselin 1992), SANET (Okabe et al. 2005), CrimeStat (Levine 2006), GeoVISTA (Takatsuka and Gahegan 2002), STAR (Rey and Janikas 2006), SAM (Rangel et al. 2006), R-Spatial (Bivand et al. 2008), and GeoDa (Anselin et al. 2005). A common theme is the integration of GIS capability with exploratory spatial analysis and basic statistics, with less emphasis on elaborate modeling. Since all of these software programs claim to provide both spatial and statistical analysis of spatially referenced data, we label this software genre “spatial data analysis software” for simplicity.

There have been three major approaches taken to such programs: early efforts focused on connecting commercial GIS and statistical software with formal methods to exchange data files between independent GIS and statistical software programs (Anselin and Getis 1992). Examples include the joint use of GRASS and S (Farley et al. 1990), and the collaboration of ARC/INFO and BMDP (Warren 1990). The linkage between mapping and analysis programs in these efforts may be generally classified as loose coupling and one-directional.

A second more recent development of spatial data analysis software has aimed at producing a fully integrated platform for spatial data analysis. Representative examples of this genre would be GeoDa (Anselin et al. 2005), SAM (Rangel et al. 2006), and Mondrian (Theus and Urbanek 2008). GeoDa, a freeware package, was constructed using C++ codes that integrate open source components such as Python, OpenGL, and COM objects. The goal is a dedicated general purpose software program to produce dynamically linked maps with sample selection based on brushing data and multiple views as well as spatial regression modeling. SAM is also a stand-alone package focusing on surface pattern analysis in macro-ecology and biogeography, whereas Mondrian is a Java-based statistical visualization system for categorical and geographical data.

A third research effort has focused on building extensions and add-ons to commercial GIS and statistical software, i.e., extending GIS software with spatial analysis capability or vice versa. On one hand, spatial analysis functionality has been added to GIS software using plug-ins, niche programs, scripts, dynamic linked libraries and middleware (Anselin et al. 2005). Conversely, since early statistical software did not provide an environment for GIS components, rare attempts were made to add this functionality to statistical programs. Examples include SpaceStat and DynESDA plug-ins for ArcView (Anselin 2000), as well as the statistical extension for ArcView developed by the SAGE project (Wise et al. 2001). Progress in computer software and hardware for data analysis and visualization gradually allowed spatial analytical functionality in statistical software. Many mainstream scientific computing platforms now provide a high level language programming interface that allows the creation of a Graphical User Interface (GUI), and thus spatial data analysis packages. For example, a suite of spatial analytical packages built-in the R environment provides functions for various aspects of applied spatial data analysis (Bivand and Gebhardt 2000; Bivand 2002a, b; Bivand and Portnov 2004; Bivand et al. 2008). These R packages range from data management (sp, shapefiles, GRASS), to visualization (maptools, maps), to spatial interpolation and geostatistics (spatial, gstat, sgeostat), and spatial regression modeling (spdep). Modules for the Stata platform (Pisati 2001) have also been developed for thematic mapping (tmap, spmap), spatial statistics (spatcorr, spatgsa), and estimating spatial models (spatreg). GeoXP (Heba et al. 2002; Laurent et al. 2006) which focuses on geographic data exploration has been implemented in both Matlab and R/S+ environments.

The Matlab-based spatial data analysis toolbox Arc_Mat is part of this stream of research effort. Initial work (LeSage and Pace 2004a) focused on extracting map polygon and database information from ESRI shapefiles for use in the Matlab environment. Matlab is a commercial software environment for scientific computing and visualization developed by the MathWorks Inc. The programming language interface in Matlab allows users to create, share, modify and utilize Matlab functions, and toolboxes are the name given by the MathWorks to related sets of Matlab functions aimed at solving a particular class of problems. The graphical capabilities of Matlab were used to produce high quality maps with the imported polygons. The Arc_Mat toolbox revision described here utilizes enhanced computing and GUI functionality in more recent versions of Matlab (R2009a) to restructure the toolbox using new object-oriented programming features, and to provide more comprehensive spatial analysis functionality ranging from basic mapping to exploratory and confirmatory spatial models.

The next section motivates our reliance on the Matlab software platform discussing issues such as multi-platform support, high level graphical capabilities and advantages associated with this interactive development environment. The following section presents the design of the re-structured Arc_Mat toolbox, after which attention is turned to illustrations of the Arc_Mat toolbox functionality and utilizing new object-oriented programming tools included in Arc_Mat.

2 Motivating Arc_Mat

2.1 Multi-platform functionality

Arc_Mat achieves a certain level of cross-platform functionality in term of both software and hardware. Arc_Mat and its applications are executable on various operating systems because Matlab has been implemented on Windows, Macintosh, and Linux/Unix. Arc_Mat and its applications utilize all computer resources through Matlab. While not freestanding, Arc_Mat and its applications can be utilized on any platform that supports Matlab. In contrast, portability across different platforms remains a major issue for fully integrated packages. For example all graphic windows in GeoDa are based on Microsoft Foundation Classes (MFC) and thus tied to the Microsoft Windows platform. An open source “OpenGeoDa” effort is underway to address this issue with a beta version recently released. A close examination of SAM suggests that the program is optimized for running in a Windows environment. Moreover, Matlab provides an intelligent interface to underlying hardware and software graphics functionality, whereas handling these low-level hardware interfaces has always been a issue for packages written from scratch like GeoDa and SAM. For example, Matlab automatically handles numerous default decisions regarding hardware versions of OpenGL, detecting graphics hardware if it is available, or using a software version of these low-level graphics functions to enhance graphing performance.

2.2 Efficient graphing and computing capability

The Arc_Mat toolbox draws on Matlab’s support for recent advances in computer graphics that allow thousands of polygons to be rapidly drawn on a computer screen, allowing us to produce efficient mapping functions. High level manipulation of interactive GUI and graphical objects is also a feature of the Matlab programming language, which enables us to create extensive and efficient user interfaces. In contrast, mapping capability may be limited in other programs due to design architecture, software purpose/developers’ backgrounds, and difficulties that arise in adapting graphing functions to varying software/hardware environments. For example, the underlying R system is not designed specifically for visualizing spatial data, and R mapping packages like sp and maptools thus cannot provide dynamically linked graphs due to the limited interactive functionalities offered by R. SAM was developed for the purpose of analyzing spatial data in macro-ecology and biogeography, so it does not explicitly offer many standard mapping and GIS functionalities such as panning and selection and most importantly, SAM does not incorporate dynamically linked graphics, i.e., all statistical plots in SAM are disconnected and not synchronized. As mentioned above, cross-platform performance has been an issue for GeoDa, and its strength in exploratory spatial data analysis cannot be released on platforms other than Windows, and the visualization of spatial data in spmap and tmap packages under the Stata environment is somewhat rudimentary and static. Finally, Mondrian provides fully linked plots and various queries and interactions, and serves as a good benchmark for development of dynamically linked graphs. The Arc_Mat toolbox provides five of eight types of plots offered by Mondrian, and implements fully linked graphs that allow several types of user interactions ranging from spatial selection, to color theme and variable selection. However, Mondrian works only with ASCII format data which is rare in spatial analysis. One drawback to an interpretive language such as Matlab has been overcome by use of Just-In-Time compiling technology used in more recent versions of Matlab to enhance performance of object-oriented code.

As for statistical computing, large spatial datasets can be handled in Arc_Mat with moderate processing time and computer storage because Matlab utilizes state-of-the-art linear algebra and matrix algorithms useful for working with large sparse spatial weight matrices which are used extensively in spatial econometrics. The developers of SAM (Rangel et al. 2006) suggest that working with large datasets on a desktop computer would be problematical and carrying out procedures that require a large amount of iterations remains a problem in R (Bivand et al. 2008).

2.3 Customizability and extensibility

Use of the Arc_Mat toolbox requires purchasing the Matlab software, but the source code for Arc_Mat is freely available online. Arc_Mat is customizable and extensible allowing developers to incorporate Arc_Mat functionality in other Matlab programs or to modify the software. This should allow Arc_Mat to be supported by the Matlab community and draw on other publicly available source code. For example, the open source C library shapelib is used by Arc_Mat to extract spatial and attribute information from ESRI shapefiles through the Matlab C/C++ interface C-MEX, and the public domain Matlab function uitable() was adopted to provide a tabular view of data for users of earlier versions of Matlab. Like Stata, LaTeX and some R programs that provide integrated package management and incorporate links in the programming environment to automatically download and install contributed software referenced by procedures, the MathWorks Inc. is introducing a similar mechanism in the near future. In contrast, updating GeoDa and SAM requires users to download and install the latest version to obtain the updated features and functionalities.

2.4 Shorter development and debug cycle

The Arc_Mat toolbox consists of Matlab programs that are developed and debugged in an interactive programming environment, avoiding the edit-compile-test-debug development loop common to compiled language program development. The development, debug and execution of the Arc_Mat toolbox and its applications are similar to that of other popular integrated development environments (IDE), e.g., Visual C++. Moreover, developing applications from scratch as in the case of GeoDa which was partially developed with C++ would require a series of knowledgeable decisions about the inherent strengths and weaknesses of underlying graphics hardware/software libraries. As already noted, Matlab automatically handles these development decisions as part of the high level Matlab platform. This automatic handling reduces the burden on developers and allows more rapid results. The interactive development environment allows users to examine and manipulate variables in memory, chose data objects for storage and later use, as well as standard debugging and diary functionality. This type of environment is of course useful when working with source code for complex object-oriented systems. In contrast, the architecture of fully integrated packages like GeoDa and SAM does not provide this type of standard integrated development environment.

Freely available source code and documentation also facilitates development of Arc_Mat applications since this allows students or researchers to examine the precise implementation of spatial analytical procedures. Rey (2009) makes the case for publicly available source code, one of which is that errors can be identified and fixed directly by users, whereas users of GeoDa and SAM have no option other than reporting bugs encountered during use. The trend of open source in spatial data analysis software can also be found in other publications such as Buliung and Remmel (2008).

2.5 Matlab object-oriented design

The MathWorks Inc has consistently improved the Matlab programming language, with one example being the new object-oriented programming capability. Starting with version R2008a, new classes can be defined and implemented, allowing use of standard object-oriented design paradigms. The updated Arc_Mat toolbox has changed from a function-based toolbox to a class-based toolbox by implementing a newly improved object-oriented design. The whole toolbox is developed as a set of generic Matlab classes, and different spatial analysis programs can be constructed using combinations of the instantiations of these classes. By implementing this object-oriented design, the Arc_Mat toolbox allows code reuse, inheritance, encapsulation, and reference behavior without engaging in the low-level housekeeping tasks required by other languages, all of which further facilitate the customizability and extensibility of the codes. A more detailed description of object-oriented programming used in the Arc_Mat toolbox can be found in the toolbox design section.

2.6 Availability of spatial econometrics/statistics functions

Built to complement the Matlab spatial econometrics toolbox, the Arc_Mat toolbox is able to utilize the wide range of spatial statistics/econometrics functions used for spatial modeling and exploratory spatial data analysis. These functions form an extensive toolbox for estimating various types of spatial models, calculating spatial weight matrices, statistical tests for spatial dependence, Bayesian MCMC estimation of model parameters, comparison of different spatial model specifications, and prediction at unobserved data locations (LeSage and Pace 2009). Apart from GeoDa, SAM and some R-spatial packages, the spatial modeling functionality in other major spatial data analysis software still remains somewhat rudimentary, most of which only provide geometrical spatial data analysis, e.g., buffering and overlay, under the name of “spatial analysis”. As for widely used and closed-source packages like GeoDa and SAM, most modeling routines offered are fairly standard, and there is no ability for users to incorporate more experimental routines. In contrast, Arc_Mat can use modeling techniques provided in the underlying spatial econometrics toolbox and a similar spatial statistics toolbox (LeSage and Pace 2009). In fact, some modeling procedures in SAM are adopted from the spatial econometrics toolbox.

3 Toolbox design

3.1 Toolbox re-structuring

The previous Arc_Mat toolbox was created using procedure-oriented programming principles making it a function-based toolbox, where its functions were grouped with the aim of solving particular problems. This resulted in a toolbox consisting of several folders, each of which contained functions that created one type of mapping GUI. For example, the folder ‘sarmap’ included all functions used to generate a map coupled with spatial autoregressive model estimation results.

This procedure-oriented design has two main setbacks. The first is a lack of code reusability, since functions are grouped to create particular types of GUIs and different GUIs can have a great deal of functionality in common. For instance, folders ‘histmap’ and ‘moranmap’, contain functions to generate a coupled map and histogram and a coupled map and Moran scatterplot, respectively. Both folders include functions to generate color themes for a choropleth map, to select map polygons, and to exit the GUI. Each folder contains its own version of the functions that provide these common functionalities resulting in a great deal of code duplication. This code redundancy can lead to increased development and testing time and size of the programs.

A second setback to the procedure-oriented design is the difficulty in maintaining and enhancing code. An illustrative example would be implementation of dynamic linking and brushing, which is a central organizing technique for multiple viewport data visualization. Dynamic linked graphs allow selection of data objects in one figure that will automatically highlight corresponding representations in other figures. The previous version of Arc_Mat offered a one-directional linkage between map and exploratory graphs, i.e., selection of polygons highlights their representation in a Moran scatterplot or histogram, but users were not allowed to make selections on these exploratory graphs. The difficulty of implementing dynamic linking among multiple “views” of the data in the previous version of the Arc_Mat stemmed from the procedure-oriented design. A procedure-oriented design requires creation of a callback function for every operation on maps, graphs or tables. As the number of operations increases and more views of data are dynamically linked, the callback functions become complex and lengthy. Moreover, it is difficult to implement data synchronization among multiple data visualizations since each “view” of the data has its own copy of the data and these “views” must be synchronized programmatically after every operation on any single data viewport.

New object-oriented design features were introduced beginning in Matlab version 2008a allowing us to plan and implement a system of interacting classes/objects for the purpose of spatial data analysis. In this approach to software design, the Matlab program is conceptualized as a group of classes. Each of these classes consists of encapsulated data which describe the class properties, procedures which describe how the class behaves, and the ‘class interface’, which determines how classes can be accessed.

The Matlab object-oriented approach improves our ability to manage code complexity, and has other advantages that we discuss in the sequel (Register 2007). First, a Matlab object-oriented programming paradigm decomposes the entire toolbox into a set of abstract classes from a modular perspective. The toolbox can be conceptualized as three main types of objects: data to be mapped, viewports of data, and objects designed to manage all visualizations. Applications based on the Arc_Mat toolbox become interactions of various instantiation of these classes, i.e., objects. Second, the encapsulation feature of Matlab object-oriented programming provides modularity and information hiding to developers. By encapsulating related variables and functions in objects, the toolbox achieves modularity so source code for an object can be written and maintained independently of the source code for other objects. Third, the inheritance feature of Matlab object-oriented programming provides the benefit of code reusability which allows extension of existing functionality without having to change existing code. Fourth, the reference behavior feature of a Matlab object-oriented program can be used to solve the problem of data synchronization. Instead of having individual copies of data to be mapped, different viewports share the same data object using the reference mechanism. Thus, operations on data originating in any viewport can be automatically reflected in other data views.

However, the current implementation of object-oriented programming in Matlab has some drawbacks that carry over to Arc_Mat. Matlab object-oriented programming does not support polymorphism of classes, which serves as one of the key features of a pure object-oriented paradigm. The lack of polymorphism adds to development time since we need additional code to handle different types of input to the same function. Furthermore, since all manipulation of object properties and functions are done indirectly through object handlers, referencing variables may take more time under the object-oriented design. This setback may become important when a very large volume of data is processed. For example, a typical retrieval of the index of map polygons without variable values is carried out using “obj.DataObj.results.missing”, where each “.” operation may take a unit of processing time, and the accumulated elapsed time for many such retrievals may become significant.

3.2 Toolbox classes design

3.2.1 Definition of classes

The new Matlab object-oriented design for the Arc_Mat toolbox consists of three main types of classes: DataSource, Linkage, and Representation shown in Fig. 1.
Fig. 1

Hierarchy of Arc_Mat classes

The DataSource class stores data to be mapped and analyzed by application programs, and contains both spatial and thematic information. Because current Arc_Mat applications typically do not analyze linear phenomena or network data, information held by DataSource objects represent aerial or point data. The DataSource object also works in a similar manner as the “bit-string” in DynESDA and GeoDa (Anselin et al. 2002, 2005) that serves as a common repository of current selection states of data objects.

The representation classes are a set of classes that visualize DataSource information in various ways. The instantiation of representation classes may be a choropleth map, statistical plot, or table showing model estimates. Different representation objects share the same DataSource object to help researchers view data from different perspectives. The functions within representation classes serve two general purposes: rendering the GUI of the specific viewport, and handling users’ operations on elements of the GUI. The set of representation classes is organized in a hierarchical manner. All classes are inherited from the generic representation class (shown in Fig. 2), and several other classes designed for more specific tasks also exhibit this inheritance. For example, the GlobalModel class which provides a general tabular view of spatial regression model estimation results is derived from the generic representation class, and further inherited by the GloablSAR and GlobalSEM classes. The latter two provide a tabular view of results from SAR and SEM model estimation, respectively.
Fig. 2

Inheritance of representation classes

The linkage class provides dynamic linking and brushing of all representation objects. Since representation objects register themselves in the linkage object using a handler for every representation object, developers can access representation objects and perform operations on those objects. For example, developers can call class functions and modify class properties using the linkage class. In addition, the linkage object detects representation objects’ events and updates them in a pre-defined sequence.

3.2.2 Interaction among objects

The interaction among objects is accomplished by implementing a Matlab events-listener mechanism. Events represent changes or actions that occur within class instances and Matlab uses an events-listener mechanism to communicate the occurrence of events and react to these events.

For any GUI operation, e.g., selection of graphical objects, selection of variables, and modification of selected objects, the corresponding representation object handles it with a callback function that generates a Matlab event. This event notifies other objects that a modification to the DataSource object has taken place for either the selected spatial objects or the variable under investigation. The linkage object has a listener function that continuously detects events. Once an event is detected, the linkage object will respond to the event by re-brushing all representation objects. This is accomplished by calling each object’s rendering function using its handler stored in the linkage object.

3.3 “Legacy code” issue

With the goals of improving design, adding features and optimizing resource usage, Arc_Mat has been upgraded from a toolbox containing interrelated Matlab functions to one consisting of generic classes. This change makes existing Matlab programs based on the previous version of Arc_Mat “Legacy code”, i.e. code whose functionality should be preserved when the underlying toolbox or libraries are changed.

Our treatment of “Legacy code” should promote the adoption of the new version of Arc_Mat, since no extra effort is required to replace the existing Arc_Mat toolbox with the new version and previous applications will still work. Functions called by users’ codes to create mapping GUIs have been rewritten in the new version of Arc_Mat. For example, the function arc_histmap() generates a map coupled with a histogram. Instead of calling sub-procedures in a certain sequence as in the former version, the rewritten arc_histmap() consist of instantiation and initialization of classes as shown in Fig. 3.
Fig. 3

Comparison of arc_histomap() function in previous and current version of Arc_Mat

Another important aspect of the “Legacy code” issue is familiarizing users with the new toolbox. The naming conventions and associated functionalities of many variables in the former version of Arc_Mat were preserved to facilitate this process. For instance, in both versions of Arc_Mat, a “results” structure contains all estimation results, the “option” structure contains all user specified options, and a “variable” structure contains names of all explanatory variables.

4 Toolbox functionalities

4.1 Mapping

Mapping functions describe the basic spatial distribution of observations. The onPlot() function in the BaseMap class handles plotting of map polygons. This function calls an underlying utility function make_map() to extract polygon coordinates contained in the DataSource object, plot map polygons in the GUI, and generates a structure variable containing ‘graphics handles’ to each polygon and its possible parts. Using these ‘graphics handles’ the characteristics of the GUI and associated graphical objects, can be defined and altered. A description of design considerations for plotting map polygons is provided in (LeSage and Pace 2004b). We note that the objective of the Arc_Mat Toolbox is not cartographic reality, but rather comparative visualization of sample data relationships that may exist between regions. This focus on relationship visualization partially explains the adherence of Arc_Mat to spatial data with planar coordinates.

4.2 Exploratory spatial data analysis (ESDA)

Dynamically linked graphs have been a fashionable and useful interactive spatial data analysis tool, for example, dynamically linking and brushing in GeoDa has been credited as a useful tool in crime and disease analysis (Leitner and Brecht 2007). With dynamically linked windows and other interactive facilities, it is easier for users to describe the spatial pattern of observations, identify spatial and non-spatial outliers, differentiate between global and local spatial patterns, generate a better understanding of spatial statistics (Anselin 2002a) as well as facilitate the analysis of multivariate spatial data (Cook et al. 1996, 1997). Arc_Mat also provides exploratory spatial data analysis through dynamic linking and brushing of maps and statistical figures, which include the three-dimensional scatterplot, choropleth map, parallel coordinate plot (PCP), Moran scatter plot, histogram, and distribution density plot.

We illustrate some of this functionality using the relationships between crime rates, housing values and income levels (see Fig. 4), from the Anselin (1988) Columbus, Ohio neighborhood dataset that contains 49 observations.
Fig. 4

Exploratory spatial data analysis of Columbus crime data

Various plots are linked to the map of neighborhoods. The upper-left panel contains a choropleth map representing the color scheme determined by the Moran scatterplot in the lower-right panel for the 1980 crime rates while the upper-right panel contains a three-dimensional scatterplot showing crime, house values, and income. The lower-left panel shows a parallel coordinate plot (PCP) for the 1980 crime rates, housing values and neighborhood income levels. The map polygons in the center of Columbus and their corresponding representations in other viewports are highlighted.

All plots illustrate that high crime rates are associated with low household income levels and housing values. The choropleth map and Moran scatterplot suggest a cluster of high crime rates in central Columbus neighborhoods where we see higher than average crime associated with an average of nearby regions that also exhibit higher than average values, which can also be seen in the clusters on the three-dimensional scatterplot. Changing variables using the GUI reveals clusters of low household income and house values in the same central city neighborhoods. Moreover, in the PCP, highlighted lines show this relationship between high crime rates, low household income and low housing values.

Besides providing dynamic linking and brushing among different plots, the Arc_Mat application allows users to make real-time changes in the variable selected for investigation, control whether each plot is synchronized with other plots, and zoom in/out or export a single plot using the Matlab figure toolbar.

For each application, Arc_Mat allows users to select one color palette from thirteen built-in Matlab colormaps programmatically or through a pop-up menu, and Arc_Mat then automatically changes the color theme by calling underlying Matlab function “colormap.m”. The parameters used for calling “colormap.m” are self-descriptive: Jet, HSV, Hot, Cool, Spring, Summer, Autumn, Winter, Gray, Bone, Copper, Pink, and Lines, all of which are fairly standard. When producing illustrations for publication purposes, a change from the default Matlab palette would usually be preferable. More advanced users can determine their palette programmatically, whereas less experienced users can rely on the Matlab “colormapeditor” to select a color theme interactively using this Matlab GUI function.

One advantage of distributing source code is the ability to provide suggestions for future enhancements that could be implemented by others. For example, Arc_Mat includes a public domain Matlab function hatch.m to provide hatching effects on selected data objects which is currently commented out because hatching polygons is time consuming for fine scale spatial datasets consisting of numerous polygons. A future enhancement would be to provide hatching functionality only in cases where the number of polygons/regions being analyzed is less than some small threshold value, say 50 regions. Other examples for enhancement opportunities relate to the default use of a user-selected set of 20 or fewer (equal interval) classes for histograms, and other presentation options such as line styles, widths and colors. These features of the current version of the toolbox could be easily changed or augmented using quantile classification available in the MathWorks Statistics Toolbox and a host of graphics options available in the standard release of Matlab.

4.3 Spatial data modeling

As already noted, the Arc_Mat toolbox was built as part of an extensive spatial econometrics toolbox. This allows spatial data analysis ranging from basic utility functions for computing spatial weight matrices to comprehensive model estimation, comparison and prediction.

Model estimation procedures in Arc_Mat distinguish themselves by use of efficient algorithms including sparse matrix algorithms available in Matlab, computationally efficient algorithms for computing maximum likelihood estimates described in Pace and Barry (1997) and Barry and Pace (1999).

We illustrate the maximum likelihood estimation of a spatial autoregressive model (SAR) of employment growth for the complete set of 3,111 continental US counties in 1980 as well as local estimates for 1,683 counties in the eastern part of the US (see Fig. 5) In the left panel is a map showing all 3,111 polygons with the 1,683 eastern counties highlighted with bold boundaries. In the right panel are tabular views of global and local estimation results for the SAR regression models. The independent variables include 1980 census information on employment density, population density, land area, educational attainment levels, employment in manufacturing, unemployment, per capita income (in 1979), local government spending on education, highways and police (in 1982), non-white population and a 1990 urban indicator variable. Moreover, Arc_Mat allows users to map model specific residuals onto the associated BaseMap by simply clicking the ‘map this graph’ checkbox on these panels. In Fig. 5, the distribution of residuals in the global SAR model is displayed (with the ‘map this graph’ checked). We can easily identify outliers with excessively bright or dark polygons. Arc_Mat also allows comparisons among different spatial models such as spatial error model (SEM), spatial durbin model (SDM), and general standard model (SAC). We do not present these model comparisons among these multiple models to preserve readability of our figure here. However, another test on the same dataset with both SAR and SEM models, which includes importing the ArcView shapefile, mapping 3,111 polygons and estimation of both models, took 8.7 s on a Lenovo Intel(R) Core(TM)2 Duo CPU @1.80 GHz laptop. This speed allows the user to explore local versus global relationships as well as inferences from alternative models using the GUI interface.
Fig. 5

Estimation results of spatial modelling for national and local unemployment rate for 1980

4.4 Programming with Arc_Mat

In this subsection, we illustrate the procedures involved in building an Arc_Mat application. An Arc_Mat program can be viewed as a combination of a DataSource object, multiple representation objects, and a linkage object.

In a typical Arc_Mat application, a DataSource object is created using three structure variables that are analogous to the function inputs in the previous version of Arc_Mat. Next, a group of viewports are constructed by initiating corresponding representation objects with the DataSource object. Finally, a linkage object is created which registers all existing representation objects.

Arc_Mat application development is illustrated with codes that are used to generate the GUI (shown in Fig. 6) for the analysis of per capita income levels for 138 European Union regions (Le Gallo and Ertur 2003) . After the data preparation process, the program generates a DataSource object (data) with variable names (variable), spatial and thematic attributes (results), and user specification of GUI settings (options). A BaseMap (map), a Moran scatterplot (moran), a Global SAR model estimates table (gsar) and a three-dimensional scatterplot(dens) are then constructed using the DataSouce object. Finally, these representation objects are registered in one linkage object to achieve dynamic linking and brushing among all viewports. In Fig. 6, we also demonstrate the dynamic linking among basic map, exploratory statistical plots, and spatial modeling results.
Fig. 6

Matlab codes used to generate GUI for analysis of per capita income levels for 138 European Union regions

4.5 Toolbox deployment and customization

Arc_Mat has been freely available since 2004 at www.spatial-econometrics.com, a web site that has been in existence since 1999. The (Matlab) source code for econometric and spatial econometric modeling functions are provided in a zip file format along with old and new versions of the Arc_Mat toolbox also in zip format. Deploying the Arc_Mat toolbox on a single computer consists of three steps: (1) download the Arc_Mat zip file from the web site, (2) unzip the Arc_Mat files to a user-designated folder, and (3) use the “set path” menu on the Matlab GUI to add the designated folder (and sub-folders) to Matlab’s search path. After these three steps, users will be able to call Arc_Mat functions in any of their Matlab programs.

As already noted, the customizability of Arc_Mat originates from being an application-level open source effort based on the object-oriented paradigm, allowing easy adoption by the Matlab user community. The Arc_Mat toolbox can be easily incorporated with other Matlab code, it can be used to extract ESRI shapefile information for use in other Matlab programs, e.g., GeoXP, and other public domain Matlab functions can be utilized in Arc_Mat as noted previously. Users are allowed to copy any part of the Arc_Mat code and paste into their own projects. More importantly, Arc_Mat achieves its extensibility by implementing an object-oriented programming paradigm. For example, a new class defining another “view” of aerial data could be easily added to the toolbox as long as the new class implements all functions in its representation super-class as well as its class-specific functions.

We illustrate the extensibility of Arc_Mat by adding a new subclass to the GlobalModel superclass that provides a tabular view of estimation results for the general spatial regression model (SAC). The creation of the GlobalSAC class requires three steps. First, a new Matlab M-file “GloablSAC.m” is created, and the class definition from GlobalSEM is copied to this new file. The developer then must change the function and variable names and re-write the callModel() and getResidual() function as can be seen by comparing the GlobalSEM and GlobalSAC class definitions shown in Fig. 7.
Fig. 7

Definition and comparison of GlobalSEM and GlobalSAC classes

5 Conclusions and future work

An object-oriented redesign of the Arc_Mat Toolbox for spatial data analysis was described here. This allowed a test of the object-oriented programming facilities available in newer versions of the Matlab programming software. The design philosophy and functionality of the prototype objects that make up the new Arc_Mat Toolbox were set forth, along with a discussion of “Legacy code” issues regarding compatibility with the previous version of the Arc_Mat toolbox. We discussed a number of advantages to the new object-oriented approach versus the previous function-based toolbox. While not freestanding, Arc_Mat and its applications can be utilized on Windows, Mac and Linux/Unix platforms, all of which are supported by the Matlab program. Since Matlab provides an intelligent interface to underlying hardware and software graphics functionality, this allows us to consistently employ emerging computer technology in terms of hardware and software. In contrast, portability across different platforms remains a major issue for other spatial data analysis packages, most of which are tied to the Windows platform, such as GeoDa and SAM.

The previous version of the toolbox is extended to incorporate dynamic linking of plots and data brushing, supporting the trend towards close-coupling, modulation and dynamic interaction of GIS and spatial analysis components. As in the previous version of the toolbox, source code is provided and the newly developed objects should allow the Matlab user community to easily incorporate spatial data analysis functionality in other Matlab programs. The transformation to object-oriented design produced a great deal of code simplification, ease of maintenance and code re-usability, a frequently cited advantage of object-oriented programming.

The toolbox revision also aimed at providing a more extensive set of exploratory and spatial modeling functionality than in the previous version. For example, objects were constructed that allow users and developers to create tabular GUI viewports for examining results from spatial econometric model estimation, new statistical views of data linked to maps such as the PCP, three-dimensional scatterplot, density and double density plots represent new ESDA tools. The ‘map this graph’ function on statistical plots allows users to map the corresponding plots onto choropleth maps easily.

Finally, we provided some guidance to users wishing to incorporate the new Arc_Mat objects to produce extensions of the applications currently in the toolbox. Our illustrative example demonstrated how to add a new general spatial model (SAC) estimation application using the object-oriented interface.

Along with providing a broader range of spatial econometric and statistic tools, an improvement would be made by extending the functionality to other data structures. The current tools apply mainly to lattice data (Cressie 1991), such as polygons and points illustrated in the examples, so support to point pattern, geostatistical data as well as spatiotemporal flow data would be a useful extension. Spatial context and arrangement for these data types would not be covered by the spatial contiguity matrix used in spatial regression modeling, and other analytical procedures would need to be incorporated into the toolbox.

One drawback we found is that the current implementation of object-oriented programming in Matlab does not support polymorphism of classes, which is typically a key feature of the pure object-oriented paradigm. The lack of polymorphism produced a need for additional code to handle different types of inputs to the same function, and an ongoing effort is to bypass the polymorphism issues by using default parameters when constructing classes.

Notes

Acknowledgments

The authors would like to acknowledge support for this research provided by the National Science Foundation, SES-0729264. Additional funding was from the Gulf of Mexico, Texas SEA Grant programs NA06OAR41770076, but the conclusions and recommendations are those of the authors and do not reflect the views of the US Department of Commerce - National Oceanic and Atmospheric Administration (NOAA). The first author would also like to thank financial support from Graduate Student Affinity Group of the American Association of Geographers.

References

  1. Anselin L (1988) Spatial econometrics: methods and models. Kluwer Academic Publishers, DordrechtGoogle Scholar
  2. Anselin L (1992) SpaceStat, a software program for analysis of spatial data. National Center for Geographic Information and Analysis (NCGIA), University of California, Santa Barbara, CAGoogle Scholar
  3. Anselin L (2000) Computing environments for spatial data analysis. J Geogr Syst 2(3):201–220CrossRefGoogle Scholar
  4. Anselin L (2002) Mapping and analysis for spatial social science. Center for Spatially Integrated Social Science, University of California, Santa Barbara, CAGoogle Scholar
  5. Anselin L, Getis A (1992) Spatial statistical analysis and geographic information systems. Ann Reg Sci 26(1):19–33CrossRefGoogle Scholar
  6. Anselin L, Syabri I, Smirnov O (2002) Visualizing multivariate spatial correlation with dynamically linked windows. Center for Spatially Integrated Social Science, University of California, Santa Barbara, CAGoogle Scholar
  7. Anselin L, Syabri I, Kho Y (2005) GeoDa: an introduction to spatial data analysis. Geogr Anal 38(1):5–22CrossRefGoogle Scholar
  8. Barry R, Pace RK (1999) A Monte Carlo estimator of the log determinant of large sparse matrices. Linear Algebra Appl 289:41–54CrossRefGoogle Scholar
  9. Bivand R (2002a) Implementing spatial data analysis software tools in R. Center for Spatially Integrated Social Science, University of California, Santa Barbara, CAGoogle Scholar
  10. Bivand R (2002b) Spatial econometrics functions in R: classes and methods. J Geogr Syst 4(4):405–421CrossRefGoogle Scholar
  11. Bivand R, Gebhardt A (2000) Implementing functions for spatial statistical analysis using the R language. J Geogr Syst 2(3):307–317CrossRefGoogle Scholar
  12. Bivand R, Portnov B (2004) Exploring spatial data analysis techniques using R: the case of observations with no neighbors. In: Anselin L, Florax R, Rey S (eds) Advances in spatial econometrics: methodology, tools and applications. Springer, Berlin, pp 121–142Google Scholar
  13. Bivand R, Pebesma E, Gomez-Rubio V (2008) Applied spatial data analysis with R. Springer, BerlinGoogle Scholar
  14. Buliung R, Remmel T (2008) Open source, spatial analysis and activity-travel behaviour research: capabilities of the aspace package. J Geogr Syst 10(2):191–216CrossRefGoogle Scholar
  15. Cook D, Majure J, Symanzik J, Cressie N (1996) Dynamic graphics in a GIS: a platform for analyzing and exploring multivariate spatial data. Comput Stat 11(4):467–480Google Scholar
  16. Cook D, Symanzik J, Majure J, Cressie N (1997) Dynamic graphics in a GIS: more examples using linked software. Comput Geosci 23(4):371–385CrossRefGoogle Scholar
  17. Cressie N (1991) Statistics for spatial data. Terra Nova 4(5):613–617CrossRefGoogle Scholar
  18. De Smith M, Goodchild M, Longley P (2009) Geospatial analysis—a comprehensive guide to principles, techniques and software tools. Matador, LeicesterGoogle Scholar
  19. Farley JA, Limp WE, Lockhart J (1990) The archaeologist’s workbench: integrating GIS, remote sensing, EDA and database management. In: Allen KMS, Green FSW, Zubrow EBW (eds) Interpreting space: GIS and archaeology. Taylor and Francis, London, pp 141–164Google Scholar
  20. Fischer M, Getis A (eds) (1997) Recent development in spatial analysis. Springer, BerlinGoogle Scholar
  21. Fischer M, Nijkamp P (eds) (1993) Geographic information systems, spatial modelling and policy evaluation. Springer, BerlinGoogle Scholar
  22. Fotheringham A, Rogerson P (1994) Spatial analysis and GIS. Taylor and Francis, LondonGoogle Scholar
  23. Goodchild MF, Haining RP (2004) GIS and spatial data analysis: converging perspectives. Pap Reg Sci 83(1):363–385CrossRefGoogle Scholar
  24. Goodchild MF, Haining RP, Wise S (1992) Integrating GIS and spatial analysis—problems and possibilities. Int J Geogr Inf Syst 6(5):407–423CrossRefGoogle Scholar
  25. Goodchild MF, Anselin L, Appelbaum R, Harthorn B (2000) Toward spatially integrated social science. Int Reg Sci Rev 23(2):139–159Google Scholar
  26. Haining RP (1989) Geography and spatial statistics: current positions, future developments. In: Macmillan B (ed) Remodelling geography. Blackwell, Oxford, pp 191–203Google Scholar
  27. Heba I, Malin E, Thomas-Agnan C (2002) Exploratory spatial data analysis with GEOXP. In: European regional science association conference papers. Available via ERSA 2002. http://www-sre.wu-wien.ac.at/ersa/ersaconfs/ersa02/cdrom/papers/498.pdf. Accessed 26 August 2009
  28. Laurent T, Ruiz-Gazen A, Thomas-Agnan C (2006) GeoXp: an R package for interactive exploratory spatial data analysis. In: useR! 2006 presentations. Available via CRAN. http://www.r-project.org/user-2006/Slides/LaurentEtAl.pdf. Accessed 26 August 2009
  29. Le Gallo J, Ertur C (2003) Exploratory spatial data analysis of the distribution of regional per capita GDP in Europe, 1980–1995. Pap Reg Sci 82(2):175–201CrossRefGoogle Scholar
  30. Leitner M, Brecht H (2007) Software review: crime analysis and mapping with GeoDa 0.9.5-i. Soc Sci Comput Rev 25(2):265–271CrossRefGoogle Scholar
  31. LeSage JP, Pace RK (2004a) Arc_Mat, a toolbox for using ArcView shapefiles for spatial econometrics and statistics. In: Egenhofer MJ, Fresksa C, Miller HJ (eds) GIScience 2004. LNCS, vol 3224, pp 179–190Google Scholar
  32. LeSage JP, Pace RK (2004b) Arc_Mat, a toolbox for using ArcView shapefiles for spatial econometrics and statistics. Econometrics toolbox for Matlab. Available via spatial-econometric.com. http://www.spatial-econometrics.com/html/arc_map.pdf. Accessed 16 August 2009
  33. LeSage JP, Pace RK (2009) Introduction to spatial econometrics. Chapman & Hall/CRC Press, Boca Raton, FLGoogle Scholar
  34. Levine N (2006) Crime mapping and the CrimeStat program. Geogr Anal 38(1):41–56CrossRefGoogle Scholar
  35. Okabe A, Okunuki K, Shiode S (2005) SANET: a toolbox for spatial analysis on a network. Geogr Anal 38(1):57–66CrossRefGoogle Scholar
  36. Openshaw S (1990) Spatial analysis and geographical information systems: a review of progress and possibilities. In: Scholten HJ, Stillwell JCH (eds) Geographical information systems for urban and regional planning. Kluwer, Dordrecht, pp 153–163Google Scholar
  37. Pace RK, Barry R (1997) Quick computation of regressions with a spatially autoregressive dependent variable. Geogr Anal 29(3):232–247Google Scholar
  38. Pisati M (2001) Tools for spatial data analysis. Stata Tech Bull 10(60):21–36Google Scholar
  39. Rangel T, Diniz-Filho J, Bini L (2006) Towards an integrated computational tool for spatial analysis in marcoecology and biogeography. Glob Ecol Biogeogr 15(4):321–327CrossRefGoogle Scholar
  40. Register A (2007) A guide to Matlab object-oriented programming. Chapman & Hall/CRC Press, Boca Raton, FLGoogle Scholar
  41. Rey S (2009) Show me the code: spatial analysis and open source. J Geogr Syst 11(2):191–207CrossRefGoogle Scholar
  42. Rey S, Anselin L (2006) Recent advances in software for spatial analysis in the social sciences. Geogr Anal 38(1):1–4CrossRefGoogle Scholar
  43. Rey S, Janikas MV (2006) STARS: space–time analysis of regional systems. Geogr Anal 38(1):67–86CrossRefGoogle Scholar
  44. Takatsuka M, Gahegan M (2002) GeoVISTA studio: a codeless visual programming environment for geoscientific data analysis and visualization. Comput Geosci 28(10):1131–1141CrossRefGoogle Scholar
  45. Theus M, Urbanek S (2008) Interactive graphics for data analysis: principles and examples. Chapman & Hall/CRC Press, Boca Raton FLGoogle Scholar
  46. Warren RE (1990) Predictive modelling of archaeological site location: a case study in the Midwest. In: Allen KMS, Green SW, Zubrow EBW (eds) Interpreting space: GIS and archaeology. Taylor and Francis, London, pp 201–215Google Scholar
  47. Wise S, Haining RP, Ma J (2001) Providing spatial statistical data analysis functionality for the GIS user: the SAGE project. Int J Geogr Inf Sci 15(3):239–254CrossRefGoogle Scholar

Copyright information

© Springer-Verlag 2009

Authors and Affiliations

  1. 1.Department of GeographyTexas State University, San MarcosSan MarcosUSA
  2. 2.Fields Endowed Chair in Urban and Regional Economics, Department of Finance and EconomicsTexas State University, San MarcosSan MarcosUSA

Personalised recommendations