April 21, 2006

Transfer and Transaction Models

When GML was first developed, many people thought of it as another kind of "shape file", that is something supporting the transfer model for geographic data. By transfer model I mean the movement of geographic data from one system to another using files. While GML does in fact support the transfer model, and can do so better than most transfer formats, this was never the real reason for its development - GML was created to support geospatial transactions on the Internet.

In the transfer model, the exporter has no knowledege of the schemas of the intended target. A schema can, and usually is constructed from the data source, and data conformant with this schema is written to the transfer file. Usually the eschema, if explicitly recorded, is carried along with other transfer "metadata" in the file header. In the case of GML there is no header to speak of, however, more or less the same ideas apply. The data instances refer to the schema to which they conform. Additional metadata for the transfer could then be encoded in this application schema.

The chief benefit of the transfer model is that it captures the exported data regardless of the intended transfer target. Since many transfer formats are binary they may be smaller (sometimes a lot smaller) than uncompressed GML data used in the transfer fashion. I can thus send you an export of data about some subject without having any knowledge of your system or data schemas.

The negative aspects of the transfer model are also fairly clear. Suppose 6 features out of 100 change in some area of interest. I can export these 6 features into a transfer file. Perhaps only some properties of these features have changed. Nonetheless I need to transfer all 6 features at a minimum since I typically have no way to encode updates or inserts in my transfer syntax. We will show how this can be done using the OGC WFS protocol in the following article. In many cases it is not so convenient to determine that 6 features have changed and as a result I send all 100 features. So very often the transfer model leads to a much larger amount of data being transferred than with the transactional model.

When I get to the point of loading these into the target system I now have to match these features to features in the target system. This may require schema-based data translation since the source and target schemas can be very different. Quite often this is a manual process and sometimes fairly prone to introducing errors. If the transfer scheme allows me to select and transfer only the changed features then things are not too bad. If it does not I may need to overwrite the complete set of features (e.g. the 100 in our example).

Clearly the transfer model can be quite slow, require manual intervention and is prone to the introduction of errors.

In the transactional model, data is transferred from one system to the other using database transactions. This is the method employed with the OGC Web Feature Service. A transaction is a message that is transferred from one system to the other and which requests the receiving system to modify its data store in a specific way as requested in the message. Furthermore the actions of the receiver must be carried out in accordance with the usual rules that ensure the integrity of such transactions, namely that they are atomic, consistent, isolated and durable. It is the responsibility of the interacting servers to ensure that these characteristics are maintained.

GML supports the WFS transaction by providing a data description capability used in formulating the request, and as a transport for the returned data (e.g. for read operations that return data). Note that it is for this reason that GML provides a rather rich expression capability for database schemas. Any relational-spatial schema can be expressed in GML and GML together with the WFS protocol can tus express requests against any such database.

Support for the transaction model means that GML/WFS can actually be much faster than "conventional" transfer models - but it requires an adjustment in your thinking. Are you ready for it?

Posted by RLake at 16:31:19 | Permanent Link | Comments (1) |

April 12, 2006

Feature Catalogues/Dictionaries, GML and RDF/S

Feature type dictionaries have been with us for quite some time, either within proprietary products, or in open standards such as the Digest FACC. Such feature dictionaries identify concepts or terms of interest within a particular domain of discourse, but do not assign or bind specific properties to these feature types or abstract concepts. Concrete feature types are then constructed from these feature types (a 1:many relationship) in one or more feature catalogues.

This idea is very similar to the model employed by RDFS (Resource Description Framework Schema Language) in which Classes are defined by assertion (the rdfs:Class statement in XML), assertions which do not bind properties to the class definition as would be the case in many object oriented models. Properties in RDF are defined by assigning their domain and range, the domain being the class on which the Property is defined, the range being the class on which the Property takes it values. Property definitions can live in completely different namespaces than the Class definitions - thus different people and organizations can see a single concept as having different concrete realizations (feature types in a feature catalogue).

Since GML was originally written in RDF/S (GML v1.0 profile 1.3), and since many of the GML constructs are copied from RDF/S it should not be too surprising that GML can also represent this separation of Class definitions and Properties, and feature dictionaries and feature catalogues. Since GML is currently written in XML Schema (and not in RDF/S) we can also expect that there are some small things lacking in the GML description.

To create a feature type dictionary in GML, one just creates a set of element declarations, all of which are abstract (abstract="true"), which have no properties and which derive from gml:AbstractFeatureType. Note that such elements automatically have properties including name, description, ID etc that one would find in a feature dictionary.

Property dictionaries can be created in exactly the same manner although these dictionaries MUST import the associated feature type schema for the classes on which the properties are defined (domain). Note that GML has no standard way of saying that the domain of a property is a particular class (feature type). The range part if ok, but the domain would require the use of an additional element (e.g. in AppInfo) to clearly designate the property's domain. Otherwise such properties could apply to ANY feature type in the associated feature type dictionary.

To create a feature catalogue we create a GML application schema in the usual fashion, with each bound feature type's content model (A concrete feature type with bound properties) deriving from the appropriate abstract feature type declaration in the feature type dictionary schema. In such a schema you will see all the properties by ref - i.e. relement ref = "something in the Property Dictionary").

So modulo the non-explicit designation for domains this captures the feature dictionary and feature catalogue structure pretty well.

Note that the "abstract" feature type dictionaries are often hierarchical in nature and this hierarchy can also be captured directly in the feature type dictionary schema using XML Schema inheritance in the usual fashion.

This leads to another interesting connection - namely that such a feature type hierarchy can also be viewed as a classification scheme in the sense of ebRIM. Feature dictionaries thus map to ebRIM classification schemes. Concrete feature types with bound properties can then be seen as being classified by their parent feature type (classification leaf) in the ebRIM scheme.

Like they say - "everything is connected"

Posted by RLake at 22:55:38 | Permanent Link | Comments (1) |

April 10, 2006

Genus Loci

I recently had the opportunity to listen to a keynote address by Michael Jones, the CTO of Google Earth, given at Map Middle East 2006, in Duabi, U.A.E.. Most of his talk dealt with the problem he felt GE was focused on, and how it differed from the problems that had and were being attacked by most of the GIS industry. It was an entertaining and at times thoughtful talk. He said that GE wanted to create what he called a "sense of place" or rather wished to enable people to create that "sense of place" and to share it with their friends and colleagues whether nearby or across the globe. While this was unlikely the original motivation of GE, I think it is a clear summary of why GE has been successful and what in the GE experience really appeals to people. Of course the global imagery is nice, as is the smooth pan and zoom and the neat fly over from one place to another .. all of this is clearly a necessary component of their success - but the sufficient bit as Michael alluded to - is that these things enable a sense of sharing of place - where I went on my vacation - what the area around my cottage looks like - that is something socially valuable to most of the people on the earth - and something that can drive Google's core business, namely the selling of advertising. It may in the process also contribute to a shared sense of the earth itself.

Given all this, there is clearly also a confluence between the objectives of GE and those of the conventional GI community. Traditional users of geographic information - meaning larger corporations and governments at all levels - also deseparately need and want to share geographic information with one another. While they may not be driven by a shared sense of place (genus loci) they are increasingly realizing that their own business processes demand access to information they don't have, and cannot afford to collect.

It is my view that these common objectives can best be met by a global linking of spatial information systems - those that collect and maintain geographic information for operational and decision making reasons - with one another - for broader and higher level decision making and to share the state of the world with one another. Marshall McCluhan said we were living in Global Village - perhaps with GE to bring the awareness and GI technology to provide the foundations - such a village can yet be a nice place to call home.

Posted by RLake at 02:34:50 | Permanent Link | Comments (0) |

April 04, 2006

GeoWeb and Survival Part II - Towards Environmental Security

In our previous note, GeoWeb and Survival, we looked at the importance of managing the environment on natural zone boundaries rather than in terms of the political units that exist today. Of course it is highly unlikely that we will in the proximate future actually alter the existing political boundaries. Even if we did, such a move would be insufficient because the zones of natural management overlap one another and often in complex ways. Hence we need a way to acquire information on a natural management zone basis while at the same time retaininig our existing politicial infrastructure. I agree that this is only half the story, since to act effectively we must also modify the interaction of the politicial institutions so that they can react in the appropriate fashion to the information views organized on the basis of natural management zones. This second and vital component of the response we will have to leave to others, noting that without the unified information view such a new direction for the management response is both unlikely and unworkable.

It should be mentioned in passing that we have in effect two notions of GeoWeb in this discussion. The first in the information GeoWeb that is the subject of this blog. The second is that of the "web of life" that natural scientists and system thinkers have embraced for a very long time. Since the natural processes are by definition distributed over the surface of the earth, it is no stretch to think of this as also a GeoWeb. It is the fusion of these concepts of GeoWeb that is at the heart of the current discussion.

So how can the GeoWeb (information technology) help us to deal with the GeoWeb(natural systems and the environment)?

One of the difficulties that we face in moving to management based on natural zones is the misalignment of information boundaries. As we have already noted existing information boundaries are based on more or less arbitrary political units defined by nation states and subdividied into states, provinces, counties, communities, cities and municipalities. In fact there is a myriad of such boundaries which overlap one another in a completely arbitrary fashion. None of this is likely to change.

The way forward is to put in place Spatial Data Infrastructures (SDI) that provide transparent access to the information managed by these politically defined jurisidictions. Applications for analysis, mapping, display, and other forms of decision support can then be constructed on top of the SDI layer thus providing the applications for negotiation and management in the natural management zone.

Now I did not say that this is an easy task. Different jurisdctions means different vendor technologies are used. It also means that the world is modeled in different ways with only a rough correspondance between the entities of one jurisdiction and those of another. Furthermore, the jursidictions posesss the expertise to actual create, document and manage their part of the information. They are the stewards or custodians of that information and this must be respected if we are to have any hope that the information we are sharing is accurate and current. Finally, we must note that bringing disparate information sources together will reveal not only intrinsic errors within the individual data components, but conflicts also between the one component and another.

Any solution to these problems is only going to be approximate at best, but this is still miles ahead of moving forward with out any information or with information which is very incomplete or very out of date.

Existing SDI technology can go a long way to addressing these problems.

GML can provide a common schema language by which information providers can expose their information models to one another and so do in the context of the Internet. Furthermore such models can readily be maintained and shared as they change in one jurisdiction or another. The mere fact of sharing these models can lead to changes and to important integration of concepts and vocabulary. When I see that your street is the same as my road, either one of us can change or we can provide the appropriate automated mapping tools to transform requests and data from one system to the other. In the not too distant future technologies such as OWL will allow us to define the underlyng objects ("what is a lake?") in a machine readable form thus enabling such mappings to be defined by computer assisted techniques, and possiblty completely automated ones in the farther future.

Web Feature Services (WFS) can provide the necessary movement of data and can do in a transparent manner possibly exploiting the schema-based data transformations referenced above. Furthermore, advanced WFS can also apply on-the-fly data integrity checks to provide assurance that data meets the required "community" data quality standards. Furthermore this can be done in an open and transparent fashion.

Web Registry Services (WRS) can be used to "register" the members of the community - i.e. the set of data providers and processing services that make up a given natural management zone and can enable automated (machine driven) access to information resources distributed on the various WFS. Furthermore, the WRS can manage projects and other activities managed by the mutliple jurisidictions (multiple agencies) that interact within the management zone.

In effect, SDI technology provides the foundation layer to create a virtual (or realized) information base that underpins the decision support applications on which management of natural zone will depend. Different government agencies can thus more readily co-operate (and negotiate) on how the zone is to be managed. Moreover such an infrastructure can be deployed in a such a way as to survive the enumerable re-organizetions that government agencies and large corporations are heir to.

By providing this information base without upsetting the existing apple cart of political and administrative authorities we may find a way forward to manage our interaction with the world in a saner manner than is possible today. Perhaps this arcane worold of XML and Web Services may make a not insignificant contribution to our long term survoval.

 

 

Posted by RLake at 23:40:49 | Permanent Link | Comments (1) |

GeoWeb and Survival

In the recent book Collapse, by Jerod Diamond, the author asks us to speculate on what was in the mind of the Easter Islander as he feld the last remaining tree on the island. Perhaps he thought there were still trees elsewhere? Perhaps the trees would still grow back? Perhaps they could obtain timber from another source on a nearby island? Of course we can never know the answer to these questions. Diamond asks us to speculate in order to get us to reflect on our own decisions in the 21st century.

Regardless of your views on the question, it is clear that the decision of the tree cutter, like the manifold decision makers of our era, depends on access to information. Increasingly, it is also becoming clear that the information that we need to access, and the domains on which we may need to make decisions are not likely to coincide with the administrative regions that we as politicial animals have heretofore established. Our world is a complex set of interacting systems within which there are natural regions or zones. Simple examples are obvious enough such as watersheds and ocean basins. These bio-geo-climatic zones provide in effect a natural decomposition of the world, and may serve to provide a better basis for long term management of the planet then our current politicial boundaries are able to. They may also provide the basis for en ecologicaly focused economics in which the flows of natural capital are integrated into the flows of monetary capital, for these two things are inextricably interconnected whether we acknowledge it or not.

The GeoWeb, as we have used the term in these pages, refers to the ability to transparently share information about the world without regard to vendor technology, and which at the same time respects the stewardship of information by various organizations. In our current context we could see the GeoWeb as providing the information base for our natural zonal accounting system, since the purpose of any accounting system is to make visible what is going on. In the corporation it is to make visible the components of the company that function well and those which need improvment. In the context of managing the world around us it is no different. In order to act, we must know what is happening, else we too may cut down that last remaining tree and not be around for further speculation.

Posted by RLake at 22:16:58 | Permanent Link | Comments (1) |