March 17, 2006

Schemas, Interoperability and RDBMS

Many have asked me why GML offers application schemas. Would it not be simpler, they say, if GML just provided one single schema, and we used that for all of our geographic information? Isn't interoperability hindered by this open ended nature of GML application schemas? One even sees these kinds of remarks in OGC discussion papers (e.g. SOS).

One response to this is to consider interoperation between relational databases. They all understand SQL, and they all understand relational schemas. The relational schemas are different from one implementation to another - no one requires that all RDBMS support the same schema - in fact the idea is silly from the outset - after all the different DBMS instances are all supporting different applications - different domains of interest. It is of course, exactly the same for GML. GML application schemas apply to different domains of interest, and are different in just the same way that relational schemas are.

For some this is still not sufficient. They still seem to think that some sort of closed GML schema is sufficient. One can respond in two ways. Yes and No. Yes they are right in the sense that there are some things we ALL might agree upon (i.e. hold across multiple application domains) - for these GML has provided standard explicit encodings. These include MOST of the base things that people need to share like geometry, topology, observations, coverages, coordinate reference systems, units of measure, time and direction. These items are covered by FIXED schema components. No, in that GML does not want to invent yet another schema language ,and this is exactly what is required if we are to have a single GML schema that covers a broad range of application domains. For schema definition we have elected (at least for the near term) XML Schema. Readers may note that GML has used other schema languages in the past such as DTD and RDF - other schema languages may be used in the future - but it does not make sense to try and create yet another one within the boundaries of GML.

 

Posted by RLake at 19:38:43 | Permanent Link | Comments (0) |

March 14, 2006

SDI Concepts

SDI = Spatial Data Infrastructure. Every national government seems to have one. We even talk of a GSDI (Global) - but there have been few if iany realizations. What do we mean by SDI? How close are we to creating an SDI with commercial software technology? What functinality should an SDI provide?

Let's put things in some concrete context. You are a highway or subdivision planner. To do your job you need access to lots of information. Location of existing street and highway networks, the water network, the electrical system, telecommunication systems etc etc. All information that is likely held by multiple organizations in a multitude of formats and with many disparate and possibily inconsistent data models. You will use that information in a variety of planning, design and project management tools to create proposed highway designs, subdivision designs etc. which you will need to share with your colleagues in the transportation authorities, land development organization, building approval, land reclamation etc etc. Furthermore you will need to be able to share this information in a secure manner and such that some people can see some things and others can see other things. The things you share will be both actual existing structures and proposed and planned ones. The things you want from others will be much the same. So information must flow in a controlled and secure manner between multiple parties - in as near to real time a manner as possible. To achieve this, however, neither you, nor your colleagues want to give up the planning, design and project management software you have grown to love and to hate. Better the devil you know then one you do not. So this SDI thing must do a lot. It is clear that:

An SDI is much more than a portal:

A portal is a set of presentation services - user interfaces - that provide access to things for people. From a portal I could look at maps in my web browser - but only if there are a set of back end services. Moreover, since I want to contnue to use my existing planning, design and project management tools - unless these are all integrated into the portal I am not going to be very happy. A portal is then just part of the story.

Note that I really do need much more than just the presentation of maps. This is very nice for planning and for discussion - but I need the actual dimensions and other properties of structures and natural objects - how else can I plan the ones I will introduce into the world? So an SDI must provide:

Universal Data Discovery:

I need a way to find all those needed information sources. I need a way to determine my access rights. I need a way to specify the access rights of those with who I am willing to share my data - my plans, designs and proposals for the future.
I need a way to register what I am interested in and find out what is available and how to get it. Ideally I can access all of the needed information online - but we all know this is off somewhere in the future - but I do need to access what I can access and access it now in real time. And please don't tell me I need to worry about format conversion - or changes of coordinates and such. Surely the SDI and help me with ..

Universal Data Access:

Of course, I don't expect that all data will be free or freely accessible. After all I know much of the information I have is confidential (the new highway route is significant economically and premature release of this information could be disastrous). In some cases I know I will need to pay for data, in some cases not. The SDI should enable these kinds of access - meaning access based on who you are and access based on whether or not you have paid your accounts. So Universality yes, but circumscribed by appropriate access control. So the SDI needs to support Universal Access with Universal Access Control and Authentication - meaing across the SDI, hence across the community of interest.

Of course Universal Data Access is not all that helpful unless I can access the models of the information being supplied. What is a road in one community is something else in another. SO there must be a means for the SDI to provide access to the

Community Vocabulary:

By this I mean the various objects shared by the community - their names and properties - does a road have a width? a surface type? a classification? How are these expressed? I am not sure I need (or could use as yet) a full ontology - but at least a dictionary of the shared (common to the community) objects is needed and in both human and machine readable terms.

Given this information - I can advertise my own data to suit or provide the appropriate translation between my view of the world and that which I share with the community. So I expect the SDI to provide me access to these common vocabularies and perhaps some tools to help me with translation.

While I think of it - I am not sure I really want to think about the SDI itself all that much. Even going to a portal seems outside my normal experience. In fact., I think a requirement for an SID should be transparency or even invisibility.

Transparency:

If I work in planning, I already have a set of tools that I work with - ones I am used to and ones which have developed over time in my own field - hence provide user features that enhance my productivity. I am not going to give these up for an SDI. Of course, one should not have to. The SDI should take care of that for you. The SDI should enlarge in a transparent as possible manner your applications access to data and services beyond your application domain - but in such a way that your existing application can use them. This is in effect the key problem for SDI.

Even transparency is not enough, however. Why should an SDI be restricted to spatial information? How could restrict it to spatial information anyways? Would I need to be able to distinguish between spatial and non-spatial information? I am not sure I would even know how to do that. So maybe the spatial in SDI has as much to the spatial distribution of the actors involved as it does to the partly spatial character of the information being manipulated. Or perhaps it is the spatial (geographic) nature of the information that demands integration because of the inherently integrated nature of the world.

There are indeed many points to ponder - there is much more to SDI than we might first suppose.

 

Posted by RLake at 01:48:30 | Permanent Link | Comments (1) |

March 05, 2006

GML Complexity Re-visited

I have discussed the issue of GML complexity a number of times in this blog. Mostly we have looked at things like the number of tags, use of XML Schema, subject complexity and so forth. Most of it was pretty qualitative. We had no real measures of the complexity, nor comparisons to other established XML grammars to see how GML stacked up. Well, now some folks over at Microsoft, led by Stan Kitsis have set about to create a number of XML Schema metrics and applied these to a large number of schemas, GML among them. Their work used GML v3.1 which is close enough to the current release (GML v3.1.1 and the pending GML v3.2) to mean their results are completely refelective of the GML we are all working with or planning to. The paper is entitled "Analysis of XML Schema Usage" and begins by developing a variety of metrics for XML Schema size and complexity and utiization of particular XML Schema features (e.g. Model-group operators, Simple type features, Occurence features, subtyping and friends, mixed content, wild cards, identity constraints and modularization).

They then provide statistics on the application of these metrics to a set of 63 schema projects from different IT Sectors. Some were internal to Microsoft and some wee external including of course GML. The schemas included some 6000 individual schema files, with roughly 82,000 global element names.

So how did GML stack up? There is not space to go over all of the findings and I will leave that to Stan and the Microsoft folks. However just a few items will give you the general idea.

Schema Size based on Lines of Code (LOC)

The range of schemas is shown in the table below with GML.

 

LOC-based category

Definition

Schema count

Mini

0 – 100

0

Small

100 – 1,000

12

Medium

1,000 – 10,000

24

Large

GML

10,000 – 100,000

10,291 lines

23

Huge

100,000 – …

4

It is clear from this measure that GML is at the bottom end of the large schemas.

Schema Size - Based on size in kilobytes.

The schemas in the study ranged from a 6 Kbytes to 18 Mbytes. Most of these schemas (26 of the 63) are in the range of 100 KB to 1MB and this is indeed where we find GML at 532 Mbytes. There were NOT many small schemas (only 6 less than 10Kbytes), and as one might expect not many really large schemas (only 11 in this range).

Number of Complex Type Definition:

Some people think GML is complex because it declares so many complex types - well does it?

According to the Microsoft study this metric ranged over the following:

 

#CT-based category

Definition

Schema count

Mini

0 – 32

13

Small

32 – 100

12

Medium

100 – 256

14

Large

256 – 1,000

12

Huge

1,000 – …

12

and GML - well 287 - so again at the bottom end of the large schemas.

 

Posted by RLake at 23:09:32 | Permanent Link | Comments (10) |

Observations are for more than sensor data

The idea of a GML observation (see other notes in this blog) was conceived as a model of the "act of observing" - the doing of it. While it has been fairly widely applied for sensor data, and even for tourist photographs, the concept equally applies to the usual update of geographic features based on land surveys, GPS or photogrammetry.

Consider the example of a crew of a water supply company out surveying a water main. They are responsible for the accurate location of the water main which wil be recorded in the company's geographic database. In the course of doing this survey they "observe" that one of the parcel boundaries is in error or at least with respect to their information baseline. They record this for company records. They would like to update the City's database, but that is not their responsibility and in most jurisdictions there is no formal mechanism to make the update. Even if there were, what should they report? They cannot change the location of the parcel for the city. Even if they had the authority, their "observation" may need to be modified (or the existing parcel fabric may need to be modified) before their observation can be integrated into the City's geographic database. They can, however, submit a GML observation recording the time, the observed feature characteristics (i.e. the parcel boundary geometry), and the means of making the observation. No one can argue with this information and it is quite conceivable that this information could be directed to and recorded by the city. The city could then process this GML Observation (provided as a WFS transaction) and then use it to create a transaction that modifies the actual parcel fabric.

Observations can be useful items.

Posted by RLake at 22:47:54 | Permanent Link | Comments (5) |

Application Schemas Drive Profiles

While there has been a lot of noise of late respecting the importance of profiles, this in many ways puts the cart before the horse. I will agree that simplicity and a low entry barrier are important for widespread adoption. At the same time, however, one might be more focused on real user need rather than the need of existing vendor product implementations.

If we take a more market-oriented perspective - we would then focus first on application schemas - these after all define the needs - the vocabulary of the application domain in question. Here we see a wide variety of demand. Some areas really need unusual geometries like Clothoids or Geodesics - unusual only to traditional GIS, while others need very simply structures (e.g. web news feeds like geoRSS). To expect a single profile will cover all of these is quite unrealistic. This is the beauty of GML - it is rich enough to cover this wide range of application domains. At the same time, we only need pick and chose what the particular application domain requires. If no Clothoids then don't use them.

In this manner we see that Application Schemas drive profiles - and a user in a given application really focuses on the Application Schema and the profile (subset of GML components) that this application schema requires.

This is not a critique of GML Simple Features (GML SF) - but simply to point out the value of a more application driven perspective.

Posted by RLake at 22:36:52 | Permanent Link | Comments (0) |