February 25, 2006

The problem with XML

One might think that the problem with XML has something to do with size or the cost of XML processing. While these are indeed issues, they are issues that tme and Moore's law make increasingly unimportant. On the other hand there is an issue with XML that will not be resolved by new technology. The problem is the simplicity and low level nature of XML. This was well stated in a Geotec Paper presented by Milan Trninic on why people believe GML is complex. He looked at the origins of complexity in XML - and in a sense found them in its simplicity.

It is easy to write XML. Just a few synatical rules and in moments anyone can be creating their own XML encoding of just about anything. This is a strength - but it is also a weakness. It means we can have XML's for anything - and we do. Geography Markup Language of course - but also Keyhole Markup Language, WorldWind Markup Language, Transportation Markup Language, Justice Markup Language, Mars Rover Markup Language - my Dogs Markup Language. In fact - all we need to do is take any decent schema editor and with a small amount of work we have a new something-ML.

This is a problem. If we proliferate markup languages for applications we can readily re-enter the world of Babel we were trying to escape.  To solve this problem requires us to think in a more global fashion - meaning to build on  work that has already been established. In many ways one could say that this is the program of GML.  GML is not an end use language.  It does not define all of the concrete things that area needed to describe transportation systems, nor geo-enabled web feeds- nor nautical charting.  GML provides two things - a constant fixed encoding structure (the so called object-prperty-value) rule and a set of primitives - from whcih you can construct other XML languages for vertical domains.  Many application communities have understood this and are building their applicaton language on GML from Mning and Mineral exploration (XMML) to geoRSS (geoRSS GML) and aeronautical information sytsem (AIXMLGML). 

 

Posted by RLake at 01:15:11 | Permanent Link | Comments (2) |

February 15, 2006

The importance of profiles

A great deal of noise has been made over the past year or two about the importance of profiles. In some cases it is to try and control some domain. In others it is to get a low entry bar to encourage specification adoption. In some cases it is an organization's attempt to put a particular spin on a specification or technology.

As readers will be aware, Profiles exist for GML and have been discussed here previously. Profiles which are strict subsets of GML have a definite use - they can limit the vocabulary required to only what is necessary to support a given range of applications. They can adapt a specification to the requirements of a narrow application. In some cases they can lay the foundation for a lowest common denominator - a reduced version of a standard that can get others on the bandwagon. The profile of GML for JPEG 2000 (part of the recently announced GMLJP2 Specification) fits the former, while GML for Simple Features fits the latter. All of this is well and good. At the same time it is interesting to see the arguments put forward in support of GML profiles - some of which are not so clearly founded.

For example - "GML Profiling - Why it is important".

The article begins simply enough - outlining a more or less accurate versio of ESRI participation in the OGC and then stating a summary of the "rules" of GML. It notes that the root tag is always FeatureCollection. While this was more or less true in GML 2 (the root tag had to have a content model that derived from AbstractFeatureCollectionType) - this is not true in GML 3, since GML 3 "documents" are NOT restricted to being geographic features.

It further notes that a FeatureCollection need not be homogeneous and can mix features of different types and those features can have mixed geometry models. This is indeed the case - not only for GML but for real world thngs. A City, for example can be seen as a feature collection - and its features will be of many different types (roads, buildings, parks etc.) with quite varying geometric descriptions. The GML is intended to represent reality.

The author then goes on to note "Although these rules can allow great richness in feature model descriptions, this richness doesn't necessarily help achieve other goals such as interoperability and wide implementation."

This seems a strange comment - since it is the richness that is essential for interoperability. I am reminded of a presentaton often used by SAFE software to explain their FME format translator. They noted that one could have a whole bunch of x to y translators ( a sort of n^2 problem) or they could have their rich geometric/semantic model - I think they called it a "Thick" pipe - their analogy for the richness of their internal proprietary representation - and that was essential to adapt to the systems at each end of the pipe and by extension to achieve interoperability. It is thus GML's "thick expressive pipe" that is essential to interoperability. At the same time it is clear that this presents developer's with implementation challenges, but hardly ones that are insurmountable. Moreover, profiles can play a role as can standardized application schemas for a particular domain (e.g. geoRSS or AIXM GML).

It is interesting to note another of the author's complaints - namely:

"Although these rules can allow great richness in feature model descriptions, this richness doesn't necessarily help achieve other goals such as interoperability and wide implementation. From a GIS point of view, a FeatureCollection corresponds to a Layer. If a FeatureCollection contains endlessly nested levels of other FeatureCollections, this translates into one layer with endless sublayers—not very good practice from a GIS point of view."

and

"Also, nonhomogeneous features inside a FeatureCollection do not correspond to the structure of a layer with homogeneous features"

We should note that layers are really a visualization or presentation construct (layers related to layers in the printing process). Early computer aided mapping and GIS systems directly bound layers in presentation to "layers in the data? - and this gradually migrated into the database architecture. Mapping to the world of GML a "GIS layer" is more the presentation of specific feature types as in the transport layer consisting of road, rail and ferry route feature types - and this is clearly enshrined as such in the OGC Styled Layer Descriptor (SLD).

The author appeals to a particular technology implementation when he invokes the terms GIS, rather than thinking about the nature of the real world. Do geographic entities not often have a hierarchical structure? Do they not often have multiple geometric characteristics? Why would the ability to express reality more adequately be a failing of GML. One might more correctly argue that existing "GIS" fail to capture important aspects of the geographic world. This is similar to the arguments made about object technology in the early 90's that it did not easily match existing flat data structures.

None of this is to say that profile are not important - we would argue that they are - but to recall T.S. Elliot's Murder in the Cathedral:
"The last temptation is the greatest treason,
To do the right thing for the wrong reason."
Posted by RLake at 00:03:03 | Permanent Link | Comments (3) |

February 08, 2006

One person's metadata is another person's ...

I have always found the discussion of metadata to be problematic - somewhat like the dsicussion of "objects vs data" in the early days of object orientation in general computing. For some people it seems to make sense to refer to metadata as anything that it is not a raw number - as in "I will send you the image and the metadata" or "I will send you the sensor readings and the metadata". A result of this viewpoint is often seen in metadata specifications which can easily subsume all specifications - since it is ALL metadata.

A more sensible approach is to separate the tasks of data and metadata. Data is about defining objects (A vocabulary of terms) that describe and enterprise, an application domain or an application. The definition of such objects is a key part of information modeling. Up to this point we don't need the construct of metadata.

Once we have defined the set of objects for a given domain we can then talk about metadata for those objects - i.e. who created them, when, why etc.

If we apply this to imagery - we see first that we need to define various types of "image objects" - what in OGC/ISO terminology are called coverages - and then we can speak of metadata for these coverages.

In the GML world this is handled by the inclusion of a metaDataProperty with a content model that is essential XML Schema any. The metaDataProperty (or some other named property with the same content model) is intended to hold the metadata properties concerning the GML application object to which it is attached (its Parent element). As for other GML properties the metaDataProperty can carry and xlink:href that points to the metadata properties rather than having them inline. Metadata can thus be readily shared across multiple fature instances.

Every GML object contains an optional metaDataProperty by default - hence metadata can be attached to features, coverages, temporal objects, coordinate reference systems etc.

Note that GML does NOT specify the metadata properties pointed to or contained within the metaDataProperty - this is to be defined by a metadata application schema. Such a metadata application schema can be something widely standardized (e.g. ISO 19139) or could be defined by a single organization. This is not up to GML. GML simply provides a way of saying that the things in here (inside the metaDataProperty) are metadata. The other properties of the object, as defined by the GML application schema, are the properties that characterize (or define) the object.
Posted by RLake at 19:24:41 | Permanent Link | Comments (1) |

February 07, 2006

From Soup to Nuts

The idea of GML at the outset was to provide a common tool box for the widest possible range of applications consistent with the open ended nature of geography (i.e. almost all data is geographic), and the diversity of uses to which such information is put.  This in turn shaped the idea of a consistent encoding model (it took a few iterations to get the consistency) called the object-property (object-property-value) model, and the use of application schemas for the synthesis of new foundation and new domain objects.  This provides GML with uniformity for object builders (i.e. code developers), and the flexibility to support an almost unlimited set of applications.

The importance of these two aspects of GML - the consistent encoding model and the notion of application schemas is being borne out by the growing family of application schemas and application languages based on GML.  These include: (this is only a sample)
  • CSML: (Climate Science)
  • TransXML (Transportation Engineering)
  • LandGML (Land Survey)
  • S57GML (Nautical Charting)
  • AIXM GML (Aviation)
  • XMML (Mining and Mineral Exploration)
  • geoRSS GML (news feeds)
  • Imaging (GMLJP2)
GML has shown the ability to support very simple applications such news feeds, or very complex ones like nautical charting and climate science.  In the near future we can expect this family to grow as more and more organizations understand the utility of the core GML model and the efficiency of building on established constructs from geometry and features to coverages and observations, soup to niuts.
Posted by RLake at 06:23:36 | Permanent Link | Comments (2) |

February 02, 2006

GeoRSS - GML in news feeds

One of the interesting and recent applications of GML is to news feeds in Atom or RSS. This is an old idea - but one that has taken a major leap forward with the creation of a GML application schema for embedding geo-tags in RSS and ATOM feeds. While the formal schema has not been published this should be out pretty soon. To begin with it builds on simple GML geometries but has the scope for non-default coordinate reference systems and may be extended to use GML temporal constructs in the future. Describing a hike or a sailboat race may never be the same.

For some examples and the GML Application Schema (soon to be posted) see http://georss.org/gml.html#examples.

Also look to this blog in the future as we will be looking at GeoRSS in the context of WFS and WRS, as well as offering some GeoRSS GML tips and guidelines.
Posted by RLake at 03:55:26 | Permanent Link | Comments (6) |