GML Complexity
- The specification is thick (Over 600 pages).
- The specification describes many objects (over 1000 tags identified).
- GML uses application schemas.
- GML deals with complex topics (geometry, topology, coverages).
- GML separates presentation from content.
- GML has an object-property rule
- GML is written in XML Schema
The specification is thick
True. The specification is indeed long. It is, however, not longer than other important specifications such as XML Schema, SQL (over 1200 pages). One might compare the "complexity" of GML in terms of the size of the spec to the complexity of a telephone book. The latter is also very thick. Any large city (where "books" are still in use) will be some thousand or more pages of very fine print. In all cases, however, the model underlying the telephone book will be more or less the same and quite simple - namely person's name, address, and telephone number. Much the same could be said of GML. The specification is long - but in all of the objects described there is a the same model and it is quite simple - namely and object (Curve, Point, Feature) and the object's properties. More over this model has not changed to any appreciable degree since GML Version 1.0. So how to read the GML "phone book" is the same now as it always was. A simple model and a thick specification. This is because GML is essentially a content specification - it uses a simple model to describe a large number of kinds of objects.
The specification describes many objects
This is true. There are over 1000 tags in GML and hence a few hundred object types are described. How should I read the GML specification? To start with read the parts that interest you or are important for your area of application. I you are not concerned with topology you do not need to read that section. Ditto for coverages, observations etc. For many users, a general understanding of features and geometries is enough. For others only coordinate systems are important. It is just a function of the concepts you need in your application domain.
GML uses application schemas
Unlike many other XML Schema grammars, GML does not rely on a single closed schema to define GML application objects. If you want to have a road, river or church steeple you will need to create an application schema. Some people find this requirement complex. It has a number of well known precedents, however, including:
- Relational schemas - to create a table in a relational DBMS you need to decide on the table structure or schema. In the same way to create an object in GML you need to create a GML application schema using XML Schema.
- Objects - to create an instance of an object in object oriented languages like C++, Java etc. you need to create a class - the class defines a "schema" for the object.
Early in GML we considered creating a schema language in GML itself. Thus in one of the profiles of GML 1.0 you will see something like:
<gml:Feature typeName = "Road> .. </gml:Feature>
This was an attempt to make GML only a single schema. The difficulty with this approach is that:
- We are creating our own schema language in GML for which no tools exist or are likely to exist.
- While it might start very simple - it would likely grow into something complex like XML Schema as we added support for enumerations, ranges etc.
One might note that other geographic languages also use schemas, in particular KML (Google). It has gone the GML v1.1 route and defined a new schema language. At the moment this seems to support only simple types - but people will surely want more .. then what do we do.
GML deals with complex topics
This is certainly true. The topics that underly Geography are not necessarily simple. Since GML exposes these objects directly it provides exposure to the complexity of the objects themselves. What is a Polygon? Can it have holes? What is a geometry complex? and so on. GML is the raw nuts and bolts of geography. In terms of using GML you need only understand the objects you need to deal with ...
GML separates presentation from content
This is a common place of XML. Lots of XML is commonly styled to XHTML or HTML for presentation. Nonetheless this does introduce an additional level of complexity - as is always the case when a general problem (in this case map generation) into multiple constituent parts. The parts need to be composed together to do the task - something that was not necessary when it was all one thing. Of course this decomposition provides other benefits - the components are simpler and one can use different styling mechanisms for the same data - or apply a single styling mechanism to multiple kinds of data. Hence this is a tradeoff. Note that KML (Google) currently is a graphic presentation language (like SVG), a style description language (like SLD/XSLT), a geographic representation language like GML ..
GML has an object-property rule
GML provides a thin layer of semantics, namely the object property rule. This means that if you look at GML and you find an object, the children of that object (in the XML sense) are the properties of that object - no more and no less. The children are not sub types nor are they objects contained in the parent object. GML properties express attributes and associations (relationship) of the parent object. When you create an application object (e.g. Road) you are expected to follow this same rule. Properties of a Road are encoded in XML (GML) as child elements - hence:
<abc:Road gml:id ="hj1">
<abc:numLanes>3</numLanes>
<abc:surfaceType>gravel</abc:surfaceType>
...
</abc:Road>
So numLanes is always the numLanes(Road) or Road.numLanes. GML core schemas follow this same model. This means that a point in GML is not the minimal:
<Point>100 200</Point>
but rather the somewhat longer:
<Point>
<pos>100 200</pos>
<Point>
where pos is the coordinates of the Point. GML stays true as possible to the object-property model. Note that the object-property rule, like many things in GML is borrowed from RDF.
GML is written in XML Schema
As we noted above, an early design decision in GML was that it must be inherently extensible, and that such extensibility should come from an external schema language and NOT from GML itself. In GML v1.0, both DTD and RDFS were provided as the schema languages. From GML 2.0 on we have chosen XML Schema. This rests on a few basic principles:
- We did not want to create a new schema language just for GML.
- We wanted something that was widely used.
- We wanted something for which there were many fast parsers.
From these requirements, XML Schema as selected. This is NOT to say that GML can only be expressed in XML Schema. In fact there is consideration of also providing GML in OWL or RDFS (once again). Someone may construct a RelaxNG version of GML. This would be perfectly valid. Of course this implies interoperability issues between one representation and another.
Much of the processing complexity of GML (and the visual complexity) derives from XML Schema. Some people will argue that another schema language will make things much simpler. I think this is not likely the case - at least of the schema language offers comparable functionality. Noneless the implementation of the GML model in XML Schema does entail that GML application schema processors be able to do certain operations that are not completely trivial - such as handle inheritance tracing or deal with substitution groups. For this reason, various vendors offer GML SDK's that hide these XML Schema details from software developers.
So while I would not call GML "simple" - it is what one might call appropriately complex!!


Oversimplification is the root cause of much complexity. It all depends on who "pays for" the complexity. Its much better to have a complete model so that its easy to use than expect the data consumers to have to cope with lots of inconsistent usage of a trivial model. After all we can express anything in RDF... its very simple. But ask 100 people to describe a cadastral dataset in RDF and you'd get 100 different answers.
Imagine a bucket of nuts and bolts. The bucket is a very simple model. But its hard to use when the number of types of bolts grows beyond a very small number. A row of bolts and matching nuts on a shelf is much more complex. Which hardware store would you go to? The one cheaper to _use_ I'd imagine!
The main driver for simplification during development is modularisation. This is where we are poorly served by lack of mature governance models - who is going to look after what part? GML does a good job (in general) of not treading where it has no right to go, but these application schemas actually need to be glued together from bigger modules.
We dont have anywhere to get these model components from (yet). But a number of us are working on this. Wathc this space. (Comment this)
However, the best way to deal with the GML 'complexity' issue is through the new GML Simple Features Profile (GMLSF). GMLSF makes it much easier to use GML and WFS in multiple applications, and will translate to lower overall implementation costs and greater flexibility.
An article on using GMLSF in 'real-life' is available here http://www.directionsmag.com/article.php?article_id=1971&trv=1
Regards,
Jeff (Comment this)
1) The specification is thick:
Darn right it is! Way too thick. I find ArcObjects easier to work with, and that's saying something.
WKT is approx 10,000% easier to work with than the equivalent GML. GML forces people who are trying to accomplish simple tasks to learn and understand a gigantic pile of irrelevant crud.
A good tool should make simple things easy, difficult things possible, remember? The first is more important than the second: this is something that the GML designers forgot!
And yet, with all this complexity, somehow the geniuses defining GML managed to forget to specify a clear and unambiguous way for a system to communicate the spatial reference it is using. Whoops! Design-by-committe at its worst.
2) Specification describes many objects.
Yup -- a clear case of "Let's do everything!" which drove the need to create the simple features profile. I predict that 99.999% of GML-producing/consuming applications will support only the simple features profile.
3) GML uses application schemas
This is in a direction I'd like to see, but backwards. GML should have addressed two major roles:
1) Web-enabled shapefile replacemen.
1a) Extension modules for things like TIN, CAD, topology for when/if people needed it.
2) Set of spatial datatypes and components that application schemas can themselves embed.
Instead, GML is the entire kitchen sink, plus arbitrary embedded application schemas. Not very easy to work with.
4) Complex topics
Right-oh. And it deals with complex topics in complex ways, making things even more complex. Very helpful, thanks!
The problem here is that unlike, say, Arc/INFO, there is no unifying model to drive decisions about what is in the standard vs what is out -- or to drive a consistent way of representing what does make it in.
5) Presentation/content.
Right. What was the @show attribute for again?
More seriously, there is a huge breakdown in GML between "policy" and "mechanism", another one of those important dichotomies.
If some application requires distributed resources and multi-part datasets, then it should be up to the application designers to make such a thing work on top of GML. Instead, GML is littered with bogosities like those rediculous actuate attributes.
6) GML has an object-property rule.
Well, yeah, so, but this isn't really phrased as a criticism. I think the need for this is an unfortunate side effect of the fact that 99% of GML's uses are as "shapefile replacement" and this was a hack to fit in the "attribute table".
7) GML is written in XML Schema
... which is why I use GML as exhibit number one when I say why XML Schema stinks. It uses all of the fancies XML Schema functionalities and as a result breaks all kinds of tooling that never fully supported XML Scheam (due to XML Schema's own complexity!).
(Off topic: It's ironic that every single restriction you can put into XML Schema can be put into relax-ng, but not vice versa -- in particular, you cannot tie the contents of an element to an attribute of that element... rules like: if the 'href' is set, then the polygon element must be empty, if not, it must have at least one polygonMember.)
Alas, GML will be the language du jour for a while, just like XML Schema. Both have made my life inordinately more difficult: hence the bashing in a comment on a random blog. (Comment this)