Blog moved to Galdos corporate site
Hi,
This blog has now been moved to http://www.galdosinc.com/archives/category/media-center/blog
See you there!!
Ron
Hi,
This blog has now been moved to http://www.galdosinc.com/archives/category/media-center/blog
See you there!!
Ron
ebRIM is one of the key specifications from OASIS. It provudes a general data model for metadata managemenrt. It is also the basis of the WRS or ebRIM profile of the OGC Catalogue 2.0 specification.
ebRIM provides a rich set of features including:
These features are supported throuigh the e-business registry information model which provides a class hierarchy including RegistryObject, RegistryPackages, ExtrinsicObject, Classification, ClassificationNode, and Association.
ebRIM also provdies the concept of Registry and Repository, with the ExtrinsicObjects of the Registry acting as proxies for items in a Repostory associated to the Registry.
In this note we outline the use of ebRIM in relation to GML.
The basic idea is to create a set of ebRIM objects that correspond to the GML core object types including feature, coverage, observation, coordinate reference system, and unit of measure definition.
This approach allows ebRIM to provide an logical model of the GML type hierarchy.
The root of the GML object hierarchy gml:_GML is represented in our ebRIM model by GMLObject, as subtype of ebRIM RegistryObject. We choose this since GMLObject cannot be instantiated and hence cannot have a corresponding repository item. All other GML objects in our model are then derived from this GMLObject, including Feature, CRS, and UofM, valueObject. Subtypes of Feature, namely Coverage, Observation and DynamicFeature.
Other deeper objects are not represented in this model (some will argue that they should be) since they are typically used to describe the more "primary" objects identified above.
Each of the primary objects identified in our model are mapped to a repository item. This is in effect the meaning of "Extrinsic" in ExtrinsicObject. The repository item is then typically a GML core schema fragment.
With this more logical model one can readily publish GML application schemas and other encodings (e.g. UML models) from this ebRIM representation. It can also provide the basis for feature catalogues, image catalogues and others. This will be discussed in greater detail in upcoming blogs. It will also provide the basis for advanced treatment of GML schema components, liberating them from the current notion of schema files. More on this later ...
When one says that a Road feature has 3 lanes what do we mean? Does it mean that a friend of mine has observed this to be the case? That a government body or private company will attest that this is a fact? What is the difference? Perhaps the issue is made more clearly if we replace the number of lanes property by say the centerline of the road - its geometry. Presumably you would be much less ready to accept my friends version than that which has received the blessing of the government or a private corporation responsible for data distribution? At the same time, you know that the government or private company had to at some point get the information, perhaps not from a friend of mine, but nonetheless from some individual or group of individuals out in the field actually measuring or observing what is "true". So there is clearly an important distinction here.
The distinction is between what one might call an authorized feature - meaning that it has the support of some organization (e.g. government department, private company) and an observation - meaning that some person or organization has observed something, but it has not yet received the blessing of "authorization".
One may ask why such a formal distinction need be made. Why not just treat everything as an observation and get the authorization part by simply noting who made it? The problem with such an approach is that the same organization may be responsible for both observing and authorizing. Additionally, an organization may be the authority (or custodian) for some types of information, while being simply an observer (albeit an authorized one) for something else. A water company would generally be accepted as the authority for the location of water mains, while it might be considered an authorized observer (one that can be trusted by the custodian) for the location of street centerlines or parcel boundaries. Note that this does not mean that the custodian takes the observations of the authorized observer at face value - simply that they are taken into account in determining the authorized location of these objects, where the survey conducted by my "friend" would not be. Of course this is equally true within the organization, the authorized location, extent etc. of an object results from multiple observations, all presumably obtained from authorized observers.
The passage from observation to authorized feature is not typically a completely formal process. It involves QA of course and very often human judgement as to the fitness for purpose of the resulting data object.
These notions of authorized feature and observation are captured in an approximate way in GML. I say approximate in that in GML there is no "authorized" construct. One can interpret, however, every GML feature to be an authorized feature, and then use GML observations to capture information about features which have not been "authorized". Such observations are then authorized as observations and serve as the input to generating authorized features.
When GML was first developed, many people thought of it as another kind of "shape file", that is something supporting the transfer model for geographic data. By transfer model I mean the movement of geographic data from one system to another using files. While GML does in fact support the transfer model, and can do so better than most transfer formats, this was never the real reason for its development - GML was created to support geospatial transactions on the Internet.
In the transfer model, the exporter has no knowledege of the schemas of the intended target. A schema can, and usually is constructed from the data source, and data conformant with this schema is written to the transfer file. Usually the eschema, if explicitly recorded, is carried along with other transfer "metadata" in the file header. In the case of GML there is no header to speak of, however, more or less the same ideas apply. The data instances refer to the schema to which they conform. Additional metadata for the transfer could then be encoded in this application schema.
The chief benefit of the transfer model is that it captures the exported data regardless of the intended transfer target. Since many transfer formats are binary they may be smaller (sometimes a lot smaller) than uncompressed GML data used in the transfer fashion. I can thus send you an export of data about some subject without having any knowledge of your system or data schemas.
The negative aspects of the transfer model are also fairly clear. Suppose 6 features out of 100 change in some area of interest. I can export these 6 features into a transfer file. Perhaps only some properties of these features have changed. Nonetheless I need to transfer all 6 features at a minimum since I typically have no way to encode updates or inserts in my transfer syntax. We will show how this can be done using the OGC WFS protocol in the following article. In many cases it is not so convenient to determine that 6 features have changed and as a result I send all 100 features. So very often the transfer model leads to a much larger amount of data being transferred than with the transactional model.
When I get to the point of loading these into the target system I now have to match these features to features in the target system. This may require schema-based data translation since the source and target schemas can be very different. Quite often this is a manual process and sometimes fairly prone to introducing errors. If the transfer scheme allows me to select and transfer only the changed features then things are not too bad. If it does not I may need to overwrite the complete set of features (e.g. the 100 in our example).
Clearly the transfer model can be quite slow, require manual intervention and is prone to the introduction of errors.
In the transactional model, data is transferred from one system to the other using database transactions. This is the method employed with the OGC Web Feature Service. A transaction is a message that is transferred from one system to the other and which requests the receiving system to modify its data store in a specific way as requested in the message. Furthermore the actions of the receiver must be carried out in accordance with the usual rules that ensure the integrity of such transactions, namely that they are atomic, consistent, isolated and durable. It is the responsibility of the interacting servers to ensure that these characteristics are maintained.
GML supports the WFS transaction by providing a data description capability used in formulating the request, and as a transport for the returned data (e.g. for read operations that return data). Note that it is for this reason that GML provides a rather rich expression capability for database schemas. Any relational-spatial schema can be expressed in GML and GML together with the WFS protocol can tus express requests against any such database.
Support for the transaction model means that GML/WFS can actually be much faster than "conventional" transfer models - but it requires an adjustment in your thinking. Are you ready for it?
Feature type dictionaries have been with us for quite some time, either within proprietary products, or in open standards such as the Digest FACC. Such feature dictionaries identify concepts or terms of interest within a particular domain of discourse, but do not assign or bind specific properties to these feature types or abstract concepts. Concrete feature types are then constructed from these feature types (a 1:many relationship) in one or more feature catalogues.
This idea is very similar to the model employed by RDFS (Resource Description Framework Schema Language) in which Classes are defined by assertion (the rdfs:Class statement in XML), assertions which do not bind properties to the class definition as would be the case in many object oriented models. Properties in RDF are defined by assigning their domain and range, the domain being the class on which the Property is defined, the range being the class on which the Property takes it values. Property definitions can live in completely different namespaces than the Class definitions - thus different people and organizations can see a single concept as having different concrete realizations (feature types in a feature catalogue).
Since GML was originally written in RDF/S (GML v1.0 profile 1.3), and since many of the GML constructs are copied from RDF/S it should not be too surprising that GML can also represent this separation of Class definitions and Properties, and feature dictionaries and feature catalogues. Since GML is currently written in XML Schema (and not in RDF/S) we can also expect that there are some small things lacking in the GML description.
To create a feature type dictionary in GML, one just creates a set of element declarations, all of which are abstract (abstract="true"), which have no properties and which derive from gml:AbstractFeatureType. Note that such elements automatically have properties including name, description, ID etc that one would find in a feature dictionary.
Property dictionaries can be created in exactly the same manner although these dictionaries MUST import the associated feature type schema for the classes on which the properties are defined (domain). Note that GML has no standard way of saying that the domain of a property is a particular class (feature type). The range part if ok, but the domain would require the use of an additional element (e.g. in AppInfo) to clearly designate the property's domain. Otherwise such properties could apply to ANY feature type in the associated feature type dictionary.
To create a feature catalogue we create a GML application schema in the usual fashion, with each bound feature type's content model (A concrete feature type with bound properties) deriving from the appropriate abstract feature type declaration in the feature type dictionary schema. In such a schema you will see all the properties by ref - i.e. relement ref = "something in the Property Dictionary").
So modulo the non-explicit designation for domains this captures the feature dictionary and feature catalogue structure pretty well.
Note that the "abstract" feature type dictionaries are often hierarchical in nature and this hierarchy can also be captured directly in the feature type dictionary schema using XML Schema inheritance in the usual fashion.
This leads to another interesting connection - namely that such a feature type hierarchy can also be viewed as a classification scheme in the sense of ebRIM. Feature dictionaries thus map to ebRIM classification schemes. Concrete feature types with bound properties can then be seen as being classified by their parent feature type (classification leaf) in the ebRIM scheme.
Like they say - "everything is connected"
I recently had the opportunity to listen to a keynote address by Michael Jones, the CTO of Google Earth, given at Map Middle East 2006, in Duabi, U.A.E.. Most of his talk dealt with the problem he felt GE was focused on, and how it differed from the problems that had and were being attacked by most of the GIS industry. It was an entertaining and at times thoughtful talk. He said that GE wanted to create what he called a "sense of place" or rather wished to enable people to create that "sense of place" and to share it with their friends and colleagues whether nearby or across the globe. While this was unlikely the original motivation of GE, I think it is a clear summary of why GE has been successful and what in the GE experience really appeals to people. Of course the global imagery is nice, as is the smooth pan and zoom and the neat fly over from one place to another .. all of this is clearly a necessary component of their success - but the sufficient bit as Michael alluded to - is that these things enable a sense of sharing of place - where I went on my vacation - what the area around my cottage looks like - that is something socially valuable to most of the people on the earth - and something that can drive Google's core business, namely the selling of advertising. It may in the process also contribute to a shared sense of the earth itself.
Given all this, there is clearly also a confluence between the objectives of GE and those of the conventional GI community. Traditional users of geographic information - meaning larger corporations and governments at all levels - also deseparately need and want to share geographic information with one another. While they may not be driven by a shared sense of place (genus loci) they are increasingly realizing that their own business processes demand access to information they don't have, and cannot afford to collect.
It is my view that these common objectives can best be met by a global linking of spatial information systems - those that collect and maintain geographic information for operational and decision making reasons - with one another - for broader and higher level decision making and to share the state of the world with one another. Marshall McCluhan said we were living in Global Village - perhaps with GE to bring the awareness and GI technology to provide the foundations - such a village can yet be a nice place to call home.
In our previous note, GeoWeb and Survival, we looked at the importance of managing the environment on natural zone boundaries rather than in terms of the political units that exist today. Of course it is highly unlikely that we will in the proximate future actually alter the existing political boundaries. Even if we did, such a move would be insufficient because the zones of natural management overlap one another and often in complex ways. Hence we need a way to acquire information on a natural management zone basis while at the same time retaininig our existing politicial infrastructure. I agree that this is only half the story, since to act effectively we must also modify the interaction of the politicial institutions so that they can react in the appropriate fashion to the information views organized on the basis of natural management zones. This second and vital component of the response we will have to leave to others, noting that without the unified information view such a new direction for the management response is both unlikely and unworkable.
It should be mentioned in passing that we have in effect two notions of GeoWeb in this discussion. The first in the information GeoWeb that is the subject of this blog. The second is that of the "web of life" that natural scientists and system thinkers have embraced for a very long time. Since the natural processes are by definition distributed over the surface of the earth, it is no stretch to think of this as also a GeoWeb. It is the fusion of these concepts of GeoWeb that is at the heart of the current discussion.
So how can the GeoWeb (information technology) help us to deal with the GeoWeb(natural systems and the environment)?
One of the difficulties that we face in moving to management based on natural zones is the misalignment of information boundaries. As we have already noted existing information boundaries are based on more or less arbitrary political units defined by nation states and subdividied into states, provinces, counties, communities, cities and municipalities. In fact there is a myriad of such boundaries which overlap one another in a completely arbitrary fashion. None of this is likely to change.
The way forward is to put in place Spatial Data Infrastructures (SDI) that provide transparent access to the information managed by these politically defined jurisidictions. Applications for analysis, mapping, display, and other forms of decision support can then be constructed on top of the SDI layer thus providing the applications for negotiation and management in the natural management zone.
Now I did not say that this is an easy task. Different jurisdctions means different vendor technologies are used. It also means that the world is modeled in different ways with only a rough correspondance between the entities of one jurisdiction and those of another. Furthermore, the jursidictions posesss the expertise to actual create, document and manage their part of the information. They are the stewards or custodians of that information and this must be respected if we are to have any hope that the information we are sharing is accurate and current. Finally, we must note that bringing disparate information sources together will reveal not only intrinsic errors within the individual data components, but conflicts also between the one component and another.
Any solution to these problems is only going to be approximate at best, but this is still miles ahead of moving forward with out any information or with information which is very incomplete or very out of date.
Existing SDI technology can go a long way to addressing these problems.
GML can provide a common schema language by which information providers can expose their information models to one another and so do in the context of the Internet. Furthermore such models can readily be maintained and shared as they change in one jurisdiction or another. The mere fact of sharing these models can lead to changes and to important integration of concepts and vocabulary. When I see that your street is the same as my road, either one of us can change or we can provide the appropriate automated mapping tools to transform requests and data from one system to the other. In the not too distant future technologies such as OWL will allow us to define the underlyng objects ("what is a lake?") in a machine readable form thus enabling such mappings to be defined by computer assisted techniques, and possiblty completely automated ones in the farther future.
Web Feature Services (WFS) can provide the necessary movement of data and can do in a transparent manner possibly exploiting the schema-based data transformations referenced above. Furthermore, advanced WFS can also apply on-the-fly data integrity checks to provide assurance that data meets the required "community" data quality standards. Furthermore this can be done in an open and transparent fashion.
Web Registry Services (WRS) can be used to "register" the members of the community - i.e. the set of data providers and processing services that make up a given natural management zone and can enable automated (machine driven) access to information resources distributed on the various WFS. Furthermore, the WRS can manage projects and other activities managed by the mutliple jurisidictions (multiple agencies) that interact within the management zone.
In effect, SDI technology provides the foundation layer to create a virtual (or realized) information base that underpins the decision support applications on which management of natural zone will depend. Different government agencies can thus more readily co-operate (and negotiate) on how the zone is to be managed. Moreover such an infrastructure can be deployed in a such a way as to survive the enumerable re-organizetions that government agencies and large corporations are heir to.
By providing this information base without upsetting the existing apple cart of political and administrative authorities we may find a way forward to manage our interaction with the world in a saner manner than is possible today. Perhaps this arcane worold of XML and Web Services may make a not insignificant contribution to our long term survoval.
In the recent book Collapse, by Jerod Diamond, the author asks us to speculate on what was in the mind of the Easter Islander as he feld the last remaining tree on the island. Perhaps he thought there were still trees elsewhere? Perhaps the trees would still grow back? Perhaps they could obtain timber from another source on a nearby island? Of course we can never know the answer to these questions. Diamond asks us to speculate in order to get us to reflect on our own decisions in the 21st century.
Regardless of your views on the question, it is clear that the decision of the tree cutter, like the manifold decision makers of our era, depends on access to information. Increasingly, it is also becoming clear that the information that we need to access, and the domains on which we may need to make decisions are not likely to coincide with the administrative regions that we as politicial animals have heretofore established. Our world is a complex set of interacting systems within which there are natural regions or zones. Simple examples are obvious enough such as watersheds and ocean basins. These bio-geo-climatic zones provide in effect a natural decomposition of the world, and may serve to provide a better basis for long term management of the planet then our current politicial boundaries are able to. They may also provide the basis for en ecologicaly focused economics in which the flows of natural capital are integrated into the flows of monetary capital, for these two things are inextricably interconnected whether we acknowledge it or not.
The GeoWeb, as we have used the term in these pages, refers to the ability to transparently share information about the world without regard to vendor technology, and which at the same time respects the stewardship of information by various organizations. In our current context we could see the GeoWeb as providing the information base for our natural zonal accounting system, since the purpose of any accounting system is to make visible what is going on. In the corporation it is to make visible the components of the company that function well and those which need improvment. In the context of managing the world around us it is no different. In order to act, we must know what is happening, else we too may cut down that last remaining tree and not be around for further speculation.
Many have asked me why GML offers application schemas. Would it not be simpler, they say, if GML just provided one single schema, and we used that for all of our geographic information? Isn't interoperability hindered by this open ended nature of GML application schemas? One even sees these kinds of remarks in OGC discussion papers (e.g. SOS).
One response to this is to consider interoperation between relational databases. They all understand SQL, and they all understand relational schemas. The relational schemas are different from one implementation to another - no one requires that all RDBMS support the same schema - in fact the idea is silly from the outset - after all the different DBMS instances are all supporting different applications - different domains of interest. It is of course, exactly the same for GML. GML application schemas apply to different domains of interest, and are different in just the same way that relational schemas are.
For some this is still not sufficient. They still seem to think that some sort of closed GML schema is sufficient. One can respond in two ways. Yes and No. Yes they are right in the sense that there are some things we ALL might agree upon (i.e. hold across multiple application domains) - for these GML has provided standard explicit encodings. These include MOST of the base things that people need to share like geometry, topology, observations, coverages, coordinate reference systems, units of measure, time and direction. These items are covered by FIXED schema components. No, in that GML does not want to invent yet another schema language ,and this is exactly what is required if we are to have a single GML schema that covers a broad range of application domains. For schema definition we have elected (at least for the near term) XML Schema. Readers may note that GML has used other schema languages in the past such as DTD and RDF - other schema languages may be used in the future - but it does not make sense to try and create yet another one within the boundaries of GML.
SDI = Spatial Data Infrastructure. Every national government seems to have one. We even talk of a GSDI (Global) - but there have been few if iany realizations. What do we mean by SDI? How close are we to creating an SDI with commercial software technology? What functinality should an SDI provide?
Let's put things in some concrete context. You are a highway or subdivision planner. To do your job you need access to lots of information. Location of existing street and highway networks, the water network, the electrical system, telecommunication systems etc etc. All information that is likely held by multiple organizations in a multitude of formats and with many disparate and possibily inconsistent data models. You will use that information in a variety of planning, design and project management tools to create proposed highway designs, subdivision designs etc. which you will need to share with your colleagues in the transportation authorities, land development organization, building approval, land reclamation etc etc. Furthermore you will need to be able to share this information in a secure manner and such that some people can see some things and others can see other things. The things you share will be both actual existing structures and proposed and planned ones. The things you want from others will be much the same. So information must flow in a controlled and secure manner between multiple parties - in as near to real time a manner as possible. To achieve this, however, neither you, nor your colleagues want to give up the planning, design and project management software you have grown to love and to hate. Better the devil you know then one you do not. So this SDI thing must do a lot. It is clear that:
An SDI is much more than a portal:
A portal is a set of presentation services - user interfaces - that provide access to things for people. From a portal I could look at maps in my web browser - but only if there are a set of back end services. Moreover, since I want to contnue to use my existing planning, design and project management tools - unless these are all integrated into the portal I am not going to be very happy. A portal is then just part of the story.
Note that I really do need much more than just the presentation of maps. This is very nice for planning and for discussion - but I need the actual dimensions and other properties of structures and natural objects - how else can I plan the ones I will introduce into the world? So an SDI must provide:
Universal Data Discovery:
I need a way to find all those needed information sources. I need a way to determine my access rights. I need a way to specify the access rights of those with who I am willing to share my data - my plans, designs and proposals for the future.
I need a way to register what I am interested in and find out what is available and how to get it. Ideally I can access all of the needed information online - but we all know this is off somewhere in the future - but I do need to access what I can access and access it now in real time. And please don't tell me I need to worry about format conversion - or changes of coordinates and such. Surely the SDI and help me with ..
Universal Data Access:
Of course, I don't expect that all data will be free or freely accessible. After all I know much of the information I have is confidential (the new highway route is significant economically and premature release of this information could be disastrous). In some cases I know I will need to pay for data, in some cases not. The SDI should enable these kinds of access - meaning access based on who you are and access based on whether or not you have paid your accounts. So Universality yes, but circumscribed by appropriate access control. So the SDI needs to support Universal Access with Universal Access Control and Authentication - meaing across the SDI, hence across the community of interest.
Of course Universal Data Access is not all that helpful unless I can access the models of the information being supplied. What is a road in one community is something else in another. SO there must be a means for the SDI to provide access to the
Community Vocabulary:
By this I mean the various objects shared by the community - their names and properties - does a road have a width? a surface type? a classification? How are these expressed? I am not sure I need (or could use as yet) a full ontology - but at least a dictionary of the shared (common to the community) objects is needed and in both human and machine readable terms.
Given this information - I can advertise my own data to suit or provide the appropriate translation between my view of the world and that which I share with the community. So I expect the SDI to provide me access to these common vocabularies and perhaps some tools to help me with translation.
While I think of it - I am not sure I really want to think about the SDI itself all that much. Even going to a portal seems outside my normal experience. In fact., I think a requirement for an SID should be transparency or even invisibility.
Transparency:
If I work in planning, I already have a set of tools that I work with - ones I am used to and ones which have developed over time in my own field - hence provide user features that enhance my productivity. I am not going to give these up for an SDI. Of course, one should not have to. The SDI should take care of that for you. The SDI should enlarge in a transparent as possible manner your applications access to data and services beyond your application domain - but in such a way that your existing application can use them. This is in effect the key problem for SDI.
Even transparency is not enough, however. Why should an SDI be restricted to spatial information? How could restrict it to spatial information anyways? Would I need to be able to distinguish between spatial and non-spatial information? I am not sure I would even know how to do that. So maybe the spatial in SDI has as much to the spatial distribution of the actors involved as it does to the partly spatial character of the information being manipulated. Or perhaps it is the spatial (geographic) nature of the information that demands integration because of the inherently integrated nature of the world.
There are indeed many points to ponder - there is much more to SDI than we might first suppose.