Tuesday, May 17, 2005

Ontological Classification

I contend that IT standards need to be organized around a taxonomy where taxonomy is defined as "classification into ordered groups or categories." The reason why taxonomy is necessary is to ensure that similar products which provide overlapping functionality cluster close together. The primary rationale for establishing IT standards is to reduce complexity and increase efficiencies through consolidation and improved collaboration.

While you'd think it impossible to imagine a more ugly word than taxonomy, I believe the IT industry may well have found one with the recent emergence in popularity of the term ontology. Defined as a branch of metaphysics, ontology is a term borrowed from philosophy that refers to the science of describing the kinds of entities in the world and how they are related.

The newfound appeal for this word ontology is probably the direct result of its usage within the next generation W3C initiative called the Semantic Web.

In Scientific American, May 2001, Tim Berners-Lee, original creator of the World Wide Web and inventor of HTML and HTTP, wrote:

The Semantic Web is an extension of the current web in which information is given well-defined meaning, better enabling computers and people to work in cooperation.

The Semantic Web's Ontology Language is intended to be used when the information contained in documents needs to be processed by applications, as opposed to situations where the content only needs to be presented to humans. This ontology language formally describes the meaning of terminology used in Web documents.

In an essay written by Clay Shirky entitled, "Ontology is Overrated: Categories, Links, and Tags", ontology is described as follows:

It is a rich irony that the word "ontology", which has to do with making clear and explicit statements about entities in a particular domain, has so many conflicting definitions. I'll offer two general ones.

The main thread of ontology in the philosophical sense is the study of entities and their relations. The question ontology asks is: What kinds of things exist in the world, and what manner of relations can those things have to each other? Ontology is less concerned with what is than with what is possible.

The knowledge management and AI communities have a related definition -- they've taken the word "ontology" and applied it more directly to their problem. The sense of ontology here is something like "an explicit specification of a conceptualization."

Shirky goes on to define categorization and classification as the act of organizing a collection of entities into related groups and ontological classification or categorization as organizing a set of entities into groups based on their essences and possible relations.

Chemistry's Periodic Table of Elements, organizing elemental atoms by the number of protons in their nucleus, is probably the best classification ever created.

Another famous example of categorization and classification is the Dewey Decimal Classification card catalogs used by American libraries.

Ontological classification works best when you need a domain-specific hierarchy with formal categories. To help make ontology a workable classification strategy, you need expert catalogers and coordinated expert users. On the other hand, if you've got a large, ill-defined corpus, naive users, catalogers who aren't experts, and no one to say authoritatively what's going on, then ontology is going to be a bad strategy. Based on this, it's not necessarily clear how good a fit the Semantic Web's Ontology Language will be. However, a Technology Architecture for specifying IT standards is an ideal candidate for ontological classification.

Look for example at IT Infrastructure. It consists of Clientware, Middleware, Serverware, Manageware, and Platforms, as illustrated below:

Click on the above image to view a more legible Flash version.

The $64,000 question is, how well does the above model describe infrastructure in terms of its descriptive and predictive value?


Anonymous Anonymous said...

Clay's points on When NOT to use formal Ontology makes sense to me. The points you are making in reference to the graphics will be helped if those graphics were visible to my IE browser.


5:07 PM  
Blogger ITscout said...

To view the graphic in detail, you have a couple of choices.

You can access the pdf version of the IT Infrastructure Roadmap poster and use Acrobat's Zoom In Tool.

Alternatively, you can login to ITscout and use Flash's Zoom In feature.

11:59 PM  
Blogger Jack Krupansky said...

I expect the confusion between the terms taxonomy and ontology will continue for quite some time.

The easy way to understand the difference is to think of an ontology as a vocabulary. The words you use to describe entities and their relationships and interactions. An ontology is essentially the set of characteristics or attributes used to describe entities. A taxonomy is a hierarchical categorization of entities based on *some* selected characteristics.

Taxonomy is nice when it actually elaborates some interesting entity characteristics, but a lot of situations have entities which don't nest cleanly into other than a relatively flat hierarchy.

Ontology is the right place to start; that's what a database architect does.

In the case of open semantic networks, some standards for ontology are essential. The actual taxonomy will evolve and "emerge" over time as actual entities come into existence that utilize the attributes described by the ontology.

-- Jack Krupansky

3:05 PM  

Post a Comment

<< Home