The Semantic Web: A guide to the future of XML, Web services and knowledge management


By Michael C. Daconta, Leo J. Obrst and Kevin T. Smith

 

Reviewed by B. Goolpour

 

Overview:

 

XML, now a days, is universal syntax for exchanging data between organizations and by using the standard schema, organizations can produce these text documents that can be validated, transmitted and parsed by any application regardless of hardware or operating systems.


XML is the foundation of all applications and using single syntax of communication enabled the organizations to exchange the information inside and outside of the company.


Knowledge management deals with great amounts of data exchange during capturing, cataloguing, and dissemination of data through intranet, extranet and internet, thus knowing XML can be very important and can be one part of the knowledge management studying. It seems that XML is driving the future of knowledge management and semantic web as an appropriate platform for knowledge presentation and transform.

 

What is semantic web and where is the application?

 

 “The first step is putting the format of data that machines can naturally understand or converting it to that form. That creates what I call a semantic web-a web that can be processed directly or indirectly by machines.” Tim Berners-Lee (1999)

 

  Tim Berners-Lee has a two part vision: first is making the web more collaborative medium and second is to make the web understandable and processed by machines. This vision deals with type of relationships and data transfers like “includes”, “describes”, and “writes” between different applications that unfortunately is not captured yet and building blocks are being deployed in small domains and prototypes. It is so important to know that the original vision of Berners-Lee encompassed additional metadata above and beyond what is currently in the web.

 

Semantic web is a machine processed web of smart data; while smart data can be defined as data that is application independent, composed, classified and part of a larger information ecosystem.

 

It is clear that semantic web is not just for the internet and the company will need that as a need in intranet and extranet too, because all of the application should use same language to be understandable for each other. In general the need of semantic web can be explained as follow:

 

Information overload is a great restriction for reusing the experiences, information and existing data, in other words the amount of received information will remain unused because there is no time to process them.

 

Stovepipe system is a system where all the components are hardwired to only work together. Thus the information only travels between the clients that use the single database with a fixed schema. Breaking down stovepipe systems needs to occur on all tiers of enterprise information architecture; however the semantic web technologies will be more effective in breaking down stove piped database systems.

 

Poor content aggregation is another problem that semantic web will solve; summarizing the information with different sources in some of the fields such as financial account aggregation, portal aggregation, comparison shopping and content mining is impossible now.

 

It is obvious that traditional knowledge management technologies have faced new challenges by today's internet: information overload, the inefficiency of keyword searching, the lack of authoritative information and the lack of natural language  computer systems. The semantic web will bring the ability and the technology of tagging the information into machine understandable markup and the insurance of authoritative information.

 

The Collection of information with semantics, may lead to knowledge that enables the staff to make well informed decisions and allows the clients to find the hidden relationship between data in data bases already exists.
 

Another important application of Semantic web is business development during the scenarios such as customer relationship management, decision making and so forth.

 

Semantic web and smart data:

 

Semantic web will be built on Extensible Markup Language (XML) and the shift of information from application to smart data will be continued with creating the machine processed and machine understandable data while this path will make the data smarter.
 

Smart data continuum can be shown in four stages:

 

Text and data bases (pre XML):

 

The initial stage, that data is most proprietary to an application and can not move between the applications. Thus the smartness is in the application not data.

 

XML documents for a single domain:

 

In this stage data achieves application independence within a specific domain. Data now is smart enough to move in a single domain.

 

XML is a set of syntax rules for creating semantically rich markup languages in a particular domain. In other words you apply XML to create a new language and any language created via the rules of XML is called an application of XML. One of the key advantages of using XML is providing a robust, simple and standard syntax for encoding the meaning of metadata.

 

Concept of XML is not a new technology; XML is a subset of the Standardized Generalized Markup Language (SGML) that was invented in 1969 by Dr. Charles Goldfrab, Ed Mosher and Ray Lorie. Thus the concepts for XML were devised over 30 years ago and continuously perfected, tested and broadly implemented.

 

The XML specification defined two levels of conformance for XML documents: well formed and valid. Well formed is mandatory, while validity is optional. A well formed XML document complies with all the W3C syntax rules of XML( explicitly called out in the XML specification as well formed ness constrains) like naming, nesting and attribute quoting. This requirement guarantees that an XML processor can parse the document without error. A valid XML document references and satisfies a schema, while schema is a separate document whose purpose is to define the legal elements, attributes and structure of an XML instance document. In general schema defines legal vocabulary, number and placement of elements and attributes in markup languages thus particular type or class of document is well defined with schema. ( http://www.w3c.org)

 

With above mentioned specifications XML has become the universal syntax for exchanging data and information in the organizations. By agreeing on a standard schema, organization can produce these text documents that can be validated, transmitted, and parsed by any application regardless of hardware or operating system.

 

XML Taxonomies and documents with mixed vocabularies:

 

Taxonomy, basically, is the study of general principles to classify the objects in to different hierarchical levels. Using taxonomies for classifying the information is important because data and information can be mined by keywords and general idea.

 

In this stage, data is defined in classifications and the classification can be used for discovery of data. Simple relationships between categories in the taxonomy can be used to relate, combine and process the data and now data is smart enough to be discovered and combined in one specific domain.

 

XML Ontologies and rules:

 

Ontology defines the common words, concepts and their meanings used to describe and represent an area of knowledge so that fine, accurate, consistent, and meaningful distinction can be made among the classes, instances, properties attributes and relations. Some ontology tools can perform automatically reasoning using ontologies and thus provide advanced services to intelligent applications such as conceptual semantic search and software agents, decision support, knowledge management, intelligent databases and electronic commerce. An ontology can range from the simple notion of a taxonomy (knowledge with minimal hierarchic) to a conceptual model or to a logical theory with rich, complex, consistent, meaningful knowledge.
 

In this stage data can be inferred by following specific logical rules. In essence, data is now smart enough to be described with concrete relationships, and sophisticated formalisms where logical calculations can made on this “Semantic Algebra”. This allows the combination and recombination of data at a more atomic level and very fine grained analysis of data. 

 

What are the steps of learning about semantic web implementation and how this book works?

 

In most of the cases the captured information from environment will be lost before being saved or transformed and the process of knowledge will be stopped. Other wise the data will be saved as a digital file indexed into a search engine. There are too many data bases in an organization that are stove piped together and this will lead to shortage of integration depending on the complexity of the organization. Any attempt to combine this information is tedious process, involving data conversion, incompatible software systems and frustrated system integrators. Most of the time organization should pay a great amount of money to a system integrator to create a very expensive stove piped system that integrates with other parts of the system and uses same language of them. In this point discovery of existing information of the organization is meaningful and accessible. Discovery process of the search engine most of the time is based on keywords and Boolean logic that may lead to relevant or irrelevant results.
 

All above mentioned challenges can lead the organization to use semantic web technologies to craft an information architecture vision touching every part of the organization life cycle.

 

In “The Semantic web” these related materials can be found as following:

-For starting the semantic web implementation first step is convincing the managers that will need to understand the reasoning behind the change, but may not want to focus on the technologies. This book was written with management in mind and chapters 1 and chapter 2 have enough materials to understand the semantic web vision and application. Chapter 3 and chapter 6 are about XML, as the foundation of the semantic web, and XML is well introduced there. Web services wrap all the functions Chapter 5 speaks about RDF that works as an intermediate between XML file and taxonomy and ontology. Thus a document is XML inside, RDF outside, filed in a branch of taxonomy and related to classes in the ontology. Chapters 7 and 8 are about taxonomy and ontology. 

 

This book changed my technical attitude to web services, E-business, communication through networks even knowledge management, because I didn't have a useful perception about technical issues in this field and that was the main reason that I choose this title for reviewing. The easy going language of the book helped me to understand the whole materials and articles very easily.