The Semantic Web: A guide to
the future of
XML, Web services and knowledge management
By
Michael C. Daconta, Leo J. Obrst and Kevin T. Smith
Overview:
XML, now a days, is universal syntax for exchanging data between organizations and by using the standard schema, organizations can produce these text documents that can be validated, transmitted and parsed by any application regardless of hardware or operating systems.
What is
semantic web and where is the application?
“The first step is putting the format of data that machines can naturally understand or converting it to that form. That creates what I call a semantic web-a web that can be processed directly or indirectly by machines.” Tim Berners-Lee (1999)
Tim Berners-Lee has
a two part vision: first is making the web more collaborative medium
and second
is to make the web understandable and processed by machines. This
vision deals
with type of relationships and data transfers like “includes”,
“describes”, and
“writes” between different applications that unfortunately is not
captured yet
and building blocks are being deployed in small domains and prototypes.
It is
so important to know that the original vision of Berners-Lee
encompassed
additional
Semantic
web is a machine processed web of smart data; while smart data can be
defined
as data that is application independent, composed, classified
and part of a larger information ecosystem.
It is clear
that semantic web is not just for the internet and the company will
need that
as a need in intranet and extranet too, because all of the application
should
use same language to be understandable for each other. In general the
need of
semantic web can be explained as follow:
Information overload is a great restriction for reusing the experiences,
information and existing
data, in other words the amount of received information will remain
unused
because there is no time to process them.
Stovepipe system is a system where all the components are hardwired to only
work
together. Thus the information only travels between the clients that
use the
single database with a fixed schema. Breaking down stovepipe systems
needs to
occur on all tiers of enterprise information architecture; however the
semantic
web technologies will be more effective in breaking down stove piped
database
systems.
Poor content aggregation is another problem that semantic web will
solve; summarizing the
information with different sources in some of the fields such as
financial
account aggregation, portal aggregation, comparison shopping and
content mining
is impossible now.
It is obvious
that traditional knowledge management technologies have faced new
challenges by
today's internet: information overload, the inefficiency of keyword
searching,
the lack of authoritative information and the lack of natural
language computer systems. The semantic web will bring the
ability and the
technology of tagging the information into machine understandable
markup and
the insurance of authoritative information.
The
Collection of information with semantics, may lead to knowledge that
enables
the staff to make well informed decisions and allows the clients to
find the
hidden relationship between data in data bases already exists.
Another
important application of Semantic web is business development during
the
scenarios such as customer relationship management, decision making and
so
forth.
Semantic web and
smart data:
Semantic
web will be built on Extensible Markup Language (XML) and the shift of
information from application to smart data will be continued with
creating the machine
processed and machine understandable data
while
this path will make the data smarter.
Smart data
continuum can be shown in four stages:
Text and data bases (pre XML):
The initial
stage, that data is most proprietary to an application and can not move
between
the applications. Thus the smartness is in the application not data.
XML documents for a single domain:
In this
stage data achieves application independence within a specific domain.
Data now
is smart enough to move in a single domain.
XML is a
set of syntax rules for creating semantically rich markup languages in
a
particular domain. In other words you apply XML to create a new
language and
any language created via the rules of XML is called an application of
XML. One
of the key advantages of using XML is providing a robust, simple and
standard
syntax for encoding the meaning of metadata.
Concept of
XML is not a new technology; XML is a subset of the Standardized
Generalized
Markup Language (SGML) that was invented in 1969 by Dr. Charles Goldfrab, Ed Mosher and Ray Lorie. Thus the
concepts for
XML were devised over 30 years ago and continuously perfected, tested
and
broadly implemented.
The XML
specification defined two levels of conformance for XML documents: well
formed
and valid. Well formed is mandatory, while validity is
optional. A
well formed XML document complies with all the W3C syntax rules of XML( explicitly called out in the XML
specification as
well formed ness constrains) like naming, nesting and attribute
quoting. This requirement
guarantees that an XML processor can parse the document without error.
A valid
XML document references and satisfies a schema, while schema is a
separate
document whose purpose is to define the legal elements, attributes and
structure of an XML instance document. In general schema defines legal
vocabulary, number and placement of elements and attributes in markup
languages
thus particular type or class of document is well defined with schema. ( http://www.w3c.org)
With above
mentioned specifications XML has become the universal syntax for
exchanging
data and information in the organizations. By agreeing on a standard
schema,
organization can produce these text documents that can be validated,
transmitted, and parsed by any application regardless of hardware or
operating
system.
XML Taxonomies and documents with mixed
vocabularies:
Taxonomy,
basically, is the study of general principles to classify the objects
in to
different hierarchical levels. Using taxonomies for classifying the
information
is important because data and information can be mined by keywords and
general
idea.
In this
stage, data is defined in classifications and the classification can be
used
for discovery of data. Simple relationships between categories in the
taxonomy
can be used to relate, combine and process the data and now data is
smart
enough to be discovered and combined in one specific domain.
XML Ontologies and rules:
Ontology
defines the common words, concepts and their meanings used to describe
and
represent an area of knowledge so that fine, accurate, consistent, and
meaningful distinction can be made among the classes, instances,
properties
attributes and relations. Some ontology tools can perform automatically
reasoning
using ontologies and thus provide advanced services to intelligent
applications
such as conceptual semantic search and software agents, decision
support,
knowledge management, intelligent databases and electronic commerce. An
ontology can range from the simple notion of a taxonomy (knowledge with
minimal
hierarchic) to a conceptual model or to a logical theory with rich,
complex,
consistent, meaningful knowledge.
In this
stage data can be inferred by following specific logical rules. In
essence,
data is now smart enough to be described with concrete relationships,
and
sophisticated formalisms where logical calculations can made on this
“Semantic
Algebra”. This allows the combination and recombination of data at a
more
atomic level and very fine grained analysis of data.
What are the steps
of learning about semantic web implementation
and how this book works?
In most of
the cases the captured information from environment will be lost before
being
saved or transformed and the process of knowledge will be stopped.
Other wise
the data will be saved as a digital file indexed into a search engine.
There
are too many data bases in an organization that are stove piped
together and
this will lead to shortage of integration depending on the complexity
of the organization.
Any attempt to combine this information is tedious process, involving
data
conversion, incompatible software systems and frustrated system
integrators.
Most of the time organization should pay a great amount of money to a
system
integrator to create a very expensive stove piped system that
integrates with
other parts of the system and uses same language of them. In this point
discovery of existing information of the organization is meaningful and
accessible. Discovery process of the search engine most of the time is
based on
keywords and Boolean logic that may lead to relevant or irrelevant
results.
All above
mentioned challenges can lead the organization to use semantic web
technologies
to craft an information architecture vision touching every part of the
organization life cycle.
In “The
Semantic web” these related materials can be found as following:
-For
starting the semantic web implementation first step is convincing the
managers
that will need to understand the reasoning behind the change, but may
not want
to focus on the technologies. This book was written with management in
mind and
chapters 1 and chapter 2 have enough materials to understand the
semantic web
vision and application. Chapter 3 and chapter 6 are about XML, as the
foundation
of the semantic web, and XML is well introduced there. Web services
wrap all
the functions Chapter 5 speaks about RDF that works as an intermediate
between
XML file and taxonomy and ontology. Thus a document is XML inside, RDF
outside,
filed in a branch of
taxonomy and related to classes in the ontology. Chapters 7 and 8 are
about
taxonomy and ontology.
This book
changed my technical attitude to web services, E-business,
communication
through networks even knowledge management, because I didn't have a
useful
perception about technical issues in this field and that was the main
reason
that I choose this title for reviewing. The easy going language of the
book
helped me to understand the whole materials and articles very easily.