Downes.ca ~ Atlantic Workshop on Semantics and Services

Atlantic Workshop on Semantics and Services - Day 2

Jun 15, 2010
By Stephen Downes

Originally posted on Half an Hour, June 15, 2010.

Ontology-centric Knowledge Discovery in a Contact Centre for Technica Product Support
Brad Shoebottom, Innovatia

This paper outlines a technical support product that helps people find answers to questions for customers. It uses an ontology where named elements in tech nical documents are annotated, where this is derived from an ontology via a web services framework.

Description of a pilot study, of searchers with up to four terms. Phase 1 - ontology and usability test. This has been completed, and asked whether users can find answers to some common queries. Phase 2, scenario testing, begins next month.

The pilot survey results showed that people were finding results more quickly, including on the visual query. There were testing challenges, especially in finding enough people. Extra time was needed to gather baseline results with the old toolset, which did not exist.

Performance metrics: productive vs non-productive time, first call resolution, case closed timeframe, filtration rate, and revenue model.

The benefit for New Brunswick is - if we can save, on the technical support side, we can save a lot of money. NB employs 18,000 people in contact centres. The projected savings is 26 percent. It's a reusable methodology applicable across multiple platforms.

Algorithm to populate Telecom domain OWL-DL ontology with A-box object properties....
A Kouznetsov, UNB-SJ

Why text mining? A significant amount of time is spent populating ontologies. We want to reduce the workload. Cjhallenges - complications - multiple properties between keywords. Eg. you have two key words, Sam and Mary - you want to knwo what the relations are between them, but there may be more than one.

Methodology - ontology-based information retrieval applies natural language processing (NLP) to link text segments and named entities. Each candidate linkage is given a (weighted) score. Properties are scored by using the distance between co-occurring terms.

Man tools used were PATE/JAPE for text mining and OWLAPI for the ontology. http://en.wikipedia.org/wiki/Jape_%28software%29

(Boxes and levels diagram - pre-processing, text segments processing, ontology populatoon).

(Flow chart diagram with the same information)

Basically, the ontology seeds the search for candidates, which after scoring may result in new candidates for the ontology, as a result of co-occurrence.

(Flow chart of the co-occurrence based scores generator) uses a synonyms list.

Diagram of extensible data model for text and a more complicate dversion for tables.

(This talk is being given in seni-literate sentence fragments - the speaker looks at the slide, rads part of it, then adds a half-sentence expanding on it- SD)

Sample sentence: "We always base our decision of this part, but partially use this."

List of primary scoring methods for table segments. (This doesn't actually tell us how the scoring happens, consisting of overlapping circles representing the different terms).

Sentence scoring:
sentencescopre = 1/(distance+1)+bonus
(for example)
The mechanism is based on domain and range (domain relates to subject, range to object).

Quote from slide:
Normalization - Log(1.0+(NSD+1.0/Cd)*(NSR+1.0/Cr))
NSD- number of sentencs domain occurred
Cd Domain synonyms list cardinality
NSR - number of sentences range ocurred
Cr - range synomyms list cardinality

How can we evaluate our prediction? (or what? I don't know - this is the first mention of a prediction in the talk - SD) "Focus on positive class recall and positive class precision." Best results when both sentences and tables joined.

(I give up... this is just an awful awful presentation)

Leverage of OWL-DL axioms in a Contact centre for Technical {Product Support
A Kouznetsov, UNB-SJ

(ack, no, it's the same speaker!)

The 'semantic assistant' frameowkr was developed at Concordia (a remarkably clear slide, must be borrowed). It's where semantic analysis services relevant to the user's current tasks are offered directly within the desktop applictaion. It relies on an OWL ontology model.

(layers and boxes diagram, sideways)

We extended the system with an ontology and axiom extractor.

(New speaker - Chris Baker)

The point is that we are able to leverage mappings between synonyms and the name density in the client interface. It leverages any own ontology, Now we're looking at different plugins, so we don't have to rely on OpenOffice as a client. The idea is to help people in the middle of a workflow to tell them that somebody has already written a document about this.

The Semantic Assistant framework can be downloaded from Concordia - it's open source. http://www.semanticsoftware.info/

Basically we picked up a technology and played around with it.

NLP pipeline for protein mutation knowledgebase construction
Jonas B. Laurila, UNB-SJ

Knowledge about mutations is critical for many applications, eg. biomedicine. Protein mutations are described in scientific literayure. Today, databases are curated manually, but the amount of literature is growing faster than humans can populate databases.

There is a typical description of mutations. Terms include the mutation, directionality of impact (increased, reduced), and the property. Also you need information about protein name, gene name, and organism name, which is usually in the title of the paper.

We created an upper-level ontology with proteins and different kinds of mutations.

(Diagram of the framework, consisting of a series of gates)

For example, named-entity recognition. We use a gazateer list based on SwissProt, and store mappings in a database. To find mutation mentions, we use rules rather than a gazateer, normalizing into a wNm-format. We identify protein functions from noun phrases extracted with MuNPEx (a NP extractor). We use Jape rules to extract rates and quantities.

(This presenter is a low-talker and mumbler and sometimes gives up on sentences prtway through with a 'whatever')

They also need to correctly position the mutation in the gene sequence - tey use regular expressions to identify it (because the writer sometimes clips part of the sequence or changes the numbeing scheme). mSTRAPviz is used to provide a visualization.

Once that is done, you can do queries. Eg. find all mutations on so-and-so that do not have an impact on the such-and-such activity. You can also do mutation grounding performance (ie., the num,ber of correctly grounded mutations overall).

What's next is to modularize the work into web services, database recreation, and to reuse the work in phenotype prediction algorithms.

C-BRASS - Canadian Bioinformatics Resources ans Semantic Services
Chris Baker, UNB-SJ

First widespread deployment of a grid framework where the messages are meaningful to machine interpreters. The idea is to create toolkits to 'lift' legacy resources into a semantic web framework.

Currently, there is a low update of semantic web integration in the bioinformatics community. This is because of challenges in implementing solutions, and a gap between what the services offer and what they need. The lack of sementics in service discovery makes them hard to discover and use. Semantic web services are designed to list services based on the meanings of the inputs and outputs.

SWS frameworks describe input & output data structures, operations of the web service. Eg. BioMoby is a service type ontology. Single term semantics are too simplistic, process descriptions are too complex. So we want to model the inputs and outputs in ontological models.

An end-user community doesn't have a process-model or business-model in mind when they're searching. They execute a BLAST alignment not because they want to run a sequence similarity matrix, but because they are looking for a certain sort of output. So ontologies of inputs and outputs are generated by the service.

SADI - Semantic Automated Discover and Integration - a set of best practices for data representation. Eg: my service consumes OWL individals of class #1 and returns OWL individuals of class #2.

(chart of some of the SADI recommendations)

How this works - a query-specific database is being generated dynamically as the query is being processed. Eg. 'find gene ontologies related to Parkinsons disease'. It does to the registry and looks for specific predicates, finds the associates services, and pulls back the information for the database that will address the query.

(Better presentation, but still a lot of slide reading)

Deliverable: SADI semantic web service framework as a candidate recommendation, a set of core ontologies in biomedical domain, and a costing model for future semantic web service providers, defining establishment and maintence costs for migrating non-semantic data.

Semantic Spaces for Communication and Coordination
Omair Shafig, Universiy of Calgary

Semantic spaces - the idea comes from the factthat today's web services are not following the principles of web communication. We services communicate in a synchronous manner, but the web should be stateless. We want in a similar way to create a communication, coordination and storage infrastructure for services over the web. This would be basically a single, globally shared semantic space.

We would realize this by joining semantics and speace-based technologies (as in 'triple-=space' not outer space). The semantic space is accessed and fed by semantic space kernels. These kernels take into account the coordination of different data stores and coordinate in a peer-to-peer fashion, and present a single unified access to users.

(Chart of existing technologies - existing technologies are limited in semantic support, query processing and knowledge support).

Requirements:
- persistent data storage
- communication - many-to-many machine communication
- information publishing
- globally accessible - accessed anywhere, anytime
- subscription and notification - subscribe to particular data
- information retrieval, ie., search/querying
- decoupling in communication
- semantic annotation of data
- reliability - alternative paths

Semantic Space Kernel - these are the entities that present the single point of access to users. It should provide layers (what else?) for:
- User API / managemetn API
- Publish-API
- kernel-to-kernel communication
- data access

The semantic space API would include operations such as 'publish triples', 'read triples', subscribe and unsubscribe, create, execute and delete transactions, etc.

We want to use the infrastrcuture to provide semantic spaces for storage, eg., semantic descriptions of web services, monitoring data, intermediate event processing information, service compositional data, etc. Semantic based event driven and publish-subscribe mechanisms facilitate communication. Also would provide space-based coordination of a service bus platform.

We applied a proposed solution as a reference implementation of the WSMO conceptual model called Web Service Execution Environment (called WSMX). The semantic space itself is available as a web service. Services accessing the semantic space could also be services, so, for example, you could use someone else's 'discovery' service to access the semantic space.

We could also ground and store WSMO objects (semantic descriptions of web services) in the semantic space.

To bind existing services to semantic space, we recommend three changes to WSDL. First, to change the transport mechanism, then second to encode SOAP messages as semantic space RDF, and third, the address location. http://www.soa4all.eu

Question - what about spammy content? http://seekda.com
(It checks services to see whether they are working or not)
(SD - this isn't a solution at all)
other comment - weight by trust
Stephen - 'the false triple problem'

Collaboration as a Service
Sandy Liu, NRC

This is a case study based on real systems and real platforms.

Collaboration is a coordinated collection of activities performd by collaborators in order to achieve a set of common goals. It is action-oriented, goal-oriented, invoves a team, and is coordinated. Collaborators could be agents, machines, organizations or institutes, or inviduals.

The simplest model is the 3C model (Ellis, 1991). The Cs are cooperation, communication, coordinaton. There are ontologies for each, such as the cooperation ontology (Oliveira, et. al. 2007, 'Towards a Collaboration Ontology').

Key outcomes can be defined in six tuples: coordination, outcome, goals, collaborators, resources and activities, all linked to collaboration.

In collaboration as a Service, the idea is that a request comes in, CaaS coordinates everything, and the outcome is collaboration (this is a very oversimplified model). Subservices would manage coordination, collaborators, etc.

In the virtual organization, it's the flexible, secure, coordinated resource sharing among dynamic collections of individuals. This is based on work in the 90s. Architects, for example, will often have people doing field work, they will have technical people in the office, stone masons elsewhere, etc.

Or a health service training, where two students discuss a case, more to a mannequin to try it out, and then debrief with the class. To support a simple scenario even like this you have to support a large number of connections. So we have built something called SAVOIR in order to support this, to manage users, resources, sessions and workflow.

SAVOIR - Service-oriented Architecture for Virtual Organization Infrastructure and Resources. It provides a single entry point for provisioning services and tools. It's generic to different types of resources.

SAVOIR was based on similar concepts - Software as a Service, Infrastructure as a Service, and then, finally, Collaboration as a Service.

(Diagram of system overview)

The front end of SAVOIR is a web-based front end. Eg. you can create a session based on the tools available.

The messgae bus - hosts the management services. It handles messages coming from many different types of transport protocols - http, tcp, jms, sms. Within SAVOIR we use JMS messages. In order to talk to SAVOIR we defined a set of specifications; there's a bus interface that talks to the device in its native language. Now with the bus interface, you can talk to many different transport protocols.

The SAVOIR messaging specification defines an asynchronous communication. Each message has to be acknowledged back to SAVOIR. There is a session manger that is rules based, using Drools. Sessions have 'authoring time', 'run time'. A session is an instance of a scenario.

(Diagram of scenarios)

(Video demonstration) http://www.hsvo.ca

SAVOIR can send out messages to the devices. The devices have different states - 'inactive', 'authenticated', 'loaded', 'running', 'paused', 'stopped'. When messages for a device are received, it looks at the state of the device, and if the device is not available, the message is cached and waits for the message. Various messages require the device to be in certain states. These are defined by rules.

The session manager acts like a facade, that take sin messages, provides a new session, starts and stops sessions, triggers the start rule, and delegrates responsibilities to other components.

(Flow chart with message flow)

The Intelligent City Project: Transit Info
William McIver, Jr., NRC

I am based in people-centered technologies, so we're not directly related to semantic services. But we have access to them. The system was inspired by a case in Toronto where the driver diod not consistently announce the stops.

The idea is to use some semantic technologies to address this issue. Here in Fredericton we have a unique research environment in which to do this work - free city 802.11 Wireless (Fred-E-Zone), as well as Red Ball internet iBurst IEEE 802.20 (which is what we ended up using for the transit info project). And this fits into a wider transit infosystem project, for example, real-time information provided by audio to bus riders.

(Presenters *must* learn to speak to the audience, rather than speaking to their slides)

An early prototype involved the developmet of a bus locater architecture. It used 802.11 to locate buses (the 802.20 wasn't available yet). The buses carried GPS locators, using a Java VM and published info using JSON. This project outlined issues involved in collecting data, but as expected, communications would often break down.

By showing this work we met up with red ball internet, who had been working on a similar system. We ended up with a system that supports various types of transit system. It's a comprehensive set of services offering support for both riders and administratprs. It's a RESTful implementation based on Ruby on Rails (using Sinatra, a smaller-footprint version of Rails).

(System architecture diagram with tiny tiny unreadable text)

We use the 'service day schema' to collect real-time data from the vehicles. Some reports were coming in every 5 seconds, others every 15 seconds (these can be changed). There is also a storage schema to allow management to manage the buses, see if they're late, maybe to add routes, etc. There are also some remote services - the key opne focuses on SMS as an alternate interface to the web interface. This was what was good about using a RESTful approach, once we had the scheme worked out we wcould repurpose one service to add another.

All the buses in the Moncton system have been outfitted with vehicle peers. Red Ball is commercializing this technology. The fleet server manages the web user interfaces, kiosks, etc. The vehicle peers can respond to requests - they report GPS data, predict arrival times, announce next stops, and display next stops.

(Image of Moncton 'Bus Catcher')

Photos of bus kit, with aliks PC engine (a-licks?) The device also provides the mobile hot spots on the buses.

Lessons learned:

architecture is more extensible by constraining it to a stateless syetem of resources (ie., RESTful). CRUD via HTTP is very intuitive and easily maintained. System integration with secuirity can be accomplished using HTTP methods. Representation of resources was easily manipulated to send HTML, JSON, XML. Custom mime types could be created for specialized data. Higher level features are easily added by composing low-level features.