Justin.jpgHaving the fortune to work as CSO for Talis, an innovative UK software company, in one of the most exciting times for software and the internet; I thought I would share some of ideas and insights I am finding exciting at the moment.

Entries by Justin Leavesley (10)

The puzzle of semantic web adoption

I am a believer in the rise of the web of data. In fact I am CTO of Talis which is investing heavily in semantic web technologies. So don’t take this the wrong way but I can’t help but feel the semantic web community is ignoring a vital part of the semantic web jigsaw and this is creating a major credibility problem between it and large parts of the technology community. I am concerned because I think that the semantic web currently lacks two critical things that drove mass adoption of the web.

To be fair the W3C has created a semantic web outreach group and Talis has two representatives on this, so we are doing our bit to help to spread the word :-) but this is only going to really work if the semantic web community really understands what the major missing pieces are for mass adoption. Today, looking at the conversations in the semantic web community, I don’t think the real barrier is being seen clearly.

So here is my personal view on what is going on here.

Many in the semantic web community have been concerned mainly with the rightness of the technology and not the utility of the technology. That is fine for the invention process but badly wrong for the adoption process. Just ask the inventors of the Beta Max :-) . It doesn’t matter how right you are!

Adoption is a strong function of day 0 utility. That means: “What can I do better today by using semantic web technology rather than existing technology?” You can’t use the argument that when everyone has adopted RDF it will be really really useful, because the people who need to adopt the technology in order reach that critical mass of RDF won’t do it because of belief in the semantic web vision. These adopters are pragmatic and need technology to give them advantage today not in 5 years. Network effect based features always have this kind of initiation problem.

To overcome the network effect initiation problem there needs to be day 0 value to drive adoption until the network effect kicks in and takes over as the main reason for adoption. In short there needs to be a killer application of the technology.

What is the semantic web killer application?

So my question to the semantic web community is what exactly can I do far better today with the more unproven semantic web technology than I can today with more established technology such as agreeing simple XML standards?

 

A clear answer to this question is vital. 

I actually think that for most specific instance of usage you could achieve faster adoption and lower risk through de facto standards agreement with a simple XML approach. Take RSS.  Was this a success because it was RDF or because is was the de facto emergence of a simple standard based on its raw day 0 utility, not some far off network effect based value. It is not a semantic web killer application.

 

So it seems to me that semantic web adoption is a very different problem to the web of documents in the early days.

The web had 0 day utility. Many people will remember that feeling of seeing the web for the first time and knowing that you could easily publish any thing you liked and the whole world could read it instantly, mind blowing.
You didn’t need any special tool to write a HTML document, doing it by hand was easy enough.

The web was its own killer application. The semantic web is not.

But the web had a piece missing. You could start at any resource and navigate the links but you couldn’t search the space itself to find a good starting resource in the first place. This meant that as the web grew, more and more of the content could not in practice add any extra value to a users experience.
Of course the missing piece was the search engine. This allowed a user of the web to query the whole space and now every document no matter how obscure could potential enrich a users web experience.
I don’t think it is right to characterise this as something missing from the architecture of the web because search engines could be layered on top and that is better than building in complexity to the core standards, but from a users point of view, the real potential of the web of documents could not be realised until it was possible to query the whole web space.

We talk about the semantic web in terms of a web as database. But where is the database engine? Google is the free text engine for the web of documents. Where is the equivalent for the semantic web? 

So the semantic web appears to have little day 0 utility over a specific approach, it is not its own killer application and it lacks the ability to query the semantic web information space itself.

This may appear a provocative conclusion, I don’t know. But is it correct? If it is then who is doing what about it?

If true does this mean the semantic web will forever be a dream?
No I don’t believe that. But I do believe it changes the way we should think about semantic web adoption.

For example, it would be crazy to believe all data must be in RDF, that would create a huge barrier. Instead the question should be how RDF and other data approaches can work together to create a powerful web of data, the superior value of the RDF approach should over time increase the amount of RDF versus other approaches.

But I think the single biggest blocker on web of data adoption and by extension, the semantic web, is the lack of ability to query the whole space. Where is the database engine for the semantic web?

 

 

Posted on Friday, December 8, 2006 at 09:39PM by Registered CommenterJustin Leavesley in , , | CommentsPost a Comment

Ecosystem 1 - Physical technology meets social technology

I’m pretty sure that the concepts of co-operation,  platforms, webs of data and webs of functions will be central to understanding how the internet and web will continue to change our world and the way technology companies can make defensible long term value.  In the next few post I will look at web1.0, web2, the semantic web from the point of view of ecosystems, drawing on the very useful new view of economics known variously as evolutionary or complexity economics.  Over on Nodalities you can follow how Talis is putting these ideas to work in the real work of high tech innovation lead business. It is this special combination of theory and practice that makes Talis such an intense and wonderful company to work for.  

Ecosystem
It is the constant dance between physical and social technology,

Click to read more ...

Posted on Sunday, November 26, 2006 at 08:25AM by Registered CommenterJustin Leavesley in , , , , | CommentsPost a Comment | References1 Reference

I've been away for a while

You may have noticed that I haven’t blogged for over a year.  How lazy is that! Well give me a break, I haven’t exactly been sitting around.

At Talis we spent a lot of 2005 thinking about the next wave of technology in the world of software and the internet. I ran the Talis research group and we had lots of fun offsites and crazy discussion, like you should in research.  But unlike many places, Talis and its management team are focused on putting research and innovation in to action  (cheesy but true). So throughout 2006 we have been putting our money where are mouth is a building something amazing, something practical yet hugely innovative,  something that takes our 36 year old core business forward but aligns Talis with the coming semantic web wave.  And something which I am not going to talk about in detail on this blog :-). To follow the Talis semantic platform you can go to Nodalities , where over the coming  months you can find out what we have been building, tell us what you think about it and most importantly have a go with the APIs yourself!.

I am going to keep my blog focused on the underlining principles of technology innovation, economics and ecosystem. On Nodalities you can see how Talis is putting the principles into action.

In case you are wondering, Nodalities -> Nodes -> network theory, ecosystems, architecture of participation.  You got it.

Posted on Sunday, November 26, 2006 at 07:40AM by Registered CommenterJustin Leavesley | CommentsPost a Comment

The platform is dead. Long live the platform

It seems to me that as we pass into the era of web 2.0, the software platform as we know it today will cease to have significant commercial value.

The principal reason being that the internet and web2.0 is allowing a move from code sharing to instance sharing for software platforms causing the existing network effect mechanism for platforms to fail.

The good news is we can expect new platform models to emerge based on the properties of sharing a single, persistent online instance rather than code sharing and multiple isolated instances (e.g. windows).

Some companies have already hooked into some aspects of this new model. eBay and Amazon as platforms have it, Google as a platform does not. I am of course talking about the architecture of participation becoming the principle network effect mechanisms for web2.0 platforms. That is, if the actions of the users contribute to the shared state of the platform (through which every platform application they may use) in such a way as to enhance the experience of other users,  then their is a strong network effect based upon participation.

It is important to note that the forces enabling this new model are also undoing the previous model.

Here's why (IMHO).
 
Platforms
Over the past 10-15 years, Microsoft demonstrated both the enormous intrinsic and commercial value of software platforms.  We have seen this battle for control of the platform played out over many segments of the software industry and layers in the software stack (Oracle, Syabase at the DB layers, Windows OS/2, IBM Websphere  vs BEA Logic for application servers, smybian etc).

The return on capital invested simply dwarfed other software models and so platform leadership passed into law for many software companies as the one truth strategy for growth. The amazing value creation being principaly driven by two forces; reuse and the network effect.

Reuse: every application built on a platform is saved from having to make the investment to build  features that the platform provides. This massively lowers the cost of production (therefore capital invested) for application developers.

But software libraries and software components do this also, but are not considered platforms. The difference between software platform and software library is the network effect or ecosystem.

Network Effect: Each application built for a particular platform increases the value of having that platform and, by extension, every other application that already uses the platform. So the more applications for a platform, the greater the value of the platform.  So for the owner of the platform, that has a model that can extract commercial value from the platforms massive intrinsic value; the return on capital is a function of the investment that OTHER people have made. Or put another way,  they achieve a return on capital NOT invested by them. Pretty sweet.

But the real questions should be "What causes the network effect in platforms". What is the mechanism by which the investment of application developer A has increased the value of the platform and of application B.

Does that same mechanism hold in the world of web2.0???? My believe is NO it doesn't. And that will have a profound effect on the strategy of software companies over the next 10 years. In fact we are already seeing it.

Traditional cause of the platform network effect

Was the dependency on the users to have purchased and installed the platform in order to use the applications.
Choice of purchase defined which applications you could use, naturally the platform with the better range and quality of software is more valuable (just like in the games console industry).

Web 2.0 removes the need for user purchase of the platform
As functionality moves off the users machine into a standards based cloud, the user choice of application platform effectively disappears.  By definition web 2.o platforms API is web based and implementation neutral.
Consider the Google search APIs. If there is one or 100 applications based on it, the value of the platform is not much enhanced, those applications do not add anything to each other, no network effect. From an ecosystem point of view Web2.0 APIs are much more like software libraries than platforms.

Web 2.0 platform network effect

But web2.0 platforms have a new trick that traditional platforms don't have. They can easily present one shared instance i.e. state to all the different users of all the different applications. This allows the actions of one user using application A of the platform to enhance the experience of another user using application B of the platform. This is the architecture of participation. It is easy to see how both eBay and Amazon increase the power of their content based network effect through open access APIs. It is also easy to see how this doesn't work for Google, the end user of a search app typically can't affect the shared state of the platform.

Open Source Network Effect
Developer still need traditional software platforms though.
So web2.0 platforms allow sharing of the state of the platform. Where as traditional platforms allowed read only sharing of the code.
There is a way that traditional platforms can drive a network effect by allowing participation in the shared code of the platform. They can let users contribute to the code. This can immediately drive a whole new network effect which hugely increases the intrinsic value of the platform. Unfortunately for the existing platform vendors, nobody wants to submit code that somebody else will make money off.  So this can only be done through open source. Linux is hugely valuable, but nobody can make $billions of its commercial sale, at least not directly.
Interestingly, as more open source code is created, it becomes easier to remix the code and create yet more open source software. The more general the software the more valuable for it to have an open source incarnation i.e. platforms are the natural place for opensource to target as we have seen with Linux, MySQL, JBOSS etc.

So for all the reasons above, I am pretty sure that as web2.0 progresses, we will see the rise of a different type of platform and the existing platform players will have a very hard time in holdings onto any serious returns.

Long live the platform of participation. 

Posted on Friday, August 12, 2005 at 09:08AM by Registered CommenterJustin Leavesley in , | CommentsPost a Comment

Will the real Semantic Web please stand up

We live in an amazing and unique time.  Most of you reading this blog were alive at the birth of the global computer, around 15 years ago. In that time the computer has never been switched off, never been rebooted and has grown to an almost inconceivable size and complexity. The shear storage and processing power is almost impossible to calculate. The computer is fed information and programmed by the actions of around a billion users, night and day, evolving at an incredible speed.  For example, in the last two years, over 14 million blogs alone have appeared, seemingly with no effort or investment!

But there is something else going on other than computing on a grand scale. A new type of approach to computing is arising, one which fundamentally changes the relationship between the user and the computer.  I am talking about a new approach which is based on tapping into the collaborative effort of millions of users to programme software through the everyday actions of the users. The new programs are effectively learning systems that extract training and feedback from users actions on an unprecedented scale. Fuzziness, statistics and learning over programmatic logic.
The Google spell checker is a great example of this. Google could have sat a bunch of programmers down and coded a spell checker using a dictionary and lots of rules. Doing this in every language under the sun and keeping it current as new words come into being (e.g. blogging) would have been a great effort. Instead, Google uses the actions of the users to programme the spell checking, extracting patterns of behaviour from users retyping misspelled words and feedback on when the user accepts a suggested spelling correction.  Amazon's people who bought this book also bought these system is a more limited example.
Built on participation between the users and the system, the result is what you might call collaborative intelligence.
It is an emergent rather than programmed.
It is interesting to note that this is also the same transition that artificial intelligence went through. It became clear that predicate logic based solutions did not scale well and the field turned to fuzzy logic, statistics and neural networks where systems required training rather than programming.

The other important quality of this approach is scalability. Implicitly this scales, in fact it thrives on scale.
Traditional programmatic approaches, essentially based on logic, have a harder time scaling.

Considering that it really is only in the last few years where the hardware costs and online community size has enabled experimentation at scale, I am very excited about what the next 10 years will bring in this direction.

So this brings me to the title of this blog. It seems to me that humans are very good at semantics and that systems that are based on human computer collaboration (i.e the emergent properties of large numbers of users) will be very important in semantic based systems. You could consider del.icio.us and Flickr and the massive rise of tagging and microformats to be very early examples.  If the collaborative approach of del.icio.us could be synthesised with more sophisticated semantic methods such as RDF then we might really be cooking with gas.

So I conceive of the Semantic Web including applications built as collaborative emergent systems. 
Here in lies my problem. The Semantic Web as defined by Tim Berners-Lee's and expressed in his paper on the design issues for the Semantic Web, expressly excludes any type of fuzzy system from being a Semantic Web application (see exert below and comment). This is because he requires applications to be logically provable and guaranteed so that first order predicate calculus (predicate logic)  is the only logic that the Semantic Web admits. The example TBL gives is of a banking application needing to be guaranteed.
I have two main issues with this:

1) Why exclude the Semantic Web from the exciting possibilities of fuzzy and statistical approaches to  semantic systems. Can't both be included, a banking application just requires a stricter criteria on statements it can operate on. Applications don't need to be guaranteed to be useful (although I admit Banking applications do!!).

2) Will this massively scale? What gives us reason to believe it will? FOPC based systems have proven difficult to scale in several fields so far. TBL admits that the Semantic Web approach is not very different from previous approaches that did fail to scale. The basic point is that FOPC based systems cannot cope with inconsistency (as TBL points out) , as you scale, keeping consistency in practice becomes harder. 

So, what will the semantic web be like. I guess in time the real semantic web will stand up.

The rest of the blog looks at TBLs semantic web design paper in more detail and may not be of great interest to most readers

First of all, thanks Rick and Ian for persevering with all my questions.

Fuzzy or not has been the main theme behind all my SW blogs to date. Tim Berners-Lee's is quite clear - Not.
I just don't get why not, certainty is just a special case of fuzziness, why can't we include both?

We are back again to where I started perfect or sloppy rdf shirky and wittgenstein.html which was based on Tim Berners-Lee's paper  you mentioned Rick

This quote has almost the entire point I am trying to make in it. I'll take a few sentences at a time and explain what they mean to me.

"The FOPC inference model is extremely intolerant of inconsistency [i.e. P(x) & NOT (P(X)) -> Q], the semantic web has to tolerate many kinds of inconsistency.

Toleration of inconsistecy can only be done by fuzzy systems. We need a semantic web which will provide guarantees, and about which one can reson with logic. (A fuzzy system might be good for finding a proof -- but then it should be able to go back and justify each deduction logically to produce a proof in the unifying HOL language which anyone can check) Any real SW system will work not by believing anything it reads on the web but by checking the source of any information. (I wish people would learn to do this on the Web as it is!). So in fact, a rule will allow a system to infer things only from statements of a particular form signed by particular keys. Within such a system, an inconsistency is a serious problem, not something to worked around. If my bank says my bank balance is $100 and my computer says it is $200, then we need to figure out the problem. Same with launching missiles, IMHO. The semantic web model is that a URI dereferences to a document which parses to a directed labeled graph of statements. The statements can have URIs as prameters, so they can may statements about documents and about other statements. So you can express trust and reason about it, and limit your information to trusted consistent data."

1)Toleration of inconsistecy can only be done by fuzzy systems. We need a semantic web which will provide guarantees, and about which one can reson with logic.
Here TBL specifically excludes fuzzy approaches from the semantic web. By extension other statistical and learning based approaches to knowledge systems are also excluded. The reason given is that guaranteed and provable is an absolute requirement. If your app is not guaranteed it is not a semantic web app. This immediately limits the concept of the semantic web to what is computable by logic rather than what is usefully computable by any means.
Sure banking applications do need to be guaranteed, so they should use rules that only operate on provable, trusted statements. But there are loads of application of semantics where usefulness rather guarantees is the goal.
I do not see why it need be one or the other, you just have stricter requirements for proof in a banking app than a fuzzy app. See Semantic Superpositions for thoughts on a semantic web that included fuzziness.

Considering FOPC approaches have been largely discredited in the field of AI and replaced by fuzziness, this would seem a risky limitation to impose.

2)Any real SW system will work not by believing anything it reads on the web but by checking the source of any information. (I wish people would learn to do this on the Web as it is!). So in fact, a rule will allow a system to infer things only from statements of a particular form signed by particular keys. Within such a system, an inconsistency is a serious problem, not something to worked around
The necessary consequence of 1). is, as TBL states here, that in any SW system an inconsistency is a serious problem. Because of the guaranteed requirement, it isn't even enough that the data is accidentally consistent, it must be logical consistent i.e you will only encounter an inconsistency if there is a programming fault or corruption, standard user action should not be a factor. That is, the statements a SW app is using must be guaranteed consistent.

This means semantic web applications are quite fragile, the larger the scale the harder to maintain consistency in practice, whereas statistical approaches work the opposite way, the larger the scale the better they work. 

Any SW application therefore requires there to be only one version of the truth, i.e. it can only work with consistent statements.  However, there are many things we wish to describe where there is no one version of the truth.
Here is the rub; this is a result only of the requirement to be logically guaranteed. There are many computational approaches that can operate on inconsistent statements, fuzzy system, statistical approaches, neural networks. These can mine huge value out of those statements. None of that is possible with Semantic Web applications (as defined above), all those rich patterns must be collapsed into a single consistent version of the truth before the application can operate on it. The Google approach to spell checking is a great example of using such statistical approaches rather than logic to programme the spell checker.


The requirement for consistency in practice is very tough because humans are in the loop of data. Here we run straight into the fact RDF is designed to allow multiple agencies to make statements about the same thing. Even if two agencies are using the same URI and the same definition of a particular property, when users come to enter data and have to make classification decisions based on that URI description, the users will not classify the same thing in exactly the same way. The URI is not an authority, it cannot guarantee consistency between agencies e.g. you cannot show two copies of Harry Potter to the Editions URI and ask it if they are different editions or the same. People make that call according to there own interpretation of the description of the Concept. 
Reversing that around, if you receive two statements about the number of Editions that exist for a Harry Potter book and one states 1 edition and the other states 2 editions. The only way to arbitrate between them is to get the actual real books out and examine them against your own interpretation of the URI definition.
What I have described above is the fact that single authorities only make sense for certain classes of problem. i.e. where there is only one version of the truth. They make perfect sense for bank accounts, in the library domain, each library has an equal right to make statements about a book whilst cataloguing it so there is no concept of one authority. Similarly, who is the authority that decides if a photo is a smiling face or a sad face.

The result of all that is that to guarantee consistency, for a particular SW system, there can be only one authority for statements or else inconsistency will arise from user actions. This allows any conflict to be resolved by asking the authority to decree. Note also, that it is not good enough that statements don't conflict with published statements from the authority, the authority may not have published all possible statements, statements must actually agree with statements made by the authority.

TBL also says

"

A semantic web is not an exact rerun of a previous failed experiment

Other concerns at this point are raised about the relationship to Knowledge representation systems: has this not been tried before with projects such as KIFand cyc? The answer is yes, it has, more or less, and such systems have been developed a long way. They should feed the semantic Web with design experience and the Semantic Web may provide a source of data for reasoning engines developed in similar projects.

Many KR systems had a problem merging or interrelating two separate knowledge bases, as the model was that any concept had one and only one place in a tree of knowledge. They therefore did not scale, or pass the test of independent invention. [see evolvability]. The RDF world, by contrast is designed for this in mind, and the retrospective documentation of relationships between originally independent concepts."

3) They therefore did not scale, or pass the test of independent invention
For any SW app to have guaranteed consistency,  independent invention is not possible because you would need to force all statements from two separate agencies to be the same, and that means they are not independent at all i.e. one agency is not free to act independently of another because that will cause inconsistency.
It then rather seems that for all intents and purposes that independent descriptions are excluded from any particular SW app by the requirement to achieve consistency, exactly how is does a semantic web app then differ from those failed experiments?

 

To sum up, I can't understand why the semantic web (at least as described by TBL) should exclude any approach based on fuzziness, statistics and inconsistency. The requirement of consistency, when taking statements from different systems, cannot be met because humans cannot be made to all agree on classification statements(what ever training or manuals you give them) and therefore will make inconsistent statements through their use of the computer systems. Whilst RDF is free to describe all the variety in the world, the Semantic web application can only make use of the tiniest portion of it.

From some of the comments I have received, clearly some people agree with the TBL vision and others don't.
In the end I guess it doesn't really matter. People will use RDF to do cool things and call them semantic apps even if they don't accord to TBL FOPC requirement for Proof.  I do think it is at the basis of a lot of sceptism from outside the Semantic Web community though, given the spectacular failure of FOPC to scale in previous attempts by the AI and KR communities. It might be an idea to really present this stuff clearly to either face up to this criticism or prove it false.

I personally have had enough of this topic now and am going to think about other things for a while :-)

Thanks to all those who have contributed to the discussion. I'm sure there are lots of people out there who will disagree with things I have said above.  Just goes to show how hard it is to get people to share the same concept of things, the world is fuzzy after all.

Posted on Wednesday, August 10, 2005 at 06:27AM by Registered CommenterJustin Leavesley in | Comments5 Comments
Page | 1 | 2 | Next 5 Entries