Inferencing About an Open World 342

Part I. A Guided Tour of the Social Web Prelude

8. Mining the Semantically Marked-Up Web: Extracting Microformats, Inferencing over RDF, and More

8.4. The Semantic Web: An Evolutionary Revolution 339

8.4.2. Inferencing About an Open World 342

Foundational languages such as RDF Schema and OWL are designed so that precise vocabularies can be used to express facts such as the triple (Mr. Green, killed, Colonel Mustard) in a machine-readable way, and this is a necessary (but not sufficient) con‐

dition for the semantic web to be fully realized. Generally speaking, once you have a set of facts, the next step is to perform inference over the facts and draw conclusions that follow from the facts. The concept of formal inference dates back to at least ancient Greece with Aristotle’s syllogisms, and the obvious connection to how machines can take advantage of it has not gone unnoticed by researchers interested in artificial intelligence for the past 50 or so years. The Java-based landscape that’s filled with enterprise-level options such as Jena and Sesame certainly seems to be where most of the heavyweight action resides, but fortunately, we do have a couple of solid options to work with in Python.

One of the best Pythonic options capable of inference that you’re likely to encounter is FuXi. FuXi is a powerful logic-reasoning system for the semantic web that uses a tech‐

nique called forward chaining to deduce new information from existing information by starting with a set of facts, deriving new facts from the known facts by applying a set of logical rules, and repeating this process until a particular conclusion can be proved or disproved, or there are no more new facts to derive. The kind of forward chaining that FuXi delivers is said to be both sound (because any new facts that are produced are true) and complete (because any facts that are true can eventually be proven). A full- blown discussion of propositional and first-order logic could easily fill a book; if you’re interested in digging deeper, the now-classic textbook Artificial Intelligence: A Modern

342 | Chapter 8: Mining the Semantically Marked-Up Web: Extracting Microformats, Inferencing over RDF, and More

5. In modern parlance, a syllogism is more commonly called an implication.

Approach, by Stuart Russell and Peter Norvig (Prentice Hall), is probably the most comprehensive resource.

To demonstrate the kinds of inferencing capabilities a system such as FuXi can provide, let’s consider the famous example of Aristotle’s syllogism5 in which you are given a knowledge base that contains the facts “Socrates is a man” and “All men are mortal,”

which allows you to deduce that “Socrates is mortal.” While this problem may seem too trivial, keep in mind that the deterministic algorithms that produce the new fact that

“Socrates is mortal” work the same way when there are significantly more facts available

—and those new facts may produce additional new facts, which produce additional new facts, and so on. For example, consider a slightly more complex knowledge base con‐

taining a few additional facts:

• Socrates is a man.

• All men are mortal.

• Only gods live on Mt. Olympus.

• All mortals drink whisky.

• Chuck Norris lives on Mt. Olympus.

If presented with the given knowledge base and then posed the question, “Does Socrates drink whisky?” you must first infer an intermediate fact before you can definitively answer the question: you would have to deduce that “Socrates is mortal” before you could conclusively affirm the follow-on fact that “Socrates drinks whisky.” To illustrate how all of this would work in code, consider the same knowledge base now expressed in Notation3 (N3), a simple yet powerful syntax that expresses facts and rules in RDF, as shown here:

#Assign a namespace for logic predicates

@prefix log: <http://www.w3.org/2000/10/swap/log#> .

#Assign a namespace for the vocabulary defined in this document

@prefix : <MiningTheSocialWeb#> .

#Socrates is a man :Socrates a :Man.

@forAll :x .

#All men are mortal: Man(x) => Mortal(x) { :x a :Man } log:implies { :x a :Mortal } .

#Only gods live at Mt Olympus: Lives(x, MtOlympus) <=> God(x) { :x :lives :MtOlympus } log:implies { :x a :god } .

8.4. The Semantic Web: An Evolutionary Revolution | 343

{ :x a :god } log:implies { :x :lives :MtOlympus } .

#All mortals drink whisky: Mortal(x) => Drinks(x, whisky) { :x a :Man } log:implies { :x :drinks :whisky } .

#Chuck Norris lives at Mt Olympus: Lives(ChuckNorris, MtOlympus) :ChuckNorris :lives :MtOlympus .

While there are many different formats for expressing RDF, many semantic web tools choose N3 because its readability and expressiveness make it accessible. Skimming down the file, we see some namespaces that are set up to ground the symbols in the vocabulary that is used, and a few assertions that were previously mentioned. Let’s see what happens when you run FuXi from the command line and tell it to parse the facts from the sample knowledge base that was just introduced and to accumulate additional facts about it with the following command:

$ FuXi --rules=chuck-norris.n3 --ruleFacts --naive

If you are installing FuXi on your own machine for the first time, your simplest and quickest option for installation may be to follow these instructions. Of course, if you are following along with the IPython Notebook as part of the virtual machine experience for this book, this installation dependency (like all others) is already taken care of for you, and the sample code in the corresponding IPython Notebook for this chapter should “just work.”

You should see output similar to the following if you run FuXi from the command line against a file named chuck-norris.n3 containing the preceding N3 knowledge base:

('Parsing RDF facts from ', 'chuck-norris.n3')

('Time to calculate closure on working memory: ', '1.66392326355 milli seconds')

<Network: 3 rules, 6 nodes, 3 tokens in working memory, 3 inferred tokens>

@prefix : <file:///.../ipynb/resources/ch08-semanticweb/MiningTheSocialWeb#> .

@prefix iw: <http://inferenceweb.stanford.edu/2004/07/iw.owl#> .

@prefix log: <http://www.w3.org/2000/10/swap/log#> .

@prefix rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#> .

@prefix rdfs: <http://www.w3.org/2000/01/rdf-schema#> .

@prefix skolem: <http://code.google.com/p/python-dlp/wiki/SkolemTerm#> .

@prefix xml: <http://www.w3.org/XML/1998/namespace> .

@prefix xsd: <http://www.w3.org/2001/XMLSchema#> . :ChuckNorris a :god .

:Socrates a :Mortal ; :drinks :whisky .

344 | Chapter 8: Mining the Semantically Marked-Up Web: Extracting Microformats, Inferencing over RDF, and More

6. See http://www.chucknorrisfacts.com.

The output of the program tells us a few things that weren’t explicitly stated in the initial knowledge base:

• Chuck Norris is a god.

• Socrates is a mortal.

• Socrates drinks whisky.

Although deriving these facts may seem obvious to most human beings, it’s quite another story for a machine to have derived them—and that’s what makes things exciting. Also keep in mind that the facts that are given or deduced obviously don’t need to make sense in the world as we know it in order to be logically inferred from the initial information contained in the knowledge base.

Careless assertions about Chuck Norris (even in an educational con‐

text involving a fictitious universe) could prove harmful to your com‐

puter’s health, or possibly even your own health.6

If this simple example excites you, by all means, dig further into FuXi and the potential the semantic web holds. The example that was just provided barely scratches the surface of what it is capable of doing. There are numerous data sets available for mining, and vast technology toolchains that are a part of an entirely new realm of exciting technol‐

ogies that you may not have previously encountered. The semantic web is arguably a much more advanced and complex topic than the social web, and investigating it is certainly a worthy pursuit—especially if you’re excited about the possibilities that in‐

ference brings to social data. It seems pretty clear that the future of the semantic web is largely undergirded by social data and many, many evolutions of technology involving social data along the way. Whether or not the semantic web is a journey or a destination, however, is up to us.

Why Is Twitter All the Rage? 6

Creating a Twitter API Connection 12