SourceForge.net Logo
Clio Knows

A Knowledge Representation Corpus
of Historical Events

 

Introduction

Research communities need a large corpus of representative, relevant and interesting problems to evaluate their proposed solutions in a meaningful way, systematically, repeatably and with statistically significant results. Unfortunately, the knowledge representation and reasoning community at current lacks such a corpus.

Clio Knows is an attempt to construct just such a corpus of knowledge representation and reasoning problems, drawing upon readily available historical real-world events and their interpretations for contents.

Construction Principles

Types of Information in the Corpus

The corpus contains different types of informaton:

  • Questions about historical events, e.g.
    "How did Wellington react to the report of Napoleon's death?"
  • One or more answers for each of the questions, e.g.
    "Wellington cried."
  • One or more explanations (or justifications) for each of the answers given, e.g.
    "Wellingon cried because he admired Napoleon as a general."
    or
    "Wellington cried because he felt an era was coming to an end."
  • Foreground and background knowledge to successfully understand the question and to arrive at an answer with justifications; e.g.
    "Wellington fought Napoleon at the battle of Waterloo in 1815."
    "Great generals admire their opponent generals."
    "Some people weep when people they admire die."
    (The line between foreground and background knowledge is sketchy, but at least intuitively the second and the third element of the justification appear more background-like than the first — maybe because they are applicable in more situations.)

Types of Corpus Contents Representation

While the corpus contains the types of information specified above — questions, answers, justifications and required knowledge — that information is stored in multiple forms or representations. Some of these representations are formal, as is usual in knowledge representation and reasoning; others employ natural language.

  • A colloquial Natural Language representation of the information; the Wellington-Napoleon example we introduce above is of this form
  • Multiple formal language representations of the information, in languages with well-specified semantics. Many of these formal languages will be related to first-order predicate logic or interesting fragments thereof. Notable examples include OWL, CycL, Concept Graphs, Situation Calculus or Prolog.
  • Multiple ontological grounding representations of the information, using one of the formal languages, but taking their vocabulary from different ontologies. Notable examples include ResearchCYC, SUMO, and any of a variety of OWL-compatible ontologies.
  • An explicit Natural-Language representation of the information, that is, a rendering in short natural language sentences that attempts to be as unambiguous as possible. Also, in the case of the justifications for the answers, the representation gives a rather detailed proof sketch. This representation, sometimes informally called English Zero, is intended to bridge the gap between the ambiguity of colloquial natural language and the rigor of formal representations.

Such multiplicity of representations is not a redundancy or even an accident, but one of the research contributions the corpus makes. One of the goals of this corpus is to help investigate which formalisms and ontologies are most suitable for historical research along which dimensions. Providing multiple representations of what aspires to be the same contents supports research that investigates the relative strenghts and weaknesses of different formal languages and/or approaches to ontologizing concepts.

Equally, supporting multiple natural languages (at least by design) assists those forms of natural language processing research that work with parallel corpora.

Organizational Considerations

TBD

How to Contribute

From people with basic literacy to researchers with advanced degrees in history or knowledge representation, almost anyone can contribute to Clio Knows. Here are just some of the ways, in ascending order of required skill:

  • Submit a Problem: Merely submitting a historical question is a big help. If the answer is readily available, then please include it; but if not, submit the question anyway. For extra credit, provide some or all of the sources used by historians to support their answer to the question.
  • Convert a Problem to "English-Zero": Some people are good at hunting down questions and citations, others are good at copy editing and simplifying. Converting a problem to "English-Zero" is one way the latter skill set can be put to good use. This includes suggesting alternate forms of stating the problem; we do not assume a 1:1 correspondence between colloquial and English-Zero formulations.
  • Translate a Problem into another Natural Language: Akin to the conversion to "English-Zero", this contribution includes suggesting alternate forms of stating a problem in another natural language. Again, we make no assumptions about a 1:1 correspondence between the original natural language of a problem and its translations into another natural language.
  • Suggest an alternate Proof: Often, there are multiple ways to get the right answer, and we want to cover as many of them as possible. Usually, this will include either the use of alternate sources or the application of alternate rules and heuristics.
  • Formalize a Problem: Eventually, this corpus should be available in all of the relevant formal representation languages and ontologies. This means, that for any new colloquial problem, there will be n formalizations of that problem.
  • Support a new Formalism: Extending the corpus to include a new formal representation language is a lot of work. Hopefully, much of it could be done via automatic conversion from an existing formal representation; but that still requires conversion tools.

Appendix: Bibliography

 

$Id: index.html,v 1.9 2006/11/18 18:15:05 canonical_chris Exp $