What:
Limit myself to emails and surrounding for now. Some basic features:
- simple and complex queries to the database
- control of the order that results are returned in
- context of previous statements taken into account
- contact list management
How:
Database:
I will use an RDF (Joseki server/database) with SPARQL queries to access it.
- why RDF?
- very easy to describe a very large amount of different datatypes
- similar to SQL in that it has a structured query language
- algebraic query language
- on the cutting edge of the semantic web
- the same database type as used by dbpedia, so very easy access to lots more data
- properties are named so default grammars can be built easily as a base
From English to SPARQL:
- top down parser
- very simple grammar rules (for now)
- ex: words more words %type name% and more words %and another type% => %result% (fn(x, y) => z)
- x of type %type name%, y of type %and another type% and z of %result%
- all of the words are matched exactly with each sub expression matched by type
- the types are all loosely defined
- functions inputs and outputs are of the type(at the programming language level) string or rdf data(a string)
How is this different from literature?
- focus on small domain. In the implementation, I am implementing only viewing and manipulation of email. However, the system is being designed with a much larger but still very limited domain, functional and practical interaction with a computer. This includes asking the computer for information (personal as well as common knowledge) and giving the computer commands. This is a much smaller domain than that of a general case translator for example. No flies fly like shit here. (not entirely true once you get to accessing all knowledge on the internet but that is way way off)
- syntax and grammar not with individual words and their parts of speech, but with exact phrases and a small set of data types
- its not that different. Many of the algorithms and optimizations that are used to work with individual words and their part of speech will still apply on this level.
What to compare this to?
- I will compare this method to a keyword system.
- If it is not obvious how different the two are, I will implement a simple keyword system for benchmarking
- It may be very obvious. In either case, some kind of comparison will be made.
- Is there a system which would be better to compare to?
- One that might be more challenging to compete with?