Independent Project.Matching

Formal Grammar Matcher:

takes in as arguments:

returns:

returns array of nodes


Some problems to solve:


Multiple matches:

Required data:


Dealing with punctuation and whitespace variations

The following match is applied to all incoming queries to deal with whitespace s/s+/ /; or gsub("%s+", " ")

How to deal with situations like:

When will extra spaces or punctuation be important?

Still, what about when is white space necessary?

What about this type of case:

you don't want just '%number%-%number%', you want '%number%-%number% but not '%tens place number%-%single digit number%'


Complex matching

Abstracting above match types

Abstractions that would be useful:

Simplify this to:

Matching types and the existence of attributes
Matching data and the values of their properties

This can be addressed by allowing the definition of a function which is called when a match is being tested. The environment would provide functions which execute a query and return its results.

Some examples of what you would want to use this for:

This function is called when testing to see if a rule matches, if the simple text match already matched. If this function returns true it actually is a match, if it returns false it overrides the simple text match.

This actually requires a new type of match type. Say the case of matching numbers entered as actual ascii digits. What do you match to?


Need to be able to match against parts of the RDF database. Ex:

... ?person contacts:person ?name ... 
all of the ?name results can be matched against.

or

?contact ex:name ?name .
?contact ex:email ?email ...
match ?name as a %person% type.  Result is ?email %email address% type.
match: "%person%"
match-sparql: "/intro/ SELECT ?email WHERE { ?contact ex:name %person% . ?contact ex:email ?email }"

How does this type of search work?

Optimizations

The search is a complete search. It finds all possible matches, not just one. This makes sense now, but an optimization algorithm here is going to be closely related to the algorithm that predicts how to interpret each ambiguous statement.


Old issues (already dealt with) - here for completeness and reference

Can a rule have multiple match types?

How does this effect the performance? * it seems like it may effect performance, but is really unavoidable. This abstraction is really useful. Tests could be done later to determine if expanding rules with multiple matchtypes to the equivalent set of multiple rules.

Is there any other way to handle this type of abstraction?