Strus query evaluation configuration source
Language grammar
The following grammar (as EBNF) describes the formal language for describing a query evaluation scheme used by the strus utilities (strusUtilities).
Comments
Comments are starting with # and are reaching to the end of the line. Using # as part of a symbol is possible if it is part of a single or double quoted string.
Handling of spaces
Spaces, control characters and end of lines have no meaning in the language.
Case sensivity/insensivity
Parameter names (keys) of the query evaluation scheme are case insensitive. Keywords and identifiers referring to elements in the storage are case insensitive.
EBNF
IDENTIFIER : [A-Za-z][A-Za-z0-9_]* STRING : <single or double quoted string with backslash escaping> NUMBER : <integer or floating point number in non exponential notation> config = statement ";" config | ; statement = evalexpr | selectexpr | weightexpr | restrictexpr | termdef | evalexpr ; evalexpr = "EVAL" [ NUMBER "*" ] functionname "(" parameterlist ")" ; ; scalarexpr = "FORMULA" STRING ; ; selectexpr = "SELECT" featureset ";" ; weightexpr = "WEIGHT" featureset ";" ; restrictexpr = "RESTRICT" featureset ";" ; termdef = "TERM" featureset termvalue termtype ; evalexpr = "SUMMARIZE" functionname "(" parameterlist ")" ; functionname = IDENTIFIER ; featureset = IDENTIFIER ; termtype = IDENTIFIER ; termvalue = IDENTIFIER | STRING ; parameterlist = parameter { "," parameter } | parameter = parametername "=" parametervalue ; parametername = [ "." ] IDENTIFIER ; parametervalue = IDENTIFIER | STRING | NUMBER ;
Meaning of the grammar elements
functionname
Name of the weighting or summarization function as provided by the query processor.
parametername
Name of the parameter passed to the weighting or summarization function. A parameter name with dot '.' as prefix is specifying a feature parameter declaration. The known names of weighting and summarization function depend on its implementation.
EVAL function
Defines a query evaluation function used for weighting
FORMULA scalar-function
Defines a scalar function (with _0,_1,.. referring to query evaluation function results in order of their definition) used to combine query evaluation function results to one result. If the specified, the different results are just added up to one.
SUMMARIZE function
Defines a summarizer function used for building the results
SELECT featureset
Defines the feature set used for selection of the documents to weight
WEIGHT featureset
Defines the feature set used for weighting
RESTRICT featureset
Defines the feature set used as restriction
Example
The following example declares the feature set 'selfeat' to define what is weighted.
All documents containing the feature 'selfeat' will be selected for ranking.
As weighting function we take the arithmetic sum of the 'bm25' weight of the document plus
3 times the value of the meta data element called 'pageweight'.
For presentation of the result we use the summarizer extracting the title attribute and
taking the content elements of the best matching phrases.
SELECT selfeat; EVAL bm25( k1=0.75, b=2.1, avgdoclen=1000, .match=docfeat ); EVAL metadata( name=pageweight ); FORMULA "0.7 * _1 * _0 + 0.3 * _0"; SUMMARIZE title = attribute( name=title ); SUMMARIZE content = matchphrase( type=orig, nof=4, len=60, structseek=40, .struct=sentence, .match=docfeat );