Strus query language
Language grammar
The query language used by the command line utility strusQuery does not have too many syntax elements. All query syntax elements are optional. A plain text without operators of the strus query language is always a valid query.
Comments
Comments are starting with # and are reaching to the end of the line. Using # as part of a symbol is possible if it is part of a single or double quoted string.
Handling of spaces
Spaces and control characters and end of lines have no meaning in the language.
Case sensivity/insensivity
Keywords and identifiers referring to element types, metadata fields, section names and feature set identifiers are case insensitive. It depends on the query analyzer configuration if query terms are case sensitive or not.
Relation Query Field / Term
A query field is mapped by the query analyzer to one or more query terms. If used in an expression the query terms resulting from one query field are grouped together implicitly in a sensible way, so that resulting expression still corresponds to the original query expression.
Selection feature
If no selection features are explicitly specified, then the query parser defines one from the set of features specified. You can use the operator '~' to mark features that should not get into the set of selection features implicitly specified.
Query elements
A query consists of expressions of query fields that are mapped by the query analyzer to
expressions of query terms. The resulting expressions of query terms are internally represented
as trees that can be translated to query instructions sent to the storage for evaluation.
The original expressions of query fields are parsed from a query string.
Each query field has a type identified by the name that determines how the query analyzer
processes the field.
In the simplest case, we have a query string without any syntactic elements that
are interpreted as a single query field. In this case, the name of the query field type
is determined by the query analyzer program. Default query field names are all
search query fields used in the query analyzer program.
If the query analysis gets more complex, using more than one query field, then
plain text queries getting default query field names assigned may not make sense anymore.
Syntax elements
If you want to form a query beyond the default case of a single query field, you can use the following syntax elements:
':' TYPE
A colon followed by an identifier <TYPE> specifies the previous phrase
or token to have the query field type <TYPE>.
Examples
Hello:WORD Nature:CATEGORY
Basketball Sports:CATEGORY
'~' FIELD
A field following the operator '~' is not considered as
selection feature if the selection features are implicitly defined.
Examples
Hello ~World
NAME compareop TERM {',' TERM}
An identifier followed by a compare operator
(one out of '<=','>=','=','>','=','==','!=') and a term <TERM> or a comma-separated
list of terms specifies a query restriction.
<NAME> is referring to a metadata field and <TERM> to an element to compare
the metadata field with. If you specify more than one <TERM>, then the restriction
condition is true, if one of the lists fulfills the condition.
Examples
Date <= '3/3/1979'
Category = 'Sports','Politics'
OP '(' ARG { ',' ARG } { '|' RANGE } { '^' CARDINALITY } ')'
An identifier followed by an oval bracket '(' starts a join of posting sets.
The Argument features <ARG> are query fields with or without field type name
or expressions themselves.
Arguments are separated by comma ','. At the end of the argument list, you have
the possibility to add a range and a cardinality specifier.
The range specifies the proximity of the terms involved and the cardinality
specifies the number of elements needed for a valid result in the case of operators
selecting a subset of the posting sets represented by the arguments.
Examples
within( "War", "Religion" | 30 )
sequence_struct( :SENTDELIM, "painting", "exhibition" | 30 )
sequence_imm( any("John", "Anne"), "Doe" )
Putting all together
Finally, we present a query with all syntax elements introduced:
Example
Category = 'Sports','Politics' Date <= '3/3/1979' university ~graduate sequence_imm( any("John", "Anne"), "Doe" )