StrusContext
Object holding the global context of the strus information retrieval engine
There a two modes of this context object operating on a different base.
If you create this object without parameter, then the context is local.
In a local context you can load modules, define resources, etc. If you create
this object with a connection string as parameter, then all object created by
this context reside on the server (strusRpcServer) addressed with the connection string.
In this case loaded modules and resources are ignored. What modules to use is then
specified on server startup.
StrusStorageClient
Object representing a client connection to the storage
The only way to construct a storage client instance is to call Context::createStorageClient( config)
StrusStorageTransaction
Object representing a transaction of the storage
The only way to construct a storage transaction instance is to call StorageClient::createTransaction()
StrusInserter
Object representing a client connection to the storage and an analyzer providing inserts of content to be analyzed
The only way to construct an inserter instance is to call Context::createInserter( storage, analyzer)
StrusInserterTransaction
Object representing a transaction of the storage for insering content to be analyzed
The only way to construct an inserter transaction instance is to call Inserter::createTransaction()
StrusVectorStorageSearcher
Object used to search for similar vectors in the collection
The only way to construct a vector storage searcher instance is to call VectorStorageClient::createSearcher( from, to)
StrusVectorStorageClient
Object representing a client connection to a vector storage
The only way to construct a vector storage client instance is to call Context::createVectorStorageClient( config) or Context::createVectorStorageClient()
StrusVectorStorageTransaction
Object representing a vector storage transaction
The only way to construct a vector storage transaction instance is to call VectorStorageClient::createTransaction()
StrusDocumentAnalyzer
Analyzer object representing a program for segmenting,
tokenizing and normalizing a document into atomic parts, that
can be inserted into a storage and be retrieved from there.
StrusQueryAnalyzer
Analyzer object representing a set of function for transforming a field,
the smallest unit in any query language, to a set of terms that can be used
to build a query.
StrusQueryEval
Query evaluation program object representing an information retrieval scheme for documents in a storage.
The only way to construct a query eval instance is to call Context::createQueryEval()
StrusQuery
Query program object representing a retrieval method for documents in a storage.
The only way to construct a query instance is to call QueryEval::createQuery( storage)
new StrusContext
Constructor
Parameter
config
(optional) context configuration. If not defined, create context for local mode with own module loader
[rpc=>"localhost:7181"]
[trace=>"log=dump;file=stdout"]
[threads=>12]
loadModule
Load a module
Parameter
name
name of the module to load
"analyzer_pattern"
"storage_vector"
Remarks
Notes
name
this function is not thread safe and should only be called in the initialization phase before calling endConfig when used in a multithreaded context.
Examples
loadModule( "storage_vector")
Result
addModulePath
Add one or more paths from where to try to load modules from
Parameter
paths
a string or a list of module search paths
["/home/bob/modules", "/home/anne/modules"]
"/home/bob/modules"
Remarks
Notes
paths
this function is not thread safe and should only be called in the initialization phase before calling endConfig when used in a multithreaded context.
Examples
addModulePath( "/home/bob/modules")
Result
addResourcePath
Add a path where to load analyzer resource files from
Parameter
paths
a string or a list of resource search paths
["/home/bob/resources", "/home/anne/resources"]
"/home/bob/resources"
Remarks
Result
defineWorkingDirectory
Define the working directory where files are written to
Parameter
path
a string specifying the working directory
Notes
All paths used for data written must be relative from the working directory, if the working directory is defined
Result
endConfig
End the configuration of the context, creates the object builders
Parameter
Remarks
Result
createStorageClient
Create a storage client instance
Parameter
config
(optional) configuration (string or structure with named elements) of the storage client or undefined, if the default remote storage of the RPC server is chosen
Examples
createStorageClient( )
createStorageClient( "path=/srv/searchengine/storage; metadata=doclen UINT32, date UINT32, docweight FLOAT")
createStorageClient( [path=>"/srv/searchengine/storage", metadata=>"doclen UINT32, date UINT32, docweight FLOAT"])
createStorageClient( [path=>"/srv/searchengine/storage", metadata=>"doclen UINT32, date UINT32, docweight FLOAT", max_open_files=>256, write_buffer_size=>"4K", block_size=>"4M", cache=>"1G"])
Result
storage client interface (class StorageClient) for accessing the storage
createVectorStorageClient
Create a vector storage client instance
Parameter
config
(optional) configuration (string or structure with named elements) of the storage client or undefined, if the default remote vector storage of the RPC server is chosen
Examples
createVectorStorageClient( )
createVectorStorageClient( "path=/srv/searchengine/vecstorage")
createVectorStorageClient( [path=>"/srv/searchengine/vecstorage"])
Result
vector storage client interface (class VectorStorageClient) for accessing the vector storage
createStorage
Create a new storage (physically) described by config
Parameter
config
storage configuration (string or structure with named elements)
Remarks
Examples
createStorage( "path=/srv/searchengine/storage; metadata=doclen UINT32, date UINT32, docweight FLOAT; acl=yes")
createStorage( [path=>"/srv/searchengine/storage", metadata=>"doclen UINT32, date UINT32, docweight FLOAT", acl=>TRUE])
Result
createVectorStorage
Create a new storage (physically) described by config
Parameter
config
storage configuration (string or structure with named elements)
Remarks
Examples
createVectorStorageClient( "path=/srv/searchengine/vecstorage")
createVectorStorageClient( [path=>"/srv/searchengine/vecstorage"])
Result
destroyStorage
Delete the storage (physically) described by config
Parameter
config
storage configuration (string or structure with named elements)
Remarks
Notes
Works also on vector storages
Examples
destroyStorage( "path=/srv/searchengine/storage")
destroyStorage( [path=>"/srv/searchengine/storage"])
Result
storageExists
Tests if the storage described by config exists
Parameter
config
storage configuration (string or structure with named elements)
Notes
Works also on vector storages, it does not distinguish between those
Examples
storageExists( "path=/srv/searchengine/storage")
storageExists( [path=>"/srv/searchengine/storage"])
Result
true, if the storage with this configuration exists
detectDocumentClass
Detect the type of document from its content
Parameter
content
the document content to classify
Examples
detectDocumentClass( "<?xml version='1.0' encoding='UTF-8'?><doc>...</doc>")
Result
the document class
[mimetype=>"application/xml", encoding=>"UTF-8", scheme=>"customer"]
[mimetype=>"application/json", encoding=>"UTF-8"]
createDocumentAnalyzer
Create a document analyzer instance
Parameter
doctype
structure describing the segmenter to use (either document class description structure or segmenter name)
[mimetype=>"application/xml", encoding=>"UTF-8", scheme=>"customer"]
[mimetype=>"application/json", encoding=>"UTF-8"]
[segmenter=>"textwolf"]
"application/json"
"json"
Examples
createDocumentAnalyzer( [mimetype=>"application/xml", encoding=>"UTF-8"])
Result
document analyzer interface (class DocumentAnalyzer)
createQueryAnalyzer
Create a query analyzer instance
Parameter
Examples
Result
query analyzer interface (class QueryAnalyzer)
createQueryEval
Create a query evaluation instance
Parameter
Examples
Result
query evaluation interface (class QueryEval)
createInserter
Create an inserter based on a storage and a document analyzer
Parameter
storage
storage client to insert into
analyzer
document analyzer to use for preparing the documents to insert
Examples
createInserter( $storage, $analyzer)
Result
inserter interface (class Inserter)
unpackStatisticBlob
Unpack a statistics blob retrieved from a storage
Parameter
blob
binary blob with statistics to decode (created by StorageClient:getAllStatistics or StorageClient:getChangeStatistics)
procname
(optional) name of statistics processor to use for decoding the message (use default processor, if not defined)
Result
the statistics structure encoded in the blob passed as argument
close
Force cleanup to circumvent object pooling mechanisms in an interpreter context
Parameter
Result
debug_serialize
Debug method that returns the serialization of the arguments as string
Parameter
arg
structure to serialize as string for visualization (debuging)
deterministic
(optional) true, if output is deterministic
Notes
this function is used for verifying if the deserialization of binding language data structures work as expected
Examples
debug_serialize( [surname=>"John", lastname=>"Doe", company=>[name=>"ACME", url=>"acme.com"]])
Result
the input serialization as string
"open name 'surname' value 'John' name 'lastname' value 'Doe' name 'company' open name 'name' value 'ACME' name 'url' value 'acme.com' close close"
introspection
Introspect a structure starting from a root path
Parameter
path
list of idenfifiers describing the access path to the element to introspect
["queryproc", "weightfunc"]
["weightfunc"]
["env"]
Result
the structure to introspect starting from the path
enableDebugTrace
Enable the debug trace interface for a named component for the current thread
Parameter
component
name of component to enable debug tracing for
Result
disableDebugTrace
Disable the debug trace interface for a named component for the current thread
Parameter
component
name of component to disable debug tracing for
Result
fetchDebugTrace
Fetch all debug trace messages of the current thread
Parameter
Notes
return
Clears all messages stored for the current thread
Result
nofDocumentsInserted
Get the number of documents inserted into this storage
Parameter
Result
the total number of documents
documentFrequency
Get the number of inserted documents where a specific feature occurrs in
Parameter
type
the term type of the feature queried
term
the term value of the feature queried
"John"
"21314"
"Z0 ssd-qx"
Result
the number of documents where the argument feature occurrs in
documentNumber
Get the internal document number from the document identifier
Parameter
docid
document identifier
"doc://2132093"
"http://www.acme.com/pub/acme?D=232133"
Result
internal document number or 0, if no document with this id is inserted
documentForwardIndexTerms
Get an interator on the tuples (value,pos) of the forward index of a given type for a document
Parameter
docno
internal local document number
termtype
term type string
pos
(optional) ordinal start position in forward index (where to start iterating)
Result
iterator on tuples (value,pos)
documentSearchIndexTerms
Get an interator on the tuples (value,tf,firstpos) of the search index of a given type for a document
Parameter
docno
internal local document number
termtype
term type string
Result
iterator on tuples (value,pos)
postings
Get an iterator on the set of postings inserted
Parameter
expression
query term expression
["sequence", 10, ["sequence", 2, ["word", "complet"], ["word", "diff"]], ["sequence", 3, ["word", "you"], ["word", "expect"]]]
["word", "hello"]
["sequence", 2, ["word", "ch"], ["number", "13"]]
restriction
(optional) meta data restrictions
[[["=", "country", 12], ["=", "country", 17]], ["<", "year", "2007"]]
["<", "year", "2002"]
start_docno
(optional) starting document number
Result
iterator on a set of postings
select
Get an iterator on records of selected elements for matching documents starting from a specified document number
Parameter
what
list of items to select: names of document attributes or meta data or "position" for matching positions or "docno" for the document number
["docno", "title", "position"]
expression
query term expression
["within", 5, ["word", "world"], ["word", "money"]]
["word", "hello"]
["sequence", 2, ["word", "ch"], ["number", "13"]]
restriction
(optional) meta data restrictions
[[["=", "country", 12], ["=", "country", 17]], ["<", "year", "2007"]]
["<", "year", "2002"]
start_docno
(optional) starting document number
accesslist
(optional) list of access restrictions (one of them must match)
Result
iterator on a set of postings
termTypes
Get an iterator on the term types inserted
Parameter
Result
iterator on the term types
docids
Get an iterator on the document identifiers inserted
Parameter
Result
docid
Get the document identifier associated with a local document number
Parameter
docno
local document number queried
Result
usernames
Get an iterator on the user names (roles) used in document access restrictions
Parameter
Result
iterator on the user names (roles)
attributeNames
Get the list of inserted document attribute names
Parameter
Result
list of names
["name", "title", "docid"]
metadataNames
Get the list of inserted document metadata names
Parameter
Result
list of names
["date", "ccode", "category"]
getAllStatistics
Get an iterator on message blobs that all statistics of the storage (e.g. feature occurrencies and number of documents inserted)
Parameter
sign
(optional) true = registration, false = deregistration, if false the sign of all statistics is inverted
Notes
The blobs an be decoded with Context::unpackStatisticBlob
Result
iterator on the encoded blobs of the complete statistics of the storage
getChangeStatistics
Get an iterator on message blobs that encode changes in statistics of the storage (e.g. feature occurrencies and number of documents inserted)
Parameter
Notes
The blobs an be decoded with Context::unpackStatisticBlob
Result
iterator on the encoded blobs of the statistic changes of the storage
createTransaction
Create a transaction
Parameter
Result
the transaction object (class StorageTransaction) created
config
Get the configuration of this storage
Parameter
Result
the configuration as structure
[path=>"/srv/searchengine/storage", metadata=>"doclen UINT32, date UINT32, docweight FLOAT"]
configstring
Get the configuration of this storage as string
Parameter
Result
the configuration as string
"path=/srv/searchengine/storage; metadata=doclen UINT32, date UINT32, docweight FLOAT"
close
Close of the storage client
Parameter
Result
introspection
Introspect a structure starting from a root path
Parameter
path
list of idenfifiers describing the access path to the element to introspect
["config"]
["termtypes"]
["attributenames"]
["metadatanames"]
Result
the structure to introspect starting from the path
insertDocument
Prepare the inserting a document into the storage
Parameter
docid
the identifier of the document to insert
doc
the structure of the document to insert (analyzer::Document)
Notes
The document is physically inserted with the call of 'commit()'
Result
deleteDocument
Prepare the deletion of a document from the storage
Parameter
docid
the identifier of the document to delete
Notes
The document is physically deleted with the call of 'commit()'
Result
deleteUserAccessRights
Prepare the deletion of all document access rights of a user
Parameter
username
the name of the user to delete all access rights (in the local collection)
Notes
The user access rights are changed accordingly with the next implicit or explicit call of 'flush'
Result
commit
Commit all insert or delete or user access right change statements of this transaction.
Parameter
Remarks
Result
rollback
Rollback all insert or delete or user access right change statements of this transaction.
Parameter
Result
createTransaction
Create a transaction
Parameter
Result
the transaction object (class InserterTransaction) created
insertDocument
Prepare the inserting a document into the storage
Parameter
docid
the identifier of the document to insert or empty if document id is extracted by analyzer
doc
plain content of the document to analyze and insert
documentClass
(optional) (optional) document class of the document to insert (autodetection if undefined)
Notes
The document is physically inserted with the call of 'commit()'
Result
deleteDocument
Prepare the deletion of a document from the storage
Parameter
docid
the identifier of the document to delete
Notes
The document is physically deleted with the call of 'commit()'
Result
deleteUserAccessRights
Prepare the deletion of all document access rights of a user
Parameter
username
the name of the user to delete all access rights (in the local collection)
Notes
The user access rights are changed accordingly with the next implicit or explicit call of 'flush'
Result
commit
Commit all insert or delete or user access right change statements of this transaction.
Parameter
Remarks
Result
rollback
Rollback all insert or delete or user access right change statements of this transaction.
Parameter
Result
findSimilar
Find the most similar vectors to vector
Parameter
vec
vector to search for (double[])
maxNofResults
maximum number of results to return
Result
the list of most similar vectors (double[])
findSimilarFromSelection
Find the most similar vectors to vector in a selection of features addressed by index
Parameter
featidxlist
list of candidate indices (int[])
vec
vector to search for (double[])
maxNofResults
maximum number of results to return
Result
the list of most similar vectors (double[])
close
Controlled close to free resources (forcing free resources in interpreter context with garbage collector)
Parameter
Result
createSearcher
Create a searcher object for scanning the vectors for similarity
Parameter
range_from
start range of the features for the searcher (possibility to split into multiple searcher instances)
range_to
end of range of the features for the searcher (possibility to split into multiple searcher instances)
Result
the vector search interface (with ownership)
createTransaction
Create a vector storage transaction instance
Parameter
Result
conceptClassNames
Get the list of concept class names defined
Parameter
Result
the list
["flections", "entityrel"]
conceptFeatures
Get the list of indices of features represented by a learnt concept feature
Parameter
conceptClass
name identifying a class of concepts learnt
"flections"
"entityrel"
""
conceptid
index (indices of learnt concepts starting from 1)
Result
the resulting vector indices (index is order of insertion starting from 0)
[2121, 5355, 35356, 214242, 8309732, 32432424]
nofConcepts
Get the number of concept features learnt for a class
Parameter
conceptClass
name identifying a class of concepts learnt.
Result
the number of concept features and also the maximum number assigned to a feature (starting with 1)
featureConcepts
Get the set of learnt concepts of a class for a feature defined
Parameter
conceptClass
name identifying a class of concepts learnt
index
index of vector in the order of insertion starting from 0
Result
the resulting concept feature indices (indices of learnt concepts starting from 1)
[2121, 5355, 35356, 214242, 8309732, 32432424]
featureVector
Get the vector assigned to a feature addressed by index
Parameter
index
index of the feature (starting from 0)
Result
the vector
[0.08721391, 0.01232134, 0.02342453, 0.0011312, 0.0012314, 0.087232243]
featureName
Get the name of a feature by its index starting from 0
Parameter
index
index of the feature (starting from 0)
Result
the name of the feature defined
featureIndex
Get the index starting from 0 of a feature by its name
Parameter
Result
index -1, if not found, else index of the feature to get the name of (index is order of insertion starting with 0)
nofFeatures
Get the number of feature vectors defined
Parameter
Result
config
Get the configuration of this vector storage
Parameter
Result
the configuration as structure
[path=>'storage', commit=>10, dim=>300, bit=>64, var=>32, simdist=>340, maxdist=>640, realvecweights=>1]
configstring
Get the configuration of this vector storage as string
Parameter
Result
the configuration as string
"path=storage;commit=10;dim=300;bit=64;var=32;simdist=340;maxdist=640;realvecweights=1"
close
Controlled close to free resources (forcing free resources in interpreter context with garbage collector)
Parameter
Result
addFeature
Add named feature to vector storage
Parameter
name
unique name of the feature added
vec
vector assigned to the feature
[0.08721391, 0.01232134, 0.02342453, 0.0011312, 0.0012314, 0.087232243]
Result
defineFeatureConceptRelation
Assign a concept (index) to a feature referenced by index
Parameter
conceptClass
name of the relation
featidx
index of the feature
conidx
index of the concept
Result
commit
Commit of the transaction
Parameter
Remarks
Result
rollback
Rollback of the transaction
Parameter
Result
close
Controlled close to free resources (forcing free resources in interpreter context with garbage collector)
Parameter
Result
addSearchIndexFeature
Define a feature to insert into the inverted index (search index) is selected, tokenized and normalized
Parameter
type
type of the features produced (your choice)
selectexpr
expression selecting the elements to fetch for producing this feature
"/doc/text//()"
"/doc/user@id"
"/doc/text[@lang='en']//()"
tokenizer
tokenizer function description to use for this feature
"split"
["regex", "[0-9]+"]
normalizers
list of normalizer function descriptions to use for this feature in the ascending order of appearance
"uc"
["lc", ["convdia", "en"]]
["date2int", "d", "%Y-%m-%d"]
options
(optional) a list of option strings, one of {"content" => feature has own position, "unique" => feature gets position but sequences or "unique" features without "content" features in between are mapped to one position, "pred" => the position is bound to the preceeding feature, "succ" => the position is bound to the succeeding feature}
"content"
"unique"
"succ"
"pred"
Examples
addSearchIndexFeature( "word", "/doc/elem", "word", ["lc", ["stem", "en"]])
Result
addForwardIndexFeature
Define a feature to insert into the forward index (for summarization) is selected, tokenized and normalized
Parameter
type
type of the features produced
selectexpr
expression selecting the elements to fetch for producing this feature
"/doc/text//()"
"/doc/user@id"
"/doc/text[@lang='en']//()"
tokenizer
tokenizer function description to use for this feature
"split"
["regex", "[0-9]+"]
normalizers
list of normalizer function descriptions to use for this feature in the ascending order of appearance
"uc"
["lc", ["convdia", "en"]]
["date2int", "d", "%Y-%m-%d"]
options
(optional) a list of options, one of {"content" => feature has own position, "unique" => feature gets position but sequences or "unique" features without "content" features in between are mapped to one position, "pred" => the position is bound to the preceeding feature, "succ" => the position is bound to the succeeding feature}
"content"
"unique"
"succ"
"pred"
Result
addPatternLexem
Declare an element to be used as lexem by post processing pattern matching but not put into the result of document analysis
Parameter
type
term type name of the lexem to be feed to the pattern matching
selectexpr
an expression that decribes what elements are taken from a document for this feature (tag selection in abbreviated syntax of XPath)
"/doc/text//()"
"/doc/user@id"
"/doc/text[@lang='en']//()"
tokenizer
tokenizer (ownership passed to this) to use for this feature
"split"
["regex", "[0-9]+"]
normalizers
list of normalizers (element ownership passed to this) to use for this feature
"uc"
["lc", ["convdia", "en"]]
["date2int", "d", "%Y-%m-%d"]
Result
defineMetaData
Define a feature to insert as meta data is selected, tokenized and normalized
Parameter
fieldname
name of the addressed meta data field.
selectexpr
expression selecting the elements to fetch for producing this feature
"/doc/text//()"
"/doc/user@id"
"/doc/text[@lang='en']//()"
tokenizer
tokenizer function description to use for this feature
"split"
["regex", "[0-9]+"]
normalizers
list of normalizer function descriptions to use for this feature in the ascending order of appearance
"uc"
["lc", ["convdia", "en"]]
["date2int", "d", "%Y-%m-%d"]
Result
defineAggregatedMetaData
Declare some aggregated value of the document to be put into the meta data table used for restrictions, weighting and summarization.
Parameter
fieldname
name of the addressed meta data field.
function
defining how and from what the value is aggregated
Result
defineAttribute
Define a feature to insert as document attribute (for summarization) is selected, tokenized and normalized
Parameter
attribname
name of the addressed attribute.
selectexpr
expression selecting the elements to fetch for producing this feature
"/doc/text//()"
"/doc/user@id"
"/doc/text[@lang='en']//()"
tokenizer
tokenizer function description to use for this feature
"split"
["regex", "[0-9]+"]
normalizers
list of normalizer function descriptions to use for this feature in the ascending order of appearance
"uc"
["lc", ["convdia", "en"]]
["date2int", "d", "%Y-%m-%d"]
Result
addSearchIndexFeatureFromPatternMatch
Define a result of pattern matching as feature to insert into the search index, normalized
Parameter
type
type name of the feature to produce.
patternTypeName
name of the pattern to select
normalizers
list of normalizer function descriptions to use for this feature in the ascending order of appearance
"uc"
["lc", ["convdia", "en"]]
["date2int", "d", "%Y-%m-%d"]
options
(optional) a list of option strings, one of {"content" => feature has own position, "unique" => feature gets position but sequences or "unique" features without "content" features in between are mapped to one position, "pred" => the position is bound to the preceeding feature, "succ" => the position is bound to the succeeding feature}
"content"
"unique"
"succ"
"pred"
Result
addForwardIndexFeatureFromPatternMatch
Define a result of pattern matching as feature to insert into the forward index, normalized
Parameter
type
type name of the feature to produce.
patternTypeName
name of the pattern to select
normalizers
list of normalizer function descriptions to use for this feature in the ascending order of appearance
"uc"
["lc", ["convdia", "en"]]
["date2int", "d", "%Y-%m-%d"]
options
(optional) a list of options, elements one of {"BindPosPred" => the position is bound to the preceeding feature, "BindPosSucc" => the position is bound to the succeeding feature}
"content"
"unique"
"succ"
"pred"
Result
defineMetaDataFromPatternMatch
Define a result of pattern matching to insert as metadata, normalized
Parameter
fieldname
field name of the meta data element to produce.
patternTypeName
name of the pattern to select
normalizers
list of normalizer function descriptions to use for this feature in the ascending order of appearance
"uc"
["lc", ["convdia", "en"]]
["date2int", "d", "%Y-%m-%d"]
Result
defineAttributeFromPatternMatch
Define a result of pattern matching to insert as document attribute, normalized
Parameter
attribname
name of the document attribute to produce.
patternTypeName
name of the pattern to select
normalizers
list of normalizer function descriptions to use for this feature in the ascending order of appearance
"uc"
["lc", ["convdia", "en"]]
["date2int", "d", "%Y-%m-%d"]
Result
definePatternMatcherPostProc
Declare a pattern matcher on the document features after other query analysis
Parameter
patternTypeName
name of the type to assign to the pattern matching results
patternMatcherModule
module id of pattern matcher to use (empty string for default)
lexems
list of all lexems generated by the feeder (analyzer)
"word"
["word", "number"]
patterns
structure with all patterns
Result
definePatternMatcherPostProcFromFile
Declare a pattern matcher on the document features after other query analysis
Parameter
patternTypeName
name of the type to assign to the pattern matching results
patternMatcherModule
module id of pattern matcher to use (empty string for default)
serializedPatternFile
path to file with serialized (binary) patterns
"/srv/strus/patterns.bin"
Result
defineSubDocument
Declare a sub document for the handling of multi part documents in an analyzed content or documents of different types with one configuration
Parameter
subDocumentTypeName
type name assinged to this sub document
selectexpr
an expression that defines the content of the sub document declared
Notes
Sub documents are defined as the sections selected by the expression plus some data selected not belonging to any sub document.
Result
defineSubContent
Declare a sub parrt of a document with a different document class, needing a switching of the segmenter
Parameter
selectexpr
an expression that defines the area of the sub content
documentClass
document class of the content, determines what segmenter to use for this part
[mimetype=>"application/json", encoding=>"UTF-8"]
Result
analyzeSingle
Analye a content and return the analyzed document structure (analyzing single document)
Parameter
content
content string (NOT a file name !) of the document to analyze
"<?xml version='1.0' encoding='UTF-8' standalone=yes?><doc>...</doc>"
documentClass
(optional) document class of the document to analyze, if not specified the document class is guessed from the content with document class detection
[mimetype=>"application/xml", encoding=>"UTF-8", scheme=>"customer"]
[mimetype=>"application/json", encoding=>"UTF-8"]
Result
structure of the document analyzed (sub document type names, search index terms, forward index terms, metadata, attributes)
analyzeMultiPart
Analye a content and return the analyzed document structures as iterator (analyzing multipart document)
Parameter
content
content string (NOT a file name !) with the documents to analyze
"<?xml version='1.0' encoding='UTF-8' standalone=yes?><doc>...</doc>"
documentClass
(optional) document class of the document set to analyze, if not specified the document class is guessed from the content with document class detection
[mimetype=>"application/xml", encoding=>"UTF-8", scheme=>"customer"]
[mimetype=>"application/json", encoding=>"UTF-8"]
Notes
If you are not sure if to use analyzeSingle or analyzeMultiPart, then take analyzeMultiPart, because it covers analyzeSingle, returning an iterator on a set containing the single document only
Result
iterator on structures of the documents analyzed (sub document type names, search index terms, forward index terms, metadata, attributes)
introspection
Introspect a structure starting from a root path
Parameter
path
list of idenfifiers describing the access path to the element to introspect
Result
the structure to introspect starting from the path
addElement
Defines an element (term, metadata) of query analysis.
Parameter
featureType
element feature type created from this field type
fieldType
name of the field type defined
tokenizer
tokenizer function description to use for the features of this field type
"content"
"word"
["regex", "[A-Za-z]+"]
normalizers
list of normalizer function descriptions to use for the features of this field type in the ascending order of appearance
"uc"
["lc", ["convdia", "en"]]
["date2int", "d", "%Y-%m-%d"]
Result
addElementFromPatternMatch
Defines an element from a pattern matching result.
Parameter
type
element type created from this pattern match result type
patternTypeName
name of the pattern match result item
normalizers
list of normalizer functions
Result
addPatternLexem
Declare an element to be used as lexem by post processing pattern matching but not put into the result of query analysis
Parameter
termtype
term type name of the lexem to be feed to the pattern matching
fieldtype
type of the field of this element in the query
tokenizer
tokenizer function description to use for the features of this field type
"content"
"word"
["regex", "[A-Za-z]+"]
normalizers
list of normalizer function descriptions to use for the features of this field type in the ascending order of appearance
"uc"
["lc", ["convdia", "en"]]
Result
definePatternMatcherPostProc
Declare a pattern matcher on the query features after other query analysis
Parameter
patternTypeName
name of the type to assign to the pattern matching results
patternMatcherModule
module id of pattern matcher to use (empty string for default)
lexems
list of all lexems generated by the feeder (analyzer)
["word", "number", "name"]
"word"
patterns
structure with all patterns
Result
definePatternMatcherPostProcFromFile
Declare a pattern matcher on the query features after other query analysis
Parameter
patternTypeName
name of the type to assign to the pattern matching results
patternMatcherModule
module id of pattern matcher to use (empty string for default)
serializedPatternFile
path to file with serialized (binary) patterns
Result
defineImplicitGroupBy
Declare an implicit grouping operation for a query field type. The implicit group operation is always applied when more than one term are resulting from analysis of this field to ensure that you get only one node in the query from it.
Parameter
fieldtype
name of the field type where this grouping operation applies
groupBy
kind of selection of the arguments grouped ("position": elements with same position get their own group, "all" (or "" default): all elements of the field get into one group
opname
query operator name generated as node for grouping
range
(optional) positional range attribute for the node used for grouping (0 for no range)
cardinality
(optional) cardinality attribute for the node used for grouping (0 for all)
Result
declareElementPriority
That all query elements assigned to a feature type get a priority that causes the elimination of all elements with a lower priority that are completely covered by a single element of this type.
Parameter
priority
priority value assigned to 'type'
Result
analyzeTermExpression
Analye a term expression
Parameter
expression
query term expression tree
["within", 5, ["word", "Worlds"], ["word", "powers"]]
["word", "PUBLIC"]
Result
structure analyzed
["within", 5, ["word", "world"], ["word", "power"]]
["word", "public"]
analyzeSingleTermExpression
Analye a unique term expression resulting in a single and unique result
Parameter
expression
query term expression tree
["within", 5, ["word", "Worlds"], ["word", "powers"]]
["word", "PUBLIC"]
Remarks
Result
structure analyzed
["within", 5, ["word", "world"], ["word", "power"]]
["word", "public"]
analyzeMetaDataExpression
Analye a metadata expression
Parameter
expression
query metadata expression tree
["<", "year", "26.9.2017"]
Result
introspection
Introspect a structure starting from a root path
Parameter
path
list of idenfifiers describing the access path to the element to introspect
Result
the structure to introspect starting from the path
addTerm
Declare a term that is used in the query evaluation as structural element without beeing part of the query (for example punctuation used for match fields summarization)
Parameter
set
identifier of the term set that is used to address the terms
type
feature type of the of the term
value
feature value of the of the term
Result
addSelectionFeature
Declare a feature set to be used as selecting feature
Parameter
set
identifier of the term set addressing the terms to use for selection
Result
addRestrictionFeature
Declare a feature set to be used as restriction
Parameter
set
identifier of the term set addressing the terms to use as restriction
Result
addExclusionFeature
Declare a feature set to be used as exclusion
Parameter
set
identifier of the term set addressing the terms to use as exclusion
Result
addSummarizer
Declare a summarizer
Parameter
name
the name of the summarizer to add
"matchphrase"
"matchpos"
"attribute"
parameter
the parameters of the summarizer to add (parameter name 'debug' reserved for declaring the debug info attribute)
[sentencesize=>40, windowsize=>100, cardinality=>5]
resultnames
(optional) the mapping of result names
Examples
addSummarizer( "attribute", [["name", "docid"], ["debug", "debug_attribute"]])
addSummarizer( "metadata", [["name", "cross"], ["debug", "debug_metadata"]])
Result
addWeightingFunction
Add a weighting function to use as summand of the total document weight
Parameter
name
the name of the weighting function to add
parameter
the parameters of the weighting function to add
[b=>0.75, k=>1.2, avgdoclen=>1000, match=>[feature=>"seek"], debug=>"debug_weighting"]
Examples
addWeightingFunction( "tf", [match=>[feature=>"seek"], debug=>"debug_weighting"])
Result
defineWeightingFormula
Define the weighting formula to use for calculating the total weight from the weighting function results (sum of the weighting function results is the default)
Parameter
source
of the weighting formula
defaultParameter
(optional) default parameter values
Examples
defineWeightingFormula( "_0 / _1")
Result
createQuery
Create a query to instantiate based on this query evaluation scheme
Parameter
storage
storage to execute the query on
Result
addFeature
Create a feature from the query expression passed
Parameter
set
name of the feature set, this feature is addressed with
expr
query expression that defines the postings of the feature and the variables attached
["contains", 0, 1, ["word", "hello"], ["word", "world"]]
[from=>"title_start", to=>"title_end"]
weight
(optional) individual weight of the feature in the query
Remarks
Examples
addFeature( "select", ["contains", 0, 1, ["word", "hello"], ["word", "world"]])
addFeature( "titlefield", [from=>"title_start", to=>"title_end"])
Result
addMetaDataRestriction
Define a meta data restriction
Parameter
expression
meta data expression tree interpreted as CNF (conjunctive normalform "AND" of "OR"s)
[[["=", "country", 12], ["=", "country", 17]], ["<", "year", "2007"]]
["<", "year", "2002"]
Notes
expression
leafs of the expression tree are 3-tuples of the form {operator,name,operand} with
operator: one of "=","!=",">=","<=","<",">"
name: name of meta data element
value: numeric value to compare with the meta data field (right side of comparison operator)
if the tree has depth 1 (single node), then it is interpreted as single condition
if the tree has depth 2 (list of nodes), then it is interpreted as intersection "AND" of its leafs
an "OR" of conditions without "AND" is therefore expressed as list of list of structures, e.g. '[[["<=","date","1.1.1970"], [">","weight",1.0]]]' <=> 'date <= "1.1.1970" OR weight > 1.0' and '[["<=","date","1.1.1970"], [">","weight",1.0]]' <=> 'date <= "1.1.1970" AND weight > 1.0'
Result
defineTermStatistics
Define term statistics to use for a term for weighting it in this query
Parameter
type
query term type name
stats
the structure with the statistics to set
Examples
defineTermStatistics( "word", "game", [df=>74653])
Result
defineGlobalStatistics
Define the global statistics to use for weighting in this query
Parameter
stats
the structure with the statistics to set
Result
addDocumentEvaluationSet
Define a set of documents the query is evaluated on. By default the query is evaluated on all documents in the storage
Parameter
docnolist
list of documents to evaluate the query on (array of positive integers)
[1, 23, 2345, 3565, 4676, 6456, 8855, 12203]
Result
setMaxNofRanks
Set number of ranks to evaluate starting with the first rank (the maximum size of the result rank list)
Parameter
maxNofRanks
maximum number of results to return by this query
Result
setMinRank
Set the index of the first rank to be returned
Parameter
minRank
index of the first rank to be returned by this query
Result
addAccess
Allow read access to documents having a specific ACL tag
Parameter
rolelist
Add ACL tag or list of ACL tags that selects documents to be candidates of the result
["public", "devel", "sys"]
Notes
If no ACL tags are specified, then all documents are potential candidates for the result
Result
setWeightingVariables
Assign values to variables of the weighting formula
Parameter
parameter
parameter values (map of variable names to floats)
Result
setDebugMode
Switch on debug mode that creates debug info of query evaluation methods and summarization as attributes of the query result
Parameter
debug
true if switched on, false if switched off (default off)
Notes
Debug attributes are specified in the declaration of summarizers and weighting functions (3rd parameter of QueryEval::addSummarizer and QueryEval::addWeightingFunction)
Result
evaluate
Evaluate this query and return the result
Parameter
Result
the result (strus::QueryResult)
tostring
Map the contents of the query to a readable string
Parameter
Result