Command line utility programs
This document lists some utility programs with description, most of them are implemented in the project strusUtilities.
Languages used by utility programs
Some utility programs are based on source files in a proprietary language. But the functionality expressed with these domain specific languages is not a parallel universe. All languages map to calls of the strus core and analyzer API. All loading of programs is implemented as calls of the program loader interface.
Document analyzer program
The grammar of the sources referred to as document analyzer programs by some utility programs are defined here (document analyzer program grammar).
Query analyzer program
The grammar of the sources referred to as query analyzer programs by some utility programs are defined here (query analyzer program grammar).
Query evaluation program
The grammar of the sources referred to as query evaluation programs by some utility programs are defined here (query evaluation program grammar).
Query language
The language used by utility programs for search queries is here (query language grammar).
List of utility programs
- strusCreate
Create a strus storage. (implemented in the project strusUtilities)usage: strusCreate [options] description: Creates a storage with its key value store database. options: -h|--help Print this usage and do nothing else -v|--version Print the program version and do nothing else --license Print 3rd party licences requiring reference -m|--module <MOD> Load components from module <MOD> -M|--moduledir <DIR> Search modules to load first in <DIR> -s|--storage <CONFIG> Define the storage configuration string as <CONFIG> <CONFIG> is a semicolon ';' separated list of assignments: path=<LevelDB storage path>;compression=<yes/no> acl=<yes/no, yes if users with different access rights exist> metadata=<comma separated list of meta data def> -S|--configfile <FILENAME> Define the storage configuration file as <FILENAME> <FILENAME> is a file containing the configuration string -T|--trace <CONFIG> Print method call traces configured with <CONFIG> Example: -T "log=dump;file=stdout"
- strusDestroy
Remove a strus storage and all its files. (implemented in the project strusUtilities)usage: strusDestroy [options] description: Removes an existing storage database. options: -h|--help Print this usage and do nothing else -v|--version Print the program version and do nothing else --license Print 3rd party licences requiring reference -m|--module <MOD> Load components from module <MOD> -M|--moduledir <DIR> Search modules to load first in <DIR> -s|--storage <CONFIG> Define the storage configuration string as <CONFIG> <CONFIG> is a semicolon ';' separated list of assignments: path=<LevelDB storage path> create=<yes/no, yes=do create if database does not exist yet> cache=<size of LRU cache for LevelDB> compression=<yes/no> max_open_files=<maximum number of open files for LevelDB> write_buffer_size=<Amount of data to build up in memory per file> block_size=<approximate size of user data packed per block> -S|--configfile <FILENAME> Define the storage configuration file as <FILENAME> <FILENAME> is a file containing the configuration string -T|--trace <CONFIG> Print method call traces configured with <CONFIG> Example: -T "log=dump;file=stdout"
- strusInspect
Inspect elements of items inserted in a strus storage. (implemented in the project strusUtilities)usage: strusInspect [options] <what...> <what> : what to inspect: "pos" <type> <value> [<doc-id/no>] = Get the list of positions for a search index term. If document is not specified then dump value for all docs. "ff" <type> <value> [<doc-id/no>] = Get the feature frequency for a search index feature If document is not specified then dump value for all docs. "df" <type> <value> = Get the document frequency for a search index feature "ttf" <type> [<doc-id/no>] = Get the term type frequency in a document If document is not specified then dump value for all docs. "ttc" <type> [<doc-id/no>] = Get the term type count (distinct) in a document If document is not specified then dump value for all docs. "featuretypes" = Get list of feature types in the index "indexterms" <type> [<doc-id/no>] = Get the list of tuples of term value, first position and ff for a search index term type. If document is not specified then dump value for all docs. "nofdocs" = Get the number of documents in the storage "maxdocno" = Get the maximum document number allocated in the storage "metadata" <name> [<doc-id/no>] = Get the value of a meta data element If document is not specified then dump value for all docs. "metatable" = Get the schema of the meta data table "attribute" <name> [<doc-id/no>] = Get the value of a document attribute If document is not specified then dump value for all docs. "attrnames" = Get the list of all attribute names defined for the storage "content" <type> [<doc-id/no>] = Get the content of the forward index for a type If document is not specified then dump content for all docs. "fwstats" <type> [<doc-id/no>] = Get the statistis of the forward index for a type If document is not specified then dump value for all docs. "fwmap" <type> [<doc-id/no>] = Print a map docno to forward index element for a type If document is not specified then dump value for all docs. "token" <type> <doc-id/no> = Get the list of terms in the forward index for a type "docno" <docid> = Get the internal document number for a document id "config" = Get the configuration the storage was created with description: Inspect some data in the storage. options: -h|--help Print this usage and do nothing else -v|--version Print the program version and do nothing else --license Print 3rd party licences requiring reference -m|--module <MOD> Load components from module <MOD> -M|--moduledir <DIR> Search modules to load first in <DIR> -r|--rpc <ADDR> Execute the command on the RPC server specified by <ADDR> -s|--storage <CONFIG> Define the storage configuration string as <CONFIG> <CONFIG> is a semicolon ';' separated list of assignments: path=<LevelDB storage path> create=<yes/no, yes=do create if database does not exist yet> cache=<size of LRU cache for LevelDB> compression=<yes/no> max_open_files=<maximum number of open files for LevelDB> write_buffer_size=<Amount of data to build up in memory per file> block_size=<approximate size of user data packed per block> cachedterms=<file with list of terms to cache> -T|--trace <CONFIG> Print method call traces configured with <CONFIG> Example: -T "log=dump;file=stdout" -A|--attribute <NAME> Print attribute with name <NAME> for lists of results instead of docno
- strusAnalyze
Dump the document analyze result without feeding the storage. This program can be used to check the result of the document analysis. (implemented in the project strusUtilities)usage: strusAnalyze [options] <program> <document> <program> = path of analyzer program <document> = path of document to analyze ('-' for stdin) description: Analyzes a document and dumps the result to stdout. options: -h|--help Print this usage and do nothing else -v|--version Print the program version and do nothing else --license Print 3rd party licences requiring reference -m|--module <MOD> Load components from module <MOD> -M|--moduledir <DIR> Search modules to load first in <DIR> -r|--rpc <ADDR> Execute the command on the RPC server specified by <ADDR> -T|--trace <CONFIG> Print method call traces configured with <CONFIG> Example: -T "log=dump;file=stdout" -R|--resourcedir <DIR> Search resource files for analyzer first in <DIR> -g|--segmenter <NAME> Use the document segmenter with name <NAME> -C|--contenttype <CT> forced definition of the document class of the document analyzed. -D|--dump <DUMPCFG> Dump ouput according <DUMPCFG>. <DUMPCFG> is a comma separated list of types or type value assignments. A type in <DUMPCFG> specifies the type to dump. A value an optional replacement of the term value. This kind of output is suitable for content analysis.
- strusAnalyzePhrase
Call the query analyzer with a phrase to analyze. This program can also be used to check details of the document analyzer as it tokenizes and normalizes a text segment with the tokenizer and normalizer specified. (implemented in the project strusUtilities)usage: strusAnalyze [options] <phrase> <phrase> = path to phrase to analyze file or '-' for stdin if option -F is specified) description: tokenizes and normalizes a text segment and prints the result to stdout. options: -h|--help Print this usage and do nothing else -v|--version Print the program version and do nothing else --license Print 3rd party licences requiring reference -m|--module <MOD> Load components from module <MOD> -M|--moduledir <DIR> Search modules to load first in <DIR> -R|--resourcedir <DIR> Search resource files for analyzer first in <DIR> -t|--tokenizer <CALL> Use the tokenizer <CALL> (default 'content') -n|--normalizer <CALL> Use the normalizer <CALL> (default 'orig') -q|--quot <STR> Use the string <STR> as quote for the result (default "'") -P|--plain Do not print position and define default quotes as empty -F|--fileinput Interpret phrase argument as a file name containing the input -T|--trace <CONFIG> Print method call traces configured with <CONFIG> Example: -T "log=dump;file=stdout"
- strusAnalyzeQuery
Call the query analyzer with a query to analyze. (implemented in the project strusUtilities)usage: strusAnalyzeQuery [options] <program> <query> <program> = path of analyzer program <query> = query content to analyze file or '-' for stdin if option -F is specified) description: Analyzes a query and dumps the result to stdout. options: -h|--help Print this usage and do nothing else -v|--version Print the program version and do nothing else --license Print 3rd party licences requiring reference -m|--module <MOD> Load components from module <MOD> -M|--moduledir <DIR> Search modules to load first in <DIR> -R|--resourcedir <DIR> Search resource files for analyzer first in <DIR> -r|--rpc <ADDR> Execute the command on the RPC server specified by <ADDR> -F|--fileinput Interpret query argument as a file name containing the input -T|--trace <CONFIG> Print method call traces configured with <CONFIG> Example: -T "log=dump;file=stdout"
- strusSegment
Call the segmenter with a document and one or more expressions to exract with the segmenter. Dump the resulting segments to stdout. (implemented in the project strusUtilities)usage: strusSegment [options] <document> <document> = path to document to segment ('-' for stdin) description: Segments a document with the expressions (-e) specified and dumps the resulting segments to stdout. options: -h|--help Print this usage and do nothing else -v|--version Print the program version and do nothing else --license Print 3rd party licences requiring reference -m|--module <MOD> Load components from module <MOD> -M|--moduledir <DIR> Search modules to load first in <DIR> -g|--segmenter <NAME> Use the document segmenter with name <NAME> (default textwolf XML) -C|--contenttype <CT> forced definition of the document class of the document processed. -e|--expression <EXPR> Use the expression <EXPR> to select documents (default '//()') -i|--index Print the indices of the expressions matching as prefix with ':' -p|--position Print the positions of the expressions matching as prefix -q|--quot <STR> Use the string <STR> as quote for the result (default "'") -P|--prefix <STR> Use the string <STR> as prefix for the result -E|--esceol Escape end of line with space -T|--trace <CONFIG> Print method call traces configured with <CONFIG> Example: -T "log=dump;file=stdout"
- strusPatternMatcher
Processes some documents mit a pattern matcher and output all matches found.
(implemented in the project strusUtilities).usage: strusPatternMatch [options] <inputpath> <inputpath> : input file or directory to process description: Runs pattern matching on the input documents and dumps the result to stdout. options: -h|--help Print this usage and do nothing else -v|--version Print the program version and do nothing else --license Print 3rd party licences requiring reference -m|--module <MOD> Load components from module <MOD> The module modstrus_analyzer_pattern is implicitely defined -M|--moduledir <DIR> Search modules to load first in <DIR> -R|--resourcedir <DIR> Search resource files for analyzer first in <DIR> -K|--tokens Print the tokenization used for pattern matching too -t|--threads <N> Set <N> as number of inserter threads to use -x|--ext <FILEEXT> Do only process files with extension <FILEEXT> -C|--contenttype <CT> forced definition of the document class of all documents processed. -e|--expression <EXP> Define a selection expression <EXP> for the content to process (default if nothing specified is "//()" -H|--markup <NAME> Output the content with markups of the rules or variables with name <NAME> -Z|--marker <MRK> Define a character sequence inserted before every result declaration -X|--lexer <LX> Use pattern lexer named <LX> Default is 'std' -Y|--matcher <PT> Use pattern lexer named <PT> Default is 'std' -p|--program <PRG> Load program <PRG> with patterns to process -o|--output <FILE> Write output to file <FILE> (thread id is inserted before '.' with threads) -g|--segmenter <NAME> Use the document segmenter with name <NAME> -r|--rpc <ADDR> Execute the command on the RPC server specified by <ADDR> -T|--trace <CONFIG> Print method call traces configured with <CONFIG> Example: -T "log=dump;file=stdout"
- strusPatternSerialize
Loads a pattern match program and outputs it in a serialized form that can be loaded by the analyzer.
(implemented in the project strusUtilities).usage: strusPatternSerialize [options] <program> description: Loads a pattern matcher program source in file <program> and outputs its serialization. options: -h|--help Print this usage and do nothing else -v|--version Print the program version and do nothing else --license Print 3rd party licences requiring reference -m|--module <MOD> Load components from module <MOD> The module modstrus_analyzer_pattern is implicitely defined -M|--moduledir <DIR> Search modules to load first in <DIR> -R|--resourcedir <DIR> Search resource files for analyzer first in <DIR> -F|--feeder Assume program with feeder (post analyzer processing) -o|--output <FILE> Write output to file <FILE>. Do text output to stdout if not specified. -r|--rpc <ADDR> Execute the command on the RPC server specified by <ADDR> -T|--trace <CONFIG> Print method call traces configured with <CONFIG> Example: -T "log=dump;file=stdout"
- strusInsert
Insert a document or all files in a directory or in any descendant directory of it. (implemented in the project strusUtilities)usage: strusInsert [options] <program> <docpath> <program> = path of analyzer program or analyzer map program <docpath> = path of document or directory to insert description: Insert a document or a set of documents into a storage. options: -h|--help Print this usage and do nothing else -v|--version Print the program version and do nothing else --license Print 3rd party licences requiring reference -s|--storage <CONFIG> Define the storage configuration string as <CONFIG> <CONFIG> is a semicolon ';' separated list of assignments: path=<LevelDB storage path> create=<yes/no, yes=do create if database does not exist yet> cache=<size of LRU cache for LevelDB> compression=<yes/no> max_open_files=<maximum number of open files for LevelDB> write_buffer_size=<Amount of data to build up in memory per file> block_size=<approximate size of user data packed per block> cachedterms=<file with list of terms to cache> -S|--configfile <FILENAME> Define the storage configuration file as <FILENAME> <FILENAME> is a file containing the configuration string -m|--module <MOD> Load components from module <MOD> -M|--moduledir <DIR> Search modules to load first in <DIR> -R|--resourcedir <DIR> Search resource files for analyzer first in <DIR> -r|--rpc <ADDR> Execute the command on the RPC server specified by <ADDR> -g|--segmenter <NAME> Use the document segmenter with name <NAME> -C|--contenttype <CT> forced definition of the document class of all documents inserted. -x|--extension <EXT> Grab only the files with extension <EXT> (default all files) -t|--threads <N> Set <N> as number of inserter threads to use -c|--commit <N> Set <N> as number of documents inserted per transaction (default 1000) -f|--fetch <N> Set <N> as number of files fetched in each inserter iteration Default is the value of option '--commit' (one document/file) -L|--logerror <FILE> Write the last error occurred to <FILE> in case of an exception -V|--verbose verbose output -T|--trace <CONFIG> Print method call traces configured with <CONFIG> Example: -T "log=dump;file=stdout"
- strusDeleteDocument
Deletes a list of documents referenced by document identifiers. (implemented in the project strusUtilities)usage: strusDeleteDocument [options] <docid> <docid> = docid of the document to delete description: Deletes a document in the storage. options: -h|--help Print this usage and do nothing else -v|--version Print the program version and do nothing else --license Print 3rd party licences requiring reference -m|--module <MOD> Load components from module <MOD> -M|--moduledir <DIR> Search modules to load first in <DIR> -r|--rpc <ADDR> Execute the command on the RPC server specified by <ADDR> -s|--storage <CONFIG> Define the storage configuration string as <CONFIG> <CONFIG> is a semicolon ';' separated list of assignments: path=<LevelDB storage path> create=<yes/no, yes=do create if database does not exist yet> cache=<size of LRU cache for LevelDB> compression=<yes/no> max_open_files=<maximum number of open files for LevelDB> write_buffer_size=<Amount of data to build up in memory per file> block_size=<approximate size of user data packed per block> cachedterms=<file with list of terms to cache> -T|--trace <CONFIG> Print method call traces configured with <CONFIG> Example: -T "log=dump;file=stdout"
- strusUpdateStorage
This program allows to update attributes, meta data and user access rights in a storage from a batch file. (implemented in the project strusUtilities)usage: strusUpdateStorage [options] <updatefile> <updatefile> = file with the batch of updates ('-' for stdin) description: Executes a batch of updates of attributes, meta data or user rights in a storage. options: -h|--help Print this usage and do nothing else -v|--version Print the program version and do nothing else --license Print 3rd party licences requiring reference -m|--module <MOD> Load components from module <MOD> -M|--moduledir <DIR> Search modules to load first in <DIR> -r|--rpc <ADDR> Execute the command on the RPC server specified by <ADDR> -s|--storage <CONFIG> Define the storage configuration string as <CONFIG> <CONFIG> is a semicolon ';' separated list of assignments: path=<LevelDB storage path> create=<yes/no, yes=do create if database does not exist yet> cache=<size of LRU cache for LevelDB> compression=<yes/no> max_open_files=<maximum number of open files for LevelDB> write_buffer_size=<Amount of data to build up in memory per file> block_size=<approximate size of user data packed per block> cachedterms=<file with list of terms to cache> -a|--attribute <NAME> The update batch is a list of attributes assignments The name of the updated attribute is <NAME>. -m|--metadata <NAME> The update batch is a list of meta data assignments. The name of the updated meta data element is <NAME>. -u|--useraccess The update batch is a list of user right assignments. -x|--mapattribute <ATTR> The update document is selected by the attribute <ATTR> as key, instead of the document id or document number. -c|--commit <N> Set <N> as number of updates per transaction (default 10000) If <N> is set to 0 then only one commit is done at the end -L|--logerror <FILE> Write the last error occurred to <FILE> in case of an exception -T|--trace <CONFIG> Print method call traces configured with <CONFIG> Example: -T "log=dump;file=stdout"
- strusCheckStorage
This program checks a strus storage for corrupt data. (implemented in the project strusUtilities)usage: strusCheckStorage [options] description: Checks a storage for corrupt data. options: -h|--help Print this usage and do nothing else -v|--version Print the program version and do nothing else --license Print 3rd party licences requiring reference -s|--storage <CONFIG> Define the storage configuration string as <CONFIG> <CONFIG> is a semicolon ';' separated list of assignments: path=<LevelDB storage path> create=<yes/no, yes=do create if database does not exist yet> cache=<size of LRU cache for LevelDB> compression=<yes/no> max_open_files=<maximum number of open files for LevelDB> write_buffer_size=<Amount of data to build up in memory per file> block_size=<approximate size of user data packed per block> cachedterms=<file with list of terms to cache> -e|--exists Checks if the database of the storage exists and return 'yes'/'no' -m|--module <MOD> Load components from module <MOD> -M|--moduledir <DIR> Search modules to load first in <DIR> -r|--rpc <ADDR> Execute the commands on the RPC server specified by <ADDR> -T|--trace <CONFIG> Print method call traces configured with <CONFIG> Example: -T "log=dump;file=stdout"
- strusCheckInsert
Processes the documents the same way as strusInsert. But instead of inserting the documents, it checks if the document representation in the storage is complete compared with the checked documents. (implemented in the project strusUtilities)usage: strusCheckInsert [options] <program> <docpath> <program> = path of analyzer program or analyzer map program <docpath> = path of document or directory to check description: Checks if a storage contains all data of a document set. options: -h|--help Print this usage and do nothing else -v|--version Print the program version and do nothing else --license Print 3rd party licences requiring reference -s|--storage <CONFIG> Define the storage configuration string as <CONFIG> <CONFIG> is a semicolon ';' separated list of assignments: path=<LevelDB storage path> create=<yes/no, yes=do create if database does not exist yet> cache=<size of LRU cache for LevelDB> compression=<yes/no> max_open_files=<maximum number of open files for LevelDB> write_buffer_size=<Amount of data to build up in memory per file> block_size=<approximate size of user data packed per block> cachedterms=<file with list of terms to cache> -S|--configfile <FILENAME> Define the storage configuration file as <FILENAME> <FILENAME> is a file containing the configuration string -m|--module <MOD> Load components from module <MOD> -M|--moduledir <DIR> Search modules to load first in <DIR> -R|--resourcedir <DIR> Search resource files for analyzer first in <DIR> -r|--rpc <ADDR> Execute the command on the RPC server specified by <ADDR> -g|--segmenter <NAME> Use the document segmenter with name <NAME> -C|--contenttype <CT> forced definition of the document class of all documents checked. -x|--extension <EXT> Grab only the files with extension <EXT> (default all files) -t|--threads <N> Set <N> as number of inserter threads to use -l|--logfile <FILE> Set <FILE> as output file (default stdout) -n|--notify <N> Set <N> as notification interval (number of documents) -T|--trace <CONFIG> Print method call traces configured with <CONFIG> Example: -T "log=dump;file=stdout"
- strusQuery
Evaluate a query per command line. (implemented in the project strusUtilities)usage: strusQuery [options] <anprg> <qeprg> <query> <anprg> = path of query analyzer program <qeprg> = path of query eval program <query> = query string file or '-' for stdin if option -F is specified) description: Executes a query or a list of queries from a file. options: -h|--help Print this usage and do nothing else -v|--version Print the program version and do nothing else --license Print 3rd party licences requiring reference -s|--storage <CONFIG> Define the storage configuration string as <CONFIG> <CONFIG> is a semicolon ';' separated list of assignments: path=<LevelDB storage path> create=<yes/no, yes=do create if database does not exist yet> cache=<size of LRU cache for LevelDB> compression=<yes/no> max_open_files=<maximum number of open files for LevelDB> write_buffer_size=<Amount of data to build up in memory per file> block_size=<approximate size of user data packed per block> cachedterms=<file with list of terms to cache> -S|--configfile <FILENAME> Define the storage configuration file as <FILENAME> <FILENAME> is a file containing the configuration string -u|--user <NAME> Use user name <NAME> for the query -N|--nofranks <N> Return maximum <N> ranks as query result -I|--firstrank <N> Return the result starting with rank <N> as first rank -Q|--quiet No output of results -D|--time Do print duration of pure query evaluation -G|--debug Switch debug info of weighting and summarization on -F|--fileinput Interpret query argument as a file name containing the input -m|--module <MOD> Load components from module <MOD> -M|--moduledir <DIR> Search modules to load first in <DIR> -R|--resourcedir <DIR> Search resource files for analyzer first in <DIR> -r|--rpc <ADDR> Execute the command on the RPC server specified by <ADDR> -T|--trace <CONFIG> Print method call traces configured with <CONFIG> Example: -T "log=dump;file=stdout" -V|--verbose Verbose mode: Print some info like query analysis
- strusAlterMetaData
Alter the table structure for document metadata of a storage. (implemented in the project strusUtilities)usage: strusAlterMetaData [options] <config> <cmds> <config> : configuration string of the storage semicolon';' separated list of assignments: path=<LevelDB storage path> create=<yes/no, yes=do create if database does not exist yet> cache=<size of LRU cache for LevelDB> compression=<yes/no> max_open_files=<maximum number of open files for LevelDB> write_buffer_size=<Amount of data to build up in memory per file> block_size=<approximate size of user data packed per block> cachedterms=<file with list of terms to cache> <cmds> : semicolon separated list of commands: alter <name> <newname> <newtype> <name> :name of the element to change <newname> :new name of the element <newtype> :new type (*) of the element add <name> <type> <name> :name of the element to add <type> :type (*) of the element to add delete <name> <name> :name of the element to remove rename <name> <newname> <name> :name of the element to rename <newname> :new name of the element clear <name> <name> :name of the element to clear all values (*) :type of an element is one of the following: INT8 :one byte signed integer value UINT8 :one byte unsigned integer value INT16 :two bytes signed integer value UINT16 :two bytes unsigned integer value INT32 :four bytes signed integer value UINT32 :four bytes unsigned integer value FLOAT16 :two bytes floating point value (IEEE 754 small) FLOAT32 :four bytes floating point value (IEEE 754 single) description: Executes a list of alter the meta data table commands. options: -h|--help Print this usage and do nothing else -v|--version Print the program version and do nothing else --license Print 3rd party licences requiring reference -m|--module <MOD> Load components from module <MOD> -M|--moduledir <DIR> Search modules to load first in <DIR> -T|--trace <CONFIG> Print method call traces configured with <CONFIG> Example: -T "log=dump;file=stdout"
- strusGenerateKeyMap
Dumps a list of terms as result of document anaylsis of a file or directory. The dump can be loaded by the storage on startup to create a map of frequently used terms. (implemented in the project strusUtilities)usage: strusGenerateKeyMap [options] <program> <docpath> <program> = path of analyzer program or analyzer map program <docpath> = path of document or directory to insert description: Dumps a list of terms as result of document anaylsis of a file or directory. The dump can be loaded by the storage on startup to create a map of frequently used terms. options: -h|--help Print this usage and do nothing else -v|--version Print the program version and do nothing else --license Print 3rd party licences requiring reference -m|--module <MOD> Load components from module <MOD> -M|--moduledir <DIR> Search modules to load first in <DIR> -R|--resourcedir <DIR> Search resource files for analyzer first in <DIR> -s|--segmenter <NAME> Use the document segmenter with name <NAME> -C|--contenttype <CT> forced definition of the document class of all documents processed. -x|--extension <EXT> Grab only the files with extension <EXT> (default all files) -t|--threads <N> Set <N> as number of threads to use -u|--unit <N> Set <N> as number of files processed per iteration (default 1000) -n|--results <N> Set <N> as number of elements in the key map generated -T|--trace <CONFIG> Print method call traces configured with <CONFIG> Example: -T "log=dump;file=stdout"
- strusDumpStatistics
Dumps the statisics that would be populated to in case of a distributed index to stout. (implemented in the project strusUtilities)usage: strusDumpStatistics [options] <filename> description: Dumps the statisics that would be populated in case of a distributed index to a file. options: -h|--help Print this usage and do nothing else -v|--version Print the program version and do nothing else --license Print 3rd party licences requiring reference -s|--storage <CONFIG> Define the storage configuration string as <CONFIG> <CONFIG> is a semicolon ';' separated list of assignments: path=<LevelDB storage path> create=<yes/no, yes=do create if database does not exist yet> cache=<size of LRU cache for LevelDB> compression=<yes/no> max_open_files=<maximum number of open files for LevelDB> write_buffer_size=<Amount of data to build up in memory per file> block_size=<approximate size of user data packed per block> cachedterms=<file with list of terms to cache> -m|--module <MOD> Load components from module <MOD> -M|--moduledir <DIR> Search modules to load first in <DIR> -r|--rpc <ADDR> Execute the command on the RPC server specified by <ADDR> -T|--trace <CONFIG> Print method call traces configured with <CONFIG> Example: -T "log=dump;file=stdout"
- strusDumpStorage
Dumps the statisics that would be populated to in case of a distributed index to stout. (implemented in the project strusUtilities)usage: strusDumpStorage [options] description: Dumps a strus storage to stout options: -h|--help Print this usage and do nothing else -v|--version Print the program version and do nothing else --license Print 3rd party licences requiring reference -s|--storage <CONFIG> Define the storage configuration string as <CONFIG> <CONFIG> is a semicolon ';' separated list of assignments: path=<LevelDB storage path> create=<yes/no, yes=do create if database does not exist yet> cache=<size of LRU cache for LevelDB> compression=<yes/no> max_open_files=<maximum number of open files for LevelDB> write_buffer_size=<Amount of data to build up in memory per file> block_size=<approximate size of user data packed per block> cachedterms=<file with list of terms to cache> -m|--module <MOD> Load components from module <MOD> -M|--moduledir <DIR> Search modules to load first in <DIR> -r|--rpc <ADDR> Execute the command on the RPC server specified by <ADDR> -B|--blocksizes Dump only block sizes -P|--prefix <KEY> Dump only the blocks of a certain type with prefix <KEY> -T|--trace <CONFIG> Print method call traces configured with <CONFIG> Example: -T "log=dump;file=stdout"
- strusResizeBlocks
Resize the blocks for a storage based on leveldb (leveldb only!). (implemented in the project strus)strusResizeBlocks [options] <config> <blocktype> <newsize> options: -h|--help Print this usage and do nothing else -v|--version Print the program version and do nothing else -c|--commit <N> Set <N> as number of documents inserted per transaction (default 1000) -D|--docno <START>:<END> Process document number range <START> to <END> -T|--termtype <TYPE> Set <TYPE> as term type to select for resize <config> : configuration string of the key/value store database <blocktype> : storage block type. One of the following: forwardindex:forward index block type <newsize> : new size of the blocks, unit depends on block type.
- strusCreateVectorStorage
Creates a storage for vectors.
(implemented in the project strusUtilities).usage: strusCreateVectorStorage [options] description: Creates a vector storage with all vectors inserted. options: -h|--help Print this usage and do nothing else -v|--version Print the program version and do nothing else --license Print 3rd party licences requiring reference -m|--module <MOD> Load components from module <MOD>. The module modstrus_storage_vector is implicitely defined -M|--moduledir <DIR> Search modules to load first in <DIR> -s|--config <CONFIG> Define the vector storage configuration string as <CONFIG> <CONFIG> is a semicolon ';' separated list of assignments: Select the vector storage type with the parameter 'storage'. -S|--configfile <FILENAME> Define the vector storage configuration file as <FILENAME> <FILENAME> is a file containing the configuration string -P|--portable Tell the loader that the vector values are stored in a portable way (hton) -T|--trace <CONFIG> Print method call traces configured with <CONFIG> Example: -T "log=dump;file=stdout" -f|--file <INFILE> Declare an input file with the vectors to process a <INFILE> Known formats are word2vec binary or text format. All files are added, if there are many input files specified. No input files lead to an empty storage.
- strusBuildVectorStorage
Build relations describing some structures of the vectors inserted into a storge.
(implemented in the project strusUtilities).usage: strusBuildVectorStorage [options] { <commands> } description: Executes a list of vector builder command. options: -h|--help Print this usage and do nothing else -v|--version Print the program version and do nothing else --license Print 3rd party licences requiring reference -m|--module <MOD> Load components from module <MOD>. The module modstrus_storage_vector is implicitely defined -M|--moduledir <DIR> Search modules to load first in <DIR> -s|--config <CONFIG> Define the vector space model configuration string as <CONFIG> <CONFIG> is a semicolon ';' separated list of assignments: Select the vector storage type with the parameter 'storage'. -S|--configfile <FILENAME> Define the vector space model configuration file as <FILENAME> <FILENAME> is a file containing the configuration string -T|--trace <CONFIG> Print method call traces configured with <CONFIG> Example: -T "log=dump;file=stdout" -t|--threads <N> Specify the maximum number of threads to use as <N> (default 16)
- strusInspectVectorStorage
Program to introspect a vector storage. Query for near vectors, for relations beween vectors, etc.
(implemented in the project strusUtilities).usage: strusInspectVectorStorage [options] <what...> <what> : what to inspect: "classnames" = Return all names of concept classes of the model. "featcon" <classname> { <feat> } = Take a single or list of feature numbers (with '%' prefix) or names as input. Return a sorted list of indices of concepts of the class <classname> assigned to it. "featvec" <feat> = Take a single feature number (with '%' prefix) or name as input. Return the vector assigned to it. "featname" { <feat> } = Take a single or list of feature numbers as input. Return the list of names assigned to it. "featidx" { <featname> } = Take a single or list of feature names as input. Return the list of indices assigned to it. "featsim" <feat1> <feat2> = Take two feature numbers (with '%' prefix) or names as input. Return the cosine similarity, a value between 0.0 and 1.0. "confeat" or "confeatidx" "confeatname" <classname> { <conceptno> } = Take a single or list of concept numbers of the class <classname> as input. Return a sorted list of features assigned to it. "confeatidx" prints only the result feature indices. "confeatname" prints only the result feature names. "confeat" prints both indices and names. "nbfeat" or "nbfeatidx" "nbfeatname" <classname> { <feat> } = Take a single or list of feature numbers (with '%' prefix) or names as input. Return a list of features reachable over any shared concept of the class <classname>. "nbfeat" prints both indices and names. "nbfeatname" prints only the result feature names. "nbfeat" prints both indices and names. "opfeat" or "opfeatname" { <expr> } = Take an arithmetic expression of feature numbers (with '%' prefix) or names as input. Return a list of features found. "opfeatw" or "opfeatwname" { <expr> } = same as "opfeat", resp. "opfeatname" but print also the result weights. "nofcon" = Get the number of concepts of the class <classname> defined. "noffeat" = Get the number of features defined. "config" = Get the configuration the vector storage. Select the vector storage type with the parameter 'storage'. "dump" [ <dbprefix> ] = Dump the contents of the VSM repository. The optional parameter <dbprefix> selects a specific block type. description: Inspects some data defined in a vector space model build. options: -h|--help Print this usage and do nothing else -v|--version Print the program version and do nothing else --license Print 3rd party licences requiring reference -m|--module <MOD> Load components from module <MOD>. The module modstrus_storage_vector is implicitely defined -M|--moduledir <DIR> Search modules to load first in <DIR> -s|--config <CONFIG> Define the vector space model configuration string as <CONFIG> <CONFIG> is a semicolon ';' separated list of assignments: -S|--configfile <FILENAME> Define the vector space model configuration file as <FILENAME> <FILENAME> is a file containing the configuration string -T|--trace <CONFIG> Print method call traces configured with <CONFIG> Example: -T "log=dump;file=stdout" -D|--time Do measure duration of operation (only for search) -t|--threads <N> Set <N> as number of threads to use (only for search) Default is no multithreading (N=0) -x|--realmeasure Calculate real values of similarities for search and compare of methods 'opfeat','opfeatname','opfeatw' and 'opfeatwname'. -N|--nofranks <N> Limit the number of results to for searches to <N> (default 20)
- strusRpcServer
Start a server processing requests from strus RPC clients
(implemented in the project strusRpc).strusRpcServer [options] options: -h|--help Print this usage and do nothing else -v|--version Print the program version and do nothing else -m|--module <MOD> Load components from module <MOD> -M|--moduledir <DIR> Search modules to load first in <DIR> -R|--resourcedir <DIR> Define a resource path <DIR> for the analyzer -p|--port <PORT> Define the port to listen for requests as <PORT> (default 7181) -s|--storage <CONFIG> Define configuration <CONFIG> of storage hosted by this server -S|--configfile <CFGFILE> Define storage configuration as content of file <CFGFILE> -x|--vecstorage <CONFIG> Define configuration <CONFIG> of the vector storage hosted by this server -c|--create <CONFIG> Implicitely create storage with <CONFIG> if it does not exist yet -l|--logfile <FILE> Write logs to file <FILE> -T|--trace <CONFIG> Print method call traces configured with <CONFIG>
- strusHelp
Program to print descriptions of functions available to console.
(implemented in the project strusUtilities).usage: strusHelp [options] <what> <name> <what> = specifies what type of item to retrieve (default all): tokenizer : Get tokenizer function description normalizer : Get normalizer function description aggregator : Get aggregator function description join : Get iterator join operator description weighting : Get weighting function description summarizer : Get summarizer function description <item> = name of the item to retrieve (default all) description: Get the description of a function. options: -h|--help Print this usage and do nothing else -v|--version Print the program version and do nothing else --license Print 3rd party licences requiring reference -m|--module <MOD> Load components from module <MOD> -M|--moduledir <DIR> Search modules to load first in <DIR> -R|--resourcedir <DIR> Search resource files for analyzer first in <DIR> -r|--rpc <ADDR> Execute the command on the RPC server specified by <ADDR> -T|--trace <CONFIG> Print method call traces configured with <CONFIG> Example: -T "log=dump;file=stdout"
- strusModuleInfo
Program to print some information about a module in the module header (version, identifiers, etc.).
(implemented in the project strusModule).strusModuleInfo { <modulepath> } options: -h|--help Print this usage and do nothing else -v|--version Print the program version and do nothing else <modulepath> : path of module to load.
- strusPageWeight
Calculate the weight of a page derived from its linkage to other pages. The linkage info is fed in a proprietary text format as input. If strusVector has been built with WITH_PAGERANK="YES" then the value calculated will be the pagerank value (invented by Larry Page and patented in the USA as https://www.google.com/patents/US6285999). If strusVector has been built without page rank support or taken from a standard strus package then the calculated value will be derived from the number of links pointing to a document (non transitive).
(implemented in the project strusVector).usage: strusPageWeight [options] <inputfile> description: Calculate the weight of a page derived from the linkage of documents with others, with help of the pagerank algorithm. <inputfile> :text file to process, lines with the following syntax: DECLARATION = "*" ITEMID "=" ["->"] { ITEMID } ";" ITEMID : document identifier (unicode alpha characters without space) Each declaration describes the links of a document (left side) to other documents (right side). Redirects are marked with an arrow (->). options : -h : print this usage -V : verbose output, print all declarations to stdout. -g : logarithmic scale page rank calculation. -n <NORM> : normalize result to an integer between 0 and <NORM>. -r <PATH> : specify file <PATH> to write redirect definitions to. -t <PATH> : specify file <PATH> to write the tokens to. -i <ITER> : specify number of iterations to run as <ITER>. <inputfile> = input file path or '-' for stdin file with lines of the for "*" SOURCEID = [->] {<TARGETID>} ";"
- strusUpdateStorageCalcStatistics
Program to calculate a formula for each document in the storages and update a metadata field with the result.
(implemented in the project strusUtilities).usage: strusUpdateStorageCalcStatistics [options] <metadata> <feattype> <formula> <sumnorm> <metadata> = meta data element to store the result <feattype> = search index feature type to calculate the result with <formula> = meta formula to calculate one summand of the result for one document with <sumnorm> = formula to normalize the sum of summands for each document (identity is default) description: Calculate a formula for each document in the storages and update a metadata field with the result. options: -h|--help Print this usage and do nothing else -v|--version Print the program version and do nothing else --license Print 3rd party licences requiring reference -m|--module <MOD> Load components from module <MOD> -M|--moduledir <DIR> Search modules to load first in <DIR> -r|--rpc <ADDR> Execute the command on the RPC server specified by <ADDR> -s|--storage <CONFIG> Define a storage configuration string as <CONFIG> <CONFIG> is a semicolon ';' separated list of assignments: path=<LevelDB storage path> create=<yes/no, yes=do create if database does not exist yet> cache=<size of LRU cache for LevelDB> compression=<yes/no> max_open_files=<maximum number of open files for LevelDB> write_buffer_size=<Amount of data to build up in memory per file> block_size=<approximate size of user data packed per block> cachedterms=<file with list of terms to cache> -c|--commit <N> Set <N> as number of updates per transaction (default 10000) If <N> is set to 0 then only one commit is done at the end -T|--trace <CONFIG> Print method call traces configured with <CONFIG> Example: -T "log=dump;file=stdout"