hdfsconffs=hadoop uri (sets fs.default.name)
see also Hadoop . Where to log
logdir=...
Whether to enable downloading
downloader=true
Whether to use authenication
authenticate=true
The admin user has username/password admin/admin, and the user has user/user. The admin has access to the control panel and configuration, and will see more search results than the user. The admin user will get results regarding indexing time usage and error messages from the indexing. If no authentication is used, everything is open. Accessing from localhost will determine admin right in both cases.
mltcount=... mltmindf=... mltmintf...
For setting More Like This search document count, minimum term frequency and minimum document frequency. These settings may influence the time More Like Queries take.
There are two types, Mahout and OpenNLP.
Mahout relevant settings: For the older map reduce based:
classify=mahout mahoutbasepath=.../mahoutc/LANG mahoutalgorithm=bayes (or cbayes) mahaoutmodelpath=.../mahout/model mahoutlabelindexfilepath=.../mahout/labelindex mahoutdictionarypath=.../mahout/dataset-vectors/dictionary.file-0 mahoutdocumentfrequencypath=.../mahout/dataset-vectors/df-count/part-r-00000 mahaoutconffs=hadoop uri (sets fs.default.name in this specific case)
classify=mahoutspark mahoutbasepath=.../mahoutc/LANG mahoutalgorithm=bayes (or cbayes) mahaoutmodelpath=.../mahout/model mahoutdictionarypath=.../mahout/dataset-vectors/dictionary.file-0 mahoutdocumentfrequencypath=.../mahout/dataset-vectors/df-count/part-r-00000 mahaoutconffs=hadoop uri (sets fs.default.name in this specific case) mahoutsparkmaster=spark-master
The LANG will be replaced by the detected languages configured, so the files and directories will be required to exist. The mahoutbasepath, if existing, will just be prepended to the other paths, which then will just indicate relative paths. For more about Spark, see Spark .
Training could be based on the Bayes part of examples/bin/classify-20newsgroups.sh in the Mahout distribution, more about this here: Mahout .
Configured by
zookeeper=...
Should be used in a multinode environment (not yet mandatory).
Configured by
highlightmlt=true
If using Solr, also go to server/solr/MYCORE/conf in server/solr/MYCORE/conf apply patch with
patch -p0 < core.store.patch
Beware that this also stores the content, and increases disk space usage.