I basically modified the examples/bin/classify-20newsgroups.sh in Mahout, adding
export MASTER=spark://127.0.0.1:7077 export SPARK_HOME=... spark install directory ... WORK_DIR=~/usr/data/mahoutspark alg=naivebayes
The WORK_DIR will have to correspond with the mahout directories given in aether.prop. The alg=naivebayes and cnaivebayes corresponds to mahoutalgorithm=bayes and cbayes, correspondingly. The downloading and unpacking of the newsgroups can be replaced with for instance the top level Dewey classes having directories philosophy and psychology, religion, social sciences etc, each containing sample books or documents used to make the classification model. For more about the settings in aether.prop, see here.