moderated Re: search function
Linguistically, there are derivational and inflectional stemmers. Derivational stemmers are more aggressive and can reduce “conduction” to “conduct” and “catty” to “cat". Some even remove prefixes, so “superconductor” becomes “conduct”. They are allowed to change between parts of speech, like noun to verb. Inflectional stemmers do not change the part of speech. They might remove endings (“cars” to “car”) or make internal changes (“women” to “woman”). Some handle irregular plurals, like “people” to “person”.
toggle quoted messageShow quoted text
In Solr, Lucene, and ElasticSearch, the KStem analyzer is the least aggressive. I’m not a fan of stemming proper nouns like “Holbrooks”. “Steve Jobs” and “Bill Gates” are not a job or a gate. Because the search includes date sorting, precision becomes really important. When sorting by relevance, it is OK to have lots of poor matches as long as they are on page 10 of the results. But when sorted by date, all those bad matches are mixed in, with no way to ignore them. Wildcards, like asterisks, usually put a huge load on the search engine. First it must scan all the terms to get matches, then search with the hundreds or thousands of terms that match.
|
|