moderated search function


 

I agree that precision is more important than sensitivity. However I've noticed that diacritical marks are discriminators, so that (say) "fevrier" will not return results including only "février". This is not usual - most search engines ignore accents - and will lead to some gritting of teeth in areas that use characters defined by more than 7-bit ASCII…
--
______________________________________
 Ian Gillis C Eng MIET


 

On Mon, Feb 12, 2018 at 06:30 pm, Walter Underwood wrote:
Because the search includes date sorting, precision becomes really important. When sorting by relevance, it is OK to have lots of poor matches as long as they are on page 10 of the results. But when sorted by date, all those bad matches are mixed in, with no way to ignore them.
Exactly!
 
--
J

 

Messages are the sole opinion of the author, especially the fishy ones.

I wish I could shut up, but I can't, and I won't. - Desmond Tutu


 

I do agree that it's odd not to return the partial matches that originally are claimed in this thread are missing. If I search on "cat," I would also expect to get back "cats." I know that doesn't happen now and it does seem unexpected. But that's for exact matches. I would not want to get back instances of every word related to cat - just words that contain the exact match.

--
J

 

Messages are the sole opinion of the author, especially the fishy ones.

I wish I could shut up, but I can't, and I won't. - Desmond Tutu


 

Whatever happens, *please* let's not go back to the days of stemming, if "stemming" means returning all words having the same root. That's what Search used to do here and it was unacceptable, useless, and a huge PITA!
--
J

 

Messages are the sole opinion of the author, especially the fishy ones.

I wish I could shut up, but I can't, and I won't. - Desmond Tutu


Walter Underwood
 

Linguistically, there are derivational and inflectional stemmers. Derivational stemmers are more aggressive and can reduce “conduction” to “conduct” and “catty” to “cat". Some even remove prefixes, so “superconductor” becomes “conduct”. They are allowed to change between parts of speech, like noun to verb. Inflectional stemmers do not change the part of speech. They might remove endings (“cars” to “car”) or make internal changes (“women” to “woman”). Some handle irregular plurals, like “people” to “person”.

In Solr, Lucene, and ElasticSearch, the KStem analyzer is the least aggressive.

I’m not a fan of stemming proper nouns like “Holbrooks”. “Steve Jobs” and “Bill Gates” are not a job or a gate.

Because the search includes date sorting, precision becomes really important. When sorting by relevance, it is OK to have lots of poor matches as long as they are on page 10 of the results. But when sorted by date, all those bad matches are mixed in, with no way to ignore them.

Wildcards, like asterisks, usually put a huge load on the search engine. First it must scan all the terms to get matches, then search with the hundreds or thousands of terms that match.

wunder
Walter Underwood
wunder@...
http://observer.wunderwood.org/  (my blog)

On Feb 12, 2018, at 5:14 PM, Shal Farley <shals2nd@...> wrote:

J,

I agree with Walter. I'd rather have it this way than the other way.

Ideal, I think, would be the third way: partial but not stemming.

I still think it should be possible to use an asterisk to indicate "contains this string." No?

No.

I'd much rather have partial matches without needing a "syntax" element like an asterisk to mark it. That is, I'd like to have a search for "driver" match "drivers", which would be a match on part of the word, but not "driving", which would be a match based on word stems.

Partial matching would solve Nancy's case of Holbrook versus Holbrooks, and it would allow one to do "psuedo-stemming". For example, searching for "driv" if one wanted to match both "drivers" and "driving". Note the "psuedo", as that would also match "drivel", which a true stemming search would not.

Shal



 

J,

I agree with Walter. I'd rather have it this way than the other way.

Ideal, I think, would be the third way: partial but not stemming.

I still think it should be possible to use an asterisk to indicate "contains this string." No?

No.

I'd much rather have partial matches without needing a "syntax" element like an asterisk to mark it. That is, I'd like to have a search for "driver" match "drivers", which would be a match on part of the word, but not "driving", which would be a match based on word stems.

Partial matching would solve Nancy's case of Holbrook versus Holbrooks, and it would allow one to do "psuedo-stemming". For example, searching for "driv" if one wanted to match both "drivers" and "driving". Note the "psuedo", as that would also match "drivel", which a true stemming search would not.

Shal


 

I agree with Walter. I'd rather have it this way than the other way. I still think it should be possible to use an asterisk to indicate "contains this string." No?

On Mon, Feb 12, 2018 at 4:07 PM, Walter Underwood <wunder@...> wrote:
I’ve been working in search for over twenty years and you really, really cannot get 100% specific and 100% sensitive. In search evaluation, those are called “precision” and “recall”. You are lucky to get to 50% on either one. I got the Netflix search working better than that, but that was in a very specific domain, just movies and TV.

The most common solution is to try and get the first few result to be very specific (high precision), matching the customer’s query. Weaker matches are farther down and on later pages of results.

I’ll go look over the previous discussion to see if there is anything I can add.

wunder
Walter Underwood
wunder@...
http://observer.wunderwood.org/  (my blog)

On Feb 12, 2018, at 4:02 PM, J_Catlady <j.olivia.catlady@...> wrote:

There were some long debates about the search function in the past. This is one side of the issue. Before the newest version, the search function would return too much (it was over-sensitive) - for example, if you searched on "diagnosis," it would return diagnosis, diagnostics, diagnosed, etc. etc. etc. It was a big argument to get it to be more specific. I wanted it to be both 100% sensitive and 100% specific but it seems you can't have both given the search software that Mark is using. I agree that it would be better if you could also do the kind of partial search you are asking for here, perhaps by using an asterisk at the end of the search term. But if I had to pick one way (sensitivity or specificity) I'd pick the latter, i.e., the way it is now. It was useless for me, and some others, to have too much returned every time.
--
J
 

Messages are the sole opinion of the author, especially the fishy ones.

I wish I could shut up, but I can't, and I won't. - Desmond Tutu




--
J

 

Messages are the sole opinion of the author, especially the fishy ones.

I wish I could shut up, but I can't, and I won't. - Desmond Tutu


Walter Underwood
 

I’ve been working in search for over twenty years and you really, really cannot get 100% specific and 100% sensitive. In search evaluation, those are called “precision” and “recall”. You are lucky to get to 50% on either one. I got the Netflix search working better than that, but that was in a very specific domain, just movies and TV.

The most common solution is to try and get the first few result to be very specific (high precision), matching the customer’s query. Weaker matches are farther down and on later pages of results.

I’ll go look over the previous discussion to see if there is anything I can add.

wunder
Walter Underwood
wunder@...
http://observer.wunderwood.org/  (my blog)

On Feb 12, 2018, at 4:02 PM, J_Catlady <j.olivia.catlady@...> wrote:

There were some long debates about the search function in the past. This is one side of the issue. Before the newest version, the search function would return too much (it was over-sensitive) - for example, if you searched on "diagnosis," it would return diagnosis, diagnostics, diagnosed, etc. etc. etc. It was a big argument to get it to be more specific. I wanted it to be both 100% sensitive and 100% specific but it seems you can't have both given the search software that Mark is using. I agree that it would be better if you could also do the kind of partial search you are asking for here, perhaps by using an asterisk at the end of the search term. But if I had to pick one way (sensitivity or specificity) I'd pick the latter, i.e., the way it is now. It was useless for me, and some others, to have too much returned every time.
--
J
 

Messages are the sole opinion of the author, especially the fishy ones.

I wish I could shut up, but I can't, and I won't. - Desmond Tutu



 

There were some long debates about the search function in the past. This is one side of the issue. Before the newest version, the search function would return too much (it was over-sensitive) - for example, if you searched on "diagnosis," it would return diagnosis, diagnostics, diagnosed, etc. etc. etc. It was a big argument to get it to be more specific. I wanted it to be both 100% sensitive and 100% specific but it seems you can't have both given the search software that Mark is using. I agree that it would be better if you could also do the kind of partial search you are asking for here, perhaps by using an asterisk at the end of the search term. But if I had to pick one way (sensitivity or specificity) I'd pick the latter, i.e., the way it is now. It was useless for me, and some others, to have too much returned every time.
--
J

 

Messages are the sole opinion of the author, especially the fishy ones.

I wish I could shut up, but I can't, and I won't. - Desmond Tutu


Nancy P
 

I'd posted this earlier in the group managers forum and it was suggested that I try it here as well as it was apparently a subject for discussion a few months ago - any chance the search function will be expanded to include partial words?

" We've just started transferring our groups from yahoo and have been very pleased so far with both the results and Mark's support.  However, I discovered something a bit puzzling today.  I was searching for a person whose last name was Holbrooks and, when I enter that entire name, all his records came up.  However, if I just enter Holbrook (without the "s" at the end), it doesn't find any of his records.  That seems very peculiar to me!   What am I missing?  Or perhaps I was spoiled in yahoo where I could enter a truncated version of a name and it would find all entries containing that sequence of letters?"