Page 1 of 1

"Exact search" does miss entries

Posted: Mon Jan 23, 2012 10:51 pm
by Thomas Lohrum
Example:
Title: Delphi Blogs

Bug:
You can search and find by searching for "delphi blogs" or "blogs delphi" when exact search is off. When exact search is on you will find "delphi blogs", but you won't find "blogs delphi". To be more precisely: You can find "Blogs", but as soon as you add a space " ", the note disappears from the search list.

Re: "Exact search" does miss entries

Posted: Tue Jan 24, 2012 6:07 am
by CintaNotes Developer
This is by design actually.
Exact search implies quotes around your search phrase

Re: "Exact search" does miss entries

Posted: Tue Jan 24, 2012 6:40 am
by Noddy330
In which case, maybe "Search also inside of words" isn't a suitable replacement for "Exact Search".
Nod

Re: "Exact search" does miss entries

Posted: Tue Jan 24, 2012 7:17 am
by CintaNotes Developer
Yes, I have actually forgot about this peculiarity of exact search.
One way to solve this problem would be to require explicit quotes.
But unfortunately it is not quite as trivial to implement as simple renaming.

Re: "Exact search" does miss entries

Posted: Tue Jan 24, 2012 10:50 am
by Thomas Lohrum
Hi Alex,

to avoid a discussion on how the search actually is implemented i will try to quote what i need for searching notes.

  • search the search term at the beginning, the middle and the end of a word
  • allow combining search terms

Example

Title "Delphi Blogs"

The following search terms all should find the above note:
delphi blogs
blogs delphi
logs delp
log elph

This is what i actually want the search to find by default. The exact search feature can be used right now to allow search inside words. It can not be used to combine search terms though. That is, it will find "Blogs", but it won't find "Blogs Delphi". This is not satisfying, because the design is not intuitive. In the past i actually had problems finding notes. I never understood, why sometimes i was able to do a full search and find notes, but the other time i could not.

Thomas

PS: Using a search phrase using double quotes can be helpful, though personally i use it less than 0.1%.

Re: "Exact search" does miss entries

Posted: Tue Jan 24, 2012 5:52 pm
by Thomas Lohrum
CintaNotes Developer wrote:Yes, I have actually forgot about this peculiarity of exact search.
One way to solve this problem would be to require explicit quotes.
But unfortunately it is not quite as trivial to implement as simple renaming.
Shouldn't it be the other way around? Usually an exact search is implicit by entering the search phrase surrounding it with double quotes. Actually there is no need to have an option to activate exact search explicitly.

Re: "Exact search" does miss entries

Posted: Thu Jan 26, 2012 10:24 am
by CintaNotes Developer
Yes, ideally it should be like you said. But it is not very easy to implement,
and also "exact" search will get slower because of this.

I suggest the following way how search should work in future:
1) whether full-text search facility is used or not is determined by the 'search inside words' checkbox;
2) otherwise the search functions the same way - find words separated by spaces in any part of the text;
3) if you want to find the exact phrase you put it in quotes.

Would this be fine for you?

Re: "Exact search" does miss entries

Posted: Thu Jan 26, 2012 4:30 pm
by Thomas Lohrum
Hi Alex,

CintaNotes Developer wrote:Yes, ideally it should be like you said. But it is not very easy to implement,
Please explain, why it is hard to implement. If i can understand the problem, i might be able to help to come up with a solution :)
CintaNotes Developer wrote:and also "exact" search will get slower because of this.
Well, CN is fast, really fast. Part of it might be SQLite, of course. I have created 800 records in the last ten months since i use CN and my machine uses a SSD. Speed is no concern to me. Speed sure will be different on different hardware and database size. My guess is, that it is still fast on those machines.

CintaNotes Developer wrote:I suggest the following way how search should work in future:
1) whether full-text search facility is used or not is determined by the 'search inside words' checkbox;
I am confused about the term "full-text search facility". You mentioned a full-text index, which allows to search words at the beginning of a word only. Whatever it is called, i want to be able to do a search for words in any position of the note text. If this is slower, i'll accept. More important it is to me, to actually find the desired note!
CintaNotes Developer wrote:2) otherwise the search functions the same way - find words separated by spaces in any part of the text;
I don't understand. What means "same way"? Searching for words separated by spaces in any part of the text is what i am looking for. However, i want this feature to always work.
CintaNotes Developer wrote:3) if you want to find the exact phrase you put it in quotes.
Yes.

CintaNotes Developer wrote:Would this be fine for you?
If we actually mean the same, as mentioned in my comments above, then Yes. Please also consider Chris' thread on searching tags, as this has (big) influence on the search result also.

Thomas

Re: "Exact search" does miss entries

Posted: Thu Jan 26, 2012 5:06 pm
by CintaNotes Developer
Thomas Lohrum wrote:Please explain, why it is hard to implement. If i can understand the problem, i might be able to help to come up with a solution :)

Well, it will require some basic understanding of SQL, but you asked for it;)

Now CintaNotes is able to do search using a single SELECT database query using the LIKE operator to find the desired text.
However, this operator just does exact-phrase search. In order to do a more "loose" search, CN will have to do some preparation.

Say we are searching for [word1 word3] and want the note with text 'word0 word1 word2 word3 word4' to be displayed. CN will do the following:
1) split the search query into separate words: "word1", "word3"
2) construct a complex query with multiple LIKE operators: SELECT * FROM Notes WHERE text LIKE '%word1%' and text LIKE '%word3%'.

Seems simple, but.. we have to watch out for those quotes! what if we are searching for ["word0 word1" word2]? so we have to modify step 1:
1) split the search query into separate search phrases, respecting quotations: "word0 word1", "word2".

But this is still not all! If we do an overhaul of the search system, we need to fix a problem that was there before: supporting the "not" and "or" operators. Now they have the form of "-" for NOT and "|" for OR, and are only supported when the full-text index is used (namely, when the "exact search" checkbox is off).

So, we need to modify the algorithm once again:
1) analyse the query string and build a tree corresponding to the search operators used, where operators become tree nodes, and words or quoted phrases are leafs. This will require building an expression parser. So the query like "word1 word2|word3 -word4" becomes:

Code: Select all

              __ AND _
           /      |     \
         word1    OR     NOT
               /    \      \
          word2  word3   word4


2) Transform the tree into SQL:

Code: Select all

   SELECT * FROM Notes
   WHERE (text LIKE '%word1%') AND ((text LIKE '%word2%') OR (text LIKE '%word3%')) AND NOT (text LIKE '%word4%')

3) Execute SQL and get the results.

Thomas Lohrum wrote:Well, CN is fast, really fast. Part of it might be SQLite, of course. I have created 800 records in the last ten months since i use CN and my machine uses a SSD. Speed is no concern to me. Speed sure will be different on different hardware and database size. My guess is, that it is still fast on those machines.

You see, since speed is CN's competitive advantage, I really hate to lose it;)

Thomas Lohrum wrote:I am confused about the term "full-text search facility". You mentioned a full-text index, which allows to search words at the beginning of a word only. Whatever it is called, i want to be able to do a search for words in any position of the note text. If this is slower, i'll accept. More important it is to me, to actually find the desired note!

Yes, I did mean full-text index, as implemented by the FTS3 SQLite extension which CN now uses.
What I meant: if you select "search inside words", full-text index won't be used and the search will be slower, but will bring you more results. This is the same as now. But unlike current behavior, the search will split the search string into separate words.

Thomas Lohrum wrote:I don't understand. What means "same way"? Searching for words separated by spaces in any part of the text is what i am looking for. However, i want this feature to always work.

Yes, "same way" meant "same as full-text index based search now does". And the latter does search for words separated into spaces, is just limited to search from word beginning because of the way full-text index is built.

Thomas Lohrum wrote:If we actually mean the same, as mentioned in my comments above, then Yes. Please also consider Chris' thread on searching tags, as this has (big) influence on the search result also.

Yes, I also agree with Chris' arguments.

Re: "Exact search" does miss entries

Posted: Thu Jan 26, 2012 9:05 pm
by Thomas Lohrum
Alex,

again - thanx a lot for sharing CN insides.

CintaNotes Developer wrote:Well, it will require some basic understanding of SQL, but you asked for it;)
I am a developer myself and have a good understanding of sql (databases). What you describe is exactly what i thought CN would do already. That's why i thought there would be a bug, returning wrong/inconsistent results. I have carefully read your description and imo Steps 1 to 3 including building an expression parser is definitely the road to go. :D

CintaNotes Developer wrote:You see, since speed is CN's competitive advantage, I really hate to lose it;)
Actually no speed will be lost - as compared to the present search algorithm. As you described the search engine will be a lot more powerful producing much better hits. Speed will be slower only, when "also search inside words" is activated. This is the same speed behaviour as with "exact search", but will produce results in a way the user expects.

CintaNotes Developer wrote:if you select "search inside words", full-text index won't be used and the search will be slower, but will bring you more results. This is the same as now. But unlike current behavior, the search will split the search string into separate words.
Wonderful. The notion "search inside words" will describe the behaviour correctly. As for speed, the notion "(slower)" could be added to make users aware of this, e.g. "also search inside words (slower, more results)".

CintaNotes Developer wrote:Yes, I also agree with Chris' arguments.
Searching tags would require a join, making the query somewhat more complex and again somewhat slower. Searching tags could be an option too, e.g. "also search inside tags (slower, more results)".

Even if slower, imo the two options "also search inside words" and "also search inside tags" should be implemented and should also be the default for a new CN installation. If people want even better speed or reduce the number of search results, the options can be turned off by demand.

Thomas

Re: "Exact search" does miss entries

Posted: Thu Jan 26, 2012 9:13 pm
by Thomas Lohrum
Thomas Lohrum wrote:If people want even better speed or reduce the number of search results, the options can be turned off by demand.
Shortcuts could be added to easily turn the two options on/off. The gui could optionally reflect the two options in the search bar.

Re: "Exact search" does miss entries

Posted: Fri Jan 27, 2012 7:42 am
by CintaNotes Developer
Thomas,

it's great to meet a fellow developer! ;) I'm lucky here since to the non-developers my explanation would
most likely be complete gibberish ;)

I agree with your thoughts, and have created an issue on the roadmap for all the suggested here changes:
Inconsistent search behavior

Since it is registered as a bug (and I reckon it really is), is will be fixed without it collecting lots of votes first.
But still, it will compete with other bugs;)

Re: "Exact search" does miss entries

Posted: Fri Jan 27, 2012 10:15 pm
by Thomas Lohrum
Hi Alex,

developers are sometimes confronted with the conflict whether something is a bug or work's as designed. Myself i usually accept a WAD as a bug, if it does not meet the users expectations. I am happy the issue is considered a bug and looking forward to use the modified search algorithm sooner or later. The sooner the better. :D I have voted for the issue right away.

Thomas

Re: "Exact search" does miss entries

Posted: Sat Jan 28, 2012 7:30 am
by CintaNotes Developer
Thanks for your help, Thomas.
I also think that the line separating a feature from a bug is sometimes rather blurry and comes down to expectations of
the majority of the users. The tricky part here is to make sure that this is really the majority ;)

But in this case, unification will make things much simpler and this is definitely a good thing.