Search for NOT is broken?

Thomas Lohrum
Posts: 1324
Joined: Tue Mar 08, 2011 11:15 am

Search for NOT is broken?

Postby Thomas Lohrum » Sun Feb 12, 2012 11:20 am

I can search for "cintanotes", but i can not search for "-cintanotes" which will return no results. This is true if i search for "title + text", "title only", "text only" or "everywhere". "Exact search" is turned off. Either the search list is empty or i get the following error: "class db::DatabaseException SQLite Error 1: malformed MATCH expression: [-cintanotes]"

Thomas
User avatar
CintaNotes Developer
Site Admin
Posts: 5001
Joined: Fri Dec 12, 2008 4:45 pm
Contact:

Re: Search for NOT is broken?

Postby CintaNotes Developer » Sun Feb 12, 2012 3:33 pm

Yep, it has already been on the roadmap for some time, but got postponed due to low score.
The good news it that it is planned for fixing in version 1.6 :)
Alex
Thomas Lohrum
Posts: 1324
Joined: Tue Mar 08, 2011 11:15 am

Re: Search for NOT is broken?

Postby Thomas Lohrum » Sun Feb 12, 2012 3:59 pm

May i suggest to introduce a different identifier for NOT? The current identifier "-" conflicts when searching for common text phrases like "public-roadmap". How else do you search for "-roadmap"? A solution could be to use "!" or "~" as the NOT operator.

Thomas
User avatar
CintaNotes Developer
Site Admin
Posts: 5001
Joined: Fri Dec 12, 2008 4:45 pm
Contact:

Re: Search for NOT is broken?

Postby CintaNotes Developer » Mon Feb 13, 2012 4:39 am

You just need to write it in quotes:
"-roadmap"

BTW it doesn't exactly work right now because full-text index doesn't include hyphens (it only indexes letters, digits and underscores).
But when this issue is completed, it will work with the "Search inside words" option.

I'm now coming to think that "Exact search" wasn't that bad a name after all.. )
Alex
Thomas Lohrum
Posts: 1324
Joined: Tue Mar 08, 2011 11:15 am

Re: Search for NOT is broken?

Postby Thomas Lohrum » Mon Feb 13, 2012 11:14 am

CintaNotes Developer wrote:You just need to write it in quotes:
"-roadmap"
It feels strange to use a phrase syntax, just because i search for "-". I still think it would be smart to use a different symbol like "!", e.g. !CintaNotes searches for all notes NOT containing the word "cintanotes". In sentences the exclamation marc is used after a word, thus it should never conflict when used in the search phrase. It also relates to C's negation. Since negation did not function over the last two years, changing the symbol now, should be no problem.
Thomas Lohrum wrote:I'm now coming to think that "Exact search" wasn't that bad a name after all.. )
I never understood the term. "Search inside words" will make it more clear, what the option actually does.

Thomas
User avatar
CintaNotes Developer
Site Admin
Posts: 5001
Joined: Fri Dec 12, 2008 4:45 pm
Contact:

Re: Search for NOT is broken?

Postby CintaNotes Developer » Tue Feb 14, 2012 11:39 am

Thomas Lohrum wrote:It feels strange to use a phrase syntax, just because i search for "-". I still think it would be smart to use a different symbol like "!", e.g. !CintaNotes searches for all notes NOT containing the word "cintanotes". In sentences the exclamation marc is used after a word, thus it should never conflict when used in the search phrase. It also relates to C's negation. Since negation did not function over the last two years, changing the symbol now, should be no problem.


Negation did function when you used it with some non-negated term like this: "word1 -word2". I doesn't work only when you use it alone. Concerning the case when you use it alone - well, you have to sacifice something when you need a really fast search. Full-text search has its limitations - try searching for "-term" on Google for example :)

I'd like to keep the "-" symbol because it is in line with the Google's search syntax which is familiar to most people. Using "!" is very unconventional and will confuse many users. I guess it seems natural to you and me because we are developers, but I doubt other people would find it easy to understand.

Now let's think what can we do:
1) The "-word" search error can be fixed via fiddling with the SQL conditions, and it will be done.
2) When you'll want to search for "-something", you'll have to use doublequotes. This will function the same way regardless of the "search inside words" option.
3) Since the "-" symbol is not included into the full-text index anyway, CintaNotes will silently use LIKE-based search under the hood.
4) Full text search will be used only in the following cases:
- "Search inside words" option is off;
- Large notebook size (> 10MB)
- Searched words contain only letters, digits and underscores (other symbols are not indexed by FTS anyway).
This way the user won't have to know which search is used, and FTS will be used just as a speedup opportunity.
Alex
Thomas Lohrum
Posts: 1324
Joined: Tue Mar 08, 2011 11:15 am

Re: Search for NOT is broken?

Postby Thomas Lohrum » Tue Feb 14, 2012 3:18 pm

CintaNotes Developer wrote:Now let's think what can we do:1) to 3)
Fine :) Agreed :)
CintaNotes Developer wrote:4) Full text search will be used only in the following cases:
- "Search inside words" option is off;
- Large notebook size (> 10MB)
Since users usually aren't aware of their nb size, it is hard to recognize, whether cn uses fts or not. Since you want to implement this as an "AND" combination, the "search inside words" option actually is ignored and has no effect. Sounds confusing. When i turn an option on/off i expect it actually has an effect, right?

Imo this need's to be discussed in more detail. For example it could be possible to drop the "search inside words" option at all. Instead introcude a settings option to "force use of FTS engine". A hint should be added, that forcing FTS has the drawback of searching at the beginning of words only. As said this is just an idea to show, that there are other ways to handle this.

Looking forward to 1.6 and the fixed search engine :D

Thomas
User avatar
CintaNotes Developer
Site Admin
Posts: 5001
Joined: Fri Dec 12, 2008 4:45 pm
Contact:

Re: Search for NOT is broken?

Postby CintaNotes Developer » Wed Feb 15, 2012 2:23 pm

Thomas Lohrum wrote:Since you want to implement this as an "AND" combination, the "search inside words" option actually is ignored and has no effect. Sounds confusing. When i turn an option on/off i expect it actually has an effect, right?


When this option is on, CN will always use the simple LIKE-search.

Thomas Lohrum wrote:Imo this need's to be discussed in more detail. For example it could be possible to drop the "search inside words" option at all. Instead introcude a settings option to "force use of FTS engine". A hint should be added, that forcing FTS has the drawback of searching at the beginning of words only. As said this is just an idea to show, that there are other ways to handle this.


I wonder why people are not complaining that it is impossible to find things inside words in Google? :)
I don't agree that "search inside words" should be on by default - more often that not, people just want to find notes containing certain words. Also I don't want to force users into making technical decisions, and choosing the search engine is a step into that direction, don't you think? :roll:
Alex
Thomas Lohrum
Posts: 1324
Joined: Tue Mar 08, 2011 11:15 am

Re: Search for NOT is broken?

Postby Thomas Lohrum » Wed Feb 15, 2012 7:13 pm

CintaNotes Developer wrote:I wonder why people are not complaining that it is impossible to find things inside words in Google?
I can speak for myself only. In german we often have words, that are made up of two basic words, e.g. "Indexdienst" (index-service). As a workaround i often add keywords to the note like "dienst". This is pretty boring though, because i workaround the system. When "search inside words" can be used, there is no need for an extra search word. I can search for "dienst" (service) directly :)

As for google i did a test and actually searched for "dienst index". Guess what? It lists a page "Indexdienst" as the first match! I think google is pretty smart and more powerful than sqlite's fts feature. Probably their index-engine will break down a composed word like "Indexdienst" to their basic meanings "index" and "dienst".

Thomas Lohrum wrote:I don't agree that "search inside words" should be on by default - more often that not, people just want to find notes containing certain words.
Yes, i want to find notes really simple. My certain word is "dienst", but the note will only be found, when "search inside words" is activated!

Thomas Lohrum wrote:Also I don't want to force users into making technical decisions, and choosing the search engine is a step into that direction, don't you think?
Absolutely. Maybe i misunderstood your suggestion. I read fts will only be used, when (a) "search inside words" is off and (b) the database size is less 10 MB. Thus i concluded that fts is disregarded no matter "search inside words" is on or off, when my database size is less than 10 MB. In this case notes would be found containing words, regardless of the option. When i can not predict, whether fts is used or not, i might get different results at various times, e.g. when my database size "suddenly" grows more than 10 MB.

Whatever the implementation or the gui looks like:
  • When "search inside words" is on, i want to find words inside words!
  • When "search inside words" is off, i want to be aware that words inside words will not be found!
  • As for consistent search results, fts should be used, when "search inside words" is deactivated (regardless of the database size).

Thomas
User avatar
CintaNotes Developer
Site Admin
Posts: 5001
Joined: Fri Dec 12, 2008 4:45 pm
Contact:

Re: Search for NOT is broken?

Postby CintaNotes Developer » Thu Feb 16, 2012 8:06 am

Thomas Lohrum wrote:As for google i did a test and actually searched for "dienst index". Guess what? It lists a page "Indexdienst" as the first match! I think google is pretty smart and more powerful than sqlite's fts feature. Probably their index-engine will break down a composed word like "Indexdienst" to their basic meanings "index" and "dienst".

Well actually this only happens when you use google.de, not google.com ;)
And yes, Google can afford to have a custom tokenizer for each language.. With CintaNotes this is not the case - for example, to properly split a Chinese string into words, a several-megabyte dictionary is required! Thus CintaNotes has to use a simpler approach.

Thomas Lohrum wrote:Yes, i want to find notes really simple. My certain word is "dienst", but the note will only be found, when "search inside words" is activated!
But this is specific to German language. Of course, I could implement pluggable tokenizers for different languages, but this is already not for a couple Mb utility for sure ;)

Thomas Lohrum wrote:Whatever the implementation or the gui looks like:
When "search inside words" is on, i want to find words inside words!
When "search inside words" is off, i want to be aware that words inside words will not be found!
As for consistent search results, fts should be used, when "search inside words" is deactivated (regardless of the database size).
I agree that the idea with 10MB was not the brightest one. It would be viable if I could guarantee that the result would always be equivalent (provided the search begins on word boundary). But I can't guarantee this, so the best option probably would be leave the option to use FTS in addition to "search inside words". When "search inside words" is selected, "Use full-text index" option becomes greyed out.

Or there's another alternative. FTS is used automatically, and only if all the conditions below are true. If one or more of them are false, LIKE-based search is used.
  • "Search inside words" is off
  • Search query contains only letters, digits, underscores, the OR symbol(|) and hyphens. Additionally | and - should not be inside of double quotes.
Alex
Thomas Lohrum
Posts: 1324
Joined: Tue Mar 08, 2011 11:15 am

Re: Search for NOT is broken?

Postby Thomas Lohrum » Thu Feb 16, 2012 1:30 pm

CintaNotes Developer wrote:And yes, Google can afford to have a custom tokenizer for each language.. With CintaNotes this is not the case - for example, to properly split a Chinese string into words, a several-megabyte dictionary is required! Thus CintaNotes has to use a simpler approach.
I am absolutely conform with this. I don't expect CN to have such capabilities. The "search inside words" option satisfies the need entirely.

CintaNotes Developer wrote:But this is specific to German language.
Different users have different needs :)

CintaNotes Developer wrote:Of course, I could implement pluggable tokenizers for different languages, but this is already not for a couple Mb utility for sure
Definitely no need to do so ;)

Thomas Lohrum wrote:Or there's another alternative. FTS is used automatically, (...)
I don't like this approach, because it has implications, the user just can not be aware of and once again has to study the help.

CintaNotes Developer wrote:the best option probably would be leave the option to use FTS in addition to "search inside words". When "search inside words" is selected, "Use full-text index" option becomes greyed out.
Having two entries in the search menu would clarify the matter.

  • "[ ] search inside words"
  • "[ ] use full-text index (faster)"

    As for the terms:
  • "search inside words" tell's me, what it does. It make's me aware, that i might get more results. Whereas the "old" term "exact search" make's me wonder, what it actually is good for. It's telling me "this is a different mode", but i don't know, what that mode is good for and i need to lookup the help system to clarify.
  • "(faster)" The additional remark tell's me what a full-text index is actually good for: "it make's the search faster". Otherwise i might wonder what a "full-text index" actually is and what it is good for. Thus adding this simple remark clarifies this right away and make's the gui easier to use.

I like this approach much. However there is an issue with representing the information in the gui. Example: Both options are turned off. Now the user activates "search inside words". Thus "use full-text index" has to be greyed out. On the other hand when the user activates "use full-text index", the "search inside words" option has to be greyed out. In other words the gui needs to take care, that only one option at any time can be activated (which is actually a radiobutton methodology). To activate the other, one has to first deactivate the current active option, which would be fine with me.

A slightly similar approach would be to have a single option "search inside words" and have "full-text index" shown as a status hint only, for example:
[ ] search inside words
full-text index will be used (faster)

[X] search inside words
full-text index can not be used

Thomas
User avatar
CintaNotes Developer
Site Admin
Posts: 5001
Joined: Fri Dec 12, 2008 4:45 pm
Contact:

Re: Search for NOT is broken?

Postby CintaNotes Developer » Mon Feb 20, 2012 10:42 am

Thomas Lohrum wrote:The "search inside words" option satisfies the need entirely.

Great!

Thomas Lohrum wrote:I don't like this approach, because it has implications, the user just can not be aware of and once again has to study the help.

Well I think it depends on implementation. If the only difference between FTS-search and LIKE-search is performance, there is no real need to know which one is used, is it?

Thomas Lohrum wrote:I like this approach much. However there is an issue with representing the information in the gui. Example: Both options are turned off. Now the user activates "search inside words". Thus "use full-text index" has to be greyed out. On the other hand when the user activates "use full-text index", the "search inside words" option has to be greyed out. In other words the gui needs to take care, that only one option at any time can be activated (which is actually a radiobutton methodology). To activate the other, one has to first deactivate the current active option, which would be fine with me.


I'm still in doubts wether it would be OK to clutter the UI with yet another option. For each option we should ask ourselves: is the user really in a better position to make this choice, compared to the developer? In this case people just want to find text and they don't really care if FTS is used or not. They would say "use whatever you need to make the search run faster, but keep it precise enough to be useful".

Thomas Lohrum wrote:A slightly similar approach would be to have a single option "search inside words" and have "full-text index" shown as a status hint only

I like this much better! CintaNotes could display which search is used at the moment as additional information.
By the way, it depends not only on "search inside words", but also on presence of non-alphanumeric characters in the query like ':', '(', '[' etc. - these characters are not indexed by FTS.
Alex
Thomas Lohrum
Posts: 1324
Joined: Tue Mar 08, 2011 11:15 am

Re: Search for NOT is broken?

Postby Thomas Lohrum » Mon Feb 20, 2012 11:10 am

Thomas Lohrum wrote:I like this much better! CintaNotes could display which search is used at the moment as additional information.
By the way, it depends not only on "search inside words", but also on presence of non-alphanumeric characters in the query like ':', '(', '[' etc. - these characters are not indexed by FTS.
Sounds nicely :) May i suggest you release a beta version before rc? Users could actually test and validate such changes. The nature of such discussions is it's theory. By having a "touch on the product" users could feedback on the beta. Allowing to adjust features when necessary, thus avoiding manifesting (undesired) behaviour in rc/public releases.

Thomas
User avatar
CintaNotes Developer
Site Admin
Posts: 5001
Joined: Fri Dec 12, 2008 4:45 pm
Contact:

Re: Search for NOT is broken?

Postby CintaNotes Developer » Tue Feb 21, 2012 1:11 pm

Thomas Lohrum wrote:Sounds nicely :) May i suggest you release a beta version before rc? Users could actually test and validate such changes. The nature of such discussions is it's theory. By having a "touch on the product" users could feedback on the beta. Allowing to adjust features when necessary, thus avoiding manifesting (undesired) behaviour in rc/public releases.

Of course I'll release beta version. The problem is that some of the features will be commercial, and we need to find a way to deal with that. I think the best way would be to create a special restricted forum for beta-testers, who will be getting commercial features for free, helping with beta-testing in return. I reckon this would be fair, do you?
Alex
Thomas Lohrum
Posts: 1324
Joined: Tue Mar 08, 2011 11:15 am

Re: Search for NOT is broken?

Postby Thomas Lohrum » Tue Feb 21, 2012 3:47 pm

Thomas Lohrum wrote:The problem is that some of the features will be commercial, and we need to find a way to deal with that.
Add an expiry date to the beta. That would allow all people interested to help out.
Thomas Lohrum wrote:I think the best way would be to create a special restricted forum for beta-testers
This might be interesting for it would allow better control over the development and discussion on the beta. Not confusing users of the release version with discussions on features not available for them.
Thomas Lohrum wrote:who will be getting commercial features for free
I appreciate your offer. Personally i would help testing the beta without expecting any benefit for it. CN is just one of the best tools i know. It's fun and inspiring working with and being part of the progress :D Actually i am happy to hear you're about to eventually release a commercial version. It is crucial to me to know that development of CintaNotes is assured :D

Thomas
User avatar
CintaNotes Developer
Site Admin
Posts: 5001
Joined: Fri Dec 12, 2008 4:45 pm
Contact:

Re: Search for NOT is broken?

Postby CintaNotes Developer » Sat Feb 25, 2012 7:01 am

Thomas Lohrum wrote:Add an expiry date to the beta. That would allow all people interested to help out.

Yes, this is a good idea.

Thomas Lohrum wrote:This might be interesting for it would allow better control over the development and discussion on the beta. Not confusing users of the release version with discussions on features not available for them.

I agree, but I think I prefer the time-limited solution because it requires less work on both parts. )

CintaNotes Developer wrote:I appreciate your offer. Personally i would help testing the beta without expecting any benefit for it. CN is just one of the best tools i know. It's fun and inspiring working with and being part of the progress :D Actually i am happy to hear you're about to eventually release a commercial version. It is crucial to me to know that development of CintaNotes is assured :D

Welcome, this is actually the only thing I can do for people who are so inspired by the program that they take their time to help. It sure as hell makes me keep going ;) But it's not enough to justify 2 devs working full-time, and the commercial version will hopefully allow that)
Alex

Return to “Bug Reports”