Japanese charactor search problem?


Japanese charactor search problem?

Postby marky » Mon Dec 27, 2010 9:03 am

Hello, first, I'd like to say thank you for this great application!
I really like it.

bu the way, I have a problem with searching.
I don't have any problem with exact search.
But when I search with some Japanese charactor, CN didn't respond correctly.
(CN says "No notes" although there are some notes including searching word.)
I think it's happen in notes including Japanese (or 2bytes?) charactors.

I really appriciate if you fix the problem.
Thank you in advance.
User avatar
CintaNotes Developer
Site Admin
Posts: 4654
Joined: Fri Dec 12, 2008 4:45 pm

Re: Japanese charactor search problem?

Postby CintaNotes Developer » Tue Dec 28, 2010 7:42 pm

Hello Marky,

thanks for your report!
I think it must be from the inability of SQlite full text search engine, which CN uses, to split Japanese text into words.
So it considers the whole sentence to be a single word.
Since full-text search only works from word start, it is easy to guess why nothing gets found.
The same problem have Chinese users of CN.

Splitting Japanese and Chinese text into words is a complex problem which cannot be solved easily and requires a lot of predefined data which might bloat CN (see e.g. the ICU library for C++, which is 19MB in size).

The other option would be to treat each character as a separate word. But this would limit you to finding just a set of single characters, without respect for their order.

I've added this as a bug here: http://roadmap.cintanotes.com/feedback/12786-/
You can vote for it so it gets higher priority.
And for the time being I suggest you use exact search.

Return to “Bug Reports”