[Goodie] Exporting each note to own TXT file via XML2TXTDIR

User avatar
CintaNotes Developer
Site Admin
Posts: 4717
Joined: Fri Dec 12, 2008 4:45 pm
Contact:

[Goodie] Exporting each note to own TXT file via XML2TXTDIR

Postby CintaNotes Developer » Wed Jul 11, 2012 4:36 pm

OK time to keep promises ;)

Here I present to you a new command-line tool called "xml2txtdir".

Usage is very simple. Say you have an XML file with notes "D:\myexport.xml" (produced by CintaNotes XML export), and want to turn it into a bunch of txt files in the "D:\txt" directory.
You go to the folder where this tool is located and issue the following command:
xml2txtdir D:\myexport.xml D:\txt


By default the TXT files will have UTF-8 encoding, but you can specify a different encoding with the
"-e" parameter, as follows:
xml2txtdir D:\myexport.xml D:\txt -e utf-16

You can also use "ascii", "windows-1251" etc. encodings, but any non-representable characters will be turned into "?", so I recommend using either utf-8 or utf-16.

Known limitations: text formatting tags like <b> and <i> are not removed (also sometimes it can be of advantage).

And for completeness/tweaking, the source code of the Python script:

Code: Select all

import argparse as ap
import os, os.path
import xml.dom.minidom as xml
import time

VERSION  = "1.0"
GREETING = "CintaNotes TXT folder exporter V%s.\n" % VERSION

def main():
   print(GREETING)
   argsParser = createArgsParser()
   args = argsParser.parse_args()
   print('Processing..')

   count = xmlToTxtFiles(args.inputXML, args.outputFolder, args.encoding)
   print('\n-> Written %d file(s).' % count)


def createArgsParser():
   parser = ap.ArgumentParser(description = "Converts CintaNotes XML file into a set of TXT files, one TXT for each note.")
   parser.add_argument("inputXML", help = 'Source XML file', type = str)
   parser.add_argument("outputFolder", help = 'Folder to write TXT files to', type = str)
   parser.add_argument("-e", "--encoding", dest="encoding",
                        help = 'Encoding of TXT files: utf-8 (default) or utf-16', type = str, default = 'utf-8')
   return parser


def xmlToTxtFiles(inputXML, outputFolder, encoding):
   doc = xml.parse(inputXML)
   notes = doc.getElementsByTagName('note')
   count = 0
   for note in notes:
      xmlToTxtFile(note, outputFolder, encoding)
      count += 1
   return count


def xmlToTxtFile(note, outputFolder, encoding):
   filename = genFileName(note)
   contents = genFileContents(note)
   with open(os.path.join(outputFolder, filename), "wb") as f:
            f.write(contents.encode(encoding, errors="replace"))


def genFileContents(note):
   title = note.attributes['title'].value
   text = note.firstChild.data if note.firstChild else ''
   return title + '\n\n' + text


def genFileName(note):
   created = note.attributes['created'].value
   tags = makeValidFilePath(note.attributes['tags'].value)
   title = makeValidFilePath(note.attributes['title'].value[:50])
   return '%s [%s] %s.txt' % (created, tags, title)

def makeValidFilePath(s):
   s = s.replace('/', '\u2044')
   return ''.join(x for x in s if x.isalnum() or x in ' -.{}#@$%^&!_()[]\u2044')

if __name__ == '__main__': main()
Attachments
xml2txtdir.zip
(2.52 MiB) Downloaded 3541 times
Alex
Noddy330
Posts: 354
Joined: Thu Jan 22, 2009 11:05 pm
Contact:

Re: [Goodie] Exporting each note to own TXT file via XML2TXT

Postby Noddy330 » Tue Mar 05, 2013 7:42 am

Can we have this back please?
I get - The selected attachment does not exist anymore.
Thanks. Nod
User avatar
CintaNotes Developer
Site Admin
Posts: 4717
Joined: Fri Dec 12, 2008 4:45 pm
Contact:

Re: [Goodie] Exporting each note to own TXT file via XML2TXT

Postby CintaNotes Developer » Tue Mar 05, 2013 11:26 am

Thanks, Nod! Strange, where did it go?
Well here they are once again:

XML2TXTDIR
TXTDIR2XML
Alex
Noddy330
Posts: 354
Joined: Thu Jan 22, 2009 11:05 pm
Contact:

Re: [Goodie] Exporting each note to own TXT file via XML2TXT

Postby Noddy330 » Tue Mar 05, 2013 12:40 pm

CintaNotes Developer wrote:Thanks, Nod! Strange, where did it go?
Well here they are once again:

XML2TXTDIR
TXTDIR2XML


Thanks, Nod.
jimspoon
Posts: 3
Joined: Sat Jun 07, 2014 3:18 pm
Contact:

Re: [Goodie] Exporting each note to own TXT file via XML2TXT

Postby jimspoon » Sat Jun 07, 2014 3:33 pm

I tried running xml2txtdir but I always get the message "cannot load library msvcr90.dll". Tried reinstalling the runtime, no luck. any help?
User avatar
CintaNotes Developer
Site Admin
Posts: 4717
Joined: Fri Dec 12, 2008 4:45 pm
Contact:

Re: [Goodie] Exporting each note to own TXT file via XML2TXT

Postby CintaNotes Developer » Sat Jun 07, 2014 3:37 pm

Hi Jim,
please try the following:

in the same folder where is xml2txtdir.exe, create a file named "xml2txtdir.exe.manifest", and copy the following into it:

Code: Select all

<assembly xmlns="urn:schemas-microsoft-com:asm.v1" manifestVersion="1.0">
  <dependency>
    <dependentAssembly>
      <assemblyIdentity type="win32" name="Microsoft.VC90.CRT" version="9.0.21022.8" processorArchitecture="x86" publicKeyToken="1fc8b3b9a1e18e3b" ></assemblyIdentity>
    </dependentAssembly>
  </dependency>
</assembly>


Save and after that try running the tool.
Alex
jimspoon
Posts: 3
Joined: Sat Jun 07, 2014 3:18 pm
Contact:

Re: [Goodie] Exporting each note to own TXT file via XML2TXT

Postby jimspoon » Sat Jun 07, 2014 4:15 pm

tried it, same result ... cannot load library msvcr90.dll. :|

i appreciate your help though!
User avatar
CintaNotes Developer
Site Admin
Posts: 4717
Joined: Fri Dec 12, 2008 4:45 pm
Contact:

Re: [Goodie] Exporting each note to own TXT file via XML2TXT

Postby CintaNotes Developer » Sat Jun 07, 2014 4:21 pm

Probably it'll be easier then to install Python 3 and run the Python script instead of the exe, like this:

python xml2txtdir.py myfile.xml myfolder

The source code of xml2txtdir.py appears in this thread above.
Alex
jimspoon
Posts: 3
Joined: Sat Jun 07, 2014 3:18 pm
Contact:

Re: [Goodie] Exporting each note to own TXT file via XML2TXT

Postby jimspoon » Sat Jun 07, 2014 4:38 pm

Success! thanks very much. you wouldn't have a similar tool, but to put all the exported notes into a single CSV file?

I tried opening up an exported xml file directly in Excel, but didn't figure out how to get Excel to display the contents of long notes in a word-wrapped cell. Even though i checked the "word wrap" box in Excel cell formatting.
User avatar
CintaNotes Developer
Site Admin
Posts: 4717
Joined: Fri Dec 12, 2008 4:45 pm
Contact:

Re: [Goodie] Exporting each note to own TXT file via XML2TXT

Postby CintaNotes Developer » Mon Jun 09, 2014 8:50 am

jimspoon wrote:Success! thanks very much. you wouldn't have a similar tool, but to put all the exported notes into a single CSV file?

Great! About CSV - I think CSV is a really bad choice for this, as notes can have commas and tabs within text.

jimspoon wrote:I tried opening up an exported xml file directly in Excel, but didn't figure out how to get Excel to display the contents of long notes in a word-wrapped cell. Even though i checked the "word wrap" box in Excel cell formatting.

What version of Excel do you have? My Excel 2010 wraps the text.
It could also help to manually delete all the "extra" stuff from XML first, leaving only the following structure:
<notebook>
<note></note>
<note></note>
....
</notebook>
Alex
iseemto
Posts: 5
Joined: Wed Jun 24, 2015 7:46 pm
Contact:

XML2TXTDIR Error

Postby iseemto » Wed Jun 24, 2015 8:50 pm

Hello!

I cannot cope with the problem I am having with the XML2TXTDIR script from here.

I used both the EXE and then the Python variants but they both result in error messages (which I attach).

Can someone give me a clue what the cause of the problem is?..

Thank you in advance.

PS. I am currently working with Windows XP SP2, CintaNotes 2.9 Pro.
Attachments
python_error message.jpg
Python error message
python_error message.jpg (61.32 KiB) Viewed 21721 times
exe_error message.jpg
Exe error message
exe_error message.jpg (55.46 KiB) Viewed 21721 times
User avatar
CintaNotes Developer
Site Admin
Posts: 4717
Joined: Fri Dec 12, 2008 4:45 pm
Contact:

Re: [Goodie] Exporting each note to own TXT file via XML2TXT

Postby CintaNotes Developer » Mon Jun 29, 2015 7:23 am

Hi iseemto,

According to the error message there is a problematic character in the XML.
Could you please post full line 12 of the XML file here?

Thanks
Alex
iseemto
Posts: 5
Joined: Wed Jun 24, 2015 7:46 pm
Contact:

Re: [Goodie] Exporting each note to own TXT file via XML2TXT

Postby iseemto » Mon Jun 29, 2015 5:58 pm

I guess I have found the cause, it is the MS Word LineBreak symbol.

(!!!) And there should be a certain minimum number of exported fields for the script to work: Title, Text, CreationDate, Tag.

Please, see attached test xml file and screenshots.
Attachments
export.zip
(117.13 KiB) Downloaded 780 times
User avatar
CintaNotes Developer
Site Admin
Posts: 4717
Joined: Fri Dec 12, 2008 4:45 pm
Contact:

Re: [Goodie] Exporting each note to own TXT file via XML2TXT

Postby CintaNotes Developer » Wed Jul 01, 2015 11:20 am

Thanks for the info!
Did you manage to process the file after removing this symbol?
Alex
iseemto
Posts: 5
Joined: Wed Jun 24, 2015 7:46 pm
Contact:

Re: [Goodie] Exporting each note to own TXT file via XML2TXT

Postby iseemto » Thu Jul 02, 2015 6:24 pm

Yes, with the symbol removed it works.
User avatar
CintaNotes Developer
Site Admin
Posts: 4717
Joined: Fri Dec 12, 2008 4:45 pm
Contact:

Re: [Goodie] Exporting each note to own TXT file via XML2TXT

Postby CintaNotes Developer » Mon Jul 06, 2015 9:33 am

Good. I think this must be a bug in the Python's XML library. This symbol was embedded into a CDATA section after all, wasn't it?
Alex
iseemto
Posts: 5
Joined: Wed Jun 24, 2015 7:46 pm
Contact:

Re: [Goodie] Exporting each note to own TXT file via XML2TXT

Postby iseemto » Mon Jul 13, 2015 7:23 am

Mmm... If I understand the question correctly... This symbol in CintaNotes is the result of clipping from MS Word and, especially, SuperMemo. And there are a lot of notes with this symbol, I think. ((

This bug actually made me write a MS VBA macros that splits the CintaNotes exported TXT file into many separate ones. It's completely amateur, but it works for me.

These two programs, CintaNotes and SuperMemo, add to each other. CN being a good clipping utility and having a flexible tagging system, and SM is a spaced repetition software that doesn not clip and tag.

SM imports individual files (which CN doesn't provide). That's why this issue is important for me.

Anyway, thank you for your work and attention.
User avatar
CintaNotes Developer
Site Admin
Posts: 4717
Joined: Fri Dec 12, 2008 4:45 pm
Contact:

Re: [Goodie] Exporting each note to own TXT file via XML2TXT

Postby CintaNotes Developer » Mon Jul 13, 2015 9:47 am

Thanks for the info. We'll definitely consider adding exporting to individual files into one of the next CN versions.
Please vote here to increase the priority of this feature:
http://roadmap.cintanotes.com/topic/342 ... ual-files/

Thanks!
Alex
User avatar
usbpoweredfridge
Posts: 407
Joined: Fri Jan 17, 2014 11:08 pm
Contact:

Re: [Goodie] Exporting each note to own TXT file via XML2TXT

Postby usbpoweredfridge » Tue Jul 14, 2015 12:28 pm

Would CN still capture the Word linebreak character if clipping.format.rtf.enabled is set to zero? Though if you are dependent on rich text capture, than this would not be an option.

Chris
User avatar
CintaNotes Developer
Site Admin
Posts: 4717
Joined: Fri Dec 12, 2008 4:45 pm
Contact:

Re: [Goodie] Exporting each note to own TXT file via XML2TXT

Postby CintaNotes Developer » Tue Jul 14, 2015 1:53 pm

usbpoweredfridge wrote:Would CN still capture the Word linebreak character if clipping.format.rtf.enabled is set to zero? Though if you are dependent on rich text capture, than this would not be an option.

Chris

I think it would, since this character seems to have its own ASCII code.
Alex

Return to “CintaNotes Personal Notes Manager”