Page 1 of 2

[Goodie] Exporting each note to own TXT file via XML2TXTDIR

Posted: Wed Jul 11, 2012 4:36 pm
by CintaNotes Developer
OK time to keep promises ;)

Here I present to you a new command-line tool called "xml2txtdir".

Usage is very simple. Say you have an XML file with notes "D:\myexport.xml" (produced by CintaNotes XML export), and want to turn it into a bunch of txt files in the "D:\txt" directory.
You go to the folder where this tool is located and issue the following command:
xml2txtdir D:\myexport.xml D:\txt


By default the TXT files will have UTF-8 encoding, but you can specify a different encoding with the
"-e" parameter, as follows:
xml2txtdir D:\myexport.xml D:\txt -e utf-16

You can also use "ascii", "windows-1251" etc. encodings, but any non-representable characters will be turned into "?", so I recommend using either utf-8 or utf-16.

Known limitations: text formatting tags like <b> and <i> are not removed (also sometimes it can be of advantage).

And for completeness/tweaking, the source code of the Python script:

Code: Select all

import argparse as ap
import os, os.path
import xml.dom.minidom as xml
import time

VERSION  = "1.0"
GREETING = "CintaNotes TXT folder exporter V%s.\n" % VERSION

def main():
   print(GREETING)
   argsParser = createArgsParser()
   args = argsParser.parse_args()
   print('Processing..')

   count = xmlToTxtFiles(args.inputXML, args.outputFolder, args.encoding)
   print('\n-> Written %d file(s).' % count)


def createArgsParser():
   parser = ap.ArgumentParser(description = "Converts CintaNotes XML file into a set of TXT files, one TXT for each note.")
   parser.add_argument("inputXML", help = 'Source XML file', type = str)
   parser.add_argument("outputFolder", help = 'Folder to write TXT files to', type = str)
   parser.add_argument("-e", "--encoding", dest="encoding",
                        help = 'Encoding of TXT files: utf-8 (default) or utf-16', type = str, default = 'utf-8')
   return parser


def xmlToTxtFiles(inputXML, outputFolder, encoding):
   doc = xml.parse(inputXML)
   notes = doc.getElementsByTagName('note')
   count = 0
   for note in notes:
      xmlToTxtFile(note, outputFolder, encoding)
      count += 1
   return count


def xmlToTxtFile(note, outputFolder, encoding):
   filename = genFileName(note)
   contents = genFileContents(note)
   with open(os.path.join(outputFolder, filename), "wb") as f:
            f.write(contents.encode(encoding, errors="replace"))


def genFileContents(note):
   title = note.attributes['title'].value
   text = note.firstChild.data if note.firstChild else ''
   return title + '\n\n' + text


def genFileName(note):
   created = note.attributes['created'].value
   tags = makeValidFilePath(note.attributes['tags'].value)
   title = makeValidFilePath(note.attributes['title'].value[:50])
   return '%s [%s] %s.txt' % (created, tags, title)

def makeValidFilePath(s):
   s = s.replace('/', '\u2044')
   return ''.join(x for x in s if x.isalnum() or x in ' -.{}#@$%^&!_()[]\u2044')

if __name__ == '__main__': main()

Re: [Goodie] Exporting each note to own TXT file via XML2TXT

Posted: Tue Mar 05, 2013 7:42 am
by Noddy330
Can we have this back please?
I get - The selected attachment does not exist anymore.
Thanks. Nod

Re: [Goodie] Exporting each note to own TXT file via XML2TXT

Posted: Tue Mar 05, 2013 11:26 am
by CintaNotes Developer
Thanks, Nod! Strange, where did it go?
Well here they are once again:

XML2TXTDIR
TXTDIR2XML

Re: [Goodie] Exporting each note to own TXT file via XML2TXT

Posted: Tue Mar 05, 2013 12:40 pm
by Noddy330
CintaNotes Developer wrote:Thanks, Nod! Strange, where did it go?
Well here they are once again:

XML2TXTDIR
TXTDIR2XML


Thanks, Nod.

Re: [Goodie] Exporting each note to own TXT file via XML2TXT

Posted: Sat Jun 07, 2014 3:33 pm
by jimspoon
I tried running xml2txtdir but I always get the message "cannot load library msvcr90.dll". Tried reinstalling the runtime, no luck. any help?

Re: [Goodie] Exporting each note to own TXT file via XML2TXT

Posted: Sat Jun 07, 2014 3:37 pm
by CintaNotes Developer
Hi Jim,
please try the following:

in the same folder where is xml2txtdir.exe, create a file named "xml2txtdir.exe.manifest", and copy the following into it:

Code: Select all

<assembly xmlns="urn:schemas-microsoft-com:asm.v1" manifestVersion="1.0">
  <dependency>
    <dependentAssembly>
      <assemblyIdentity type="win32" name="Microsoft.VC90.CRT" version="9.0.21022.8" processorArchitecture="x86" publicKeyToken="1fc8b3b9a1e18e3b" ></assemblyIdentity>
    </dependentAssembly>
  </dependency>
</assembly>


Save and after that try running the tool.

Re: [Goodie] Exporting each note to own TXT file via XML2TXT

Posted: Sat Jun 07, 2014 4:15 pm
by jimspoon
tried it, same result ... cannot load library msvcr90.dll. :|

i appreciate your help though!

Re: [Goodie] Exporting each note to own TXT file via XML2TXT

Posted: Sat Jun 07, 2014 4:21 pm
by CintaNotes Developer
Probably it'll be easier then to install Python 3 and run the Python script instead of the exe, like this:

python xml2txtdir.py myfile.xml myfolder

The source code of xml2txtdir.py appears in this thread above.

Re: [Goodie] Exporting each note to own TXT file via XML2TXT

Posted: Sat Jun 07, 2014 4:38 pm
by jimspoon
Success! thanks very much. you wouldn't have a similar tool, but to put all the exported notes into a single CSV file?

I tried opening up an exported xml file directly in Excel, but didn't figure out how to get Excel to display the contents of long notes in a word-wrapped cell. Even though i checked the "word wrap" box in Excel cell formatting.

Re: [Goodie] Exporting each note to own TXT file via XML2TXT

Posted: Mon Jun 09, 2014 8:50 am
by CintaNotes Developer
jimspoon wrote:Success! thanks very much. you wouldn't have a similar tool, but to put all the exported notes into a single CSV file?

Great! About CSV - I think CSV is a really bad choice for this, as notes can have commas and tabs within text.

jimspoon wrote:I tried opening up an exported xml file directly in Excel, but didn't figure out how to get Excel to display the contents of long notes in a word-wrapped cell. Even though i checked the "word wrap" box in Excel cell formatting.

What version of Excel do you have? My Excel 2010 wraps the text.
It could also help to manually delete all the "extra" stuff from XML first, leaving only the following structure:
<notebook>
<note></note>
<note></note>
....
</notebook>

XML2TXTDIR Error

Posted: Wed Jun 24, 2015 8:50 pm
by iseemto
Hello!

I cannot cope with the problem I am having with the XML2TXTDIR script from here.

I used both the EXE and then the Python variants but they both result in error messages (which I attach).

Can someone give me a clue what the cause of the problem is?..

Thank you in advance.

PS. I am currently working with Windows XP SP2, CintaNotes 2.9 Pro.

Re: [Goodie] Exporting each note to own TXT file via XML2TXT

Posted: Mon Jun 29, 2015 7:23 am
by CintaNotes Developer
Hi iseemto,

According to the error message there is a problematic character in the XML.
Could you please post full line 12 of the XML file here?

Thanks

Re: [Goodie] Exporting each note to own TXT file via XML2TXT

Posted: Mon Jun 29, 2015 5:58 pm
by iseemto
I guess I have found the cause, it is the MS Word LineBreak symbol.

(!!!) And there should be a certain minimum number of exported fields for the script to work: Title, Text, CreationDate, Tag.

Please, see attached test xml file and screenshots.

Re: [Goodie] Exporting each note to own TXT file via XML2TXT

Posted: Wed Jul 01, 2015 11:20 am
by CintaNotes Developer
Thanks for the info!
Did you manage to process the file after removing this symbol?

Re: [Goodie] Exporting each note to own TXT file via XML2TXT

Posted: Thu Jul 02, 2015 6:24 pm
by iseemto
Yes, with the symbol removed it works.

Re: [Goodie] Exporting each note to own TXT file via XML2TXT

Posted: Mon Jul 06, 2015 9:33 am
by CintaNotes Developer
Good. I think this must be a bug in the Python's XML library. This symbol was embedded into a CDATA section after all, wasn't it?

Re: [Goodie] Exporting each note to own TXT file via XML2TXT

Posted: Mon Jul 13, 2015 7:23 am
by iseemto
Mmm... If I understand the question correctly... This symbol in CintaNotes is the result of clipping from MS Word and, especially, SuperMemo. And there are a lot of notes with this symbol, I think. ((

This bug actually made me write a MS VBA macros that splits the CintaNotes exported TXT file into many separate ones. It's completely amateur, but it works for me.

These two programs, CintaNotes and SuperMemo, add to each other. CN being a good clipping utility and having a flexible tagging system, and SM is a spaced repetition software that doesn not clip and tag.

SM imports individual files (which CN doesn't provide). That's why this issue is important for me.

Anyway, thank you for your work and attention.

Re: [Goodie] Exporting each note to own TXT file via XML2TXT

Posted: Mon Jul 13, 2015 9:47 am
by CintaNotes Developer
Thanks for the info. We'll definitely consider adding exporting to individual files into one of the next CN versions.
Please vote here to increase the priority of this feature:
http://roadmap.cintanotes.com/topic/342 ... ual-files/

Thanks!

Re: [Goodie] Exporting each note to own TXT file via XML2TXT

Posted: Tue Jul 14, 2015 12:28 pm
by usbpoweredfridge
Would CN still capture the Word linebreak character if clipping.format.rtf.enabled is set to zero? Though if you are dependent on rich text capture, than this would not be an option.

Chris

Re: [Goodie] Exporting each note to own TXT file via XML2TXT

Posted: Tue Jul 14, 2015 1:53 pm
by CintaNotes Developer
usbpoweredfridge wrote:Would CN still capture the Word linebreak character if clipping.format.rtf.enabled is set to zero? Though if you are dependent on rich text capture, than this would not be an option.

Chris

I think it would, since this character seems to have its own ASCII code.