----- Original Message ----- From: "Richard A. O'Keefe" <ok(a)atlas.otago.ac.nz> To: <ciao-users(a)clip.dia.fi.upm.es> Sent: Thursday, December 13, 2001 4:17 PM Subject: Re: Database and memory limitations
Manuel Carro <boris(a)aaron.ls.fi.upm.es> wrote: I find it of interest that you are transforming xml datasets into prolog with xsl... specifically the reason your snippet caught my eye is I'm about to try out some previous work with Topic Navigation Maps with Prolog (which is new to me), well basically to see what fits well and what doesn't.
I note that SWI Prolog comes with an SGML parser which supports XML, including XML namespaces. This package has particular support for RDF. I don't know whether Ciao's and SWI's licences are compatible, but it might be worth looking into. I'm told that SWI Prolog is being used to process 90MB RDF files.
I also note that Prolog is vastly more convenient for XML processing than XSLT is. Prolog "Document Value Model" data structures for representing XML are pretty much bound to be much cheaper than the "Document Object Model" data structures used by most XSLT processors, if you have a reasonably compact representation for text. (SWI Prolog uses garbage-collected atoms for this.)
TBH, I'm not convinced by SWI's approach of encoding text nodes (CDATA and PCDATA) as atoms. It is a compact representation, but often the content of the text nodes (including attribute values) has some internal structure that needs to be "micro-parsed".
I've been using Prolog to read and write XML quite a lot, and I found the "text as atoms" approach meant having to convert between atoms and chars far too often. It might be that the XML applications I've been dealing with: XHTML, SVG and SMIL, are especially prone to this, but leaving the text nodes as chars has worked to advantage (see http://www.binding-time.co.uk/xml.pl.shtml ). The files themselves tend to be quite small, (average 10k maximum 250k, they are intended for transmission after all), so memory usage hasn't been a problem.
Perhaps RDF is different in this respect. AIUI, from http://www.xml.com/pub/a/2001/04/25/prologrdf/index.html , all of RDF's data values are URIs, so atom should be a good representation for them, unless you're interested in the URI's structure.
Regards
John Fletcher