Richard,
Thanks for your tips, and thanks to all people on this list. I am very excited about the resonance I get from the prolog community, the gcc compiler community proper is not that interested in this project or any project like it. It seems that many people here have sympathy for the idea of extracting meta data from c/c++ programs.
To answer your question,
Is there any redundant information?
yes there is much redundant information, for example, I have one output file per input c file from the compiler, plus one file per function that is compiled in each module, each time it is declared c file or (in lines appear all over the place).
Could the information be put into a CDB file and Ciao's memory be used as a cache?
I was hoping that that would work.
The set of all global information for a c program is not that large, types and functions, these should be compressed down.
The files that I have are around 10-20 MB per source file for the translations of the gcc sources themselves.
My original memory limitations were with gnu prolog, I must admit I have not tried with ciao yet :(.
Is there information which is seldom needed, so it could be loaded on demand?
the bodies of the functions can be loaded on demand, the usage information of data types is not always needed.
I have switched the processing to Perl for a while, but I really did like working with prolog, also because of the ability of querying.
This weekend I will send out and update on the project with all newer source code and example XML files
to the project page at http://sourceforge.net/projects/introspector/
Mike I will be working from my mdupont777(a)yahoo.com account this weekend.
-----Original Message----- From: Richard A. O'Keefe [mailto:ok(a)atlas.otago.ac.nz] Sent: Donnerstag, 13. Dezember 2001 17:18 To: ciao-users(a)clip.dia.fi.upm.es Subject: Re: Database and memory limitations
Manuel Carro <boris(a)aaron.ls.fi.upm.es> wrote: I find it of interest that you are transforming xml datasets into prolog with xsl... specifically the reason your snippet caught my eye is I'm about to try out some previous work with Topic Navigation Maps with Prolog (which is new to me), well basically to see what fits well and what doesn't. I note that SWI Prolog comes with an SGML parser which supports XML, including XML namespaces. This package has particular support for RDF. I don't know whether Ciao's and SWI's licences are compatible, but it might be worth looking into. I'm told that SWI Prolog is being used to process 90MB RDF files.
I also note that Prolog is vastly more convenient for XML processing than XSLT is. Prolog "Document Value Model" data structures for representing XML are pretty much bound to be much cheaper than the "Document Object Model" data structures used by most XSLT processors, if you have a reasonably compact representation for text. (SWI Prolog uses garbage-collected atoms for this.)
My own experience is that having Prolog, Scheme, and Haskell available it'll take a gun pointed at my head or an extremely large bribe to make me use XSLT for anything.
I suspect that the fundamental problem is with the representation that is being generated as the output of the XSLT processing step.
Is there any redundant information? Is there information which is seldom needed, so it could be loaded on demand? Could the information be put into a CDB file and Ciao's memory be used as a cache?