Armin Rigo <arigo(a)tunes.org> wrote: By my definition 'text' streams are understood to be a subset of all 'binary' streams.
Oh, neat trick. Neat trick. If text does worse than binary, that proves binary is better. But if text does better than binary, text is just a special case of binary, so that proves binary is better.
I cannot comment these numbers more without any clue about how the files were actually encoded. The binary file was encoded as native binary, and written using fwrite(). The text file was encoded as little-endian hex, with spaces between numbers.
By widening the average length of the bit strings, it was possible to push the text time above the binary time, and the text file size about the binary file size. But, and this is the big point, the cost of writing the data was STILL less than the cost of generating it in the first place.
In short, the practical point is that - information encoded as printable ASCII can take LESS space than the same information encoded as native binary - information encoded as printable ASCII need never take MUCH more space than the same information encoded as native binary - writing information as printable ASCII can take LESS time than writing it as native binary - writing information as printable ASCII need never take MUCH more time than writing it in native binary - whichever you do, if you do it _well_, the major cost is going to be the cost of generating the data - so the advantages of native binary encoding over printable ASCII are slight to nonexistent.
This must be interpreted with some care when dealing with floating point. "text" is not necessarily the same as "decimal". Fast text transmission forms for floating point are _not_ decimal.
> If you want C, you know where to find it, and you can call it from > Prolog. How then do I efficiently delegate to my C snippet a block of data that my Prolog program should acquire from some external source ? Using the same kind of interface you *got* it from in the first place. Better still, the block of data should never enter the Prolog world in the first place. The C world should give the Prolog world information about _where to find it_, and the Prolog world should pass that information back to the C world. That way you can accept and forward a gigabyte as fast as you can accept and forward a single byte.
> Here, of course, I'm referring to NATIVE binary formats. > If you talk about things like ASN.1 or XDR, most of the difficulties > I mentioned go away. But so, of course, does the claimed efficiency > advantage of binary representations. I am not particularly talking about 'native' binary, if you mean by that processor- or whatever-dependent format. I am not particularly talking about standards like ASN.1, either. There is plenty of room inbetween, where you assign a well-defined but custom meaning to a sequence of bytes (as opposed to a sequence of words or lines). The key point is this: if there is a chunk of data that is to be *processed* in Prolog, then reading and writing it a byte or character at a time is the most convenient way to reading and write it, and the cost of doing so is at worst comparable to the cost of fetching elements out of a block by any other means
BUT if there is a chunk of data that is simply to be *routed* by Prolog, without Prolog actually looking at the contents of the chunk, then the chunk should never enter the Prolog world at all, only a descriptor.
I am talking about 'SWI strings': atom-like objects that are not guaranteed to be unique. These (not atoms) have a minimal allocation overhead which is 'amortized constant' with respect to their length (where 'amortized constant' has the precise complexity theory meaning).
Ah. You're referring to section 4.23 of the SWI Prolog manual. Did you note the bit where it said "new code should ... us[e] atoms"?
I'm having trouble interpreting "'amortized constant' with respect to their length". I understand amortised very well; what I don't understand is the "with respect to their length bit", which cancels the "constant" bit.
From the SWI Prolog manual it is clear that the cost of creating a SWI string that is N bytes long is proportional to N. I don't see how that can be called "constant" in any sense; it's linear.
I note that the space required for a SWI string is only a constant factor less than the space required for a compound term, or even a list of characters.
But the important thing is that making a SWI string does involve allocating space on the global stack, which will typically have to be checked by the garbage collector. It cannot possibly be as fast as NOT making a Prolog data structure to hold a copy of the data and NOT copying the data. If you have a block of data in the C world, _leave_it_there_.
Here's what I'm getting at. Suppose you have a chunk source, two chunk sinks, and a chunk classifier.
route_chunks :- repeat, ( get_chunk(Start, Length), ( Length > 0 -> classify_chunk(Start, Length, Class), ( Class = 1 -> put_chunk_here(Start, Length) ; Class = 2 -> put_chunk_there(Start, Length) ; true ), hey_source_I_have_finished_with_this_chunk(Start, Length), fail ; ! ) ).
All that moves between the C world and the Prolog world is the Start and Length of the chunk.
Now, it's not just because one of the layers is Prolog that this is the right thing to do. If you are routing chunks in C, the right thing to do is LEAVE THE CHUNK WHERE IT IS, not copy it around. Some years ago someone made measurements that showed a UNIX kernel spending about 20% of its time just copying bytes from one place to another. There are alternative I/O system designs, like the "container shipping" design, a vague memory of which is at the back of my recommendations here. ============================================================================== Message: Address: Action: help majordomo(a)clip.dia.fi.upm.es Info. on useful commands subscribe ciao-users-request(a)clip.dia.fi.upm.es Subscribe to this list unsubscribe ciao-users-request(a)clip.dia.fi.upm.es Unsubscribe from this list <whatever> ciao-users(a)clip.dia.fi.upm.es Send message to list ----------------------------------------------------------------------------- Archived messages: http://www.clip.dia.fi.upm.es/Mail/ciao-users/ -----------------------------------------------------------------------------