Re: "Printing" a term to a string

20 May 2003


      Armin Rigo <arigo(a)tunes.org> wrote:
    By my definition 'text' streams are understood to be a subset of all 'binary'
    streams.
Oh, neat trick.  Neat trick.  If text does worse than binary,
that proves binary is better.  But if text does better than binary,
text is just a special case of binary, so that proves binary is better.
I cannot comment these numbers more without any clue about how the
    files were actually encoded.
    
The binary file was encoded as native binary, and written using fwrite().
The text file was encoded as little-endian hex, with spaces between
numbers.
By widening the average length of the bit strings, it was possible to
push the text time above the binary time, and the text file size about
the binary file size.  But, and this is the big point, the cost of
writing the data was STILL less than the cost of generating it in the
first place.
In short, the practical point is that
 - information encoded as printable ASCII can take LESS space than the
   same information encoded as native binary
 - information encoded as printable ASCII need never take MUCH more space
   than the same information encoded as native binary
 - writing information as printable ASCII can take LESS time than writing it as
   native binary
 - writing information as printable ASCII need never take MUCH more time
   than writing it in native binary
 - whichever you do, if you do it _well_, the major cost is going to be
   the cost of generating the data
 - so the advantages of native binary encoding over printable ASCII are
   slight to nonexistent.
This must be interpreted with some care when dealing with floating point.
"text" is not necessarily the same as "decimal".  Fast text transmission
forms for floating point are _not_ decimal.
>     If you want C, you know where to find it, and you can call it from
    >     Prolog.
    
    How then do I efficiently delegate to my C snippet a block of
    data that my Prolog program should acquire from some external
    source ?
    
Using the same kind of interface you *got* it from in the first place.
Better still, the block of data should never enter the Prolog world in
the first place.  The C world should give the Prolog world information
about _where to find it_, and the Prolog world should pass that information
back to the C world.  That way you can accept and forward a gigabyte as
fast as you can accept and forward a single byte.
>     Here, of course, I'm referring to NATIVE binary formats.
    >     If you talk about things like ASN.1 or XDR, most of the difficulties
    >     I mentioned go away.  But so, of course, does the claimed efficiency
    >     advantage of binary representations.
    
    I am not particularly talking about 'native' binary, if you mean
    by that processor- or whatever-dependent format.  I am not
    particularly talking about standards like ASN.1, either.  There
    is plenty of room inbetween, where you assign a well-defined but
    custom meaning to a sequence of bytes (as opposed to a sequence
    of words or lines).
    
The key point is this:
    if there is a chunk of data that is to be *processed* in Prolog,
    then reading and writing it a byte or character at a time is
    the most convenient way to reading and write it, and the cost
    of doing so is at worst comparable to the cost of fetching
    elements out of a block by any other means
BUT if there is a chunk of data that is simply to be *routed* by
    Prolog, without Prolog actually looking at the contents of the
    chunk, then the chunk should never enter the Prolog world at all,
    only a descriptor.
I am talking about 'SWI strings': atom-like objects that are not
    guaranteed to be unique.  These (not atoms) have a minimal
    allocation overhead which is 'amortized constant' with respect
    to their length (where 'amortized constant' has the precise
    complexity theory meaning).
Ah.  You're referring to section 4.23 of the SWI Prolog manual.
Did you note the bit where it said "new code should ... us[e] atoms"?
I'm having trouble interpreting "'amortized constant' with respect
to their length".  I understand amortised very well; what I don't
understand is the "with respect to their length bit", which cancels
the "constant" bit.
...
From the SWI Prolog manual it is clear that the cost of creating a
SWI string that is N bytes long is proportional to N.  I don't see
how that can be called "constant" in any sense; it's linear.
I note that the space required for a SWI string is only a constant
factor less than the space required for a compound term, or even a
list of characters.
But the important thing is that making a SWI string does involve
allocating space on the global stack, which will typically have
to be checked by the garbage collector.  It cannot possibly be
as fast as NOT making a Prolog data structure to hold a copy of
the data and NOT copying the data.  If you have a block of data
in the C world, _leave_it_there_.
Here's what I'm getting at.  Suppose you have a chunk source,
two chunk sinks, and a chunk classifier.
route_chunks :-
    repeat,
    (	get_chunk(Start, Length),
        (   Length > 0 ->
        classify_chunk(Start, Length, Class),
        (   Class = 1 -> put_chunk_here(Start, Length)
        ;   Class = 2 -> put_chunk_there(Start, Length)
        ;   true
        ),
        hey_source_I_have_finished_with_this_chunk(Start, Length),
        fail    
    ;   !
    )
    ).
All that moves between the C world and the Prolog world is the
Start and Length of the chunk.
Now, it's not just because one of the layers is Prolog that this is
the right thing to do.  If you are routing chunks in C, the right thing
to do is LEAVE THE CHUNK WHERE IT IS, not copy it around.  Some years
ago someone made measurements that showed a UNIX kernel spending about
20% of its time just copying bytes from one place to another.  There
are alternative I/O system designs, like the "container shipping" design,
a vague memory of which is at the back of my recommendations here.
==============================================================================
Message:     Address:                               Action:
help         majordomo(a)clip.dia.fi.upm.es           Info. on useful commands
subscribe    ciao-users-request(a)clip.dia.fi.upm.es  Subscribe to this list
unsubscribe  ciao-users-request(a)clip.dia.fi.upm.es  Unsubscribe from this list
<whatever>   ciao-users(a)clip.dia.fi.upm.es          Send message to list
-----------------------------------------------------------------------------
Archived messages: http://www.clip.dia.fi.upm.es/Mail/ciao-users/
-----------------------------------------------------------------------------

2026

2025

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

2010

2009

2008

2007

2006

2005

2004

2003

2002

2001

2000

1999

1998

Re: "Printing" a term to a string