New subject: "Printing" a term to a string

19 May 2003


      I'll focus my answer to the problem of reading/writing binary
    data.  Using plain text to transfer information between programs
    always strike me as utterly inefficient anyway (even if it is
    often greatly appreciated in practice), and reading/writing from
    a user terminal doesn't need to be efficient.
Claims like "utterly inefficient" are so much marsh gas without
actual numbers.  Here's one example.  I wrote a program that generates
2**22 bit strings from 1 to 32 bits each, stored in an array of
unsigned long.  The cost of generating them and writing them out 5
times was:
text version:   3.95 user + 0.13 system = 4.08 total seconds.
binary version: 5.06 user + 2.08 system = 7.14 total seconds.
This really doesn't look like a victory for binary, especially when
you realise that the binary version was 1.6 times bigger.
I don't call nearly twice as fast and nearly twice as small
"utterly inefficient", and I don't believe anyone else would either.
I refuse to focus on reading/writing binary data.
I have several reasons for this.
(1) That's just not what Prolog is about.  Prolog's strengths are in
    the area of text and symbolic processing; these days the top Prolog
    systems are also good at some kinds of constraint logic programming.
If you want C, you know where to find it, and you can call it from
    Prolog.
(2) As it happens, I am supervising a 4th year student who _is_ very
    VERY interested in processing binary data.  MIDI data, in fact.
    But this has to be grovelled over byte by byte; using a block level
    interface for reading and/or writing MIDI data would be about as
    sensible as using hedge clippers to trim your fingernails.
This leads to my major point, which is that if you are DOING something
    with the binary data, chances are that the overheads of moving one
    item at a time between the Prolog world and the C world are dwarfed
    by the processing costs.
(3) I have used binary data transmission between programs.  And I have
    done it both before there was such a thing as the System V ABI (where
    processor supplements are supposed to spell out record layout for C)
    and between systems that had no interest in being in any way like
    System V.  I have experienced UNIX systems where three different
    compilers laid simple structs out three different ways (not realised
    until _after_ the data failed to read).  I have transmitted data from
    a 5-character-per-word machine to a 2-character-per-word machine via
    a 6-character-per-word machine.  (I never want to do that again,
    especially if paper tape is involved.)  I have spent hours sweating
    over how to convert between IBM mainframe, VAX, IEEE, and sort-of-
    like-IEEE-but-not-really floating point formats.  I have found
    systems where one compiler could generate any of six different
    record layouts depending on compile-time options.  I have had to
    deal with binary data containing addresses where a program was
    loaded at a different address each time it was run (and I see that
    some version of BSD is bringing this back in the interests of
    defeating the stack-smashing bandits).  I've heard horror stories
    from friends about two C++ programs compiled with the same headers
    being unable to share data in a shared memory region (because the
    vtables were different).  Then there's the fact that the processor-
    specific System V ABI supplements have nothing to say about record
    layout in Fortran or Pascal, let alone Ada or Eiffel.
Put it this way, using binary formats to transfer information
    between programs that differ in even the slightest respect (such
    as being two different processes running the *same* executable
    on the *same* machine at the *same* time) may be efficient,
    but it's crazy.
Here, of course, I'm referring to NATIVE binary formats.
    If you talk about things like ASN.1 or XDR, most of the difficulties
    I mentioned go away.  But so, of course, does the claimed efficiency
    advantage of binary representations.
(4) There's a vast tide of information flowing over the net.  A lot of
    it is pictures and movies, which Prolog should be delegating to
    other processors anyway.  Almost all the rest is text.
All other information sources (disk files, sockets, unix pipes...) can
    typically be accessed a block at a time. Actually, in Unix, the native OS call
    is the block-oriented 'read' and 'write' for all types of file descriptors,
    even for conceptually character-oriented pipes.
    
This is highly relevant to how the bottom layer works.
It is highly IRRELEVANT to how the top layer works.  This is, after all,
the point of distinguishing layers.  I have already shown in some detail
how a single-byte-at-a-time interface (to the Prolog world) can be
as efficient as anyone could wish for, built on a block-at-a-time C
world layer.
There are three activities:
    *generation* (the top layer's "writing" facilities)
    *transmission* (the bottom layer's concern)
    *parsing and processing* (the top layer's "reading" facilities)
It is important to decouple these activities.
I note, by the way, that one of the important points for the Quintus
block layer was the fact that some important operating systems distinguish
logical blocks and physical blocks, and what they actually _give_ you is
physical blocks, which you have to reassemble yourself.  If the top
layer works in terms of blocks, there isn't the slightest reason to
expect _those_ blocks to have any simple relationship to the blocks
exchanged with the operating system.
Even in UNIX, it may be necessary to issue several read() or write()
requests to transmit a single logical block, and you don't know until
you try whether a single read() or write() request will actually suffice.
> OK, what are the type and mode of Block?
    
    As you point out, it has to be -, and the only reasonable data
    structure it can return is some kind of native packed string
    (like in SWI Prolog).
SWI Prolog programmers routinely use atoms where other Prolog programmers
might have used character lists.  SWI atoms are compact, and garbage
collected.  They are well suited to holding text that you are just going
to pass on.  They are NOT convenient for generation or parsing and
processing.
But the overhead of having to allocate that memory is minimal.
Well, no.  On a 250MHz machine, I measured the cost of creating a
5-character atom as 33 microseconds.  The cost of calling get_byte/1
5 times was measured as 17.5 microseconds.  The cost you dismiss as
"minimal" is in fact nearly twice the cost of single-byte input.
It is an amortized constant.
It is not amortized and it is not constant.  A new object has to be
created *EACH* time.  This is Prolog, not C.
Actually reading data into that buffer is a single system call.
    There is no need for any loop in your program, the OS kernel
    handles all details.
This has never been true in UNIX.  If you want to read N bytes, where
N > 1, you MUST have a loop, because the operating system is under no
obligation whatsoever to deliver all N bytes at once.  Indeed, it may
be unable to.  (The same applies to write().)
You just cannot make it more efficient on a Unix
    architecture (disregarding mmap(), which I
    concede has strong limitations).
I _agree_ that reading and writing blocks is the right way to go
IN THE BOTTOM LAYER.  That's why Quintus did it that way; layering
get0/1 on top of glue code on top of getc() on top of RMS on top of
QIO was seriously inefficient (almost, but not quite, as bad as Java).
Moving to a block-oriented bottom layer with "universal" mapping above
meant that we just had get0/1 on top of glue code on top of QIO.
As I've already shown, the fast path for the glue code can be EXTREMELY
fast, although the VAX, M68k, and /370 were sufficiently register poor
that it wasn't practical at the time to make it _quite_ that fast.
On Unix applications, implementing block-reading in term of character-reading
    is an abstraction inversion (http://cliki.tunes.org/Abstraction%20Inversion)  
    because the kernel natively provides block-reading.
    
This is argument by alluding to intimidating authority,
and as such uncompelling.
It's basically uncompelling for three reasons:
(1) I am not at all arguing that block reading should be implemented in
    terms of character reading.  I'm arguing precisely the opposite:
    that character reading is practically always the right abstraction
    for Prolog, and that it should be implemented in terms of block reading.
(2) the abstraction that matters is the abstraction your program NEEDS
    not the abstraction your operating system provides.  I keep on trying
    to make the point that it is a waste of time worrying about the raw
    transput costs if you don't worry about the generating and processing
    costs.  A program whose cost is dominated by raw transput is a program
    which isn't _doing_ anything much with its data, and about the only
    useful program of that kind is 'cat'.
(3) The block abstraction provided by UNIX is ***not*** "the blocks you
    find convenient in your source code" but "the block fragments that
    the kernel finds convenient to accept or return".  In the plainest
    possible terms, the block transput abstraction provided by UNIX is
    *not* the abstraction any program which generates or processes data
    wants; only a program whose task is simply to *route* data can be
    satsified with it directly.
As witness: fread() and fwrite(), which do NOT map onto single read()
    and write() calls.
> To wrap up, there is a logic programming language with Prolog-like
    > syntax in which it *is* possible to do byte-block-at-a-time I/O
    > without breaking the language's semantics, and that is Mercury.
    
    You are too concerned about the allocation cost. A lot of high-level languages 
    more dynamic than Mercury have immutable string-like buffers and very 
    successfully return them as a result of a block-reading operation 
    (http://www.python.org).
    
I am concerned about the allocation cost because I have *measured* it,
it is an *avoidable* cost, and because it is the cost of making most
Prolog I/O *harder* to write correctly, not easier.
==============================================================================
Message:     Address:                               Action:
help         majordomo(a)clip.dia.fi.upm.es           Info. on useful commands
subscribe    ciao-users-request(a)clip.dia.fi.upm.es  Subscribe to this list
unsubscribe  ciao-users-request(a)clip.dia.fi.upm.es  Unsubscribe from this list
<whatever>   ciao-users(a)clip.dia.fi.upm.es          Send message to list
-----------------------------------------------------------------------------
Archived messages: http://www.clip.dia.fi.upm.es/Mail/ciao-users/
-----------------------------------------------------------------------------

Re: "Printing" a term to a string