New subject: "Printing" a term to a string

14 May 2003

      I wrote:
    > You want to work in terms of a BLOCK at a time.
Armin Rigo <arigo(a)tunes.org> continues:
    Yes, exactly.  Even on Unix in C nobody uses the
    character-oriented functions on the C API any more;
    using block-oriented functions is much more efficient.
For reading and writing binary information, that's so; fread()
and fwrite() have been there a long time.  For reading and
writing text, you might be surprised.  The technique of
reading a block at a time and throwing it at an "inverted"
parser, as used in expat and SWI SGML, does lead to fast parsers
(although expat is not as fast as my XML parser, which doesn't do
that).  But it also leads to parsers that are very hard to write.
A high speed parser that does the wrong thing because it was too
complex for its author to understand is not an efficient one.
Something that is not often appreciated is that character I/O in the
spirit of C can be implemented so that
get_char(f, p, c) expands to
    while ((c = *p++) == '\0')
        if ((p = get_char_check(f, p)) == 0)
            {c = EOF; break;}
which could compile to something like this on a V9 SPARC:
ldub [p], c     ! these three instructions are the "fast path"
L1:	brnz,pt c, L2	!
    inc p		!
    mov f, %o0	! these instructions are the "slow path"
        call get_char_check
    mov p, %o1	! p and c must be in registers,
    brnz,pt,a p, L1	! but f need not be, because accessing f
    ldub [p], c	! is on the slow path.
    set -1, c
L2:
The cost is 8 instructions per block plus 3 instructions per character.
If you are going to look at the characters at all, once you have read
them, it's going to cost you 2 instructions per character minimum.
(The EOF test can be folded into branching logic.)  The typical C
loop for scanning the characters in a string is
    while ((c = *p++) != '\0') ...
which boils down to the same 3 instructions as the fast path of character
reading.
For other machines the code would be slightly different, but it would
still resemble the C string scanning loop.
So it's true that reading a character at a time using the C stdio library
interface can be slow, but it is NOT true that reading a character at a
time in C using something *resembling* the C stdio interface has to be
slow.
The most efficient solution is even probably to use mmap() to
    map the whole file in the memory space.
Yes, BUT that only works for disc files.  You can't read from a terminal
that way; you can't read from a pipe that way; you can't read from a
socket that way; you can't read from a message queue that way; ...
I well (actually, painfully) remember the way that TOPS-10 required you
to use different system calls (actually, different instructions) to read
from a disc file, from a tape, and from a terminal.  Given the importance
of the internet these days, a minimum requirement on I/O is that it should
be easy to write tolerably efficient code which works with data sources
and sinks that might be local or might be remote.  The box on my desk has
a 500MHz UltraSPARC and a 700MHz Pentium of some kind.  There are 2.4GHz
Pentiums on sale in the local shops at prices so low I have trouble
believing them.  There really isn't any point in me worrying about a
factor of 2 in CPU speed if the cost of a program is dominated by the
cost of reading and writing the blocks (network speed and disc speed do
NOT increase anywhere near as fast as CPU speed).
This is what I meant when I complained about the old-fashioned
    character of the standard Prolog predicates of the get_byte
    family.

For the sake of argument, let's suppose that there was a built-in
predicate
read_block(Stream, Block, Length)
with the same argument order as POSIX.1's read(2).
OK, what are the type and mode of Block?
The mode can't be +, because Prolog can't _change_ a data structure.
(If it could change a block, trailing the changes so they could be
undone on backtracking would be hideously expensive.)
So it has to be -.  If read_block/3 returns a list of (nauseating
ISO botch) single-character atoms, or if it returns a list of
(historic and efficient) character codes, or if it returns a
packed-array-of-codes string object (some Prologs have such things)
or if it returns a single atom, then there is the cost of dynamically
allocating this object.  Presto chango:  you have just squandered far
more time on allocating the result than you have saved by reading
many bytes at once.
So the type has to be something else.  If the stream is actually
bound to a memory mapped file, there _is_ something else we can
return,
block(StartAddress, ActualLength)
But (a) as noted above, this isn't going to work for sockets, and
(b) it introduces a vulnerability: if the stream is closed then
a surviving Block counts as a dangling reference, with all that that
implies, and (c) if you are going to do anything with the block
other than pass it on to a block output predicate, it's going to
cost you more to access each character than it would have to read
them individually in the first place.
So it would make sense to me to introduce block-oriented
    predicates in Prolog.  With a correct design, user-defined
    predicates could then be used to define any new type of
    "blockstream" with very reasonable overhead.
Every time I've tried to design such a thing, I have run into really
nasty problems.  That doesn't mean that "a correct design" couldn't
exist; I am a bear of very little brain and have to learn from everyone.
So if there _is_ such a "correct design" I would be extremely interested
to know what it might be.
  For example, I
would find it very natural to have a pair of predicates
    get_string and put_string with read or write a string of bytes
    from/to a stream.
Yes, but that requires get_string to CHANGE a term, which may be all
very well for C and Lisp, but is just not the way Prolog does things.
Making them dynamic and defining all other
    predicates (like get_byte, for compatibility) in term of them
    would allow user-defined streams with little overhead.

It is easy to _say_ this.  It's rather harder to actually produce
a design where all streams can do 8-bit character and 8-bit byte
reading at a cost of 8 instructions overhead per block + 3 instructions
per character.  If such a design can in fact be produced, I will be an
enthusiastic supporter of it.

However, my earlier mention of the Internet raises another issue.
XML is defined in terms of Unicode, which requires 21 bits per character.
(The range of Unicode values is 0 to 16'10FFFF.)  People have
scarcely imaginable volumes of data coded using 8 bit character sets
or 16 bit character sets, and aren't terribly interested in quadrupling
or doubling their disc requirements or their network traffic.  So programs
that can deal with XML data (or, for that matter, Windows NT file names)
have to be able to convert from other encodings to Unicode on input
and convert from Unicode to other encodings on output.
C handles this by defining a set of functions like [f]getwc(), which
typically convert one character at a time.  Block I/O is considerably
complicated by the fact that block boundaries need not coincide with
character boundaries: a character may be represented by several bytes
some of which are on one block and some of which are on the next.
This makes a source-level block approach to character streams rather
unattractive.
It is still possible to provide fairly efficient character-at-a-time
I/O in a C-like style, provided you don't do it the C way.  In fact,
you keep the same interface, except that p is a pointer to uint32_t
instead of a pointer to uint8_t.  The user-supplied low-level code
delivers blocks of *bytes* to the middle-level "glue" code, which
decodes the block (using any left-over information from the previous
block) and fills a buffer of decoded characters, which the high level
code then pulls out one by one.  I have implemented this scheme for an
XML parser in C.  The details are sufficiently hairy that I really don't
think it is practical to require Prolog programmers to do their own
(en,de)coding, but at least it solves the [....CR][LF...] problem
neatly.
To wrap up, there is a logic programming language with Prolog-like
syntax in which it *is* possible to do byte-block-at-a-time I/O
without breaking the language's semantics, and that is Mercury.
Anyone seriously worried about the cost of I/O would do well to
consider Mercury.  The compiler uses type and mode information to
generate native code, C code, or .NET code that is about as efficient
as typical compiled imperative languages, even though the language is
still a pure logic programming language.
==============================================================================
Message:     Address:                               Action:
help         majordomo(a)clip.dia.fi.upm.es           Info. on useful commands
subscribe    ciao-users-request(a)clip.dia.fi.upm.es  Subscribe to this list
unsubscribe  ciao-users-request(a)clip.dia.fi.upm.es  Unsubscribe from this list
<whatever>   ciao-users(a)clip.dia.fi.upm.es          Send message to list
-----------------------------------------------------------------------------
Archived messages: http://www.clip.dia.fi.upm.es/Mail/ciao-users/
-----------------------------------------------------------------------------

Re: "Printing" a term to a string