I wrote: > You want to work in terms of a BLOCK at a time. Armin Rigo <arigo(a)tunes.org> continues: Yes, exactly. Even on Unix in C nobody uses the character-oriented functions on the C API any more; using block-oriented functions is much more efficient.
For reading and writing binary information, that's so; fread() and fwrite() have been there a long time. For reading and writing text, you might be surprised. The technique of reading a block at a time and throwing it at an "inverted" parser, as used in expat and SWI SGML, does lead to fast parsers (although expat is not as fast as my XML parser, which doesn't do that). But it also leads to parsers that are very hard to write. A high speed parser that does the wrong thing because it was too complex for its author to understand is not an efficient one.
Something that is not often appreciated is that character I/O in the spirit of C can be implemented so that
get_char(f, p, c) expands to while ((c = *p++) == '\0') if ((p = get_char_check(f, p)) == 0) {c = EOF; break;}
which could compile to something like this on a V9 SPARC:
ldub [p], c ! these three instructions are the "fast path" L1: brnz,pt c, L2 ! inc p ! mov f, %o0 ! these instructions are the "slow path" call get_char_check mov p, %o1 ! p and c must be in registers, brnz,pt,a p, L1 ! but f need not be, because accessing f ldub [p], c ! is on the slow path. set -1, c L2:
The cost is 8 instructions per block plus 3 instructions per character. If you are going to look at the characters at all, once you have read them, it's going to cost you 2 instructions per character minimum. (The EOF test can be folded into branching logic.) The typical C loop for scanning the characters in a string is while ((c = *p++) != '\0') ... which boils down to the same 3 instructions as the fast path of character reading.
For other machines the code would be slightly different, but it would still resemble the C string scanning loop.
So it's true that reading a character at a time using the C stdio library interface can be slow, but it is NOT true that reading a character at a time in C using something *resembling* the C stdio interface has to be slow.
The most efficient solution is even probably to use mmap() to map the whole file in the memory space.
Yes, BUT that only works for disc files. You can't read from a terminal that way; you can't read from a pipe that way; you can't read from a socket that way; you can't read from a message queue that way; ...
I well (actually, painfully) remember the way that TOPS-10 required you to use different system calls (actually, different instructions) to read from a disc file, from a tape, and from a terminal. Given the importance of the internet these days, a minimum requirement on I/O is that it should be easy to write tolerably efficient code which works with data sources and sinks that might be local or might be remote. The box on my desk has a 500MHz UltraSPARC and a 700MHz Pentium of some kind. There are 2.4GHz Pentiums on sale in the local shops at prices so low I have trouble believing them. There really isn't any point in me worrying about a factor of 2 in CPU speed if the cost of a program is dominated by the cost of reading and writing the blocks (network speed and disc speed do NOT increase anywhere near as fast as CPU speed).
This is what I meant when I complained about the old-fashioned character of the standard Prolog predicates of the get_byte family. For the sake of argument, let's suppose that there was a built-in predicate
read_block(Stream, Block, Length)
with the same argument order as POSIX.1's read(2).
OK, what are the type and mode of Block?
The mode can't be +, because Prolog can't _change_ a data structure. (If it could change a block, trailing the changes so they could be undone on backtracking would be hideously expensive.)
So it has to be -. If read_block/3 returns a list of (nauseating ISO botch) single-character atoms, or if it returns a list of (historic and efficient) character codes, or if it returns a packed-array-of-codes string object (some Prologs have such things) or if it returns a single atom, then there is the cost of dynamically allocating this object. Presto chango: you have just squandered far more time on allocating the result than you have saved by reading many bytes at once.
So the type has to be something else. If the stream is actually bound to a memory mapped file, there _is_ something else we can return,
block(StartAddress, ActualLength)
But (a) as noted above, this isn't going to work for sockets, and (b) it introduces a vulnerability: if the stream is closed then a surviving Block counts as a dangling reference, with all that that implies, and (c) if you are going to do anything with the block other than pass it on to a block output predicate, it's going to cost you more to access each character than it would have to read them individually in the first place.
So it would make sense to me to introduce block-oriented predicates in Prolog. With a correct design, user-defined predicates could then be used to define any new type of "blockstream" with very reasonable overhead.
Every time I've tried to design such a thing, I have run into really nasty problems. That doesn't mean that "a correct design" couldn't exist; I am a bear of very little brain and have to learn from everyone. So if there _is_ such a "correct design" I would be extremely interested to know what it might be. For example, I
would find it very natural to have a pair of predicates get_string and put_string with read or write a string of bytes from/to a stream.
Yes, but that requires get_string to CHANGE a term, which may be all very well for C and Lisp, but is just not the way Prolog does things.
Making them dynamic and defining all other predicates (like get_byte, for compatibility) in term of them would allow user-defined streams with little overhead. It is easy to _say_ this. It's rather harder to actually produce a design where all streams can do 8-bit character and 8-bit byte reading at a cost of 8 instructions overhead per block + 3 instructions per character. If such a design can in fact be produced, I will be an enthusiastic supporter of it. However, my earlier mention of the Internet raises another issue.
XML is defined in terms of Unicode, which requires 21 bits per character. (The range of Unicode values is 0 to 16'10FFFF.) People have scarcely imaginable volumes of data coded using 8 bit character sets or 16 bit character sets, and aren't terribly interested in quadrupling or doubling their disc requirements or their network traffic. So programs that can deal with XML data (or, for that matter, Windows NT file names) have to be able to convert from other encodings to Unicode on input and convert from Unicode to other encodings on output.
C handles this by defining a set of functions like [f]getwc(), which typically convert one character at a time. Block I/O is considerably complicated by the fact that block boundaries need not coincide with character boundaries: a character may be represented by several bytes some of which are on one block and some of which are on the next. This makes a source-level block approach to character streams rather unattractive.
It is still possible to provide fairly efficient character-at-a-time I/O in a C-like style, provided you don't do it the C way. In fact, you keep the same interface, except that p is a pointer to uint32_t instead of a pointer to uint8_t. The user-supplied low-level code delivers blocks of *bytes* to the middle-level "glue" code, which decodes the block (using any left-over information from the previous block) and fills a buffer of decoded characters, which the high level code then pulls out one by one. I have implemented this scheme for an XML parser in C. The details are sufficiently hairy that I really don't think it is practical to require Prolog programmers to do their own (en,de)coding, but at least it solves the [....CR][LF...] problem neatly.
To wrap up, there is a logic programming language with Prolog-like syntax in which it *is* possible to do byte-block-at-a-time I/O without breaking the language's semantics, and that is Mercury. Anyone seriously worried about the cost of I/O would do well to consider Mercury. The compiler uses type and mode information to generate native code, C code, or .NET code that is about as efficient as typical compiled imperative languages, even though the language is still a pure logic programming language. ============================================================================== Message: Address: Action: help majordomo(a)clip.dia.fi.upm.es Info. on useful commands subscribe ciao-users-request(a)clip.dia.fi.upm.es Subscribe to this list unsubscribe ciao-users-request(a)clip.dia.fi.upm.es Unsubscribe from this list <whatever> ciao-users(a)clip.dia.fi.upm.es Send message to list ----------------------------------------------------------------------------- Archived messages: http://www.clip.dia.fi.upm.es/Mail/ciao-users/ -----------------------------------------------------------------------------
Hello Richard,
I'll focus my answer to the problem of reading/writing binary data. Using plain text to transfer information between programs always strike me as utterly inefficient anyway (even if it is often greatly appreciated in practice), and reading/writing from a user terminal doesn't need to be efficient.
All other information sources (disk files, sockets, unix pipes...) can typically be accessed a block at a time. Actually, in Unix, the native OS call is the block-oriented 'read' and 'write' for all types of file descriptors, even for conceptually character-oriented pipes.
On Wed, May 14, 2003 at 01:33:05PM +0200, Richard A. O'Keefe wrote:
For the sake of argument, let's suppose that there was a built-in predicate
read_block(Stream, Block, Length)with the same argument order as POSIX.1's read(2).
OK, what are the type and mode of Block?
As you point out, it has to be -, and the only reasonable data structure it can return is some kind of native packed string (like in SWI Prolog). But the overhead of having to allocate that memory is minimal. It is an amortized constant. Actually reading data into that buffer is a single system call. There is no need for any loop in your program, the OS kernel handles all details. You just cannot make it more efficient on a Unix architecture (disregarding mmap(), which I concede has strong limitations).
On Unix applications, implementing block-reading in term of character-reading is an abstraction inversion (http://cliki.tunes.org/Abstraction%20Inversion) because the kernel natively provides block-reading.
To wrap up, there is a logic programming language with Prolog-like syntax in which it *is* possible to do byte-block-at-a-time I/O without breaking the language's semantics, and that is Mercury.
You are too concerned about the allocation cost. A lot of high-level languages more dynamic than Mercury have immutable string-like buffers and very successfully return them as a result of a block-reading operation (http://www.python.org).
Armin Rigo ============================================================================== Message: Address: Action: help majordomo(a)clip.dia.fi.upm.es Info. on useful commands subscribe ciao-users-request(a)clip.dia.fi.upm.es Subscribe to this list unsubscribe ciao-users-request(a)clip.dia.fi.upm.es Unsubscribe from this list <whatever> ciao-users(a)clip.dia.fi.upm.es Send message to list ----------------------------------------------------------------------------- Archived messages: http://www.clip.dia.fi.upm.es/Mail/ciao-users/ -----------------------------------------------------------------------------