[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

RE: SrPersist, sql-c-wchar, and unicode/wide-characters



> when i use sql-c-char, i get the english text ok (roman/ascii
> alphabet), but no korean, arabic, etc.  (they come back as question
> marks)

That makes sense, because sql-c-char represents 8-bit C characters.

> when i tried sql-c-wchar, the mzscheme interpreter said "illegal
> instruction" and exited.

I've never tried SrPersist with a Unicode database, so the wide-character code is untested.

Which primitive caused this problem?  Was it make-buffer, read-buffer, or write-buffer?

Even if this code worked perfectly, you might still have problems.  The MzScheme language does not support Unicode.  In SrPersist, if you read from a buffer that contains Unicode characters into a Scheme string, only the least significant 8-bit of each character are stuffed into the resulting Scheme string.  That strategy works (I think) if the Unicode represents ordinary Latin-1 text.  With Korean, Arabic, etc., it probably fails miserably.

You might consider modifying the wide-character code in srpbuffer.cxx to use a different strategy, say, placing the two bytes in the Unicode character in distinct characters in a Scheme string.  Of course, it will look like garbage in the Scheme REPL.

-- Paul