[ABD v8i SS5 Native C] Querying multi-byte languages from database

Hi,

To be able to query a database table's content we are using mdlDB_openCursorWithID() and mdlDB_fetchRowByID() functions along with MS_sqlda structure. But it returns (char *) data type as query result which is limited to 255 ASCII characters. But for multi-byte languages like Japanese, Korean and Arabic, we need the query result as MSWChar data type to support Unicode characters. Is there any solution to this?

Kind regards,

Sedat Alis
AEC Technology Inc.

Parents
  • But it returns (char *) data type as query result which is limited to 255 ASCII characters.

    Hi ,

    MicroStation (and ABD) V8 are both capable of performing the most common needed API calls for reading and writing "character data" to a supported target database; typically up to the database schema permitted sizes for column names and supported datatypes; whether ASCII (character byte codes 0-255) or UNICODE (local + multibyte character arrays).

    As with most MicroStation API functions that do not provide multi-byte (MSWChar) character parameter equivalents and only provide "char *" parameters (0-255 ASCII byte code ranges - though lengths/sizes virtually unlimited) can typically and correctly pass (with a little encoding and decoding) ASCII multi-byte strings (char *) to Unicode wide character (MSWChar) string data.

    e.g. The MicroStation Dialog Manager and Resource Managers process char * pointer data checking to see if the byte stream pattern starts with a UNICODE_INDUCER bytecode (0xfffeu - if needed, defined in unicode.h), convert and read/write to a target location.

    If your data has a unicode inducer proceed to make calls to convert the char * (multibyte ASCII character data) to unicode (using mdlCnv_convertMultibyteToUnicode/mdlCnv_convertUnicodeToMultibyte).  Once you have unicode in multibyte form (MSWChar) proceed to use the available/standard wide character C/C++ APIs (MicroStation or Microsoft's) to manipulate your multi-byte character data.  Additionally your application may need to call setlocale() to ensure proper Unicode interpretation/conversion if the Unicode results do not appear correct at first.

    HTH,
    Bob



Reply
  • But it returns (char *) data type as query result which is limited to 255 ASCII characters.

    Hi ,

    MicroStation (and ABD) V8 are both capable of performing the most common needed API calls for reading and writing "character data" to a supported target database; typically up to the database schema permitted sizes for column names and supported datatypes; whether ASCII (character byte codes 0-255) or UNICODE (local + multibyte character arrays).

    As with most MicroStation API functions that do not provide multi-byte (MSWChar) character parameter equivalents and only provide "char *" parameters (0-255 ASCII byte code ranges - though lengths/sizes virtually unlimited) can typically and correctly pass (with a little encoding and decoding) ASCII multi-byte strings (char *) to Unicode wide character (MSWChar) string data.

    e.g. The MicroStation Dialog Manager and Resource Managers process char * pointer data checking to see if the byte stream pattern starts with a UNICODE_INDUCER bytecode (0xfffeu - if needed, defined in unicode.h), convert and read/write to a target location.

    If your data has a unicode inducer proceed to make calls to convert the char * (multibyte ASCII character data) to unicode (using mdlCnv_convertMultibyteToUnicode/mdlCnv_convertUnicodeToMultibyte).  Once you have unicode in multibyte form (MSWChar) proceed to use the available/standard wide character C/C++ APIs (MicroStation or Microsoft's) to manipulate your multi-byte character data.  Additionally your application may need to call setlocale() to ensure proper Unicode interpretation/conversion if the Unicode results do not appear correct at first.

    HTH,
    Bob



Children
  • Hi Robert,

    Thank you, your explanation is so clean. I am trying to query Arabic text from database.

    First, I tried without setting locale. Text is returned as "?????".

    Second, I used below function to set locale to Arabic before query.

    setlocale(LC_ALL, "ar");

    Text is returned again as "?????".

    I can't catch UNICODE_INDUCER inside sqlda.value. Maybe Access is returning text as "?????".

    Do you have a sample code?

    Kind regards,

    Sedat Alis
    AEC Technology Inc.

  • Maybe Access is returning text as "?????"
    • What is the Windows language setting for your computer?
    • Does Microsoft Access display Arabic text correctly?
    I tried without setting locale. Do you have a sample code?

    Investigate locale and Unicode/multibyte characters initially on web sites that deal with Microsoft APIs.  Character conversion is confusing even before you throw MicroStation V8's APIs and MSWideChar into the mix.   I prefer to obtain help from sites, such as Stack Overflow or Code Project, where your peers are offering their experience independently of Microsoft.

     
    Regards, Jon Summers
    LA Solutions

  • What is the Windows language setting for your computer?

    Windows language is English. Windows locale language is Turkish.

    Does Microsoft Access display Arabic text correctly?

    Yes.

    Investigate locale and Unicode/multibyte characters initially on web sites that deal with Microsoft APIs.

    I checked Microsoft standards first but MicroStation API is returning the values of the database field.

    Investigate locale and Unicode/multibyte characters initially on web sites that deal with Microsoft APIs.  Character conversion is confusing even before you throw MicroStation V8's APIs and MSWideChar into the mix.

    I think several Unicode/multibyte character types are also a Dilemma in MDL.

    MSWChar, wchar_t, MSWideChar

    You explained them on MicroStation MDL Text Types. Thank you.

    Kind regards,

    Sedat Alis
    AEC Technology Inc.

  • I think several Unicode/multibyte character types are also a Dilemma in MDL

    I agree: Unicode support in the V8 MicroStation API is confusing.

    MSWChar is a typedef of C++ wchar_t (see MDL header basetypes.h)

    MSWideChar is a Bentley invention for V8 (it is not used in CONNECT).  AFAIK the only place it's used is when creating a text element.

     
    Regards, Jon Summers
    LA Solutions

  • Hi ,

    I am tagging to see if he has or knows of any code examples (or issues) related to Reading and Writing UNICODE values between MicroStation and Microsoft Access Databases.  I presume he may have his own experience(s) to pass along or some old code snips that may help.

    Unfortunately I am on an English only Windows and locale so I can only provide some quick research feedback along the lines of what Jan and Jon have suggested.  Try to remove MicroStation from the picture first and ensuring straight Microsoft code, SQL statements and (driver and connection settings) options can produce the desired results first.  Then attempt to run those same steps in MicroStation (straight SQL calls as a test first) then more specific MDL database calls next; helping ensure each incremental step in complexity succeeds, or can be proven to have a problem/recommendation needed next.

    My quick look for "microsoft access unicode sql code examples" produced these two points I believe are worth testing and considering; when first testing with Microsoft code...:

    • Test and/or Microsoft Access Database Unicode Compression if found to be contributing to the problem. (Ref 1, Ref 2)
    • If using a generic ODBC driver for mdb, use or install Microsoft Access specific/optimized versions for better results and/or performance.

    Please let us know your results using Microsoft APIs and SQL (w/options).  Hopefully that (or YongAn's input) can help resolve the issue, or provide a more specific are to identify and troubleshoot.

    HTH,
    Bob