Bentley Communities
Bentley Communities
  • Site
  • User
  • Site
  • Search
  • User
  • Welcome
  • Products
  • Support
  • About
  • More
  • Cancel
MicroStation
  • Product Communities
  • More
MicroStation
MicroStation Wiki Text Data Types
    • Sign in

    • +MicroStation Wiki
    • +Administration Wiki
    • -Annotations Wiki
      • Annotation Scale in DWG Mode
      • Cannot display or place cells
      • Creating Custom Linestyles [VID]
      • Creating Zoom-Independent Symbols
      • Degree symbol does not appear in angular dimension
      • Detail symbols do not display when placed
      • Dimensions scaled when placed
      • Dimstyle using a terminator with a symbol cell
      • Drawing Boundary
      • Dynamic View Patterns hatched
      • Edit font RSC - Font Utilities
      • Editing text in a DGN drawing MicroStation crashes and closes.
      • Fonts In MicroStation
      • Greyed out Word Processor text editor dialog
      • How to turn off fractions in the text editor
      • Is there an equivalent MDL to "chngtxt" from v7 to v8i?
      • Linear dimension placing stacked
      • Manipulating and Modifying Cell Contents
      • MicroStation Dimensions
      • MicroStation Font Configuration File
      • MicroStation V8 Cell
      • Named Boundaries
      • Obtaining Displayable Corners With Custom Linestyles
      • Parametric Cells - Feature Cells
      • Place dimensions between parallel curves
      • Setting Linestyle Scale In MicroStation
      • Spaces appear as fractions in word processor
      • Sticky Z - Sticking to the elevation
      • Text Data Types
      • Text Editor is not visible
      • The Power Of CSV
      • The Power Of OLE In MicroStation V8
      • Use Named Boundary to place a static attachment
      • Using Chinese Fonts In MicroStation
      • Using foreign characters in written text on English MicroStation V8i on English OS
      • Using MicroStation Tags
      • Using The 400 Grad Angle System In MicroStation V8
      • Using The Replace Cells Tool
      • Why aren't my fonts available in the font pick list?
      • Working With Dimension Styles In DWG Workmode
    • +Bentley View Wiki
    • +MicroStation PowerDraft
    • +Programming Wiki
    • +Visualization Wiki

     
     Questions about this article, topic, or product? Click here. 

    Text Data Types

      Product: MicroStation
      Version: All
      Environment: N\A
      Area: Annotations
      Subarea: N\A

    Text Data Types

    The MicroStation API uses several different data types to represent string data, each with varying underlying data types and semantics. Inside your own applications, we recommend using MSWChar and/or WString, and only converting to other data types when absolutely necessary.

    Encoding

    The goal of this section is to introduce you to the importance of encodings, not to be a full treatment or history; for more information, you may want to consult other sources, for example:

    • http://en.wikipedia.org/wiki/Unicode
    • http://unicode.org
    • http://en.wikipedia.org/wiki/Universal_Character_Set
    • http://en.wikipedia.org/wiki/ASCII
    • http://en.wikipedia.org/wiki/Windows_code_page
    • http://www.microsoft.com/globaldev/reference/WinCP.mspx
    • http://en.wikipedia.org/wiki/Multi-byte_character_set

    In the distant past, computers operated on English-only text, and 128 characters were sufficient to represent all characters. However, they soon needed to be able to convey data in many more languages, some with thousands of characters each. There are two predominant encodings: locale, and Unicode; additionally, each of these can be represented in multi-byte or fixed-byte formats.

    At the end of the day, your ‘string' is just a series of numbers to the computer. The computer doesn't strictly equate 0x41 with the capital Latin letter A, or 0xb1 with a plus/minus symbol, your application must correctly interpret this information. In order to do this, you must know the encoding being used, and sometimes you even need more information. For example, when dealing with a locale-encoded string, 0x8E could mean a Cyrillic capital Tshe, an Arabic Jeh, a Latin capital Z with Caron, or many other characters, depending on the code page; in fact, for a multi-byte locale-encoded string, a given byte may even indicate you must combine it with the following byte in order to determine the character. If 0x8e were Unicode-encoded, it would be an unprintable control character!

    One side note worth mentioning is that in all character encodings, 0x0-0x7f (0-127) represent the same (English) characters.

    Locale-Encoding

    * This encoding should be avoided in favor of the Unicode encoding.

    This type of character encoding attempts to minimize the space needed to represent every character in all supported languages by tightly coupling the encoding to a separate numeric identifier called a code page. In the example above, a single number (0x8e) was attributed many meanings, dictated by code page. While this encoding minimizes the space needed to represent a string (with many Latin-based languages using only a single byte for every character), it requires you to track a code page at all times. A locale-encoded string without a code page is meaningless, and cannot be interpreted to mean anything. Unfortunately, what many functions (and developers) assume is that the locale of a string is that of the system: the Active Code Page (ACP). Windows keeps track of this code page at the system level, and it affects all running applications, and can only be changed with a reboot. This type of encoding can be represented by multi- or fixed-byte formats.

    Unicode-Encoding

    * This is the preferred, unambiguous encoding method.

    This type of character encoding reserves a unique number for every character in every language, thus there is no need to track a separate ‘code page'. Unicode, in a way, is its own code page. This encoding is an industry standard, and is maintained by a consortium of companies and individuals. While technically there are more than 65,535 unique characters in the world (what 2 bytes can hold), 2 bytes-per-character maintains all of the characters in MicroStation's supported languages (those in the BMP, or Basic Multilingual Plane). When dealing with MicroStation, this type of encoding is only supported by fixed-byte formats (for those who want more detail, see information on UTF-16/UCS-2; because MicroStation only supports characters in the BMP, this is effectively a fixed-byte format, where each character uses a single 16-bit word).

    This external article explains Unicode and the various encoding methods in more detail: http://www.joelonsoftware.com/articles/Unicode.html

    Data Types

    char*, std::string

    Encoding: Locale-only
    Ordinal: Multi-byte-only

    This data type only supports locale-encoded, multi-byte strings. Older API's and data structures utilize this format, but going forward, it should never be used. When dealing with this format, you must always keep track of the associated code page.

    MSWChar*, Bentley::WString (also wchar_t* and std::wstring)

    Encoding: Unicode-only
    Ordinal: Fixed-byte-only

    This data type only supports Unicode-encoded, fixed-byte strings. New API's and data structures utilize this format, and this should be the only format used going forward. When dealing with this format, you do not need to keep track of any separate information.

    MSWideChar*

    Encoding: Locale or Unicode
    Ordinal: Fixed-byte-only

    This data type is a MicroStation invention. It should never be used in your applications, except where absolutely necessary; there is active and ongoing work to deprecate and replace any API's that utilize this data type. When dealing with this format, you must always keep track of the associated code page.

    It is important to realize that although this data type is fixed-byte (and can be static casted to wchar_t-based data types), it is generally inappropriate to do so. This format uses the same storage mechanism for both locale and Unicode encoding. If an MSWideChar is Unicode-encoded, than it will function like an MSWChar*; however, if it is locale-encoded, it functions unlike anything else; it is similar to a multi-byte char*, but reserves two bytes for every character, regardless of whether the character needs it. It is also important to realize it is wrong to provide an MSWideChar to any Windows or C runtime functions.

    Converting Between Data Types

    Functions to convert from (left column) to (top row) in MicroStation V8i...

     

    (full size)

    Note that functions to convert to/from MSWChar and MSWideChar are not published; you will have to stub your own definitions in order to be able to use them. The Font-based methods have the advantage of knowing what code page to use (as any MSWideChar strings you provide should always be in the code page of the font).

    To use the C functions, you will have to manually provide a code page, but can simply append the following declarations to your source code. These functions return the number of characters (not strictly bytes) inserted into the destination buffer.

    int MSWideCharStringToMSWCharString (MSWChar* pOutString, UInt32 nOutChars, MSWideChar const * pInString, UInt32 codePage);
    int MSWCharStringToMSWideCharString (MSWideChar* pOutString, UInt32 nOutChars, MSWChar const * pInString, UInt32 codePage);

    To use the methods on the Font object, you will have to modify your delivered FontManager.h file to include the following declarations in the Font class:

    MSCORE_EXPORT int MSWideCharStringToMSWCharString (MSWCharP outString, UInt32 nOutChars, UInt16 const* inString) const;
    MSCORE_EXPORT int MSWCharStringToMSWideCharString (UInt16* outString, UInt32 nOutChars, MSWCharCP inString) const; 

    See also

    Other language sources

      Original Author: Bentley Technical Support Group
    • MSWChar
    • text
    • Encoding
    • CodePage
    • char
    • MSWideChar
    • Font
    • Share
    • History
    • More
    • Cancel
    • Jeff Marker Created by Bentley Colleague Jeff Marker
    • When: Mon, Dec 22 2008 11:20 AM
    • Andrew Bell Last revision by Bentley Colleague Andrew Bell
    • When: Tue, Oct 9 2018 12:12 AM
    • Revisions: 8
    • Comments: 2
    Recommended
    Related
    Communities
    • Home
    • Getting Started
    • Community Central
    • Products
    • Support
    • Secure File Upload
    • Feedback
    Support and Services
    • Home
    • Product Support
    • Downloads
    • Subscription Services Portal
    Training and Learning
    • Home
    • About Bentley Institute
    • My Learning History
    • Reference Books
    Social Media
    •    LinkedIn
    •    Facebook
    •    Twitter
    •    YouTube
    •    RSS Feed
    •    Email

    © 2021 Bentley Systems, Incorporated  |  Contact Us  |  Privacy |  Terms of Use  |  Cookies