Data Types and Character Sets – Windows Programming

Data Types in Win32 API

Windows does not widely use standard C/C++ data types. Instead, it uses a collection of typedef-defined data types found in the windows.h header file. For most Win32 API programs, only a small core set of data types is required. Below are the most important Win32 data types:

Basic Integer Types:
BOOL – Boolean value (TRUE or FALSE)
INT – A 32-bit signed integer. Normal C-style integer. Declared as typedef int INT;
UINT – A 32-bit unsigned integer. Declared as typedef unsigned int UINT;
DWORD – A 32-bit unsigned integer. Windows system/hardware terminology
LONG – A 32-bit signed integer
BYTE – The same as unsigned char. Declared as typedef unsigned char BYTE;

Handles:
A handler is an identifier that refers to an internal Windows object.
HINSTANCE – A handle to the application instance.
HDC – A device context handle
HMENU – A handle to a menu.
HFONT – A handle to a font.
HBITMAP – A handle to a bitmap.
HBRUSH – A handle to a brush.

String Types (Modern Usage)
LPCWSTR – pointer to a constant null-terminated UTF-16 string.
LPWSTR – A 32-bit pointer to a string of 16-bit Unicode characters, which may be null-terminated.
LPCTSTR – An LPCWSTR if UNICODE is defined, or an LPCSTR otherwise. Now depreciated still exists for backward compatibility
LPTSTR – An LPWSTR if UNICODE is defined, or an LPSTR otherwise. Now depreciated still exists for backward compatibility.
TCHAR – A WCHAR if UNICODE is specified, or a CHAR otherwise. Now depreciated still exists for backward compatibility

For a full list of Windows data types
https://docs.microsoft.com/en-us/windows/win32/winprog/windows-data-types

Identifier Constants

Every Windows program will contain a large number of identifiers. These are constants used to represent numerical values. They are typically written in uppercase and consist of a two- or three-letter prefix denoting the general category, followed by an underscore and the constant name. A selection of type prefixes and their associated messages is listed below.

Prefix	Description	Example
CS	Class style	CS_HREDRAW \| CS_VREDRAW
CW	Create window	CW_USEDEFAULT CW_USEDEFAULT
DT	Draw text	DT_CENTER DT_LEFT DT_RIGHT
IDI	Icon identifier	IDI_ASTERISK IDI_ERROR IDI_HAND
IDC	Cursor identifier	IDC_ARROW IDC_HAND
MB	Message box options	MB_HELP MB_OK MB_OKCANCEL
SND	Sound option	SND_ASYNC SND_NODEFAULT
WM	Window message	WM_NULL WM_CREATE WM_DESTROY
WS	Window style	WS_OVERLAPPED WS_SYSMENU WS_BORDER

Naming conventions

Microsoft follows a set naming convention known as Hungarian notation. Hungarian notation uses short, lowercase prefixes to indicate the data type followed by the variable name, which begins with a capital letter. Function names should start with a capital letter and no type prefix. For further reading on MS coding style conventions

https://docs.microsoft.com/en-us/windows/win32/stg/coding-style-conventions

Character sets

Text and numbers are encoded in a computer as patterns of binary digits known as character codes. For computers to communicate there must be an agreed standard that defines which character code is used for which character. A complete collection of characters is a character set. Two common character sets are ASCII and Unicode.

ASCII

ASCII is a character encoding system that can represent 128 characters. It uses 7 bits to represent each character since the first bit of the byte is always 0. The code set allows 95 printable characters and 33 non-printable Control characters.

Extended ASCII

Although the 128 characters supported by standard ASCII are enough to represent all the standard English characters, they cannot describe all the special characters in other languages. Extended ASCII uses eight bits to represent a character as opposed to seven. Despite extended ASCII doubling the number of characters available, it does not include nearly enough characters to support all languages therefore other forms of character encoding such as Unicode are now commonly used.

UNICODE

The Unicode Standard is a universal character-encoding standard that can represent data in any combination of languages by assigning a unique code, known as a code point, to every character and symbol in that language. A Unicode transformation format (UTF) is an algorithmic mapping of every Unicode code point to a unique byte sequence. The two most common Unicode implementations for encoding the Unicode standard are UTF-8 and UTF-16.

UTF-8 is a variable-width encoding standard. A character in UTF8 can be from 1 to 4 bytes long. The first 128 Unicode codes are the same as ASCII making it backward compatible. This backward compatibility is useful for older API functions. UTF-8 is the preferred encoding for e-mail and web pages.

UTF-16 is a variable-width encoding standard using either 2 or 4 bytes to represent a character. UTF-16 is not backward compatible with ASCII. In Windows, strings are either ANSI or UTF-16LE.

Unicode in the Windows API

Unicode has been the standard in Windows since Windows NT. Windows API functions that use or return strings are generally implemented in one of three forms: an ANSI version (suffixed with “A”), a wide-character version (suffixed with “W”) to support Unicode, and a generic function prototype.

The generic prototype is resolved at compile time into one of the other two function variants by appending a single-character suffix to the base function name. For example, the generic function CreateWindowEx can be resolved as CreateWindowExA (ANSI) or CreateWindowExW (Unicode), depending on the compilation environment.

Working with Strings

C++ has four built-in character types: char, wchar_t, char16_t, and char32_t. In 2011, C and C++ introduced fixed-size character types char16_t and char32_t to deal with the UTF-16 and UTF-32 formats. Since the width of wchar_t is compiler-specific, any program that needs to be compiler-portable should avoid using wchar_t for storing Unicode text.

Any string literal should also use the prefix L,u, or U to indicate a wchar_t, char16_t, and char32_t character string.

char *ascii_example = "This is an ASCII string."; wchar_t *Unicode_example = L"This is a wide char string."; char16_t * char16_example = u"This is a char16_t Unicode string."; har32_t * char32_example = U"This is a char32_t Unicode string.";

TCHAR and the TEXT Macro

To make applications portable between Unicode and non-Unicode systems, Microsoft introduced the macro TCHAR. When a developer needs to support Unicode and earlier non-Unicode compliant operating systems, TCHAR enables the compilation of the same code in either environment by automatically mapping strings to Unicode or ANSI. To complement TCHAR, the TEXT() or _T() macro can automatically define a string as Unicode or ANSI. For example

TCHAR *autostring = TEXT("This message can be either ASCII or UNICODE!");

TCHAR and the TEXT() macro are generally regarded as obsolete for modern Windows application development. Although they remain supported for maintaining legacy codebases, they are no longer required because contemporary Windows development is fully based on Unicode (UTF-16).

For further detailed reading on dealing with character encoding –
https://docs.microsoft.com/en-us/windows/win32/learnwin32/working-with-strings