HNDIT: DATA TYPES IN C++

Primitive Types

C++ is a typed programming language. Typed programming languages use systems of types to distinguish value data, address data, and program code in primary memory. The C++ memory model is byte-addressible at its lowest level: that is, the smallest region of memory that we can identify with a C++ type is a single byte. The type that we assign to the contiguous sequence of bytes starting at a specific address identifies the nature of the information stored in that region of memory as either value data, address data, or program code.

In the type system of any programming language, type defines:

how to interpret the bit string at a specified location
what operations are valid on the bit string at the specified location

Once we assign a type to a region of memory, the compiler can flag those operations that are not permissible on that region of memory. For example, the compiler will flag any multiplication of two C-style strings as an error because multiplication of a pointer to a char with another pointer to a char is not an admissible operation on pointers to chars.

In this chapter, we describe the primitive types of C++ in detail. These types include both scalar types - integrals, floating-points, and pointers - and a void type - valueless and operationless.

INTEGRAL TYPES

Standard C++ defines four integral types:

char
bool
int
enum

All of these types store their values in equivalent binary form without approximation.

1.char

The char type occupies one byte of memory by definition:

char
1 Byte

2.bool

The bool type also occupies one byte of memory:

bool
1 Byte

A variable of bool type can hold one of two values: 0 for false and 1 for true.

3.int

The int type occupies one word of memory. One word is typically the size of a CPU register, making the int type the optimally efficient type. On 32-bit platforms, one word occupies 4 bytes:

int (32-bit platforms)

1 Byte

On a 16-bit platforms or emulations of 16-bit platforms, one word occupies 2 bytes:

int (16-bit platforms)

1 Byte

The ordering of the bytes themselves depends upon the host platform. Big-endian platforms store the highest order byte first. Little-endian platforms store the lowest order byte first. PowerPC platforms are typically big-endian, while Intel platforms are typically little-endian. The ordering of bits within each byte is also platform dependent.

Size Specifiers

Three size specifiers define the minimum number of bits in an int type:

short
long
long long

A short int type, or more concisely a short type, contains at least 16 bits:

short

1 Byte

A long int type, or more concisely a long type, contains at least 32 bits:

long

1 Byte

A long long int type, or more concisely a long long type, contains at least 64 bits:

long long

1 Byte

Size-wise, a short fits between a char and an int, while an int fits between a short and a long.

Range Specifiers

Range specifiers define the range of values associated with an int or a char. The two range specifiers are

unsigned - no negative values
signed - negative and positive values

The default range for int types is signed. The default range for char is platform-dependent.

Unsigned

The unsigned keyword describes a range that extends from zero into the positive domain. All of the bits in anunsigned type store value data.

Type	Size	Min	Max - 32 bit	Max - 16 bit
unsigned char	1 byte	0	255	255
unsigned short	>=16 bits	0	>= 65,535
unsigned int	1 word	0	4,294,967,295	65,535
unsigned long	>=32 bits	0	>= 4,294,967,295
unsigned long long	>=64 bits	0	>= 18,446,744,073,709,551,615

If a variable only holds non-negative values, we add this keyword to its definition. For example,

unsigned char letter;
unsigned short languages;
unsigned int persons; or more simply unsigned persons;
unsigned long students;
unsigned long long citizens;

The range of an unsigned int type depends upon the word size of the host platform.

Signed

The range of a signed int depends upon the word size of the host platform and the encoding scheme for negative values.

The encoding schemes for negative values of integral type include:

two's complement notation - flip the bits and subtract one
one's complement notation - flip the bits
sign magnitude notation - reserve one bit for the sign

All three schemes represent positive values identically. Two's complement, which is the most popular, renders separate ALU subtraction circuits unnecessary and yields only one representation of 0.

For a two's complement encoding scheme, the ranges are:

32-bit platforms
Type	Size	Min	Max
signed char	1 byte	-128	127
char	1 byte	<=0	>=127
short	>=16 bits	<= -32,768	>= 32,767
int	1 word	-2,147,483,648	2,147,483,647
long	>=32 bits	<= -2,147,483,648	>= 2,147,483,647
long long	>=64 bits	<= -9,223,372,036,854,775,808	>= 9,223,372,036,854,775,807

16-bit platforms
Type	Size	Min	Max
signed char	1 byte	-128	127
char	1 byte	<=0	>=127
short	>=16 bits	<= -32,768	>= 32,767
int	1 word	-32,768	32,767
long	>=32 bits	<= -2,147,483,648	>= 2,147,483,647
long long	>=64 bits	<= -9,223,372,036,854,775,808	>= 9,223,372,036,854,775,807

Note that the ranges for char, short, long, and long long types are independent of the word size of the host platform. Only the range for the int type is platform-dependent.

Signed char

The default range for the char type varies across platforms. The IBM AIX platform treats values of char type asunsigned, while Microsoft treats values of char type as signed.

Since the ASCII collating sequence extends from 0 to 127 inclusive, the char type stores ASCII characters identically on all ASCII platforms. However, if we use a char type to hold EOF (typically, -1), we need to identify thechar type as signed:

 signed char c; // for possibly storing EOF

FLOATING-POINT TYPES

The core language defines two floating-point types:

float - a single-precision, floating-point
double - a double-precision, floating-point

The standard does not specify the size of a float or double type but leaves it open to the compiler writer.

Typically, a float type occupies 4 bytes of memory:

float

1 Byte

Typically, a double type occupies 8 bytes of memory:

double

1 Byte

Size Specifier

The keyword long on the double type maximizes the number of significant digits. Typically, a long double type occupies at least 64 bits of memory:

long double

1 Byte

The standard requires that the long double type occupy no less bits than the double type. The standard does not specify a minimum number of bits for this type.

Data Representation

The floating-point types store values approximately. The most popular model is the IEEE (Eye-triple-E for the Institute of Electrical and Electronics Engineers) Standard 754 for Binary and Floating-Point Arithmetic.

Under IEEE 754, a float type occupies 32 bits, has one sign bit, a 23-bit mantissa and an 8-bit exponent. The arrangment of the mantissa and exponent bits is open:

float

1 Byte

exponent

mantissa

float

1 Byte

mantissa

exponent

Under IEEE 754, the value stored is determined by the following formula

 value = s * 2^e * { 1 + f₁2^-1 + f₂2^-2 + ... + f₂₃2^-23}

where f_i is the value of bit i (i = 1,2,...,23) of the mantissa and e is the exponent, which has a value between -127 and 128 inclusive.

Under IEEE 754, a double type occupies 64 bits, has one sign bit, a 52-bit mantissa and an 11-bit exponent. The arrangment of the mantissa and exponent bits is open:

double

1 Byte

exponent

mantissa

double

1 Byte

mantissa

exponent

Under IEEE 754, the value stored is determined by the following formula

 value = s * 2^e * { 1 + f₁2^-1 + f₂2^-2 + ... + f₅₂2^-52}

where f_i is the value of bit i (i = 1,2,...,52) of the mantissa and e is the exponent, which is between -1022 and 1023 inclusive.

Limits and Ranges

The limits on the number of significant digits and the ranges of the exponents for IEEE 754 float and double types are:

Type	Size	Significant Digits	Min Exponent	Max Exponent
float	4 bytes	6	-37	38
double	8 bytes	15	-307	308

The exponent values in this table are decimal (base 10).

SYNONYMS

We can declare synonyms for types to improve the readability of our code. Synonym types are simply aliases for other types. We declare a synonym type using the keyword typedef. The declaration takes the form

 typedef specifiedType Synonym;

where specifiedType is the original type along with its specifiers. Synonym is the alias for that type.

We allocate memory for a variable of synonym type by writing

 Synonym identifier;

For example

 typedef long long int VeryLong; // declaration

 VeryLong x, y;                  // definition

declares the type VeryLong as an unsigned long long int. The definition allocates memory for twounsigned long long int variables:

 long long int x;
 long long int y;

We may not add specifiers to a synonym type. We must include the specifiers in the original declaration.

 unsigned VeryLong x, y; /* ERROR */

POINTER TYPES

A pointer type exists for each type in an application, including each specified type and each synonym type. The pointer types declared for the core primitive types are:

char*
short*
int*
long*
long long*
float*
double*
long double*

Different pointer types are not assignment compatible. We must use explicit casts:

 int* i;
 char* c;
 i = c;            // ERROR - Different Types 
 i = (int*) c;    // OK

Size of a Pointer Type

The size of a pointer type may vary from type to type and is platform dependent. A common assumption has been that a variable of long type occupies at least as much space as any pointer type would require in any application.

Synonym Pointer Types

We can use a synonym pointer type to simplify pointer definitions. For example, we declare a synonym for a pointer to an int

 typedef int* ptr2int;

We can then define several pointer variables without having to include the * before each identifier

 ptr2int px, py;

Note that this synonym form is more readable than a direct definition

 int* px,* py;

VOID TYPE

In addition to the pointer types associated with the different types in an application, the core language also defines a generic pointer type that is not associated with any particular type:

 void*

We may convert any pointer into a generic pointer (void*) and back to the application type without incurring any loss of information:

 void* v;
 int* i;
 v = i;  // OK
 i = v;  // OK

HNDIT

Wednesday, January 2, 2013

DATA TYPES IN C++

INTEGRAL TYPES

Size Specifiers

FLOATING-POINT TYPES

No comments: