Wednesday, January 2, 2013

DATA TYPES IN C++

Primitive Types 
C++ is a typed programming language.  Typed programming languages use systems of types to distinguish value data, address data, and program code in primary memory.  The C++ memory model is byte-addressible at its lowest level: that is, the smallest region of memory that we can identify with a C++ type is a single byte.  The type that we assign to the contiguous sequence of bytes starting at a specific address identifies the nature of the information stored in that region of memory as either value data, address data, or program code. 
In the type system of any programming language, type defines:
  • how to interpret the bit string at a specified location
  • what operations are valid on the bit string at the specified location
Once we assign a type to a region of memory, the compiler can flag those operations that are not permissible on that region of memory.  For example, the compiler will flag any multiplication of two C-style strings as an error because multiplication of a pointer to a char with another pointer to a char is not an admissible operation on pointers to chars. 
In this chapter, we describe the primitive types of C++ in detail.  These types include both scalar types - integrals, floating-points, and pointers - and a void type - valueless and operationless.

INTEGRAL TYPES

Standard C++ defines four integral types:
  • char
  • bool
  • int
  • enum
All of these types store their values in equivalent binary form without approximation. 
1.char
The char type occupies one byte of memory by definition: 
char
1 Byte
2.bool
The bool type also occupies one byte of memory: 
bool
1 Byte
A variable of bool type can hold one of two values: 0 for false and 1 for true. 
3.int
The int type occupies one word of memory.  One word is typically the size of a CPU register, making the int type the optimally efficient type.  On 32-bit platforms, one word occupies 4 bytes:
int (32-bit platforms)
1 Byte1 Byte1 Byte1 Byte
On a 16-bit platforms or emulations of 16-bit platforms, one word occupies 2 bytes:
int (16-bit platforms)
1 Byte1 Byte
The ordering of the bytes themselves depends upon the host platform.  Big-endian platforms store the highest order byte first.  Little-endian platforms store the lowest order byte first.  PowerPC platforms are typically big-endian, while Intel platforms are typically little-endian.  The ordering of bits within each byte is also platform dependent.

Size Specifiers

Three size specifiers define the minimum number of bits in an int type:
  • short
  • long
  • long long
short int type, or more concisely a short type, contains at least 16 bits:
short
1 Byte1 Byte
long int type, or more concisely a long type, contains at least 32 bits:
long
1 Byte1 Byte1 Byte1 Byte
long long int type, or more concisely a long long type, contains at least 64 bits:
long long
1 Byte1 Byte1 Byte1 Byte1 Byte1 Byte1 Byte1 Byte
Size-wise, a short fits between a char and an int, while an int fits between a short and a long.
Range Specifiers
Range specifiers define the range of values associated with an int or a char.  The two range specifiers are
  • unsigned - no negative values
  • signed - negative and positive values
The default range for int types is signed.  The default range for char is platform-dependent. 
Unsigned
The unsigned keyword describes a range that extends from zero into the positive domain.  All of the bits in anunsigned type store value data. 
TypeSizeMinMax - 32 bitMax - 16 bit
 unsigned char 1 byte0255255
unsigned short>=16 bits0>= 65,535
unsigned int1 word04,294,967,29565,535
unsigned long>=32 bits0>= 4,294,967,295
 unsigned long long >=64 bits0>= 18,446,744,073,709,551,615 
If a variable only holds non-negative values, we add this keyword to its definition.  For example,
  • unsigned char letter;
  • unsigned short languages;
  • unsigned int persons; or more simply unsigned persons;
  • unsigned long students;
  • unsigned long long citizens;
The range of an unsigned int type depends upon the word size of the host platform.
Signed
The range of a signed int depends upon the word size of the host platform and the encoding scheme for negative values. 
The encoding schemes for negative values of integral type include:
  • two's complement notation - flip the bits and subtract one
  • one's complement notation - flip the bits
  • sign magnitude notation - reserve one bit for the sign
All three schemes represent positive values identically.  Two's complement, which is the most popular, renders separate ALU subtraction circuits unnecessary and yields only one representation of 0. 
For a two's complement encoding scheme, the ranges are:
32-bit platforms
TypeSizeMinMax
 signed char 1 byte-128127
char1 byte<=0>=127
short>=16 bits<= -32,768>= 32,767
int1 word-2,147,483,6482,147,483,647
long>=32 bits<= -2,147,483,648>= 2,147,483,647
long long>=64 bits<= -9,223,372,036,854,775,808 >= 9,223,372,036,854,775,807 
16-bit platforms
TypeSizeMinMax
 signed char 1 byte-128127
char1 byte<=0>=127
short>=16 bits<= -32,768>= 32,767
int1 word-32,76832,767
long>=32 bits<= -2,147,483,648>= 2,147,483,647
long long>=64 bits<= -9,223,372,036,854,775,808 >= 9,223,372,036,854,775,807 
Note that the ranges for charshortlong, and long long types are independent of the word size of the host platform.  Only the range for the int type is platform-dependent. 
Signed char
The default range for the char type varies across platforms.  The IBM AIX platform treats values of char type asunsigned, while Microsoft treats values of char type as signed
Since the ASCII collating sequence extends from 0 to 127 inclusive, the char type stores ASCII characters identically on all ASCII platforms.  However, if we use a char type to hold EOF (typically, -1), we need to identify thechar type as signed:
 signed char c; // for possibly storing EOF


FLOATING-POINT TYPES

The core language defines two floating-point types:
  • float - a single-precision, floating-point
  • double - a double-precision, floating-point
The standard does not specify the size of a float or double type but leaves it open to the compiler writer. 
Typically, a float type occupies 4 bytes of memory: 
float
1 Byte1 Byte1 Byte1 Byte
Typically, a double type occupies 8 bytes of memory: 
double
1 Byte1 Byte1 Byte1 Byte1 Byte1 Byte1 Byte1 Byte
Size Specifier
The keyword long on the double type maximizes the number of significant digits.  Typically, a long double type occupies at least 64 bits of memory:
long double
1 Byte1 Byte1 Byte1 Byte1 Byte1 Byte1 Byte1 Byte
The standard requires that the long double type occupy no less bits than the double type.  The standard does not specify a minimum number of bits for this type. 
Data Representation
The floating-point types store values approximately.  The most popular model is the IEEE (Eye-triple-E for the Institute of Electrical and Electronics Engineers) Standard 754 for Binary and Floating-Point Arithmetic. 
Under IEEE 754, a float type occupies 32 bits, has one sign bit, a 23-bit mantissa and an 8-bit exponent.  The arrangment of the mantissa and exponent bits is open:

float
1 Byte1 Byte1 Byte1 Byte
sexponentmantissa
or
float
1 Byte1 Byte1 Byte1 Byte
smantissaexponent
Under IEEE 754, the value stored is determined by the following formula
 value = s * 2e * { 1 + f12-1 + f22-2 + ... + f232-23}
where fi is the value of bit i (i = 1,2,...,23) of the mantissa and e is the exponent, which has a value between -127 and 128 inclusive.
Under IEEE 754, a double type occupies 64 bits, has one sign bit, a 52-bit mantissa and an 11-bit exponent.  The arrangment of the mantissa and exponent bits is open:
double
1 Byte1 Byte1 Byte1 Byte1 Byte1 Byte1 Byte1 Byte
sexponentmantissa
or
double
1 Byte1 Byte1 Byte1 Byte1 Byte1 Byte1 Byte1 Byte
smantissaexponent
Under IEEE 754, the value stored is determined by the following formula
 value = s * 2e * { 1 + f12-1 + f22-2 + ... + f522-52}
where fi is the value of bit i (i = 1,2,...,52) of the mantissa and e is the exponent, which is between -1022 and 1023 inclusive.
Limits and Ranges
The limits on the number of significant digits and the ranges of the exponents for IEEE 754 float and double types are:


TypeSizeSignificant DigitsMin ExponentMax Exponent
float4 bytes6-3738
double8 bytes15-307308
The exponent values in this table are decimal (base 10). 

SYNONYMS
We can declare synonyms for types to improve the readability of our code.  Synonym types are simply aliases for other types.  We declare a synonym type using the keyword typedef.  The declaration takes the form
 typedef specifiedType Synonym;
where specifiedType is the original type along with its specifiers.  Synonym is the alias for that type. 
We allocate memory for a variable of synonym type by writing
 Synonym identifier;
For example
 typedef long long int VeryLong; // declaration

 VeryLong x, y;                  // definition
declares the type VeryLong as an unsigned long long int.  The definition allocates memory for twounsigned long long int variables: 
 long long int x;
 long long int y;
We may not add specifiers to a synonym type.  We must include the specifiers in the original declaration.
 unsigned VeryLong x, y; /* ERROR */

POINTER TYPES
A pointer type exists for each type in an application, including each specified type and each synonym type.  The pointer types declared for the core primitive types are:
  • char*
  • short*
  • int*
  • long*
  • long long*
  • float*
  • double*
  • long double*
Different pointer types are not assignment compatible.  We must use explicit casts:
 int* i;
 char* c;
 i = c;            // ERROR - Different Types 
 i = (int*) c;    // OK
Size of a Pointer Type
The size of a pointer type may vary from type to type and is platform dependent.  A common assumption has been that a variable of long type occupies at least as much space as any pointer type would require in any application. 
Synonym Pointer Types
We can use a synonym pointer type to simplify pointer definitions.  For example, we declare a synonym for a pointer to an int
 typedef int* ptr2int;
We can then define several pointer variables without having to include the * before each identifier
 ptr2int px, py;
Note that this synonym form is more readable than a direct definition
 int* px,* py;


VOID TYPE
In addition to the pointer types associated with the different types in an application, the core language also defines a generic pointer type that is not associated with any particular type: 
 void*
We may convert any pointer into a generic pointer (void*) and back to the application type without incurring any loss of information: 
 void* v;
 int* i;
 v = i;  // OK
 i = v;  // OK

No comments: