Stereotypes
Table of Contents
It’s All Numbers #
To the computer hardware, it’s all numbers. Addresses to locations in working memory? Numbers. The values at those locations? Numbers. Machine instructions, Operational Codes, Operands, etc. Numbers.
True and False? One and Zero.
Characters? Encoded as numbers. Graphics? Colors are encoded as numbers. Sounds are encoded as numbers.
Onboard the CPU, the machine has registers and processor support for floating point numbers, integers and booleans. But you need to tell the machine which type of number to process.
So the following value is stored in working memory
01000010
Is it the integer 66? Is it the floating point number 9.2E-44? Is it the letter ‘B’?
The hardware does not provide the context for the meaning or types of values. The programmer and software must manage the context and type of the numerical value.
Types #
A type is a blueprint for a value. A type defines a range of values that can be accepted, and what operations can be performed on those values.
Examples:
// C#
int x = 5; // x is a 32 bit integer type (Int32)
char y = 'A'; // y is a character or char type
string z = " Brave New World"; // z is a sequence of characters (string type)
// C#
x = x + 3; // x was declared in above code example
string z = y + z;
// x = 8
// z = "A Brave New World"
// The + operation implicitly converts char y to a string
// and then concatenates the 2 string variables
// In C# "+" works for all number types
// except for 8 and 16 bit integrals
// Also "+" can be used to concatenate strings
So what does a programming language “type system” mean?1
// C#
int x = 4 + "2";
// The C# compiler throws an error
// error CS0029: Cannot implicitly convert type 'string' to 'int'
(* F# a "Functional" programming language *)
let x = 4 + "2"
// error FS0001: The type 'string' does not match the type 'int'
N.B. Build Time type inference. More on this in a later post.
// JavaScript
let x = 4 + "2";
console.log(x);
console.log(typeof(x));
/* Output */
// "42"
// "string"
N.B. Runtime type coercion. More on this in a later post.
But
// C#
string x = 4 + "2";
Console.WriteLine(x);
// "42"
// Number type has a ToString() method
// that formats a number to a string
// "+" implicitly converts (coerces)
// 4 to a "4"
More to come on type conversions (explicit or implicit) in a later post, including the concept of type “inference”. But the short story is all of the above are 2 sides of the same coin.
Variables #
An identifier for a storage location in working memory. Every variable has a name, a type and usually a value.
// C#
int x = 3;
// Declare a pointer of type int "int *"
// & returns the address of x to the pointer
int* ptrToNbr = &x;
Console.WriteLine($"Address of x (hex) {(long)ptrToNbr:X}");
Console.WriteLine($"Address of x (decimal) {(long)ptrToNbr}");
/* Output */
// 7FFEE1D52738 (x's location in hex)
// 140732687263544 (x's location in decimal)
x is an identifier for 140732687263544 (a location in working memory). Every variable has a name, a type, and content.
Primary Type Taxonomy #
Value Types #
A variable (storage location) that contains a value.
// C#
int x = 3;
int* ptrToNbr = &x; // Pointer
Console.WriteLine($"Value of x is {x}");
Console.WriteLine($"Address of x (hex) {(long)ptrToNbr:X}");
Console.WriteLine($"Address of x (decimal) {(long)ptrToNbr}");
// "*ptrToNbr" the "*" returns the value at the pointed to address
Console.WriteLine($"Value of x via pointer dereferencing {*ptrToNbr}");
/* Output */
// 3 (the content of variable x)
// 7FFEE1D52738 (x's location in hex)
// 140732687263544 (x's location in decimal)
// 3 (via dereferencing the pointer)
Reference Types #
A variable (storage location) that contains a number, and that number is an address that => (“references”) a separate location that has a value or object.
// C#
// An array of numbers
// numbers stores the address of the first element in the array
int[] numbers = {10,20,30, 50};
// The "=" copies the address in numbers to the pointer
// We want the value at numbers not the address of numbers so no & needed
fixed (int* ptrToArray = numbers)
{
Console.WriteLine($"numbers has the value {(long)ptrToArray}");
// Output: 6518340696
// This is the address that numbers and ptrToArray are storing
// "*ptrToArray" returns the value at address 6518340696
Console.WriteLine($"points to position 0 {*ptrToArray}");
// Output: 10
// this also returns the value at address 6518340696
Console.WriteLine(numbers[0]);
// Output 10
// Pointer math
Console.WriteLine($"value at position 1 {*(ptrToArray + 1)}");
// Output: 20
// Same as above
Console.WriteLine(numbers[1]);
// Output 20
// copy the address the second array element to a pointer
int* ptrToNbrsPos1 = (ptrToArray + 1);
Console.WriteLine($"Address of numbers[1] position {(long)ptrToNbrsPos1}");
// Output: 6518340700
// 4 bytes beyond position 0
// or one Int32 beyond position 0
}
// numbers is a name for a storage location (address unknown)
// the numbers storage location contains the value 6518340696
// 6518340696 is the address of the numbers array position zero
We can naively think of a Reference Type as a managed pointer.
Generic Type Parameters #
We don’t know the type yet. We will figure it out at compile time.2
For programming languages that are “fussy” (aka strongly typed) about types, Generic Type Parameters enable the creation of functions that are type agnostic and can be used across variables of different types.
A Generic Type Parameter is represented in code in different contexts as a <T>, or T. The T replaces int, long, char, string or some other complex type.
// C#
public class ExampleGenericParameter
{
// The Add method's parameters a and b
// are declared as type T
public static T Add<T>(T a, T b)
{
// the dynamic keyword tells the compiler
// we will not close or resolve a and b's type
// until program execution time aka runtime
// (post build and post compile time)
return (dynamic)a + (dynamic)b;
}
}
int exampleIntX = 3;
int exampleIntY = 4;
// not necessary to close the generic T with Add<int>( ... )
// because C# infers the parameters are of type int
var intAddResult = ExampleGenericParameter.Add(exampleIntX,exampleIntY);
Console.WriteLine(intAddResult);
// Output: 7
double exampleDoubleX = 3.2;
double exampleDoubleY = 4.5;
var doubleAddResult = ExampleGenericParameter.Add(exampleDoubleX, exampleDoubleY);
Console.WriteLine(doubleAddResult);
// Output: 7.7
char exampleCharX = 'A';
char exampleCharY = 'Y';
// C# infers the type of var = char based on the types of the parameters
var charAddResult = ExampleGenericParameter.Add(exampleCharX, exampleCharY);
Console.WriteLine(charAddResult);
// @ runtime (dynamic keyword) throws below error:
// "Unhandled exception. Cannot implicitly convert type 'int' to 'char'."
// "An explicit conversion exists (are you missing a cast?)"
// because "+" does not work with char or ushort (implicit char casting)
// It makes sense to not combine two characters into one character :-)
string exampleStrX = "X";
string exampleStrY = "Y";
var strAddResult = ExampleGenericParameter.Add(exampleStrX, exampleStrY);
Console.WriteLine(strAddResult);
// Output: "XY"
char exampleCharCastA = 'A';
char exampleCharCastY = 'Y';
int charCastAddResult = ExampleGenericParameter.Add((int)exampleCharCastA,(int)exampleCharCastY);
Console.WriteLine(charCastAddResult);
// Output: 154 (UTF encoding of A -> 65, and Y -> 89)
char exampleCharZ = 'Z';
string exampleStrB = "B";
string charStrAddResult = ExampleGenericParameter.Add(exampleCharZ.ToString(), exampleStrB);
Console.WriteLine(charStrAddResult);
// Output: "ZB"
There are different approaches we could use in a type “fussy” language to accomplish the above. We could explicitly convert between types (casting), or we could create several Add functions all with the Add name but with parameters of different types (function overloading). None of these approaches is a silver bullet.
The purpose of the above example is to illustrate the “type wrangling” that occurs in strongly typed languages.
A frequent use case for generic type parameters in strongly typed languages is placing objects of different types that share an interface or parent type in lists and other collections.
// JavaScript
function add (a,b) {
return a + b;
}
let addResult = add(3, 4);
console.log(addResult);
// Output is 7
addResult = add(3.2, 4.5);
console.log(addResult);
// Output is 7.7
addResult = add('a', 'b');
console.log(addResult);
// Output is "ab"
JavaScript is not very fussy about types (aka Weakly Typed). It does not need a Generic Type Parameter. There are other tradeoffs, however, with Weakly Typed languages that will be discussed downstream.
As mentioned before, its two sides of the same coin. Neither strongly or weakly typed is a silver bullet.
Pointers (Unsafe Code) #
Some managed programming languages, like C#, allow declarations of “unsafe” code where the programmer can directly access memory. Like this ..
// C#
int x = 3;
int* ptrToNbr = &x; // C# Pointer to Variable x
New Types and New Semantics #
Over the last few years, C# has evolved with new types and new semantics that combine some of the characteristics of value and reference types.
Some of the new types or new semantics are to enable performance optimization, and lower level control of the machine’s resources while maintaining managed memory safety:
- Ref Locals
- Ref Returns
- Ref Struct and Span<T>
- Function Pointers
Other new types increase C# support for functional programming:
- ValueTuples
- Record Classes
- Record Structs
These new types will be the topics of later posts.
Other Type Taxonomies #
These are not formally defined terms. Across different languages and in different contexts, these terms get applied in different ways. Below is a general summary.
Primitive Types #
Types that are highly aligned or map easily to the processor’s operations and data representations. A 32 bit integer, a 64 bit integer, a float, a bool.
Alternatively, types that can not be decomposed. Such as the above types or an atom in Lisp.
Complex Types #
Types constructed using primitive types. Structs, arrays, classes, lists, maps, delegates, etc.
Some are Value Types, and some are Reference Types.
Built-In Types #
The types predefined by your programming language. Int, float, char, object, array, atom, list, etc.
Some are Value Types, and some are Reference Types.
Custom Types #
You can define your own structs, classes, objects, etc.
Some are Value Types, and some are Reference Types.
Programming Language Type Based Taxonomy #
Finally, two terms with informal definitions and variances in usage.
Static Versus Dynamic Typing #
A programming language is considered “statically” typed if the types are resolved at compilation time. C#, Java, C, C++, Rust, Haskell and others are considered statically typed.
A programming language is considered “dynamically” typed if the types are resolved at program execution time (runtime). JavaScript, PHP, Python, Ruby are considered dynamically typed.
Strong Versus Weak Typing #
A strongly typed programming language aggressively enforces its type system either at compile time or runtime with errors. You can not compile, or your program crashes at run time, if your program is in violation of the type system rules.
C#, Java, C++, Ruby, Python, Rust, Haskell and others are considered strongly typed.
A weakly typed programming language does everything possible to prevent an error usually by implicitly converting your types (without asking :-) ).
JavaScript, C, PHP and others are considered weakly typed.
What is better? It depends. All of these languages have contexts where they shine, and contexts where they are not well aligned.
To see how the other half lives, see this post with an example in C. ↩︎
Compile time for C# defined as when the Intermediate Language is compiled to machine code (usually by the Just in Time (JIT) compiler at runtime). Generics in C# are resolved (“closed”) at this point. Generics are not resolved at build time which for C# is compilation from source to Intermediate Language. At build time, however, C# will generate errors preventing a build if the code does not provide sufficient information for C# to resolve (“close”) the types downstream by the JIT compilation process. The build system is a gatekeeper to prevent downstream errors. In the example above, the dynamic keyword was required to “get around” these errors stopping the build as the C# type system could not guarantee that all types passed into the function during program execution would be supported by the “+” operation (as my example shows with the char types). This moved the error from build time to runtime. The dynamic keyword told the build time processes to relax, we know what we are doing :-). Whoops. ↩︎