# TIL#01: Why Pointers Have Data Types

Ever since started learning programming in high school and knowing C & C++, one thing has always bugged me, that is, why do pointers have data types?

In C & C++, when we need to declare a pointer we normally do it like this:
```cpp
int b = 10;
int* a = &b;
```
Which can be explained as: `a` is a pointer to a variable `b`, while `b` has `int` as its data type. 

That explains all of it, doesn't it?
Well I might say we're missing something here, why does `a` need to have `int` as its data type too? Does that mean `b`'s address is of type `int`? Or does that mean `a` is pointing to an `int`? If that's the case then why write the data type at all for `a`? Why not just automatically infer it from `b`'s data type?

It gets even weirder once you know that the amount of space a pointer requires is the same regardless of its type, it only depends on whether the system is 32-bit or 64-bit. If all pointer's size is the same regardless of the data type it points to, then why don't we have a generic `pointer` data type?

## The Answer
There are several answers I found online, but most of them tends to point to one thing: *dereferencing*.

**When we dereference a pointer, the program needs to know how much bytes to read from the memory, and it determines that by looking at the pointer's data type.**

> If you need a reference on how many bytes each data type in C++ consumes, see [here](https://www.geeksforgeeks.org/c-data-types/). Though keep in mind that each data type's size might vary from compiler to compiler.

Now let's assume we have an `int` data type that is 4 bytes long on the memory and a `char` data type that is 1 byte long. Let's say we have the following example:
```cpp
int a = 150; // <-- a is 4 bytes long on the memory
int* b = &a; // <-- b is a pointer to int
int c = *b; // <-- c reads 4 bytes starting from a's address
std::cout << c; // <-- Prints 150
```

Basically at the moment the program needs to declare `c`, it looks at `b`'s data type and found out that it's a pointer to an `int`, it figures out it needs to read `sizeof(int)` bytes starting from the address that is held by `b`, which is `a`'s address.

We can, however, mess around and convert `b`'s type from pointer to `int` to pointer to `char` just before we dereference and assign it to `c`. In which case, it should give us a broken character:
```cpp
int a = 1045; // <-- a is 4 bytes long on the memory
int* b = &a; // <-- b is a pointer to int
char c = *(char*)b; // <-- before dereferencing b we convert it to be a pointer to char, thus now c reads only 1 byte starting from a's address
std::cout << c << "\n"; // <-- Prints a broken character (�)
std::cout << *b << "\n"; // <-- Prints 1045
```
This happens because `c` contains only 1/4 of the data that is associated with `a` (because `char` is 1 byte and `int` is 4 bytes).

---

Another case where this behavior comes into play is during pointer arithmetic operations. Consider the following example:
```cpp
int a[] = {1045, 2021, 3012};
int* ptr = a; // <-- ptr points to the address of the first element in a

// loop through every element of a
for (int i = 0; i < sizeof(a) / sizeof(a[0]); i++) { 
  std::cout << *ptr << " ";
  ptr++; // <-- ptr jumps 4 bytes from the current address
}
```

The output is:
```bash
1045 2021 3012
```

But what if I mess around with the pointer used in the loop?
```cpp
int a[] = {1045, 2021, 3012};
int* ptr = a; // <-- ptr points to the address of the first element in a
char* ptr2 = (char*) ptr; // <-- ptr (converted to pointer to char) is assigned to ptr2

// loop through every element of a
for (int i = 0; i < sizeof(a) / sizeof(a[0]); i++) {
  std::cout << (int) *ptr2 << " ";
  ptr2++; // <-- ptr2 jumps to the next byte
}
```

The output will be:
```bash
21 4 0
```

If we take a look at the binary representation of the first element of `a` (1045) which is stored as 4 bytes long on the memory, we'll get: 
```
00000000 00000000 00000100 00010101
```

Which if we convert each byte into an integer, we'll get:
```
0 0 4 21
```

We would notice that we got that output because `ptr2` only jumps one byte at a time because it's a pointer to `char` and the loop only runs 3 times, thus it only get to the third byte of the first element of `a` which is stored as 4 bytes long on the memory, and never really get to the next element in array `a`.

### But What about `void*`?
According to [cppreference.com](https://cppreference.com), `void` is "an incomplete type that cannot be completed". 

But what is an "incomplete type"?
Well, according to [microsoft](https://docs.microsoft.com/en-us/cpp/c-language/incomplete-types?view=msvc-170#:~:text=An%20incomplete%20type%20is%20a%20type%20that%20describes%20an%20identifier%20but%20lacks%20information%20needed%20to%20determine%20the%20size%20of%20the%20identifier.), it's a type that describes an identifier but lacks the information needed to determine the size of the identifier.

Basically, you'll get an error if you try to run the following piece of code:
```cpp
int a = 1045;
void* b = (void*) a;
std::cout << *b << "\n";
```

This makes sense, because:

If pointers need to have data type because dereferencing them requires the program to know the size of the data it points to on the memory. Then a pointer to `void` cannot be dereferenced because the size of the data it points to cannot be determined by the program.

And similarly, since pointer arithmetic requires the program to know the amount of space to be jumped over on the memory, we won't be able to do it with a pointer to void:
```cpp
int a[] = {1045, 2021, 3012};
int* ptr = a; // <-- ptr points to the address of the first element in a
void* ptr2 = (void*) ptr; // <-- ptr (converted to pointer to void) is assigned to ptr2
  
// loop through every element of a
for (int i = 0; i < sizeof(a) / sizeof(a[0]); i++) {
  ptr2++; // <-- error
}
```
