Communication in any format requires some form of standard understanding between the communicating parties. In some cases, these standards may be living standards. This is the case for human languages, for example, as they change over time. In the world of computing, however, standards tend to be more set in stone. This doesn’t necessarily mean that every standard agrees though.
A great example of this is with endianness. Data is technically stored at the bit level but typically accessed at the byte level. Endianness is a description of the order of a sequence of bytes. There are two main types: big endian and little endian. In a big-endian system, the most significant byte is stored at the lowest memory address, while the least significant byte is stored at the highest memory address. Little-endian systems do the reverse, with the least significant byte at the lowest memory address and the most significant byte at the highest memory address.
Tip: Endianness can also refer to the order of bits being transmitted over a communications channel. The naming convention remains the same. Big endian communications transmit the most significant bits first.
An example
Let’s say we want to store the string “John” and that this string is represented by the hexadecimal ASCII representation. J is 4a, o is 6f, h is 68, and n is 6e. In order, that would be represented as 0x416f686e.
Note: The 0x at the start just indicates that the string is hexadecimal.
On a big-endian system, the most significant byte is in the lowest memory address. That means that the first memory address stores “4a” representing “J”.
Big-endian |
Increasing memory addresses –> |
|||
Hex | 41 | 6f | 68 | 6e |
ASCII | J | o | h | n |
This all seems very straightforward and easy to understand. Especially to people whose native language is written left to right. Little endian does the exact opposite. The least significant byte is stored in the lowest memory address.
Little-endian |
Increasing memory addresses –> |
|||
Hex | 6e | 68 | 6f | 41 |
ASCII | n | h | o | J |
Again, to native speakers of languages written left to right, this will look wrong. It can make more sense to native speakers of languages written right to left though.
Helpfully, a few systems use a middle-endian system which muddies the waters further. In hexadecimal, each pair of hex characters represents a byte. This means that “John” takes four bytes to store, that’s 32 bits. Some systems, like the PDP-11 natively use 16-bit words and store these in the little-endian format. To represent a 32-bit word, it combines two 16-bit words, but it does so in big-endian format. This means that you get the following.
Middle-endian |
Increasing memory addresses –> |
|||
Hex | 6f | 41 | 6e | 68 |
ASCII | o | J | n | h |
Now nobody is happy.
Why is this an issue
First things first, it’s important that the storage device and the device interpreting the data both use the same system for ordering data. As much as the little-endian system stores “John” as “nhoJ”, the processor knows this and knows how to interpret this when displaying it to the screen, for example. This actually oversimplifies things though. CPUs are designed to operate in one way or the other, though some have the ability to do both separately. Storage doesn’t actually care what order data is stored in at all. It’s down to software to ensure that the data is presented to the CPU in the right way.
Some operations are more favourably performed by starting from the most or least significant bytes. In modern computers with large word sizes where multiple bytes are fetched from RAM at the same time, these issues are less severe as the whole operand can be accessed at the same time. Smaller microprocessors, however, may end up requesting individual bytes and so may see more of a performance impact from endianness.
For example, when adding two numbers together, the computer starts from the least significant bit and carries any remainder up a bit. To that end it is most efficient to request the least significant bit first, as would be done in a little-endian system. the same holds for subtraction and multiplication too. Comparison and division operations, however, start from the most significant bit and propagate any carried bit down. For this, it is most efficient to request the most significant bit first, as done in a big-endian system. In systems that can’t access the whole operand in one transaction, endianness has a performance impact.
Conclusion
Endianness is a description of the ordering of data. In a big-endian system, the most significant byte is stored in the lowest memory address and is thus accessed first. In a little-endian system, the least significant byte is stored in the lowest memory address and thus addressed first. Different operations are more suitable for different endian systems. Addition starts from the least significant bit while division starts from the most significant bit. In modern systems with large word sizes, the performance issues are generally moot but they can still affect embedded systems. x86-64 CPUs are little-endian.
Did this help? Let us know!