Home > Articles > Operating Systems, Server > Linux/UNIX/Open Source

  • Print
  • + Share This
Like this article? We recommend

Everything Is Text

Everything inside a file is a stream of bytes. That's fine for text. Well, it would be fine for text if everyone lived in the USA and only ever needed to use 7-bit ASCII. Unfortunately, some people live elsewhere or have different character-set requirements. All those folks need some sort of mechanism for describing what character set they're using.

The assumption that everything is—and should be—text permeates the UNIX design. Except in a very few cases, commands produce text and expect to produce text. This is rather a strange design choice, considering the other UNIX philosophy of putting everything into the shell, even if it belongs in a shared library. A more intelligent approach would be for the commands to produce typed binary data and have the shell display it, if it were intended for display.

Consider the ls command, which lists the contents of a directory. If you wanted the contents sorted in a case-insensitive way, you would pipe the output into sort. Now imagine that you want the output sorted by file size. You can make ls display the file size, and then tell sort to sort it by that column. This is fine, except that then the file sizes are all in bytes (or sometimes allocation units, usually of 512 bytes, depending on your UNIX variant). This is not very human-readable, so you tell ls to output the size in human-readable format—in bytes, kilobytes, megabytes, etc. Unfortunately, sort doesn't understand that 1MB is bigger than 6KB, so it sorts everything into a silly order. On the other hand, if ls would output a header defining its output as a set of columns with names and types, then you could tell sort to sort by the column called size, and tell your shell to translate the size into a human-readable form.

  • + Share This
  • 🔖 Save To Your Account