What will we cover? |
---|
The definition of data, the different types of data from simple characters and numbers through collections and how to define your own data types. |
Data is one of those terms that everyone uses but few really understand.
My dictionary defines it as:
Data: "facts or figures from which conclusions can be inferred; information"
That's not much better but at least gives a starting point.
Data is the stuff that your program manipulates. Without data a program cannot usefully exist. Programs manipulate data in many ways, often depending on the type of the data. And it comes in many types:
We've already seen these. They are literally any string or sequence of characters that can be printed on your screen. (In fact there can even be non-printable control characters too).
In Python, strings can be represented in several ways:
With single quotes:
'Here is a string'
With double quotes:
"Here is a very similar string"
With triple double quotes:
""" Here is a very long string that can if we wish span several lines and Python will preserve the lines as we type them..."""
One special use of the latter form is to build in documentation for Python functions that we create ourselves - we'll see this later.
You can access the individual characters in a string by treating it as an array of characters (see arrays below). There are also usually some operations provided by the programming language to help you manipulate strings - find a sub string, join two strings, copy one to another etc.
Whole numbers from a large negative value through to a large positive value. The size of this value is known as MAXINT and depends on the number of bits used on your computer to represent a number. On most current computers it's 32 bits so MAXINT is around 2 billion.
You can also get unsigned integers which basically are positive and zero only. Thus there is a bigger maximum number of around 2 * MAXINT or 4 billion on a 32 bit computer.
Because integers are restricted in size to MAXINT adding two integers together where the total is greater than MAXINT causes the total to be wrong. On some systems/languages the wrong value is just returned as is (usually with some kind of secret flag raised that you can test if you think it might have ben set). Normally an error condition is raised and either your program can handle the error or the program will exit. Python adopts this latter approach while Tcl adopts the former. BASIC throws an error but provides no way to catch it (at least I don't know how!)
These are fractions. They can represent very large numbers, much bigger than MAXINT, but with less precision. That is to say that 2 real numbers which should be identical may not seem to be when compared by the computer. This is because the computer only approximates some of the lowest details. Thus 4.0 could be represented by the computer as 3.9999999.... or 4.000000....01. These approximations are close enough for most purposes but occasionally they become important! If you get a funny result when using real numbers, bear this in mind.
If you have a scientific or mathematical background you may be wondering about complex numbers? If you aren't you may not even have heard of complex numbers! Anyhow some programming languages, notably Fortran, provide builtin support for the complex type but most, like Python, provide a library of functions which can operate on complex numbers. And before you ask, the same applies to matrices too.
Like the heading says, this type has only 2 values - either true or false. Some languages support boolean values directly, others use a convention whereby some numeric value (often 0) represents false and another (often 1 or -1) represents true.
Boolean values are sometimes known as "truth values" because they are used to test whether something is true or not. For example if you write a program to backup all the files in a directory you might backup each file then ask the operating system for the name of the next file. If there are no more files to save it will return an empty string. You can then test to see if the name is an empty string and store the result as a boolean value (true if it is empty). You'll see how we would use that result later on in the course.
Computer science has built a whole discipline around studying collections and their various behaviours. Some of the names you might see are:
>>> dict = {} >>> dict['boolean'] = "A data item whose value can be either true or false" >>> dict['integer'] = "A whole number" >>> print dict['boolean']
There's a whole bunch of others but these are the main ones that we deal with. (In fact we'll only be dealing with some of these!)
As a computer user you know all about files - the very basis of nearly everything we do with computers. It should be no surprise then, to discover that most programming languages provide a special file type of data. However files and the processing of them are so important that I will defer discussing them till later when they get a whole section to themselves.
Dates and times are often given dedicated types in programming. At other times they are simply represented as a large number (typically the number of seconds from some arbitrary date/time!). In other cases the data type is what is known as a complex type as described in the next section. This usually makes it easier to extract the month, day, hour etc.
Sometimes the basic types described above are inadequate even when combined in collections. Sometimes what we want to do is group several bits of data together then treat it as a single item. An example might be the description of an address: a house number, a street and a town. Finally there's the post code or zip code. Most languages allow us to group such information together in a record or structure.
In BASIC such a record definition looks like:
Type Address Hs_Number AS INTEGER Street AS STRING Town AS STRING Zip_Code AS STRING End Type
In Python its a little different:
class Address: def __init__(self, Hs, St, Town, Zip): self.Hs_Number = Hs self.Street = St self.Town = Town self.Zip_Code = Zip
That may look a little arcane but don't worry it will make sense soon.
We'll look at how to use these structures in the next section on Variables.
Data is stored in the memory of your computer. You can liken this to the big wall full of boxes used in mail rooms to sort the mail. You can put a letter in any box but unless the boxes are labelled with the destination address its pretty meaningless. Variables are the labels on the boxes in your computer's memory.
Knowing what data is is OK, but what can we do with it? In programming terms we can create instances of data objects and assign them to variables. A variable is a reference to a specific area somewhere in the computers memory. These areas hold the data. In some computer languages a variable must match the type of data that it points to. eg in BASIC we declare a string variable by putting a $ at the end of the name:
DIM MYSTRING$ MYSTRING$ = "Here is a string"
Here DIM MYSTRING$ creates the label and specifies that it will hold a string ( because of the $ sign). The MYSTRING$ = "Here..." line creates the actual data and puts it in the bit of memory labelled MYSTRING$.
Similarly we declare an integer by putting a % at the end:
DIM MYINT% MYINT% = 7In Python and Tcl a variable takes the type of the data assigned to it. It will keep that type and Python or Tcl will warn if you try to mix data in strange ways - like trying to add a string to a number. (Recall the example error message earlier?). We can change the type of data that a Python variable points to by reassigning the variable.
>>> q = 7 >>> print 2*q 14 >>> q = "Seven" >>> print 2*q SevenSeven
Note that q was set to point to 7 initially. It maintained that value until we made it point at "Seven". Thus, Python variables maintain the type of whatever they point to, but we can change what they point to simply by reassigning the variable. At that point the original data is 'lost' and Python will erase it from memory (unless another variable points at it too) this is known as garbage collection. (Garbage collection can be likened to the mailroom clerk who comes round once in a while and removes any packets that are in boxes with no labels. If he can't find an owner or address on the packets he throws them in the garbage!)
BASIC will not allow you to do this. If a variable is a string variable (terminated with a $) you cannot ever assign a number to it. Similarly, if it is an integer variable (ends in %) you cannot assign a string to it. BASIC does allow 'anonymous variables' that don't end in anything. These can only store numbers however, either real or integer numbers but only numbers.
One final gotcha with integer variables in BASIC:
i% = 7 PRINT 2 * i% i% = 4.5 PRINT 2 * i%
Notice that the assignment of 4.5 to i% seemed to work but only the integer part was actually assigned. This is reminiscent of the way Python dealt with division of integers. All programming languages have their own little idiosyncracies like this!
We can assign a complex data type to a variable too, but to access the individual fields of the type we must use some special access mechanism (which will be defined by the language). Usually this is a dot.
To consider the case of the address type we defined above we would do this in BASIC:
DIM Add AS Address Add.Hs_Number = 7 Add.Street = "High St" Add.Town = "Anytown" Add.Zip_Code = "123 456" PRINT Add.Hs_Number," ",Add.Street
And in Python:
Add = Address(7,"High St","Anytown","123 456") print Add.Hs_Number, Add.Street
Let's see what we can do with variables now that we know what they are and how to create them.
Points to remember |
---|
|
Previous  Next  Contents
If you have any questions or feedback on this page
send me mail at:
alan_gauld@xoommail.com