Scientific Applications

Use of Excel, a common spreadsheet, under Windows

This exercise demonstrates a number of features of Excel. It assumes some familiarity with the Windows operating system, including file management and storage. If you are not familiar with Windows you should ask a demonstrator.

This handout is for users of Windows NT, Windows 95 or higher.

Some data for this example has already been generated, and is in the file EXPT1.

The first step is to read this information into Excel. There are various ways of doing this. One method is to use the File menu, normally located at the top lefthandside of the Excel worksheet. When you choose to Open the RDATA file, you may well find that the spreadsheet cannot find this file, even if you search the appropriate directory. The reason for this is that the data, in common with most data, is stored in text format. To find this file, you should select All File types. The RDATA file can then be found, and opened.

In order to correctly import the data, you need to recognise that the numbers are delimited, i.e. invisible characters are between each number. The spreadsheet has to interpret this, and only by choosing the correct option will Excel correctly interpret the data.

You are advised to save this file immediately, and regularly save the spreadsheet as you carry on through this exercise. Choose a new name and place it on your disc.

Fitting a straight line

The first task we will perform involves fitting a straight line. If correctly opened, 20 x values should be in the cells A1 to A20, and 20 y values in B1 to B20. By choosing the Tools, then Data Analysis and then Regression menus, you should be given a screen that allows you to perform linear regression. Select the relevant x and y ranges. For output, specify an output range starting at F1. No other options need to be selected.

If correctly performed, the results of regression should be an equation that links y to x of the form y = a + b x.

Look at the output. The values for a and b are represented by the "Intercept" and the "x value" (= slope) respectively.

1) What cells are these numbers in?

2) What is the best fit regression equation?

A common step is to use this equation to predict the values of y from x. In order to study this, a new column, C1 to C20, will be set up. Using the values of a and b, calculate this column as follows. In cell C1 place an equation. In order that the spreadsheet recognises the equation you must precede the entry with an equals sign. Then type the cell address for the intercept (a), a + sign, the cell address for the alope (b) and * sign (indicating multiplication), and the cell address for the first x value (which is A1). The intercept and slope cell addresses must be preceded with $ signs. An address of the form $A$1 means that, if the equation is copied anywhere, row 1 and column A is retained. If the address is specified as A1 rather than $A$1, and the equation is copied, the rows and columns change relative to the first entry. Once the equation is correctly entered into cell C1, you can then copy this equation down the column to C20 by pulling at the cell boundary with the mouse.

3) What are the predicted values of y when x = 1 and when x = 12?

If performed correctly, the contents of columns B and C should be roughly equal.

4) Why are these columns not exactly equal?

The next step is to see how well the straight line is predicted. The residual is the difference between the observed and predicted value of y. Calculate this by subtracting column C from column B and placing this in column D. (Hint: use an equation). If correctly performed, the numbers in column D should be fairly small.

The size of the difference between predicted and observed values of y is a good indicator of how well the regression has worked, and it is common to calculate the root mean square error. This is done by summing the squares of the the 20 values in D, dividing by 20 and taking the square root. In order to do this we need some special functions. Excel, in common with most spreadsheets, has a variety of mathematical functions. The SUMSQ and SQRT functions are useful here. In order to choose a range of numbers, you use the symbol ":". For example, typing "=AVERAGE(A1:A20)", places the average of cells A1 to A20 into the destination cell. Calculate the root mean square error.

5) What is the root mean square error?

Graphical display of results

Plotting a graph of the data is also useful. There are a variety of ways of doing this. The simplest is by selecting the "Chart wizard" in the toolbar. If the icon is not visible, it is necessary to add the appropriate toolbar, which can be done in the View menu, selecting Toolbars.

After selecting the chart wizard, move to any portion of the spreadsheet, and select an area. Then you will be asked a number of questions. Select the data from A1 to C20. The first column will form the x axis, the second column is the observed y data, and the third the predicted y data. Select a scatter plot, and then select a graph type that contains symbols with straight lines connecting the points. Finally, do not add a legend. After leaving the chart wizard you should have produced a graph on the screen.

The graph is not in the exact format required. In order to improve the graph, you can double click on it and its parts. This takes some practice, but it is easy to learn, simply by experimenting by clicking on various portions of the graph. It may take you fifteen minutes to work out how to do this correctly, but once you learn it is quite easy. If you find it hard to select a symbol exactly, the screen magnification can be changed e.g. to 200%; remember to return the screen to normal magnification aterwards. You are asked to make various changes. (a) Change all the symbols and lines to black. (b) Remove the line joining up the points B1 to B20, and change the symbols to crosses for points B1 to B20. (c) Remove the symbols joining up points C1 to C20 but keep the line for these points. (d) Change the background to none rather than grey.

The graph can be printed out by selecting the chart and then choosing the print option.

6) Print out the graph you have produced.

Before you proceed further, save the spreadsheet, in case of any accidental deletion or computer error.

Exchange of graphs between Excel and Word

A graph in Excel can be pasted into another package, such as Word.

In order to do this, open up Word. You should be able to have both Excel and Word open simultaneously. Some familiarity with the Windows operating system will be assumed; if you have difficulties, ask a demonstrator for help at this point.

When in Word, type a first line as follows
   THIS IS AN EXAMPLE OF A CHART OBJECT FROM EXCEL.

Save this document, consisting of one line of text, as a file called "a1" on your disc (if in Windows 3.11 you should save it as a1.doc). From File manager you should be able to find out the size of this file.

7) What is the size of the file?

Now return to Excel, keeping Word open at the same time. Select the graph and Copy it. The graph is stored in a temporary file called the Clipboard, which you do not normally display on the screen. Then return to word, and edit the existing file by Pasting the graph after the first sentence. Save this file as "a2" (or a2.doc in Windows 3.11), using the Save as command in the File menu.

8) What is the size of file a2?

The file size has increased dramatically, because the picture takes up a lot of space. However, we have unnecessarily increased the size of the document. In order to be more economical in space, delete the picture from the currently open Word file. Then, select the Paste Special rather than Paste option. You will be presented with some options. Choose the "Picture" option, and click OK. Now save the new file as "a3" (or a3.doc in Windows 3.11).

9) What is the size of file a3?

The new file is smaller than the previous one because the only information transferred is the actual graph. An Excel chart object is linked to the original spreadsheet, so, if the spreadsheet is changed so is the Word document. This latter facility can be extremely useful. For example it might be possible to prepare a report containing a variety of different types of graph. As the dataset is changed, each graph changes correspondingly both in Word and Excel without the need to repeatedly cut and paste, but this flexibility results in a large increase in disc storage space.

A lot of the commands above are timeconsuming to perform, and can be automated.

Use of Macros

In order to automate commands, you use the Macro facility. In order to enter this you should select the Record Macro facility from the Tools menu. Once you do this, every command on the screen will be recorded and form part of a small program. You should give the macro a name which in effect corresponds to the name of a procedure called by Excel.

An interesting plot is the residual plot, using the numbers in cells D1 to D20. Enter the chart wizard, find a convneient portion of the screen to display the graph, select cells A1 to A20 and D1 to D20 (type a1:a20,d1:d20 when asked for the Range). Then select the scatter plot option, and graph type 1. Do not add a legend, but give the graph a title "Residual plot". An appropriate graph should then be displayed on the screen. Stop recording the macro.

A module, probably called "Module1", will be created. This can be found by selecting the Visual Basic Editor from the Macro submenu in the Tools menu. A small program, written in the language Visual Basic, will be displayed. If you get into a mess, or want to change the macro, it is easy to delete this program simply by entering Module1, and selecting the text and deleting it. You can then start again and recreate the macro.

To return to the worksheet from the module, simply select Sheet1. You can interchange between these two screens whenever you wish.

10) When you are satisfied with the macro, print off the small program.

In order to demonstrate the program in use, delete the residual plot. Then select the Macro facility again, but this time run the macro you have created. The plot should be correctly created.

Spend some time improving the two graphs you have on the screen, changing symbols, background, title, axes, legend etc. as you wish to make the graphs look clear. Remember that the graphs will eventually be printed on a black and white printer (see below), so only use black and white otherwise there may be some unpredictable printing problems.

Writing up the results

Write up your answers to the exercises above in Word. In the answer, paste the final graphs for the regression and residual plots.

Hand in the following: -

  1. Print out of answers to the questions above from Word including the two graphs.
  2. Plot of the first regression chart obtained directly from Excel.
  3. The short macro program.