Scientific Applications

Further Features

This exercise will illustrate some further features of Excel. You will need to explore some of the facilities of Excel using the hints given below. It illustrates some simple approaches for the graphical and stastical analysis of data, and provides some experience in graphics, using functions and add-ins in Excel.

The spreadsheet sales.xls is taken from US economic statistics between 1979 and 1989 and attempts to relate sales to income and employment. The data is arranged in seven columns, A to G, as follows.

```A    TIME: Quarter, from 1st quarter 1979 to 4th quarter 1989
B    WASA: National income wage and salary disbursements (\$ billions)
C    EMPL: Employees on payrolls of non-agricultural establishments (thousands)
D    BLDG: Building material dealer sales (\$ millions)
E    AUTO: Automotive dealer sales (\$ millions)
F    FURN: Furniture and home furnishings dealer sales (\$ millions)
G    GMER: General merchandise dealer sales (\$ millions)
```

1. In column I, place the month and year of the start of each quarter as follows.
1. In cell I4 enter 01/79. If correctly done, this will be represented by Jan-79 in the spreadsheet.
2. Use the EDATE function to fill cell I5 to I47 with successive dates 3 months apart (you will need to work out how to use this function, in cell I47 you should have the date Oct-89).
3. Change the format of the cells I4 to I47 from "Jan-79" to "January 1979" up to "October 1989" (you will need to use the "format" menu item and select the "custom" option and may need to experiment with the result).

2. Copy columns B to G to columns J to O, so that the new dates are to the left of the main data.

3. Using the Data Analysis Add-ins, and the Correlation option, produce a correlation matrix between the 6 variables (WASA to GMER). Place this correlation matrix in a portion of the spreadsheet starting at cell B2, and print out this information (the data should be in cells B4 to H10, you need to select this region and when printing use the "Selection" option).

1. Which two variables have the highest correlation coefficient? These represent the two variables are follow the closest commontrends. Produce a scatter plot of these variables. Produce this on a separate sheet (as a chart), and change the appearance as follows.
1. Change the scales of both axes so that the minimum values are slightly below the true minima of each variable.
2. Remove gridlines, the box around the chart, change the background to white, remove the legend.
3. Change the symbols to blue crosses that are 8 points in size.
4. Produce a title to the graph, explaining what the variables are, making the font 14 point bold italic and Arial.
Printout this graph.

1. Produce a similar graph for the variables with the lowest correlation coefficient.

1. A histogram of the first variable against time is given below.

(click for larger version)

Produce this histogram in Excel from your data, taking care to reproduce all the features above including axes, labelling and thick bars. You may need to spend some while experimenting with the features in Excel to obtain such a histogram.
Printout the graph.

1. Produce similar histograms for all the other 5 variables, remembering to choose sensible scales for the vertical axes. An efficient way to do this is to make a copy of the chart (using "Edit" and "Copy Sheet") then change the data using the Graph Wizard, finally change the vertical scale and the title. Make sure the graphs are saved, you will be asked to print them out in a single Word document below.
2. Open a new document using word, and copy the six graphs obtained in steps 6 and 7 into Word, using "Paste Special" rather than "Paste". Arrange these graphs three to a page (you may need to change the size), and print out the Word document containing all six graphs.

1. Use a linear model to predict the sales of furniture as a function of employment. You have already encountered regression tools in previous exercises.
1. Select the "Regression" option from the "Data Analysis" Tools.
2. Select "Furniture" and the y and "Employment" as the x variables.
3. When performing regression, do not use "constant =0". Select "Residuals" and "Line Fit Plot" which will provide additional information you are asked to use below.
4. Make sure that the results do not overwrite the original data.
5. Improve the appearance of the "Line Fit Plot". You may wish to select it and then using the chart wizard create a new Sheet. You will wish to change axes, symbols etc.
6. What are the following: -
1. Equation for best fit straight line;
2. the root mean square residual [you will need to use the SQRT and SUMSQ functions in Excel]?

1. Calculate the average of each of the 6 variables using the AVERAGE function. Then, in six separate columns, one for each variable, display the word "yes" if a variable is above its average or "no" if below. You will need to use the IF function and also are likely to have to use \$ notation, although there are a number of ways of doing this. Remember not to overwrite the existing information. Label the columns (with the variable names) and the rows with the date as obtained in step 1. Create a final column to the right, which counts the number of variables that are above average (a number between 0 and 6), you will need to use the COUNTIF function.
Print out the portion of the spreadsheet which should consist of 8 columns (1 containing the date, 6 containing the words "yes" or "no" and 1 containing a number between 0 and 6, together with a row that gives a title to each column).