# Scientific Applications

### Further Features

This exercise will illustrate some further features of Excel. You will need to explore some of the facilities of Excel using the hints given below. It illustrates some simple approaches for the graphical and stastical analysis of data, and provides some experience in graphics, using functions and add-ins in Excel.

The spreadsheet sales.xls is taken from US economic statistics between 1979 and 1989 and attempts to relate sales to income and employment. The data is arranged in seven columns, A to G, as follows.

A TIME: Quarter, from 1st quarter 1979 to 4th quarter 1989 B WASA: National income wage and salary disbursements ($ billions) C EMPL: Employees on payrolls of non-agricultural establishments (thousands) D BLDG: Building material dealer sales ($ millions) E AUTO: Automotive dealer sales ($ millions) F FURN: Furniture and home furnishings dealer sales ($ millions) G GMER: General merchandise dealer sales ($ millions)

- In column I, place the month and year of the start of each quarter as
follows.
- In cell I4 enter 01/79. If correctly done, this will be represented by Jan-79 in the spreadsheet.
- Use the EDATE function to fill cell I5 to I47 with successive dates 3 months apart (you will need to work out how to use this function, in cell I47 you should have the date Oct-89).
- Change the format of the cells I4 to I47 from "Jan-79" to "January 1979" up to "October 1989" (you will need to use the "format" menu item and select the "custom" option and may need to experiment with the result).

- Copy columns B to G to columns J to O, so that the new dates are to the left of the main data.
- Using the Data Analysis Add-ins, and the Correlation option, produce a correlation matrix between the 6 variables (WASA to GMER). Place this correlation matrix in a portion of the spreadsheet starting at cell B2, and print out this information (the data should be in cells B4 to H10, you need to select this region and when printing use the "Selection" option).

This printout should be included in your answer.

- Which two variables have the highest correlation coefficient? These represent
the two variables are follow the closest commontrends. Produce a scatter plot of
these variables. Produce this on a separate sheet (as a chart), and change the
appearance as follows.
- Change the scales of both axes so that the minimum values are slightly below the true minima of each variable.
- Remove gridlines, the box around the chart, change the background to white, remove the legend.
- Change the symbols to blue crosses that are 8 points in size.
- Produce a title to the graph, explaining what the variables are, making the font 14 point bold italic and Arial.

This printout should be included in your answer.

- Produce a similar graph for the variables with the
*lowest*correlation coefficient.

This printout should be included in your answer.

- A histogram of the first variable against time is given below.

(click for larger version)

Produce this histogram in Excel from your data, taking care to reproduce all the features above including axes, labelling and thick bars. You may need to spend some while experimenting with the features in Excel to obtain such a histogram.

Printout the graph.

This printout should be included in your answer.

- Produce similar histograms for all the other 5 variables, remembering to choose sensible scales for the vertical axes. An efficient way to do this is to make a copy of the chart (using "Edit" and "Copy Sheet") then change the data using the Graph Wizard, finally change the vertical scale and the title. Make sure the graphs are saved, you will be asked to print them out in a single Word document below.
- Open a new document using word, and copy the six graphs obtained in steps 6 and 7 into Word, using "Paste Special" rather than "Paste". Arrange these graphs three to a page (you may need to change the size), and print out the Word document containing all six graphs.

This printout should be included in your answer.

- Use a linear model to predict the sales of furniture as a function of
employment. You have already encountered regression tools in previous
exercises.
- Select the "Regression" option from the "Data Analysis" Tools.
- Select "Furniture" and the y and "Employment" as the x variables.
- When performing regression, do not use "constant =0". Select "Residuals" and "Line Fit Plot" which will provide additional information you are asked to use below.
- Make sure that the results do not overwrite the original data.
- Improve the appearance of the "Line Fit Plot". You may wish to select it and then using the chart wizard create a new Sheet. You will wish to change axes, symbols etc.
- What are the following: -
- Equation for best fit straight line;
- the root mean square residual [you will need to use the SQRT and SUMSQ functions in Excel]?

A printout from question (e) should be included in your answer. In addition, include also the answers to (f).

- Calculate the average of each of the 6 variables using the AVERAGE
function. Then, in six separate columns, one for each variable, display the
word "yes" if a variable is above its average or "no" if below. You will need
to use the IF function and also are likely to have to use $ notation, although
there are a number of ways of doing this. Remember not to overwrite the existing
information. Label the columns (with the variable names) and the rows with the
date as obtained in step 1. Create a final column to the right, which counts
the number of variables that are above average (a number between 0 and 6), you
will need to use the COUNTIF function.

Print out the portion of the spreadsheet which should consist of 8 columns (1 containing the date, 6 containing the words "yes" or "no" and 1 containing a number between 0 and 6, together with a row that gives a title to each column).

This printout should be included in your answer.

- If you have not already done so, you are advised to copy the original
data into a new worksheet at this point. The aim of this exercise is to look
at seasonal trends.
- Place the data starting in cell A2 of the new sheet, so that A2 corresponds to the date Jan 1979 and B2 should contain the number 1193.3.
- Copy the column labels from cells B1 to G1 to cells I1 to N1, and the dates from cells A2 to A45 to cells H2 to H45.
- In cell I2 calculate the difference between the Jan WASA for 1979 and the annual average of WASA in 1979, in cell I3 the difference between the Apr WASA and the annual average in 1979, in cell I4, the July WASA and the annual average and cell I5 the Oct WASA and the annual average.
- Check that these figures are correct and if so it is simply necessary to extend the range to 6 columns and 44 rows, so that, for example, N45 contains the difference between the Oct GMER for 1989 and the average over that year.

Printout this report and include in your write up.