Hello,
I have written a short draft which I am going to use in my report. I am not native English speaker and I make grammatical errors in my writing. Could someone help me to check grammar in this draft please? Thanks!
Report Proposal
Topic D: Weather Forecasts
Step 1
Before we do data analysis in this report we need to be certain we understand the questions we need to answer. Two recommended questions are provided in the report’s instructions.
There are several issues needed to be addressed first before answering the questions.
We need some data for this analysis. We need to know how we are going to collect the data and for which regions we are going to collect the data. Also, we need to know who the authority on providing accurate daily temperature forecasts is. What is the population on which we going to project our inferences. How are we are going to produce samples of data for our analysis: ourselves or use the data from another source. Whether or not the data samples can be selected randomly. What sample size we are going to use. If two data samples can be obtained from two independent sources. So we need to design an experiment to collect the data making sure it’s accurate, relevant and does not violate any of the assumptions of our analysis.
The first question we are going to answer states that we need to determine if forecast maximum daily temperatures significantly different from true maximum temperatures. How do we determine what difference in these temperatures is significant? Do we use a conservative approach or a larger difference in temperatures is acceptable for this analysis? Similarly, we need to decide what of level of significance to use in our hypothesis tests.
Once accurate and relevant data samples are collected and the calculations are carried out we then need to interpret our findings and draw a conclusion. The following questions need to be considered: can we use our findings for the future temperature forecasts in the two regions? For immediate future only, for the next 10 years? Can we assume our findings are applicable to the period when temperature forecasts were at an early stage in the two regions of interest?
Furthermore, if it is found that the proportions of correct forecast temperatures differ for the two regions significantly then what do we do next? We assume this fact or do we investigate further to determine the cause for this difference?
Step 2
In this hypothesis test I will use a conservative measure with the average of differences of 2 degrees Celsius or more (=2) between forecast maximum daily temperatures and true maximum temperatures to be considered significantly different.
In hypothesis test I am going to use 5% significance level to determine wether the average of differences between forecast maximum daily temperatures and true daily maximum temperatures recorded for each region is significantly different.
Assume the average of differences between the temperatures is equal to or less than 2 degrees in Celsius.
H0: µ = 2
H1: µ > 2
µ is the average of differences between forecast maximum daily temperatures and true daily maximum temperatures recorded for each region.
To determine if proportions of correct forecast temperatures (to within 1 degree Celsius) are the same between the two regions I will use the following hypothesis test:
H0: p1 = p2 => H0: p1 - p2 = 0
H1: p1 ? p2 => H1: p1 - p2 ? 0
p is proportion of the population of correct forecast temperatures to within 1 degree Celsius.
Furthermore, I will calculate 99% confidence intervals for the average of differences between forecast maximum daily temperatures and true daily maximum temperatures recorded for each region.
The assumptions for the hypothesis tests are:
Two data samples are randomly selected;
Two data samples are independent. Two sets of data were produced by two different meteorological stations for two different regions;
Two sampling distributions of the means for temperature differences are approximately normally distributed. By Central Limit Theorem this condition will be met if the samples’ sizes are sufficiently large (n = 30). Our samples’ size will be at least 30.
Step 3
In this report I will use data I have obtained from the Australian Bureau of Meteorology. The Bureau keeps archives of historical data on climate statistics for various locations around Australia and provides it to the public on request.
In Australia there are regional meteorological stations responsible for reporting weather forecasts for their respective regions. I obtained the data for Adelaide metropolitan area produced by Kent Town meteorological station (station number: 23090) and for Perth metropolitan area produced by Perth East meteorological station (station number: 9225) for the period of 01/01/2006 - 24/07/2008.
The two meteorological stations forecast highest and lowest temperatures for the next 3 days each day and update their weather forecast reports several times during 24 hours. The latest weather forecast update for the day is done in late afternoon or during evening. In this report I will use latest updates on weather forecasts released by the meteorological stations during the days for the following days. In doing so I will ensure that the latest, most accurate and consistent temperature forecast readings are used in our data analysis.
I have also obtained a file of historical records on the actual highest and lowest temperatures recorded in Adelaide metropolitan and Perth metropolitan areas for the period of 01/01/2006 - 25/07/2008. In total there are 937 temperature records for each region.
I used random generator to randomly select 30 dates from the period of 01/01/2006 - 24/07/2008 for each region separately. I then, for each region, selected a pair of temperature records from the two data sets for each randomly selected date: the maximum temperature recorded on a selected day and the maximum forecast temperature on the previous date during the evening of the selected date. Once random pairs of records for each region are determined then I calculated the differences between the temperatures pairs taking absolute magnitude for each value to account for some negative results e.g. when forecast temperature is underestimated. I assorted the list of differences in ascending order.
The archive of files of raw data I obtained from the Australian Bureau of Meteorology is located at
http://www.spunge.org/~alexg/008_markin.zipData of randomly selected dates and computed differences between the temperatures for the dates is located at
http://www.spunge.org/~alexg/data_forecast.xlsStep 4
Once two data samples are randomly selected and presented in appropriate form for statistical analysis Excel 2003 will be used to carry out necessary calculations.
For hypothesis tests and for 99% confidence intervals I will use PHStat2 package (add-in for Excel) which comes with various statistical functions and procedures producing quality outputs. A free version of the package can be obtained from
http://www.prenhall.com/phstat/
The assumptions for hypothesis tests I have described above which are:
Two data samples are randomly selected;
Two data samples are independent. Two sets of data were produced by two different meteorological stations for two different regions;
The two sampling distributions of the means for temperature differences are approximately normally distributed. By Central Limit Theorem this condition will be met if the samples’ sizes are sufficiently large (n = 30).
In this analysis the population is all available forecast temperatures ever produced by the two meteorological stations for the two regions. A large sample of 937 temperature records I obtained is not really entire population. However, knowledge and expertise as well technological advances in meteorology have changed significantly in the last 30-40 years. This factor has to be taken into account and probably a subset of the entire population, all forecast temperatures reports produced in the last 10 years for the two regions, for example, would be more appropriate for our analysis. As forecasts are more accurate and consistent for the current period of time.