The article discusses the basics of statistics, importance of statistical analysis, use of statistics in data mining and importance as a tool in business through data representations, including various graphs.
Statistics as a branch of mathematics concerned with the collection, analysis and interpretation of data or numerical information. Statistical analysis plays an essential role in the efficient and effective functioning of any society. Various areas such as education, research, business and government use statistical analysis heavily.
In recent years, the term “data mining” has been much related to statistics. In general term, data mining is the process of analyzing data from different perspectives then summarizing it into useful information. Data mining is mainly used by companies with strong computer focus, for example, financial, retail, and communication organizations.
One of a number of analytical tools for analyzing data is data mining software which allows users to analyze data from different categories or angles, and summarize the outcomes identified.
Statistics and its Basic Concepts
In order to understand the concepts of statistics, one needs to know the meaning of a “variable”, a characteristic of an item or individual that makes people and an item differ from each other.
The aim of statistics is to obtain information from data, where interpretations are made and decisions are derived at. It is important to note where the data had come from and what can learned from it. How are the data processed?
1. The first step is to collate and display the data, usually in a graphical form to facilitate visualization, absorb the pattern and find any unusual observations, if any.
2. The second step is to determine numerical measures that aid in summarizing the data.
These graphical displays and numerical summaries are great ways and means to aid in understanding the data, especially those in question.
Statistical Data: Primary and Secondary Data
Data can be obtained in different ways but they have only two source categories: primary and secondary.
Secondary data - collected usually for normal managerial purposes, such as ledger and journal entries, invoices or sales figures. It is data that already exists having been collected for some other purpose than it could be used for. Before using secondary data the following questions should be considered:
- Do the data contain the information we require?
- Where did the data come from? That is, if the source is reliable.
- How the data are collected, that is, if the results are going to be inconclusive or biased.
If satisfied with the answers to these questions, then the data can be used.
The main advantage of secondary data is that it already exists. They are usually cheaper to obtain than primary data. They can also save time particularly if interpretations need to be carried out and decisions need to be made quickly. If not completely confident in the secondary data, it would be wise to use primary data.
Primary data are collected either by you, a colleague or by a market research organisation, for the specific purposes. The source of data is current and meets requirements totally, therefore it is more reliable for drawing conclusions and decision making. It must be remembered that obtaining primary data can be time consuming and therefore more expensive than secondary data.
Word of caution: A selection bias can exist when there is a systematic tendency to over-represent or under-represent some part of the population. There can also be a refusal or a none-response bias. A solution to this is random sampling, a process whereby each possible sample of a given size has the same probability of being selected.
A random sample is not a casual or haphazard sample. The target population is identified, with a thorough list of elements represented in the table of random numbers or some other form of sampling that are appropriate.
Everyone is exposed to results of statistical analysis in everyday living where newspapers & magazines, television, radios, and all sorts of pamphlets present statistical information on: weather forecasts, health road accidents, stock markets, unemployment figures, birth and death rates, currency fluctuations, and so on.
So how are data represented? Data are related numeric information about any specified subject matter. There are two types of data indentified:
Discrete variables – They are variables obtained by counting exact values. In mathematical terms, they are positive integers or whole numbers.
Continuous variables – They are variables usually obtained from some form of measurement scale by using approximate values. Data relating to such items, for example, heights, weights, distances or time, are continuous, with the accuracy of measurement being determined by the instrument used.
Depending on the type of data being collected and/or analysed, statistical information is presented in variety of ways. These collected data are condensed and represented by the use of different graphs as suited. The most familiar ones are:
- line graphs
- bar graphs
- pie or circle graphs
Applications of Statistics to Business
What is “business statistics”? Business statistics happen when data are applied to business, for example, statistics in marketing, quality control and insurance.
With business concerned in extracting the best information from data to facilitate decision making, statistics is the most widely used quantitative method in business as applied to market research, sales forecasting and quality control.
Statistical data used in business include consumer data bases, opinion polls, population censuses, and sales data. A statistician determines, for a given question, the type of data needed, the method of collection, and how data is analysed to provide the best answer to that question.
With large businesses using more data mining in their processes, statistics become more vital as it actively participates in everything, from planning for data collection to drawing inferences from data with results represented in graphical forms.
Pie Chart of Nigerian Oil Exports (2009 data)
Dickman, Greg. Business Statistics. Melbourne: Nelson ITP Intl. Publishing Company, 1998