Business Intelligence & general IT stuff

Tuesday, November 18, 2008

Real-time business intelligence

From Wikipedia, the free encyclopedia

  (Redirected from Real time business intelligence)

Real-time business intelligence is the process of delivering information about business operations without any latency. In this context, real-time means delivering information in a range from milliseconds to a few seconds after the business event. While traditional business intelligence presents historical information to users for analysis, real-time business intelligence (see also Business Intelligence 2.0) compares current business events with historical patterns to detect problems or opportunities automatically. This automated analysis capability enables corrective actions to be initiated and or business rules to be adjusted to optimize business processes.


Latency in real-time systems

All real-time business intelligence systems have some latency, but the goal is to minimize the time from the business event happening to a corrective action or notification being initiated. Analyst Richard Hackathorn describes three types of latency:

  • Data latency; the time taken to collect and store the data
  • Analysis latency; the time taken to analyze the data and turn it into actionable information
  • Action latency; the time taken to react to the information and take action

Real-time business intelligence technologies are designed to reduce all three latencies to as close to zero as possible. Traditional business intelligence and business activity monitoring by comparison only seek to reduce data latency and do not address analysis latency or action latency since both are governed by manual processes.

Some commentators have introduced the concept of right time business intelligence which proposes that information should be delivered just before it is required, and not necessarily in real-time.


Real-time Business Intelligence Architectures

[edit]Event based Real-time Business Intelligence

Real-time Business Intelligence systems are event driven, and use Event Stream Processing techniques to enable events to be analysed without being first transformed and stored in a database. These in- memory techniques have the advantage that high rates of events can be monitored, and since data does not have to be written into databases data latency can be reduced to milliseconds.

[edit]Real-time data warehouse

An alternative approach to event driven architectures is to increase the refresh cycle of an existing data warehouse to update the data more frequently. These real-time data warehouse systems can achieve near real-time update of data, where the data latency typically is in the range from minutes to hours out of date. The analysis of the data is still usually manual, so the total latency is significantly different from event driven architectural approaches.

[edit]Real-time server-less technology

The latest alternative innovation to "real-time" event driven and/or "real-time" data warehouse architectures is MSSO Technology (Multiple Source Simple Output) which does away with the need for the data warehouse and intermediary servers altogether since it is able to access live data directly from the source (even from multiple, disparate sources). Because live data is accessed directly by server-less means, it provides the potential for zero-latency, real-time data in the truest sense.

[edit]Process-aware Real-time Business Intelligence

Is sometimes also considered a subset of Operational intelligence and is also identified with Business Activity Monitoring, this allows entire processes (transactions, steps) to be monitored, metrics (latency, completion/failed ratios, etc.) to be viewed, compared with warehoused historic data, and trended - in real-time. Advanced implementations allow threshold detection, alerting and providing feedback to the process execution systems themselves, thereby 'closing the loop'.

Saturday, November 8, 2008

Some Data mining terminology

Confusion matrix

From Wikipedia, the free encyclopedia

In the field of artificial intelligence, a confusion matrix is a visualization tool typically used in supervised learning (in unsupervised learning it is typically called a matching matrix). Each column of the matrix represents the instances in a predicted class, while each row represents the instances in an actual class. One benefit of a confusion matrix is that it is easy to see if the system is confusing two classes (i.e. commonly mislabelling one as another).

When a data set is unbalanced (when the number of samples in different classes vary greatly) the error rate of a classifier is not representative of the true performance of the classifier. This can easily be understood by an example: If there are for example 990 samples from class A and only 10 samples from class B, the classifier can easily be biased towards class A. If the classifier classifies all the samples as class A, the accuracy will be 99%. This is not a good indication of the classifier's true performance. The classifier has a 100% recognition rate for class A but a 0% recognition rate for class B.

In the example confusion matrix below, of the 8 actual cats, the system predicted that three were dogs, and of the six dogs, it predicted that one was a rabbit and two were cats. We can see from the matrix that the system in question has trouble distinguishing between cats and dogs, but can make the distinction between rabbits and other types of animals pretty well.

Example confusion matrix
CatDogRabbit
Cat530
Dog231
Rabbit0211



Table of Confusion

In Predictive Analytics, a Table of Confusion, also known as a confusion matrix, is a table with two rows and two columns that reports the number of True Negatives, False Positives, False Negatives, and True Positives.

 actual value
 pntotal
prediction
outcome
p'True
Positive
False
Positive
P'
n'False
Negative
True
Negative
N'
totalPN

Table 1: Table of Confusion.

For example, consider a model which predicts for 10,000 Insurance Claims whether each case is Fraudulent. This model correctly predicts 9,700 non-fraudulent cases, and 100 fraudulent cases. The model also incorrectly predicts 150 cases which are not fraudulent to be fraudulent, and 50 cases which are fraudulent to be non-fraudulent. The resulting Table of Confusion is shown below.

 actual value
 pntotal
prediction
outcome
p'100150P'
n'509700N'
totalPN

Table 2: Example Table of Confusion.


Mean absolute error


In statistics, the mean absolute error is a quantity used to measure how close forecasts or predictions are to the eventual outcomes. The mean absolute error (MAE) is given by

\mbox{MAE} = \frac{1}{n}\sum_{i=1}^n \left| f_i-y_i\right| =\frac{1}{n}\sum_{i=1}^n \left| e_i \right|.

As the name suggests, the mean absolute error is an average of the absolute errors ei = fi − yi, where fi is the prediction and yi the true value. Note that alternative formulations may include relative frequencies as weight factors.

The mean absolute error is a common measure of forecast error in time series analysis, where the terms "mean absolute deviation" is sometimes used in confusion with the more standard definition of mean absolute deviation. The same confusion exists more generally.


Mean absolute error

From Wikipedia, the free encyclopedia

In statistics, the mean absolute error is a quantity used to measure how close forecasts or predictions are to the eventual outcomes. The mean absolute error (MAE) is given by

\mbox{MAE} = \frac{1}{n}\sum_{i=1}^n \left| f_i-y_i\right| =\frac{1}{n}\sum_{i=1}^n \left| e_i \right|.

As the name suggests, the mean absolute error is an average of the absolute errors ei = fi − yi, where fi is the prediction and yi the true value. Note that alternative formulations may include relative frequencies as weight factors.

The mean absolute error is a common measure of forecast error in time series analysis, where the terms "mean absolute deviation" is sometimes used in confusion with the more standard definition of mean absolute deviation. The same confusion exists more generally.