What is a correlaion?
A correlation is a kind of relationship between two variables. It is a statistical value expressed in a correlation...
Random Forests
Like mentioned in the post about decision trees, the big challenge to face is overfitting. To adress the issue, a concept developed by Tin...
R - Simple Data Preparation
R covers a huge variety of functions for data manipulation, reorganization and separation. Here are some of the commands I consider...
R - Packages and DataTypes
Packages
Functionality is maintained in packages. Some packages are part of the basic functionality and are predefined when you install...
Null-Hypothesis, Z-Scores and Normal Distribution
This is Carl Friedrich Gauß, one of the most friendly looking German mathematicans, on a banknote. Nowadays Germany forms part of the European...
Null Hypothesis, P-Value and Significance Level
What is the null hypothesis?
That's easy: It is the assumption, that an observation is simply due to chance. The contrary assumption,...
R - How to Start
Welcome to a new challenge - an empty green meadow is waiting to be explored and developed!
R is a pretty strange programming language,...
Lift
Like its name indicates a lift is a measure on how good your binary classifier model is lifting the predictions. In other words it...
The ROC curve
The ROC curve (receiver operating characteristic curve) is a graphical illustration that can be used to visualize and compare the quality...
Classical Decomposition of a time series
A time series is a data set that has a time component. Yes, it is just what you think about, in the optimal case you have one value in a fixed...
Seasonality and Random Determination
In this post we saw how the three components trend, seasonality and random of a time series. How to extract the trend was shown there, now we...
Trend determination with moving averages
We already saw in the previous post that you can decompose a trend from a time series. Here a classical approach how to determine an underlying...
Classical Decomposition - Summary
The classical decomposition of a time series can help to get an overview on the tendencies (trend component), periodic patterns (seasonal...
Sensitivity and Specificity
Apart from Accurancy and Precision there are othere measures for classification models. Today we will focus on another pair of classifiers,...
Averages
Averages are a bad idea. Averages hide characteristics, individual properties and specialities behind one value. Nobody wants to be...
Decision Trees
One of the most important techniques for data analysis are decision trees. They are easy to understand, can illustrate complex rule sets...
Accuracy and Precision - How good is your classifier?
After successfully deciding on a classifier, after adjusting and optimizing the parameters on behalf of training data, it is time to evaluate...
How is the weather in the Black Forest? - Nearest Neighbour Approaches
Planning a city or hiking trip on the weekend, a barbequeue or a romantic picnic in the mountains requires a pretty good weather forecast....
Where to get data from?
Looking for interesting data sets to train your models on or create hypothesis?
Here some well organized lists of free (most of them) and...
What does a Data Scientist do? - Facts
From the post about the CRISP-DM we know what phases a data science project consist of. This post analyzes the time spend on the different...
Abonnieren
Posts (Atom)