Robust data analysis in R and Matlab
-
Concepts related to Robust Statistics. -
Performance of outlier detection methods. -
Learn to differentiate one method from another. -
Identify the most robust and efficient methods that you should use in practice. -
Application of the methods with handmade examples. -
Application of the methods with R and Matlab.
- Basic statistical knowledge.
Robust data analysis and outlier detection are crucial in Statistics, Data Analysis, Data Mining, Machine Learning, Pattern Recognition, Artificial Intelligence, Classification, Principal Components, Regression, Big Data, and any field related with data. Researchers, students, data analyst, and mostly anyone who is dealing with real data have to be aware of the problem with outliers and they have to know how to deal with this issue.
This course is intended to study the characteristics of the problem, its consequences and learn how to recognise it through the existing approaches. We will deeply study the performance and the properties of the methods to detect outliers in case we have a single random variable (univariate data) or in case we have more than one (multivariate data) . We will see the theoretical properties of the methods and we will apply them to examples. In addition, we are going to see the practical performance with the software R and Matlab, and we will learn the different existing packages in both software for the problem of outlier detection. The implementation and example codes are available in the open Google Drive repository.
You will learn about both classical and recent algorithms for outliers detection:
Univariate space:
-
Method SD
-
Z score
-
Tukey Boxplot
-
MADe
-
Modified Z score
-
Adjusted boxplot
Multivariate space:
-
Classical Mahalanobis distance
-
Robust Mahalanobis distance
-
MCD
-
Adjusted MCD
-
Stahel-Donoho
-
Kurtosis
Linear regression:
-
Ordinary least squares (classic method)
-
Robust regression: LAD, LMS, LTS
In addition, we have two sections of basic concepts that will help you to remember some notions necessary to understand the methods for outlier detection.
Basics I
-
Sample, population, random variable
-
Distribution of a random variable
-
Normal distribution
-
Fisher chi-square, t-student and F distributions
-
Estimators
Basics II
-
Linear algebra
-
Multivariate variable
-
Joint and marginal distribution
-
Independence, covariance and correlation
-
Multivariate Normal
With this course you will master one of the most important issues today both academically, as in industry and in data analysis. The examples will help you to visualize this importance and as a guide to carry out these analyzes by yourself.
- Data scientist.
- Data analyst.
- Students.
- Researchers.