- Good knowledge of the fundamentals of Statistics (i.e. probability, inferential statistics, linear regression model).
- Basic knowledge of the R programming language (as given by the "Coding for data science" module 1).
The course aims at providing the knowledge of cutting-edge statistical tools for modeling complex data. In particular, the objective of the considered methods is the automatic detection of patterns in the data (i.e. to “learn” from data). The estimated models can then be used by the analysts to make accurate predictions and take decisions under uncertainty.
At the end of the course the student will gain the ability to:
a) choose and apply the appropriate statistical tool, in the class of statistical learning methods, for the analysis of different types of data coming from real-world problems;
b) use the open-source statistical software R (freely available for download at http://www.r-project.org) for performing data analysis and visualization, implementing statistical models and obtaining predictions;
c) interpret the results in a decision making perspective.
- Introduction to machine learning: supervised versus unsupervised learning, the bias-variance trade-off.
- Classification methods: K-nearest neighbors classification, logistic regression, naive Bayes, linear and quadratic discriminant analysis, classification trees (including bagging, random forests, boosting), support vector machine.
- Regression methods: K-nearest neighbors regression, ridge and lasso regression, non-linear regression models, regression trees (including bagging, random forests, boosting), support vector machine.
- Resampling methods: cross-validation and bootstrap.
The course consists in theory lectures for a total of 48 hours. Extra hours (usually 12) are dedicated to R lab sessions. The lectures/labs calendar will be published at the beginning of the course on the Moodle page of the course.
The exam consists in:
- a test including open-ended and T/F questions concerning theoretical topics or short applications of the studied methods;
- exercises to be solved using the R software in order to evaluate the ability of the student in analysing data and interpreting outputs.
The two parts of the exam (theoretical and practical) are each worth 50% of the total score, approximately.
This course represents the second module of the “CODING AND MACHINE LEARNING” course (12 cfu). The final score will be computed by averaging the grades obtained from the two modules (Coding for Data Science and Machine Learning for Economics). The final scores will be published in the e-learning page of the course.
- Attending lectures and R labs is strongly recommended.
- In case of specific directives by the authorities required for the management of epidemiological emergencies, the course
could undergo changes compared to what is stated in the syllabus.