- Good knowledge of the fundamentals of Statistics (i.e. probability, inferential statistics, linear regression model).
- Basic knowledge of the R programming language.
The course aims at providing the knowledge of cutting-edge statistical tools for modeling complex data. In particular, the objective of the considered methods is the automatic detection of patterns in the data (i.e. to “learn” from data). The estimated models can then be used by the analysts to make accurate predictions and take decisions under uncertainty.
At the end of the course the student will gain the ability to:
a) choose and apply the appropriate statistical tool, in the class of statistical learning methods, for the analysis of different types of data coming from real-world problems;
b) use the open-source statistical software R (freely available for download at http://www.r-project.org) for performing data analysis and visualization, implementing statistical models and obtaining predictions;
c) interpret the results in a decision making perspective.
- Introduction to machine learning: supervised versus unsupervised learning, the bias-variance trade-off.
- Classification methods: K-nearest neighbors classification, logistic regression, linear and quadratic discriminant analysis, classification trees (including bagging, random forests, boosting), support vector machine.
- Regression methods: K-nearest neighbors regression, ridge and lasso regression, non-linear regression models, regression trees.
- Resampling methods: cross-validation and bootstrap.
The course consists in theory lectures and R lab sessions (usually R labs represent 1/3 of the total number of hours). The lectures/labs calendar will be published at the beginning of the course on the Moodle page of the course.
The exam consists in:
- a test including open-ended and T/F questions concerning theoretical topics or short applications of the studied methods;
- exercises to be solved using the R software in order to evaluate the ability of the student in analysing data and interpreting outputs.
The two parts of the exam (theoretical and practical) are each worth 50% of the total score, approximately.
This course represents the second module of the “CODING AND MACHINE LEARNING” course (12 cfu). The final score will be computed by averaging the grades obtained from the two modules (Coding for Data Science and Machine Learning for Economics). The final scores will be published in the e-learning page of the course.
- Attending lectures and R labs is strongly recommended.
- If the course will be delivered remotely (totally or partially), changes may occur in the program and/or in the exam, in order to adapt the course to on-line teaching methods.
- More information about the book is available at the following links:
https://www.springer.com/us/book/9781461471370
https://www-bcf.usc.edu/~gareth/ISL/
- Documentation for R software is freely available at the following link: https://www.r-project.org/other-docs.html