ECTS credits ECTS credits: 4.5
ECTS Hours Rules/Memories Hours of tutorials: 1 Expository Class: 15 Interactive Classroom: 25 Total: 41
Use languages Spanish, Galician
Type: Ordinary Degree Subject RD 1393/2007 - 822/2021
Departments: Electronics and Computing
Areas: Computer Architecture and Technology
Center Higher Technical Engineering School
Call: First Semester
Teaching: With teaching
Enrolment: Enrollable
In the age of digitalisation, we generate data in all our daily actions: interacting on social networks, browsing websites, performing business transactions or simply carrying our mobile phone in our pocket. This huge generation of data constitutes a potentially significant reservoir of valuable information. This is how the term "big data" started.
The term "big data" is often used to refer to the massive use of data for profit. Although there is no standard definition, the term goes far beyond the exploitation of a large amount of data. Big data relates to a set of data whose volume, diversity and complexity requires new techniques, algorithms and analysis in order to extract hidden knowledge.
The main objective of this subject is to train students in the application of large-scale data analysis and processing techniques. Throughout the subject, we will study the hardware and software infrastructures necessary for the efficient exploitation of this data. We will look at how to distribute data and processing with the aim of obtaining scalable systems. We will analyse programming models and technologies specially designed for handling large datasets with special emphasis on machine learning. The subject will provide a comprehensive understanding of the platforms and tools used in big data processing, addressing both the technical aspects and the theoretical foundations necessary to face the challenges presented by big datasets.
Unit 1. Introduction to Big Data
Unit 2. Infrastructures for Big Data storage and processing: Hadoop
Unit 3. Large-scale data processing: Apache Spark
Unit 4. Machine learning with Apache Spark MLlib
Unit 5. Continuous data processing
In addition to the slides and other material provided by the professors, students have the following basic and complementary bibliography to follow the subject.
Basic bibliography
- A. Polak, Scaling Machine Learning with Spark, O'Reilly, 2023.
- A. Bifet, R. Gavaldà, G. Holmes, and B. Pfahringer. Machine learning for Data Streams with Practical Examples in MOA. MIT Press, 2018. Dispoñible de xeito gratuíto en https://moa.cms.waikato.ac.nz/book/.
- I. Triguero, M. Galar, Large-Scale Data Analytics with Python and Spark, Cambridge University Press, 2023.
Complementary bibliography
- T. White, Hadoop: The Definitive Guide, 4th Edition, O'Reilly, 2015.
- G. Maas and F. Garillot. Stream processing with Apache Spark, O'Reilly, 2019.
- J. Damji, B. Wenig, T. Das, and D. Lee. Learning Spark, 2nd Edition, O'Reilly, 2020.
Students will acquire the following basic, transversal and specific competences at different levels of depth:
Basic and general
GC2 - Ability to solve problems with initiative, decision-making, autonomy and creativity.
GC5 - Ability to design new computational systems and/or evaluate the performance of existing systems, integrating artificial intelligence models and techniques.
Transversal
TR3 - Ability to create new models and solutions autonomously and creatively, adapting to new situations. Initiative and entrepreneurial spirit.
Specific
SC5 - Understand and apply the basic principles and techniques of parallel and distributed programming for the development and efficient execution of artificial intelligence techniques.
CE6 - Ability to carry out the analysis, design and implementation of applications that require working with large volumes of data and in the cloud in an efficient manner.
The subject has a theoretical-practical approach through a combination of theoretical and interactive classes.
The theoretical classes will consist of lectures given by the professor, dedicated to the presentation of theoretical content and the resolution of problems or exercises. The lecturers will promote an active attitude, asking questions to clarify specific aspects and leaving open questions for student reflection. Competences covered: CG5, CE5 and CE6.
The interactive classes will consist of classes in small groups with the aim of acquiring practical skills and complementing the contents taught in the theoretical classes by encouraging group work learning. They will be focused on the active work of students in practical tasks or projects necessary to pass the continuous assessment part of the subject. Competences covered: CG2, TR3, CE5 and CE6.
The tutorials will be used to resolve theoretical and/or practical doubts as well as for guidance in carrying out the proposed tasks.
OPPORTUNITY ORDINARY
The assessment of the subject will be divided into two parts, continuous assessment (50% of the mark) and exam (50% of the mark). In order to pass the subject, a mark of 5 or more out of 10 must be obtained in each of the parts.
Continuous assessment (50% of the final mark).
The solutions provided by the students in relation to the tasks or projects developed within the interactive classes will be assessed. Continuous assessment activities may include, among others, the completion of exercises individually or in small groups, submission of reports or follow-up tests. In this part, the competences CG2, TR3, CE5 and CE6 will be implicitly or explicitly assessed.
The practical work will be assessed by means of correction by the teacher, the defence of the solution delivered by the student to the teacher or the oral presentation in class. All assignments must be submitted via the Campus Virtual before the dates that will be specified. Late submissions will be graded 0 points. The degree of compliance with the specifications, methodology, thoroughness and presentation of results will be assessed.
Attendance at the interactive sessions is compulsory when an activity is planned to be assessed during the session. These sessions will be scheduled and notified in advance.
Continuous assessment is nonrecoverable and therefore cannot be replaced by the completion of a final practical work or exam.
For cases of fraudulent performance of exercises, projects or tests, the "Normativa de avaliación do rendemento académico dos estudantes e de revisión de cualificacións" will apply.
Exam (50% of the final mark).
There will be a theoretical and/or problem-solving exam, on the date officially designated for this purpose, in which all the contents of the subject will be assessed, fundamentally the competences CG2, CG5, TR3 and CE5.
EXTRAORDINARY OPPORTUNITY FOR RECOVERY.
Students can opt to be re-evaluated of the theoretical content and/or problem solving (exam). Continuous assessment is nonrecoverable, the mark obtained in the ordinary opportunity will be maintained.
NO-SHOW CONDITION.
The mark will be "no-show" when the person does not take the exam at any of the opportunities and does not submit more than one continuous assessment activity.
REPEAT STUDENTS
Repeat students will be assessed under the same conditions as first year students. The mark obtained in the continuous assessment may be retained in subsequent years if the student so requests by email to the professor.
STUDENTS WITH ATTENDANCE EXEMPTION
Students with attendance exemption will be assessed under the same conditions as the rest of the students. In those activities of the continuous assessment whose evaluation requires classroom attendance, the student must attend those sessions.
• Lectures: 15 hours.
• Practical and/or laboratory sessions: 25 hours.
• Tutorials: 1 hour.
• Autonomous student work (study, exercises, practical work, projects): 71.5 hours.
Total: 112.5 hours
The recommended prerequisites are to have previously passed Concurrent, Parallel and Distributed Computing, Networks and Databases, as established in the memory of the degree.
Attendance at lectures and interactive classes is also recommended, as well as continued study of the subject, making an active effort to search for materials, carry out exercises and complete the proposed practicals or projects.
The instructional materials for this subject include the indicated bibliography and the slides prepared by the professors. The Campus Virtual will be used for all types of communication, file exchange and submission system. The practicals will be done using free software such as Hadoop and Apache Spark. Depending on the availability of CESGA, we will use their Big Data cluster for some of the practicals.
The language of this subject will be Galician. However, the bibliography are entirely in English.
In case of doubts about the program, the original source (the Galician one) will be taken into account.
Alvaro Ordoñez Iglesias
Coordinador/a- Department
- Electronics and Computing
- Area
- Computer Architecture and Technology
- Phone
- 881815508
- alvaro.ordonez [at] usc.es
- Category
- Professor: LOU (Organic Law for Universities) PhD Assistant Professor
Wednesday | |||
---|---|---|---|
10:30-12:00 | Grupo /CLE_01 | Galician | IA.01 |
12:00-14:00 | Grupo /CLIS_01 | Galician | IA.01 |
Thursday | |||
09:00-11:00 | Grupo /CLIS_02 | Galician | IA.01 |
01.14.2025 16:00-20:00 | Grupo /CLE_01 | IA.01 |
01.14.2025 16:00-20:00 | Grupo /CLIS_02 | IA.01 |
01.14.2025 16:00-20:00 | Grupo /CLIS_01 | IA.01 |
01.14.2025 16:00-20:00 | Grupo /CLIS_01 | IA.11 |
01.14.2025 16:00-20:00 | Grupo /CLE_01 | IA.11 |
01.14.2025 16:00-20:00 | Grupo /CLIS_02 | IA.11 |
01.14.2025 16:00-20:00 | Grupo /CLIS_02 | IA.12 |
01.14.2025 16:00-20:00 | Grupo /CLE_01 | IA.12 |
01.14.2025 16:00-20:00 | Grupo /CLIS_01 | IA.12 |
06.24.2025 16:00-20:00 | Grupo /CLE_01 | IA.11 |
06.24.2025 16:00-20:00 | Grupo /CLIS_01 | IA.11 |
06.24.2025 16:00-20:00 | Grupo /CLIS_02 | IA.11 |