Humboldt-Universität zu Berlin - Faculty of Mathematics and Natural Sciences - Process Management and Information Systems

Bachelor and Master Thesis

General Information

Our team offers bachelor and master thesis topics as well as student projects to be written in English. 

Student may apply for a thesis or a study project during within two application windows in a year, in which new topics are made available. The first window is open from February 1st until April 1st. The second window is open from July 1st until October 1st.

 

Here you can find the information on how to write a thesis with us. Slides are available here (part Ipart II), and recordings here (part Ipart II). 

Furthermore, find below a summary of guidelines for working on your thesis with us.

 

Expression of Interest in a Topic (Thesis or Study Project)

If you are interested in one of the topics, please send an email expressing your interest to Dr. Saimir Bala (firstname[dot]lastname[at]hu-berlin.de). Please explain why this topic is interesting for you and how it fits your prior studies. Also explain what are your strengths in your studies and in which semester of your studies you are.

 

Process Overview

  • There are two main time windows in which the team proposes new topics: Feb 1st ­– Apr 1st and Jul 1st – Oct 1st
  • Within these windows students can apply for an open topic (see list of open topics below)
  • Application is done by sending an email to Dr. Saimir Bala (firstname[dot]lastname[at]hu-berlin.de).
  • We collect your applications and make a topic-student assignment in two rounds. First round on March, second round after the deadline. For the winter session, we have two rounds (Sep, Oct).
  • Once a student has been matched to a supervisor, a kick-off meeting is scheduled to scope the topic.
  • Then, students must submit a research proposal to the supervisor within a month.
  • If the proposal is graded as passed, the supervision is officially registered
  • Once the thesis work is concluded, the thesis defense is scheduled within a dedicated defense slot.

 

Important Dates

01.07.2024: New topics released. Students can express their interest.

02.09.2024: Topic assignment (1st round)

01.10.2024: Expression of interest deadline

02.10.2024: Topic assignment (2nd round)

 

Milestones:

- Kick-off: shortly after assignment round

- Research proposal submission deadline first round (1 month after official kick-off)

- Official start (if proposal sufficient)

- Thesis delivery 

- Grading & Defence

 

Formatting

Please consider the following hints and guidelines for working on your thesis:

  • Templates for thesis and proposal: https://www.informatik.hu-berlin.de/de/studium/formulare/vorlagen
  • Page limits are as follows
    • page limit is for Bachelor Informatik 40 pages and for Kombibachelor Lehramt Informatik 30 pages
    • page limit is for Master Informatik 80 pages and for Master Information Systems 60 pages
  • The limits do not include cover, table of content, references, and appendices.

 

Prerequisites

The candidate is expected to be familiar with the general rules of writing a scientific paper. Some general references are helpful for framing any thesis, no matter which topic:

In agreement with the supervisor an individual list of expected readings should be studied by the student in preparation of the actual work on the thesis.

 

Grading

The grading of the thesis takes various criteria into account, relating both to the thesis as a product and the process of establishing its content. These include, but are not limited to:

  • Correctness of spelling and grammar
  • Aesthetic appeal of documents and figures
  • Compliance with formal rules
  • Appropriateness of thesis structure
  • Coverage of relevant literature
  • Appropriateness of research question and method
  • Diligence of own research work
  • Significance of research results
  • Punctuality of work progress
  • Proactiveness of handling research progress

 

Recent Topics

The following topics are available within the current application window.

 

Topic 1: Process prediction using object-centric event log (Bachelor/Master)

Business process prediction involves forecasting specific details, such as the next activity to be performed, the time remaining for the completion of a process instance, or key process indicators, for an ongoing process instance. Currently, the techniques rely on XES event logs as input data. However, the field of process mining is shifting towards utilizing object-centric event logs, which offer a comprehensive multidimensional view of the data. Despite this advancement, object-centric event logs have been underutilized as input for process prediction.


Research problem:
The core research problem addressed is: How can process prediction benefit from an object-centric event log?
The aim is to propose a method to process prediction using object-centric event log.

Requirements:
The candidate must have previous knowledge of process mining and software development. Further desirable requirements are pro-activity and self-organization.

 

Initial references

  • An Empirical Investigation of Different Classifiers, Encoding, and Ensemble Schemes for Next Event Prediction Using Business Process Event Logs. ACM Trans. Intell. Syst. Technol. 11(6): 68:1-68:34 (2020)
  • Uncovering Object-Centric Data in Classical Event Logs for the Automated Transformation from XES to OCEL. BPM 2022: 379-396
  • Benedikt Knopp, Wil M. P. van der Aalst:Order Management Object-centric Event Log in OCEL 2.0 Standard. Zenodo, 2023

 

Supervisor: Kate Revoredo

 

Topic 2: Causation discovery for process prediction (Bachelor/Master)

Business process prediction involves forecasting specific details, such as the next activity to be performed, the time remaining for the completion of a process instance, or key process indicators, for an ongoing process instance. Currently, most techniques rely on the order in which the events happened without considering the cause-effect relation among them.

 

Research problem:
The core research problem addressed is: How can process prediction benefit from the cause-effect relation among the events?
The aim is to propose a method to discover the cause relation among events and use this information for process prediction.

 

Requirements:
The candidate must have previous knowledge of process mining, statistics, and software development. Further desirable requirements are pro-activity and self-organization.

 

Initial references

  • An Empirical Investigation of Different Classifiers, Encoding, and Ensemble Schemes for Next Event Prediction Using Business Process Event Logs. ACM Trans. Intell. Syst. Technol. 11(6): 68:1-68:34 (2020)
  • Jens Brunk, Matthias Stierle, Leon Papke, Kate Revoredo, Martin Matzner, Jörg Becker: Cause vs. effect in context-sensitive prediction of business process instances. Inf. Syst. 95: 101635 (2021)
  • Pearl,J.(2011).Bayesiannetworks.

 

Supervisor: Kate Revoredo

 

Topics 3: Uses of Models in Agile Software Development (Bachelor/Master)

Motivation & problem: Modeling is a key topic in software engineering. In software development projects, among other aspects, modeling supports the developer in understanding the design by providing an overview and a tool for communication with fellow developers and other stakeholders. The benefits of models for supporting system analysis and design activities have been highlighted regarding their cognitive effectiveness, often in the context of traditional methodologies. However, these benefits have also been discussed in the agile scene, but it is still not clear to what extent models are used in agile software development projects.

Objectives: conduct a systematic review of the literature, identify the uses of models in agile software development, categorize and prioritize them, and propose a framework to support agile software development based on these findings. The findings shall be evaluated according to the perspective of practitioners.

Prerequisites: (1) Basic knowledge of agile software development methodologies; (2) Intermediate knowledge of models used in software development; (3) Pro-activity, self-organization, attention to detail (desirable).

 

Initial References:

  • Ambler, Scott W. The object primer: Agile model-driven development with UML 2.0. Cambridge University Press, 2004.
  • Alfraihi, Hessa Abdulrahman A., and Kevin Charles Lano. "The integration of agile development and model driven development: A systematic literature review." The 5th International Confrence on Model-Driven Engineeing and Software Development (2017).
  • Wagner, Stefan, Daniel Méndez Fernández, Michael Felderer, Antonio Vetrò, Marcos Kalinowski, Roel Wieringa, Dietmar Pfahl et al. "Status quo in requirements engineering: A theory and a global family of surveys." ACM Transactions on Software Engineering and Methodology (TOSEM) 28, no. 2 (2019): 1-48.
  • Petre, Marian. "UML in practice." In 2013 35th international conference on software engineering (icse), pp. 722-731. IEEE, 2013.

 

Supervisor: Cielo González

 

Topic 4: Collaborative business-model-driven tool for agile software development projects (Bachelor/Master)

 

Motivation & problem: Agile software development methodologies and frameworks have changed the way software is created and are widely supported and used. However, this does not mean there are no challenges that jeopardize the principles of agile methodologies, increasing the failure rate of agile software development projects. This situation highlights the need for cohesive solutions. The use of business process models in the agile software development context emerges as a promising option due to their ability to facilitate communication and share knowledge.

Objectives: analyze, design, implement and evaluate a collaborative business-model-driven tool for agile software development projects. The objectives will be adapted to align with the student's study goals.

Prerequisites:(1) Knowledge of agile software development methodologies; (2) Knowledge of business process models; (3) Knowledge in frontend (e.g., JavaScript and TypeScript); (4) Knowledge in Java; (5) Knowledge in databases (e.g., PostgreSQL, mongoDB); (6) Pro-activity and self-organization.

 

Initial references:

  • Moyano, Cielo González, et al. "Uses of business process modeling in agile software development projects." Information and Software Technology 152 (2022): 107028.
  • Trkman, Marina, Jan Mendling, and Marjan Krisper. "Using business process models to better understand the dependencies among user stories." Information and software technology 71 (2016): 58-76.

 

Supervisor: Cielo González

 

Topic 5: Fair and Diverse Sampling of Event Logs (Bachelor and Master)

The sampling of large event logs, i.e. the selection of subsets of data, has been proposed as one possible solution to tackle the runtime requirements of process analysis tasks and to aid the understandability of data sets, that are too complex to analyze as a whole. In this context a diverse sample is one, which properly reflects the diversity or complexity of the event log, while a fair sample is one, which ensures, that each value is represented properly. For both quality criteria, approaches have been proposed, to solve these problems, for instance by optimizing subset selection functions or employing determinantal point processes.

 

In this thesis the student will:

* conduct research on existing approaches for diverse and fair sampling
* implement selected approaches for the diverse and fair sampling for event logs
* evaluate the implemented algorithms comparatively in terms of quality and efficiency

The student is expected to have existing knowledge in optimization or statistics or sampling.

 

Initial References

  • Kabierski, M., van der Aa, H., and Weidlich, M. (2020). Sampling and approximation techniques for efficient process conformance checking. Information Systems. 104. 101666. http://dx.doi.org/10.1016/j.is.2020.101666
  • Moumoulidou, Z., McGregor, A., and Meliou A.. Diverse Data Selection under Fairness Constraints. In 24th International Conference on Database Theory (ICDT 2021). Leibniz International Proceedings in Informatics (LIPIcs), Volume 186, pp. 13:1-13:25, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2021), https://doi.org/10.4230/LIPIcs.ICDT.2021.13
  • Celis, L., Vijay, K., Straszak, D., Deshpande, A., Kathuria, T., and Vishnoi, N. (2018). Fair and Diverse DPP-based Data Summarization. https://arxiv.org/abs/1802.04023

 

Supervisor: Martin Kabierski

 

Topic 6: Estimating Saturation in Qualitative Studies (Master)

 

Grounded theory is a research methodology usually applied in qualitative analysis. It involves the collection of data (usually through interviews, surveys, ...), and the deduction of concepts, categories, and ultimately theories that emerge from the collected data. A central question to this iterative data collection-evaluation process is when one should stop collecting data, which ideally is at the point of saturation, i.e. when no new information is gained from new interviews.
Determining when exactly this point has been reached is an ongoing topic of discussion and research.
Species richness estimators, that estimate the completeness of samples, could be utilized to give saturation estimates that are data-driven and grounded in statistics.

 

In this thesis, the student will:

- assess the applicability of species richness estimation for determining saturation in qualitative studies
- implement and apply the estimators to qualitative interview data
- evaluate the feasibility of the approach and discuss potential limitations

The student is expected to have understanding of statistics and and optionally preliminary experience in the analysis of qualitative data. We note, that the student is not expected to collect data for the thesis, as this data will be provided by us.

 

Initial References:

  • Strauss, A., & Corbin, J. (1994). Grounded theory methodology: An overview. In N. K. Denzin & Y. S. Lincoln (Eds.), Handbook of qualitative research (pp. 273–285). Sage Publications, Inc. https://www.depts.ttu.edu/education/our-people/Faculty/additional_pages/duemer/epsy_5382_class_materials/Grounded-theory-methodology.pdf
  • Saunders, Benjamin, et al. (2018). Saturation in qualitative research: exploring its conceptualization and operationalization. In: Qual Quant 52 (pp. 1893-1907). Springer. https://doi.org/10.1007/s11135-017-0574-8
  • Gotelli, Nicholas & Chao, Anne. (2013). Measuring and Estimating Species Richness, Species Diversity, and Biotic Similarity from Sampling Data. 10.1016/B978-0-12-384719-5.00424-X. https://www.uvm.edu/~ngotelli/manuscriptpdfs/Gotelli_Chao_Encyclopedia_2013.pdf

 

Supervisor: Martin Kabierski

 

Topic 7: Runtime Prediction of Alignment Construction Algorithms (Bachelor/Master)

 

Conformance Checking relates a process model to recorded instances of the execution of the process, typically stored in event logs, to determine where expected and actual behaviour deviate from each other. In this context alignment algorithms are regarded as the de facto standard method, due to their interpretability and accuracy in highlighting precise problem areas in the process. Yet, typically run times for alignment construction are prohibitively large, typically caused by a handful of traces in the log, for which the construction of an alignment is especially complex.
One possible solution to this problem could lie in predicting the expected runtime of aligning a trace to the model, for instance using regression-based methods and then ignoring traces, that are expected to take long.

 

In this thesis, the student will:
- assess the factors that influence the runtime of alignments
- derive a methodology for predicting the runtime of alignment construction between event logs and process models
- evaluate the accuracy of the predictor

The student is expected to have existing knowledge of process mining, conformance checking, and basic knowledge of regression analysis, or willingness to learn about these topics under guidance of the supervisor.

 

Initial References:

  • Chapter 1, 2 and 7 in Carmona, J., van Dongen, B., Solti, A., & Weidlich, M. (2018). Conformance checking. Switzerland: Springer. https://doi.org/10.1007/978-3-319-99414-7
  • Backhaus, K., Erichson, B., Weiber, R., Plinke, W. (2016). Regressionsanalyse. In: Multivariate Analysemethoden. Springer Gabler, Berlin, Heidelberg. https://doi.org/10.1007/978-3-662-08893-7_1


Supervisor: Martin Kabierski

 

Topic 8: Event Log Privacy Auditing (Master)

Event Logs are often anonymized to ensure privacy. Most anonymization algorithms use formal privacy guarantees such as differential privacy. The issue with these techniques is that the algorithm or implementation could contain errors. Consequently, anonymized event logs might not have the targeted privacy guarantee and the individuals involved in the dataset might have a higher privacy loss than expected. Differential Privacy Auditing allows to check if an algorithm fulfils the differential privacy guarantee. The aim of this thesis is to adjust know Differential Privacy Auditing techniques to Event Log Anonymization and evaluate anonymization techniques from the process mining domain.

 

Initial references:

  • Stephan A. Fahrenkrog-Petersen, Martin Kabierski, Han van der Aa, Matthias Weidlich:
    Semantics-aware mechanisms for control-flow anonymization in process mining. Inf. Syst.114: 102169 (2023)
  • https://research.google/blog/dp-auditorium-a-flexible-library-for-auditing-differential-privacy/

 

Supervisor: Stephan Fahrenkrogh-Petersen

 

Topic 9: Analysis of theoretical explanations and scientific theories on transitioning from dashboards to decision making in organizational contexts (Bachelor)

This bachelor thesis seeks to analyze theoretical explanations and scientific theories concerning the transition from dashboards to decision-making processes. Dashboards are widely used tools in organizational contexts for decision-making. The study aims to examine the levels of management where dashboards are employed and how they contribute to the decision-making process within organizations.

Literature:

  • Burstein, F., & Holsapple, C. W. (2008). Handbook on Decision Support Systems 2. https://www.academia.edu/83497312/Handbook_on_Decision_Support_Systems_2
  • Maynard, S., Burstein, F., & Arnott, D. (2001). A multi-faceted decision support system evaluation approach. Journal of Decision Systems, 10(3–4), 395–428.
    Mintzberg, H., Raisinghani, D., & Theoret, A. (1976). The Structure of “Unstructured”
  • Decision Processes. Administrative Science Quarterly, 21(2), 246. https://doi.org/10.2307/2392045

 

Supervisor: Kristina Sahling

 

Topic 10: Visualizing Cyclic Time Arrangements in Process Graphs (Bachelor/Master)

Time is essential to understanding processes, yet most process mining approaches are limited to depicting time within a process graph as textual cues or color schemes. Adapting the visual appearance of process graphs to various time arrangements may enhance the accessibility for finding bottlenecks or delays. An example is aligning process graphs along a linear timeline [1]. In cases where processes involve repetitive patterns, such as in chronic health care or crop management, a cyclic arrangement may be useful. However, for the latter, an adequate solution in process mining is needed.

This thesis aims to develop and exemplify a design method for a visual solution in process mining that allows for exploring a cyclic time arrangement in a process graph. We will adapt the research objectives to align with the experience and study goals of the student.

Initial References:

  • H. Kaur, J. Mendling, C. Rubensson, and T. Kampik, “Timeline-based Process Discovery,” CoRR, abs/2401.04114, 2024. Available: https://doi.org/10.48550/arXiv.2401.04114
  • A. Yeshchenko and J. Mendling, “A Survey of Approaches for Event Sequence Analysis and Visualization using the ESeVis Framework.,” CoRR, abs/2202.07941, 2022. Available: https://arxiv.org/abs/2202.07941
  • W. Aigner, S. Miksch, H. Schumann, and C. Tominski, Visualization of Time-Oriented Data. in Human-Computer Interaction Series. London: Springer London, 2011. Available: https://doi.org/10.1007/978-0-85729-079-3.

 

Supervisor: Christoffer Rubensson

 

Topic 11: Advanced Resource Analysis in Process Mining (Bachelor/Master)

In the last decade, process mining techniques have been developed to study human behavior in event data, such as the strength of collaboration between co-workers or even stress levels at a workplace. Since measuring human behavior is complex, this is a welcoming alternative to more labor-intensive methods like surveys. Still, most techniques are relatively simple but could be improved by applying theoretical frameworks from social science.

This thesis aims to develop a resource analysis approach (e.g., a metric, a concept, or a framework) in process mining grounded in an existing theory from social science. We will adapt the research objectives to align with the experience and study goals of the student.

Initial References:

  • J. Nakatumba and W. M. P. van der Aalst, “Analyzing Resource Behavior Using Process Mining,” in Business Process Management Workshops. BPM 2009. Lecture Notes in Business Information Processing, S. Rinderle-Ma, S. Sadiq, and F. Leymann, Eds., Berlin, Heidelberg: Springer Berlin Heidelberg, 2010. Available: https://doi.org/10.1007/978-3-642-12186-9_8.
  • A. Pika, M. Leyer, M. T. Wynn, C. J. Fidge, A. H. M. Ter Hofstede, and W. M. P. Van der Aalst, “Mining Resource Profiles from Event Logs,” in ACM Transactions on Management Information Systems, vol. 8, no. 1, 1:1-30, 2017. Available: https://doi.org/10.1145/3041218.
  • Z. Huang, X. Lu, and H. Duan, “Resource behavior measure and application in business process management,” in Expert Systems with Applications, vol. 39, no. 7, 6458–6468, 2012. Available: https://doi.org/10.1016/j.eswa.2011.12.061.

 

Supervisor: Christoffer Rubensson

 

Topic 12: Anthropomorphic Perceptions of Large Language Models: what is the gender of ChatGPT and its Counterparts? (Bachelor/Master)

Description: In today's digital era, Large Language Models (LLMs) like ChatGPT are transforming the way we interact with technology, often blurring the boundaries between machine and human cognition. This thesis delves into the intriguing realm of anthropomorphism, the human tendency to attribute human-like qualities to non-human entities. Specifically, this research aims to uncover laypeople's underlying beliefs and implicit conceptions about ChatGPT and similar models concerning an implicit gender attribution. By designing and conducting a survey, the thesis will gain insights into individuals' perception of these cutting-edge technologies. The findings can potentially illuminate not only our relationship with LLMs but also the broader implications of human-machine interactions in an increasingly AI-driven world.

 

Initial References:

  • Deshpande, A., Rajpurohit, T., Narasimhan, K., & Kalyan, A. (2023). Anthropomorphization of AI: Opportunities and Risks (arXiv:2305.14784). arXiv. https://doi.org/10.48550/arXiv.2305.14784
  • Farina, M., & Lavazza, A. (2023). ChatGPT in society: Emerging issues. Frontiers in Artificial Intelligence, 6. https://www.frontiersin.org/articles/10.3389/frai.2023.1130913
  • Aşkın, G., Saltık, İ., Boz, T. E., & Urgen, B. A. (2023). Gendered Actions with a Genderless Robot: Gender Attribution to Humanoid Robots in Action. International Journal of Social Robotics, 15(11), 1915–1931. https://doi.org/10.1007/s12369-022-00964-0

 

Supervisor: Jennifer Haase