Research

Department of Computer Science | Research

Research

Research Vision

The Computer Architecture and Communication Group is focusing it's research on all aspects of dependable systems. Dependability is an umbrella term for several system properties such as availability, reliability, safety, integrity, maintainability and real-time. In this context we focus on concepts, methods, algorithms and technologies in order to analyze and create dependable systems in various domains ranging from embedded real-time systems to distributed service-oriented systems and cloud computing.

Our research targets three basic principles of dependable systems. All three aspects can be covered both at design time (oflline) and run time (online). In more detail, the building blocks comprise the following:

Data acquisition: Our research usually follows an evidence-based approach to dependable system design and analysis, focussing on data-driven algorithms and analysis techniques. Since data is the typical starting point of our activities, data acquisition strategies are a major builidng block of our work. Typical research topics are efficient runtime monitoring concepts and variable selection techniques at several architectural levels. One example approach is the interweaving of location information with standard monitoring data, in order to get better reliability properties for mobiole and wireless systems. Another example is the quantitative analysis of complex IT infrastructure management processes, which first demands data gathering through a qualitative assessment.
Evaluation: System surveillance data needs to be evaluated in order to draw conclusions for the dependable system design. In case of an offline analysis, we apply statistical methods such as phase-type distributions or graph theory in order to determine critical paths in the hardware / software setup. In the online case, evaluation comprises statistical methods such as pattern recognition using hidden semi-Markov models or rank-sum tests.
Action: Dependability of systems can only be improved if the analyses are followed by appropriate actions. In the offline case, this means for example to develop improved IT management processes. In the online case, we focus on prediction-triggered preventive maintenance techniques that aim to prevent an upcoming failure or try to minimize its impact. This can be achieved, e.g., by restart techniques or optimal runtime reconfiguration.

All three building blocks are accompanied by general techniques for assessment, modeling and design of systems and system components. Other examples include the assessment of service level agreements and service level objectives.

Research Topics

Our research activities in the area of dependable systems cover the following topics:

Proactive failure avoidance, recovery and maintenance
Adaptive and self-healing systems
Dependability assessment of IT infrastructures (SHIP model)
Dependability of embedded systems
NOMADS - Network Of Mobile Adaptive Dependable Systems
Dependability in mobile communication and ad-hoc networks
Location-based services
Composability and verification of software systems
Failure prediction for hardware components and complex IT infrastructures
Dependable multi-core computing / GPU systems
Dependability in distributed systems and service-oriented architectures
Stochastic modelling of dependability in complex systems
Planning and implementation of service level agreements

Current Research Projects

Aletheia - Semantic federation of product information. In this project, we focus on the system architecture, information retrieval and information location, and presentation. Please check the Aletheia home page for further information. Our group contributes to the system architecture, information gathering and location (see also MagicMap project), and information display.
Contact person: Peter Ibach
SmartKanban: A Kanban system based on intelligent, networked and ultra cost-efficient sensor nodes
Contact person: Peter Ibach, Bratislav Milic
Magic Map: Cooperative positioning with WLAN and RFID.
Contact person: Peter Ibach
METRIK: Graduate school on "Model-based development of technologies for self-organizing decentral information systems for catastrophy management"
People involved: Andreas Dittrich
Proactive Fault Management in the Cloud: Cloud computing provides technology for a new era of dependable computing where redundancy in space can easily be managed dynamically.
Contact person: Felix Salfner

Current research activities are presented and discussed in the weekly research seminar.

Finished Projects

InterVal - Internet and Value Chains (BMBF, 2003-2007)
CERO - CE Robots Community (Microsoft, 2003)
DISCOURSE - The Berlin Distributed Computing Laboratory
DFG project "QoS Management in heterogenen Rechnernetzen mit variabler Topologie"
DFG project "Failure prediction in critical infrastructures"

DFG project "Reliability and Performance of Service-Oriented Architectures"
Graduate school "Stochastische Modellierung und quantitative Analyse großer Systeme in den Ingenieurwissenschaften"
InterVal - Internet and Value Chains (BMBF, 2003-2007)
CERO - CE Robots Community (Microsoft, 2003)
DISCOURSE - The Berlin Distributed Computing Laboratory
DFG-Projekt "QoS Management in heterogenen Rechnernetzen mit variabler Topologie"
DFG-Projekt "Failure prediction in critical infrastructures"

DFG-Projekt "Reliability and Performance of Service-Oriented Architectures"
Graduiertenkolleg "Stochastische Modellierung und quantitative Analyse großer Systeme in den Ingenieurwissenschaften"

Motherboard Failure Prediction (Intel, 2006-2008)

Finished Thesis

Study thesises and diploma thesis

Publications

Group publication list

Responsible for this page: Felix Salfner