Domain Engineering A Software Engineering Discipline in Need of Research

D. Bjorner

Before software can be developed its requirements must be stated. Before requirements can be expressed the application domain must be understood. In this invited paper we outline some of the basic facets of domain engineering.
Domains seem, it is our experience, far more stable than computing requirements, and these again seem more stable than software designs. Thus, almost like the universal laws of physics, it pays o to rst develop theories of domains.
But domain engineering, as in fact also requirements engineering, really is in need of thoroughly researched development principles, techniques and tools. The aim of this invited paper is to advocate: that researchers study these development method components, and that universities focus their education on basing well-nigh any course on the use of formal techniques: Specification and verification, and that software engineers take heed: Start applying formal techniques. A brief example of describing stake-holder perspectives will be given - on the background of which we then proceed to survey the notions of domain intrinsics, domain support technologies, domain management & organisation, domain rules & regulations, domain human behaviour, etc. We show elsewhere how to "derive" requirements from domain descriptions. Domain requirements: by domain projection, instantiation, extension and initialisation; interface requirements: multimedia, dialogue, etc.; and machine requirements: performance, dependability (reliability, availability, accessability, safety, etc.), and maintainability (adaptability, perfectability and correctability).
The current paper presents work-in-progress.
The text of the present paper is therefore very schematic.


Exhaustive search, combinatorial optimization and enumeration: Exploring the potential of raw computing power

J. Nievergelt

For half a century since computers came into existence, the goal of finding elegant and efficient algorithms to solve "simple" (well-defined and well-structured) problems has dominated algorithm design. Over the same time period, both processing and storage capacity of computers have increased by roughly a factor of a million. The next few decades may well give us a similar rate of growth in raw computing power, due to various factors such as continuing miniaturization, parallel and distributed computing. If a quantitative change of orders of magnitude leads to qualitative advances, where will the latter take place? Only empirical research can answer this question.
Asymptotic complexity theory has emerged as a surprisingly effective tool for predicting run times of polynomial-time algorithms. For NP-hard problems, on the other hand, it yields overly pessimistic bounds. It asserts the non-existence of algorithms that are efficient across an entire problem class, but ignores the fact that many instances, perhaps including those of interest, can be solved efficiently. For such cases we need a complexity measure that applies to problem instances, rather than to over-sized problem classes.
We present case studies of combinatorial enumeration and optimization problems designed as experiments to explore the practical limits of computational feasibility. And ZRAM, a program library of parallel search algorithms used for rapid prototyping of large discrete computations.


Kolmogorov Complexity and Its Applications

P. Vitanyi

Kolmogorov complexity is a modern notion of randomness dealing with the quantity of information in individual objects; that is, pointwise randomness rather than average randomness as produced by a random source. It was proposed by A.N. Kolmogorov in 1965 to quantify the randomness of individual objects in an objective and absolute manner. This is impossible by classical probability theory (a branch of measure theory satisfying the so-called Kolmogorov axioms formulated in 1933). Kolmogorov complexity is known variously as `algorithmic information', `algorithmic entropy', `Kolmogorov-Chaitin complexity', `descriptional complexity', `shortest program length', `algorithmic randomness', and others. Using it, we developed a new mathematical proof technique, now known as the `incompressibility method'. The incompressibility method is a basic general technique such as the `pidgeon hole' argument, `the counting method' or the `probabilistic method'. The new method has been quite successful and we present recent examples. The first example concerns a ``static'' problem in combinatorial geometry. From among $ {n \choose 3}$ triangles with vertices chosen from among $n$ points in the unit square, $U$, let $T$ be the one with the smallest area, and let $A$ be the area of $T$. Heilbronn's triangle problem asks for the maximum value assumed by $A$ over all choices of $n$ points. We consider the average-case: If the $n$ points are chosen independently and at random (uniform distribution) then there exist positive $c$ and $C$ such that $c/n^3 < \mu_n < C/n^3$ for all large enough $n$, where $\mu_n$ is the expectation of $A$. Moreover, with probability close to one $c/n^3 < A < C/n^3$. Our proof determines the area of the smallest triangle for an arrangement in ``general position.'' Our second example concerns a ``dynamic'' problem in average-case running time of algorithms. The question of a nontrivial general lower bound (or upper bound) on the average complexity of Shellsort (due to D.L. Shell in 1959) has been open for about four decades. We present such a lower bound for the general case of multipass Shellsort. This is joint work with Tao Jiang and Ming Li. For more information see ``New Scientist'' of November 6, 1999, pp 44-47; ``Die Zeit'' of April 13, 2000 (#16), p. 40; ``Courrier International'' of December 23, 1999 -- January 5, 2000 (#41), p. 477-478; or http://www.cwi.nl/~paulv


Bio-Informatics, Databases and Data Mining

A. Siebes

Modern molecular biology generates large amounts of data, think of projects like the Human Genome project. This data poses new challenges to Computer Science, e.g., databases and data mining. The data consists of long strings over a few different (simple) alphabets. Moreover, (parts) of the string may be annotated and meta-information is of prime importance for the interpretation of the string. In other words, the data will fit more naturally in a semi-structured or XML database than in a plain relational one. But there is more, for a prime selection method is partial string matching with, perhaps unspecified, gaps. Which is rather more complex than the simple select we all know about. Querying such biological data is one thing, it is more useful to be able to mine it. There are many questions, ranging from "where are the genes" to how are these genes activated for which no good theoretical models exist. By searching for patterns in the data it is hoped that such questions can be answered. In this talk I will present a survey of current research and outline the design of an integrated biologist's workbench that supports both the database and the data mining aspects of Bio-Informatics.


Algorithms for Rational Agents

A. Ronen

Many recent applications of interest involve self-interested participants (agents). As such agents may manipulate the algorithm for their {\em own} benefit, a new challenge emerges: The design of algorithms and protocols that perform well when the agents behave according to their own self-interest. This led several researchers to consider computational models that are based on a sub-field of game theory and mathematical economics called mechanism design. The main goals of this talk are to introduce this topic mainly through examples, to show that in many cases, selfishness can be satisfactorily overcome, to survey some of the recent trends in this area and to present new challenging problems. This talk is mostly based on classic results from mechanism design as well as on recent work by the author and others. Key Phrases: Algorithms, mechanism design, game theory, strategies, incentive compatible mechanisms.


On the Power of Two Choices

Angelika Steger

Probably most people are aware of the following strategy for coping with long lines in a supermarket: Split up your group and wait in two lines, move to the quicker line in the last moment. In recent years it turned out that ideas based on such simple principles as considering two instead of just one choices can give rise to a dramatic increase in the performance of algorithms controlling for example data servers or implementing strategies for load distribution on multiprocessor machines. In this talk we discuss some of these examples in detail.


Software Testing & Diagnostics: Theory & Practice

Vladimir Marik

The paper introduces the fundamental concepts and requirements of SW testing, automated SW testing and diagnostics. Further, the basic terminological issues will be discussed. Also, the relation between the quality theory and the testing process will be stressed. The resulting impact of the SW diagnostics to total quality management process will be presented. We introduce some of the practical methods of test evaluation that are an integral part of the SW diagnostics. The concept will be demonstrated on case studies from practical software testing and diagnostics projects. They are based on the cooperation with large international companies in the field of industrial automation and medical instrumentation. The industrial experience will be summarized and compared with theoretical assumptions.


PHYSICAL DESIGN OF A CMOS SEMICONDUCTOR CHIP IN SIX EASY STEPS

Sidney Benda

This lecture is aimed at audience unaware or ignorant of technologies, design and physical implementation of computer semiconductor components. The lecture opens with short description of structure of NMOS, PMOS transistors on silicon wafer. The rest of it focuses on algorithms used in the process of chip design, which can be segmented into these six steps:

  1. Design capture
    Brief introduction into concepts of Verilog RTL language; sample of Verilog code with explanation of intended function.
  2. Logic synthesis
    Explanation of task of logic synthesis: mapping on library of standard cells, meeting timing constraints.
  3. Sample of available tools (Synopsys scripts)
  4. Floorplanning
    Partitioning of a design into blocks, placement of blocks on die, pads, power and ground grid.
  5. Place & route
    Discussion of algorithms for automated placement of standard cells (min-cut, annealing, quadratic programming). Demo of maze router algorithm.
  6. Verification
    Explanation of matching silicon geometries to input netlist.
  7. Extraction
    Methods of calculation resistance and capacitance, feedback into design.

Analysis patterns

Lubor Sesera

Analysis patterns are patterns used in the analysis phase of software systems development. Like any other patterns, e.g. Design patterns, Analysis patterns are recurring combinations of meaningful units that occur in some context. When comparing to Design patterns, however, Analysis patterns are about semantics, i.e. they consist of entities with their relationships that have some meaning in an application domain. To the contrary, Design patterns are mainly about flexibility and can be applied to any application.

In the first part of the Invited talk some data modeling principles and the concept of an Analysis pattern will be introduced. Next fundamental patterns for modeling an enterprise will be provided. Furthermore, Analysis patterns of several authors will be compared briefly. Finally, some applications of Analysis patterns in software projects of our company will be shown.


Information Society Technologies in Healthcare

Stelios Orphanoudakis

The growing demand for more efficient and effective healthcare services, coupled with an implicit requirement for supporting citizen mobility and continuity of care, is currently setting the stage for the exploitation of Information and Telecommunications Technologies in the healthcare sector. The current vision in the healthcare sector comprises affordable wireless access to healthcare resources and services for all citizens, thus making medical expertise a shared resource wherever and whenever needed. Important areas in which Information Society Technologies are likely to have a significant impact include those of pre-hospital health emergencies, remote monitoring of patients with chronic conditions, and medical collaboration through sharing of health-related information resources. Accessibility to these and other media-rich user-oriented services, in the context of the emerging Global Information Society, will be supported by a Healthcare Information Infrastructure (HII) which can achieve effective horizontal integration of networked information sources.

The great challenge in applying information and telecommunications technologies in the healthcare sector is how to be creative in exploiting the opportunities afforded by these emerging technologies, so that we may improve peoples’ lives, while providing clinically significant and cost-effective added-value services to the healthcare community. At the same time, we need to ensure that the potential benefit to be derived from technological advances also finds its way to the scene of an accident, the (virtual) home, and the (virtual) working place of all citizens.

The healthcare sector can serve as the test-bed for assessing the impact which new information and telecommunication technologies will have on our lives. For the past several years, we have witnessed revolutionary technological developments, which afford us a pragmatic look into our digital future. However, the stringent requirements for quality of service will make the process of change in the health sector evolutionary rather than revolutionary. A plethora of unresolved technical, economic, regulatory, legal, and training issues will have to be settled before the real impact of technological change in this important sector can be fully assessed.


Grammar Induction on large corpora and it's aplications

Pieter Adriaans

In my thesis in 1992 'Language Learning from a categorial point of view' I developed a theoretical approach to grammar induction based on a clustering algorithm. I described two algorithms for grammar induction EMILE 1.0 and 2.0. The main result was that shallow context-free languages can be learned efficiently provided that the samples are drawn according to a simple distribution. A language is shallow if every syntactical construction has an example sentence with a complexity that is logarithmic in the complexity of the grammar as a whole. My conjecture was that natural languages satisfy this shallowness constraint. In the past decade we improved the EMILE approach, but until recently our implementations did not scale very well and we could not verify the theoretical claims of 1992 emperically. . In 1999 Marco Vervoort finished EMILE 4.1, a clustering algorithm, that can learn a subset of the class of shallow context-free languages very efficiently. Emile 4.1 is one of the first efficient grammar induction algorithms that work on free text. Although Emile is far from perfect it enables researchers to start emperical grammar induction research on various types of corpora. We are currently applying Emile to a large body of different texts and messages: a textbook on Dutch for foreigners, biomedical data, vocational profiles and archeological inscriptions. Emile might be useful as the kernel of a tool that constructs knowledge bases from free text. Emile can be applied to words instead of sentences and learn morphological structures. It can be used to extract semantic data from a text. Emile might be of help to develop mathematical models of first and second language acquisition. The world wide web is an obvious application domain for Emile. In my talk I will describe various aspects of this research.


Information Access based on Associative Calculation

Akihiko Takano

The statistical measures for similarity have been widely used in textual information retrieval for many decades. They are the basis to improve the effectiveness of IR systems, including retrieval, categorization, filtering, and summarization. In this talk, I will show the abstract associative calculation parametrized with statistical measures provides a computational basis for various tasks required in IR. I will introduce the practical system we developed for this associative calculation, and outlined how it is used for implementing various functions for information access. I also provide a live demonstration of a document search interface called DualNAVI we are developing. It is a good example of new generation information access methods based on the associative calculation.


Cheap vision - exploiting ecological niche and morphology

Rolf Pfeifer

On desert ants, insect eyes, and robots

In the course of evolutionary history, the visual system has evolved as part of a complete autonomous agent in the service of motor control. Therefore, the synthetic methodology investigates visual skills in the context of tasks a complete agent has to perform in a particular environment using autonomous mobile robots as modeling tools. The desert ant Cataglyphis is a solitary forager that employs polarization vision as well as visual landmark navigation in order to return to its nest after extensive foraging trips . The snapshot model for insect landmark navigation, a very simple model, is the currently accepted state-of-the art model in biology. Recently, a much simpler alternative, the average landmark vector model (ALV) has been developed. The ALV model has been demonstrated to predict behavior equally well as more complex models based on snapshots. In the presentation, polarization vision, and both landmark navigation models are introduced. An analog implementation of the ALV model (i.e. a robot without software) and an application to indoor navigation in general are discussed. Further explorations of eye morphology in the context of navigation tasks will be discussed. In particular we will present an experiment in which the morphology of a robot eye that can compensate for motion parallax is evolved on a real robot using an evolution strategy.

Key phrases: synthetic methodology, polarization vision, visual landmark navigation, average landmark vector model, analog robot, evolution of eye morphology


Hierarchies of sensing and control for visually guided agents.

Jana Kosecka

The successful functioning of complex (autonomous) systems in a dynamically changing environment requires rich sensory input and advanced sensory-motor control. Such systems are typically equipped with multiple sensors and multiple actuators and the interactions between the system and the environment can be characterized by both discrete and continuous models. The accomplishment of various tasks is mediated by complex coordination and interaction between individual sensing and control strategies.
In the first part of the talk I will introduce a framework for modeling and verifying distributed team of autonomous mobile robots, with vision as a prime sensing modality. The framework proposes to model elementary sensing and control strategies by making appropriate finite state machine abstractions which capture the discrete event aspects of continuous models. The verification of the discrete event interactions between strategies is formulated as a Supervisory Control Theory Problem, where we synthesize a supervisor which serves as a run-time scheduler and monitor of the task execution.
Similar modeling paradigm can be applied in more a structured, but more dynamic environment of automated highways. In these high performance applications, I explore the use of dynamic models and demonstrate the use of vision as a sensor for lateral and longitudinal control of vehicles in highway scenarios. Looking at task of visually guided navigation I will provide some guidelines on how by using additional assumptions about the environment and motion constraints one can systematically approach the design of a system with multiple sensing and control hierarchies.


Recognizing Objects by their Appearance using Eigenimages

Horst Bischof

The appearance-based approaches to vision problems have recently received a renewed attention in the vision community due to their ability to deal with combined effects of shape, reflectance properties, pose in the scene, and illumination conditions. Besides, appearance-based representations can be acquired through an automatic learning phase which is not the case with traditional shape representations. The approach has led to a variety of successful applications, e.g., visual positioning and tracking of robo manipulators, visual inspection, and human face recognition.
In this talk I will present the basic ideas of eigenspace methods for appearance-based object recognition. Various methods for training and recognizing objects will be outlined. I will also identify the major limitations of the approach and present algorithms how these limitations can be handled, leading to an object recognition system which is applicable in real world situations. Throughout the talk all concepts introduced will be demonstrated on real images.


Information Mining with Fuzzy Methods

Rudolf Kruse

Due to modern information technology, which produces ever more powerful computers every year, it is possible today to collect, store, transfer, and combine huge amounts of data at very low costs. However, exploiting the information contained in these archives in an intelligent way turns out to be fairly difficult. In contrast to the abundance of data there is a lack of tools that can transform these data into useful information and knowledge.
In reply to these challenges a new area of research has emerged, which has been named "Knowledge Discovery in Databases" (KDD) or "Data Mining". However, data mining methods often neglect the goal of discovering understandable patterns. Our own experience - gathered in several co-operations with industry - is that modern technologies are accepted more readily, if the applied methods are easy to understand and the results can be checked against human intuition or can be used for gaining insight into a domain of for training.
Therefore we suggest to concentrate on information mining, which we see as an extension of data mining and which can be defined in analogy to the KDD definition as follows: Information mining is the non-trivial process of identifying valid, novel, potentially useful, and understandable patterns in heterogeneous information sources.
The term information is thus meant to indicate two things: In the first place, it points out that the heterogeneous sources to mine can already provide information, understood as expert background knowledge, textual descriptions, images and sounds etc., and not only raw (numeric) data. Secondly, it emphasises that the results must be comprehensible (must provide a user with information), so that a user can check their plausibility and can get insight into the domain the data comes from.
The goal of fuzzy systems has always been to model human expert knowledge and to produce systems that are easy to understand. Therefore we expect fuzzy systems technology to play a prominent role in the quest to meet these challenges. In this talk we illustrate how fuzzy data analysis techniques can help to do information mining.
In this talk we will especially discuss Neuro-Fuzzy Data Analysis and Possibilistic Graphical Models. Neuro-fuzzy methods are useful to produce understandable solutions for less complex domains while possibilistic graphical models can be used if complex knowledge must be represented and uncertainty and vagueness must be handled. We illustrate these technologies using our own freely available software tools and by describing the results of some of our industrial projects, e.g. a solution developed for Daimler-Chrysler, where large databases of vehicle records had to be analysed for possible causes of faults.