Before software can be developed its requirements must be stated.
Before requirements can be expressed the application domain must be understood.
In this invited paper we outline some of the basic facets of domain engineering.

Domains seem, it is our experience, far more stable than computing requirements, and these again seem more stable than software designs.
Thus, almost like the universal laws of physics, it pays o to rst develop theories of domains.

But domain engineering, as in fact also requirements engineering, really is in need of thoroughly researched development principles, techniques and tools. The aim of this invited paper is to advocate: that researchers study these development method components, and that universities focus their education on basing well-nigh any course on the use of formal techniques: Specification and verification, and that software engineers take heed: Start applying formal techniques. A brief example of describing stake-holder perspectives will be given - on the background of which we then proceed to survey the notions of domain intrinsics, domain support technologies, domain management & organisation, domain rules & regulations, domain human behaviour, etc. We show elsewhere how to "derive" requirements from domain descriptions. Domain requirements: by domain projection, instantiation, extension and initialisation; interface requirements: multimedia, dialogue, etc.; and machine requirements: performance, dependability (reliability, availability, accessability, safety, etc.), and maintainability (adaptability, perfectability and correctability).

The current paper presents work-in-progress.

The text of the present paper is therefore very schematic.

For half a century since computers came into existence,
the goal of finding elegant and efficient algorithms to
solve "simple" (well-defined and well-structured) problems
has dominated algorithm design. Over the same time period,
both processing and storage capacity of computers have increased
by roughly a factor of a million. The next few decades may
well give us a similar rate of growth in raw computing power,
due to various factors such as continuing miniaturization,
parallel and distributed computing. If a quantitative change
of orders of magnitude leads to qualitative advances,
where will the latter take place?
Only empirical research can answer this question.

Asymptotic complexity theory has emerged as a surprisingly
effective tool for predicting run times of polynomial-time
algorithms. For NP-hard problems, on the other hand,
it yields overly pessimistic bounds. It asserts the non-existence
of algorithms that are efficient across an entire problem class,
but ignores the fact that many instances, perhaps including
those of interest, can be solved efficiently. For such cases
we need a complexity measure that applies to problem instances,
rather than to over-sized problem classes.

We present case studies of combinatorial enumeration and
optimization problems designed as experiments to explore
the practical limits of computational feasibility. And ZRAM,
a program library of parallel search algorithms used for
rapid prototyping of large discrete computations.

Kolmogorov complexity is a modern notion of randomness dealing with the quantity of information in individual objects; that is, pointwise randomness rather than average randomness as produced by a random source. It was proposed by A.N. Kolmogorov in 1965 to quantify the randomness of individual objects in an objective and absolute manner. This is impossible by classical probability theory (a branch of measure theory satisfying the so-called Kolmogorov axioms formulated in 1933). Kolmogorov complexity is known variously as `algorithmic information', `algorithmic entropy', `Kolmogorov-Chaitin complexity', `descriptional complexity', `shortest program length', `algorithmic randomness', and others. Using it, we developed a new mathematical proof technique, now known as the `incompressibility method'. The incompressibility method is a basic general technique such as the `pidgeon hole' argument, `the counting method' or the `probabilistic method'. The new method has been quite successful and we present recent examples. The first example concerns a ``static'' problem in combinatorial geometry. From among $ {n \choose 3}$ triangles with vertices chosen from among $n$ points in the unit square, $U$, let $T$ be the one with the smallest area, and let $A$ be the area of $T$. Heilbronn's triangle problem asks for the maximum value assumed by $A$ over all choices of $n$ points. We consider the average-case: If the $n$ points are chosen independently and at random (uniform distribution) then there exist positive $c$ and $C$ such that $c/n^3 < \mu_n < C/n^3$ for all large enough $n$, where $\mu_n$ is the expectation of $A$. Moreover, with probability close to one $c/n^3 < A < C/n^3$. Our proof determines the area of the smallest triangle for an arrangement in ``general position.'' Our second example concerns a ``dynamic'' problem in average-case running time of algorithms. The question of a nontrivial general lower bound (or upper bound) on the average complexity of Shellsort (due to D.L. Shell in 1959) has been open for about four decades. We present such a lower bound for the general case of multipass Shellsort. This is joint work with Tao Jiang and Ming Li. For more information see ``New Scientist'' of November 6, 1999, pp 44-47; ``Die Zeit'' of April 13, 2000 (#16), p. 40; ``Courrier International'' of December 23, 1999 -- January 5, 2000 (#41), p. 477-478; or http://www.cwi.nl/~paulv

Modern molecular biology generates large amounts of data, think of projects like the Human Genome project. This data poses new challenges to Computer Science, e.g., databases and data mining. The data consists of long strings over a few different (simple) alphabets. Moreover, (parts) of the string may be annotated and meta-information is of prime importance for the interpretation of the string. In other words, the data will fit more naturally in a semi-structured or XML database than in a plain relational one. But there is more, for a prime selection method is partial string matching with, perhaps unspecified, gaps. Which is rather more complex than the simple select we all know about. Querying such biological data is one thing, it is more useful to be able to mine it. There are many questions, ranging from "where are the genes" to how are these genes activated for which no good theoretical models exist. By searching for patterns in the data it is hoped that such questions can be answered. In this talk I will present a survey of current research and outline the design of an integrated biologist's workbench that supports both the database and the data mining aspects of Bio-Informatics.

Many recent applications of interest involve self-interested participants (agents). As such agents may manipulate the algorithm for their {\em own} benefit, a new challenge emerges: The design of algorithms and protocols that perform well when the agents behave according to their own self-interest. This led several researchers to consider computational models that are based on a sub-field of game theory and mathematical economics called mechanism design. The main goals of this talk are to introduce this topic mainly through examples, to show that in many cases, selfishness can be satisfactorily overcome, to survey some of the recent trends in this area and to present new challenging problems. This talk is mostly based on classic results from mechanism design as well as on recent work by the author and others. Key Phrases: Algorithms, mechanism design, game theory, strategies, incentive compatible mechanisms.

Probably most people are aware of the following strategy for coping with long lines in a supermarket: Split up your group and wait in two lines, move to the quicker line in the last moment. In recent years it turned out that ideas based on such simple principles as considering two instead of just one choices can give rise to a dramatic increase in the performance of algorithms controlling for example data servers or implementing strategies for load distribution on multiprocessor machines. In this talk we discuss some of these examples in detail.

The paper introduces the fundamental concepts and requirements of SW testing, automated SW testing and diagnostics. Further, the basic terminological issues will be discussed. Also, the relation between the quality theory and the testing process will be stressed. The resulting impact of the SW diagnostics to total quality management process will be presented. We introduce some of the practical methods of test evaluation that are an integral part of the SW diagnostics. The concept will be demonstrated on case studies from practical software testing and diagnostics projects. They are based on the cooperation with large international companies in the field of industrial automation and medical instrumentation. The industrial experience will be summarized and compared with theoretical assumptions.

This lecture is aimed at audience unaware or ignorant of technologies, design and physical implementation of computer semiconductor components. The lecture opens with short description of structure of NMOS, PMOS transistors on silicon wafer. The rest of it focuses on algorithms used in the process of chip design, which can be segmented into these six steps:

**Design capture**

Brief introduction into concepts of Verilog RTL language; sample of Verilog code with explanation of intended function.- Logic synthesis

Explanation of task of logic synthesis: mapping on library of standard cells, meeting timing constraints. **Sample of available tools**(Synopsys scripts)**Floorplanning**

Partitioning of a design into blocks, placement of blocks on die, pads, power and ground grid.**Place & route**

Discussion of algorithms for automated placement of standard cells (min-cut, annealing, quadratic programming). Demo of maze router algorithm.**Verification**

Explanation of matching silicon geometries to input netlist.**Extraction**

Methods of calculation resistance and capacitance, feedback into design.

Analysis patterns are patterns used in the analysis phase of software systems development. Like any other patterns, e.g. Design patterns, Analysis patterns are recurring combinations of meaningful units that occur in some context. When comparing to Design patterns, however, Analysis patterns are about semantics, i.e. they consist of entities with their relationships that have some meaning in an application domain. To the contrary, Design patterns are mainly about flexibility and can be applied to any application.

In the first part of the Invited talk some data modeling principles and the concept of an Analysis pattern will be introduced. Next fundamental patterns for modeling an enterprise will be provided. Furthermore, Analysis patterns of several authors will be compared briefly. Finally, some applications of Analysis patterns in software projects of our company will be shown.

The growing demand for more efficient and effective healthcare services, coupled with an implicit requirement for supporting citizen mobility and continuity of care, is currently setting the stage for the exploitation of Information and Telecommunications Technologies in the healthcare sector. The current vision in the healthcare sector comprises affordable wireless access to healthcare resources and services for all citizens, thus making medical expertise a shared resource wherever and whenever needed. Important areas in which Information Society Technologies are likely to have a significant impact include those of pre-hospital health emergencies, remote monitoring of patients with chronic conditions, and medical collaboration through sharing of health-related information resources. Accessibility to these and other media-rich user-oriented services, in the context of the emerging Global Information Society, will be supported by a Healthcare Information Infrastructure (HII) which can achieve effective horizontal integration of networked information sources.

The great challenge in applying information and telecommunications technologies in the healthcare sector is how to be creative in exploiting the opportunities afforded by these emerging technologies, so that we may improve peoples’ lives, while providing clinically significant and cost-effective added-value services to the healthcare community. At the same time, we need to ensure that the potential benefit to be derived from technological advances also finds its way to the scene of an accident, the (virtual) home, and the (virtual) working place of all citizens.

The healthcare sector can serve as the test-bed for assessing the impact which new information and telecommunication technologies will have on our lives. For the past several years, we have witnessed revolutionary technological developments, which afford us a pragmatic look into our digital future. However, the stringent requirements for quality of service will make the process of change in the health sector evolutionary rather than revolutionary. A plethora of unresolved technical, economic, regulatory, legal, and training issues will have to be settled before the real impact of technological change in this important sector can be fully assessed.

In my thesis in 1992 'Language Learning from a categorial point of view' I developed a theoretical approach to grammar induction based on a clustering algorithm. I described two algorithms for grammar induction EMILE 1.0 and 2.0. The main result was that shallow context-free languages can be learned efficiently provided that the samples are drawn according to a simple distribution. A language is shallow if every syntactical construction has an example sentence with a complexity that is logarithmic in the complexity of the grammar as a whole. My conjecture was that natural languages satisfy this shallowness constraint. In the past decade we improved the EMILE approach, but until recently our implementations did not scale very well and we could not verify the theoretical claims of 1992 emperically. . In 1999 Marco Vervoort finished EMILE 4.1, a clustering algorithm, that can learn a subset of the class of shallow context-free languages very efficiently. Emile 4.1 is one of the first efficient grammar induction algorithms that work on free text. Although Emile is far from perfect it enables researchers to start emperical grammar induction research on various types of corpora. We are currently applying Emile to a large body of different texts and messages: a textbook on Dutch for foreigners, biomedical data, vocational profiles and archeological inscriptions. Emile might be useful as the kernel of a tool that constructs knowledge bases from free text. Emile can be applied to words instead of sentences and learn morphological structures. It can be used to extract semantic data from a text. Emile might be of help to develop mathematical models of first and second language acquisition. The world wide web is an obvious application domain for Emile. In my talk I will describe various aspects of this research.

The statistical measures for similarity have been widely used in textual information retrieval for many decades. They are the basis to improve the effectiveness of IR systems, including retrieval, categorization, filtering, and summarization. In this talk, I will show the abstract associative calculation parametrized with statistical measures provides a computational basis for various tasks required in IR. I will introduce the practical system we developed for this associative calculation, and outlined how it is used for implementing various functions for information access. I also provide a live demonstration of a document search interface called DualNAVI we are developing. It is a good example of new generation information access methods based on the associative calculation.

On desert ants, insect eyes, and robots

In the course of evolutionary history, the visual
system has evolved as part of a complete autonomous agent in the service
of motor control. Therefore, the synthetic methodology investigates
visual skills in the context of tasks a complete agent has to perform in
a particular environment using autonomous mobile robots as modeling
tools. The desert ant *Cataglyphis *is a solitary forager that
employs polarization vision as well as visual landmark navigation in
order to return to its nest after extensive foraging trips . The
snapshot model for insect landmark navigation, a very simple model, is
the currently accepted state-of-the art model in biology. Recently, a
much simpler alternative, the average landmark vector model (ALV) has
been developed. The ALV model has been demonstrated to predict behavior
equally well as more complex models based on snapshots. In the
presentation, polarization vision, and both landmark navigation models
are introduced. An analog implementation of the ALV model (i.e. a robot
without software) and an application to indoor navigation in general are
discussed. Further explorations of eye morphology in the context of
navigation tasks will be discussed. In particular we will present an
experiment in which the morphology of a robot eye that can compensate
for motion parallax is evolved on a real robot using an evolution
strategy.

*Key phrases:* synthetic
methodology, polarization vision, visual landmark navigation, average
landmark vector model, analog robot, evolution of eye morphology

The successful functioning of complex (autonomous)
systems in a dynamically changing environment requires rich sensory
input and advanced sensory-motor control. Such systems are typically
equipped with multiple sensors and multiple actuators and the
interactions between the system and the environment can be characterized
by both discrete and continuous models. The accomplishment of various
tasks is mediated by complex coordination and interaction between
individual sensing and control strategies.

In the first part of the talk I will introduce a framework for modeling
and verifying distributed team of autonomous mobile robots, with vision
as a prime sensing modality. The framework proposes to model elementary
sensing and control strategies by making appropriate finite state
machine abstractions which capture the discrete event aspects of
continuous models. The verification of the discrete event interactions
between strategies is formulated as a Supervisory Control Theory
Problem, where we synthesize a supervisor which serves as a run-time
scheduler and monitor of the task execution.

Similar modeling paradigm can be applied in more a structured, but more
dynamic environment of automated highways. In these high performance
applications, I explore the use of dynamic models and demonstrate the
use of vision as a sensor for lateral and longitudinal control of
vehicles in highway scenarios. Looking at task of visually guided
navigation I will provide some guidelines on how by using additional
assumptions about the environment and motion constraints one can
systematically approach the design of a system with multiple sensing and
control hierarchies.

The appearance-based approaches to vision problems
have recently received a renewed attention in the vision community due
to their ability to deal with combined effects of shape, reflectance
properties, pose in the scene, and illumination conditions. Besides,
appearance-based representations can be acquired through an automatic
learning phase which is not the case with traditional shape
representations. The approach has led to a variety of successful
applications, e.g., visual positioning and tracking of robo
manipulators, visual inspection, and human face recognition.

In this
talk I will present the basic ideas of eigenspace methods for
appearance-based object recognition. Various methods for training and
recognizing objects will be outlined. I will also identify the major
limitations of the approach and present algorithms how these limitations
can be handled, leading to an object recognition system which is
applicable in real world situations. Throughout the talk all concepts
introduced will be demonstrated on real images.

Due to modern information technology, which produces ever more powerful
computers every year, it is possible today to collect, store, transfer, and
combine huge amounts of data at very low costs. However, exploiting the
information contained in these archives in an intelligent way turns out to
be fairly difficult. In contrast to the abundance of data there is a lack
of tools that can transform these data into useful information and knowledge.

In reply to these challenges a new area of research has emerged, which has
been named "Knowledge Discovery in Databases" (KDD) or "Data Mining".
However, data mining methods often neglect the goal of discovering
understandable patterns. Our own experience - gathered in several
co-operations with industry - is that modern technologies are accepted more
readily, if the applied methods are easy to understand and the results can
be checked against human intuition or can be used for gaining insight into
a domain of for training.

Therefore we suggest to concentrate on information mining, which we see as
an extension of data mining and which can be defined in analogy to the KDD
definition as follows: Information mining is the non-trivial process of
identifying valid, novel, potentially useful, and understandable patterns
in heterogeneous information sources.

The term information is thus meant to indicate two things: In the first
place, it points out that the heterogeneous sources to mine can already
provide information, understood as expert background knowledge, textual
descriptions, images and sounds etc., and not only raw (numeric) data.
Secondly, it emphasises that the results must be comprehensible (must
provide a user with information), so that a user can check their
plausibility and can get insight into the domain the data comes from.

The goal of fuzzy systems has always been to model human expert knowledge
and to produce systems that are easy to understand. Therefore we expect
fuzzy systems technology to play a prominent role in the quest to meet
these challenges. In this talk we illustrate how fuzzy data analysis
techniques can help to do information mining.

In this talk we will especially discuss Neuro-Fuzzy Data Analysis and
Possibilistic Graphical Models. Neuro-fuzzy methods are useful to produce
understandable solutions for less complex domains while possibilistic
graphical models can be used if complex knowledge must be represented and
uncertainty and vagueness must be handled. We illustrate these technologies
using our own freely available software tools and by describing the results
of some of our industrial projects, e.g. a solution developed for
Daimler-Chrysler, where large databases of vehicle records had to be
analysed for possible causes of faults.