Postgraduate Seminar Series - Past Seminars (2000)

Department of Computer Science

The University of Melbourne

February 11th: Tetra Lindarto
The Design and Analysis of PPM Compression Algorithm
PPM (Prediction by Partial Matching) is one example of a context-based text compression. This method is capable of a very efficient compression scheme. PPM uses several orders blended together to obtain a good probability estimate of the next symbol. PPM always predicts the current symbol from the highest order and if the symbol is not found, an escape symbol will be sent and and then uses the next lower order to predict the probability of the current symbol. At the beginning, there will be a lot of escape symbols to be sent to the decoder and this will add some overhead in the compression effectiveness. In this seminar, we will try to reduce the amount of escape symbols sent to the decoder by starting the search from the best performance order in the past instead of from the highest order.
February 18th: Tony Wirth
New approaches to the Burrows-Wheeler Transform
The Burrows-Wheeler Transform permutes text into a form that is easier to compress. It is a one-to-one transform, and after compression and decompression the original text can be retrieved. Traditionally, the transformed text was further encoded using a move-to-front list. The new approaches I have been exploring instead try to exploit the similarity of the transformed text to the action of PPM (cf Tetra's seminar), and also try to do without the move-to-front list.

No familiarity with the Burrows-Wheeler Transform will be expected, nor is it essential to understand PPM, in case you can't attend Tetra's seminar.

June 1st: Xiaoying (Sharon) Gao
Knowledge-Based Approach to Information Extraction from the World Wide Web
This research investigates a knowledge-based approach to the generation of systems that extract information from Web pages with semi-structured data. Knowledge is classified into three categories: General (G), Domain specific (D) and Site specific (S). We introduce methods for developing systems based on {G, D}, {G, S} and {G, D, S}, and successfully demonstrate the methods with prototype systems which extract information from online real estate advertisements and car advertisements. Our approach provides knowledge representation methods, knowledge acquisition algorithms, and information extraction algorithms to support system generation.
June 9th: Xiuzhen (Jenny) Zhang
Exploring Constraints to Efficiently Mine Emerging Patterns from Large High-dimensional Datasets
Emerging patterns (EPs) are knowledge patterns whose support increase sharply from a background dataset to a target dataset; the support ratio of EPs is called growth rate. Interesting long EPs often have low support; mining such EPs from high-dimensional dense datasets is a great challenge due to the combinatorial explosion of the number of candidates. We propose a Constraint-based EP Miner, ConsEPMiner, that utilizes inherent as well as external constraints for effectively pruning the search space. ConsEPMiner can efficiently mine all EPs at low support on large high-dimensional datasets, with low minimums on growth rate and growth-rate improvement. In comparison, the widely known Apriori-like approach is ineffective on high-dimensional data; Dense-Miner, a recent constraint-based association rule miner, only considers external constraints for pruning. Experiments on dense data show that, at low support, ConsEPMiner outperforms the Apriori-like approach by orders of magnitude and is more than twice as fast as the Dense-Miner approach.
June 16th: Emma Norling
Enhancing Multi-Agent Based Simulation with Human-Like Decision Making Strategies
We are exploring the enhancement of models of agent behaviour with more `human-like' decision making strategies than are presently available. Our motivation is to build multi-agent based simulations of human societies that would exhibit more realistic simulation behaviours. This in turn will allow researchers to study more complex issues regarding individual and organisational performance. Forms of a rational choice strategy are commonly-used for agent decision making, but although such strategies may lead to optimal choices, humans rarely make decisions this way. Drawing on studies into naturalistic decision making, which looks at people's decision making in their natural environments, we propose ways of integrating a recognition-primed decision making approach with the popular BDI agent architecture, and report on preliminary empirical studies.
July 11th: Linhui Jia
Classification-Driven Object-Based Image Retrieval
This research investigates a semantic-level object-based approach for content-based image retrieval. A unique characteristic of this approach is that it encodes the information of semantically meaningful object classes in image retrieval. We proposed a framework for this approach and designed a system that instantiates the framework. In this talk, after a brief introduction to the background, motivation, and objective of this research, I will focus on the framework and the key techniques in the system including feature extraction, image representation, classifier learning, object classification, and image similarity calculation. Successful experimental evaluation results of the system on two small and two large image databases will be reported. At the end, I will summarize the contributions of this research, draw conclusions, and discuss some interesting directions for future research.
August 1st: R. Yugo Kartono Isal
Word-Based Block Sorting Text Compression
Block-sorting compression algorithm is a unique lossless approach introduced in 1994 by Burrows and Wheeler. It involves at least three steps: sorts the input one block at a time by using Burrows-Wheeler Transform (BWT), applies Move-To-Front(MTF) to the sorted blocks, and compresses them by using Huffman or arithmetic coder. BWT permutes blocks of data to become more compressible by performing a sorting algorithm on the shifted blocks according to its previous context. MTF is a simple heuristic method which exploits the locality of reference. In character-based, the above steps process blocks containing characters, whereas in word-based, the blocks contain word-numbers or integer tokens. This work will investigate, implement, and improve the performance of the block-sorting compression in a word-based dictionary method by combining ideas suitable for input text files in general. Input text is parsed into a dictionary and a file of tokens/symbols (in which case a 32-bits integer is used to represent a token) using the spaceless-word approach. Experiments show that compression on a file of integers can be achieved after both BWT and MTF were applied. However, simple MTF performs poorly in running time, particularly due to linear search and move to front operations on the large set of alphabets. The use of a splay tree to compute the exact MTF value is implemented. Further modification of MTF using a number of splay trees to partition the set of alphabets is proposed. Different ways of partitioning to approximate MTF is performed while the corresponding time and accuracy is observed. Preliminary results show not only that the modification runs faster, but also an improved compression ratio can be achieved. Observations suggest hints to improve time and accuracy performance in both direction.
August 4th: Jinyan Li
The Space of Jumping Emerging Patterns and Its Incremental Maintenance Algorithms
The usefulness of the landmark concept of version spaces is limited, because frequently in practice version spaces tend to be empty. This paper proposes the space of jumping emerging patterns, or JEP space for short, to generalize version space, by weakening the strict consistency with training data of version spaces; as a result, JEP spaces represent many useful discriminating patterns for situations where version spaces are empty. Patterns represented by JEP spaces have been used to build accurate classifiers elsewhere. Similar to version spaces, JEP spaces satisfy the property of convexity, and thus they can be concisely represented by two bounds: left bound and right bound, consisting respectively of the most general elements and of the most specific elements. By exploiting the convexity, we develop efficient incremental maintenance algorithms for JEP spaces. In response to insertion of new training instances, a JEP space is modified by operating on its boundary elements and the boundary elements of the JEP spaces associated with the new instances. In addition, our maintenance algorithms can also handle other cases such as insertion of new attributes and deletion of attributes. A suite of experiments were conducted to verify the efficiency of our algorithms.

The paper was presented in the Seventeenth International Conference on Machine Learning, Stanford, June 28 - July 2, 2000, as a regular paper.

August 18th: Inga Sitzmann
Improving Temporal Joins Using Histogram
Histograms are used in most commercial database systems to estimate query result sizes and evaluation plan costs. They can also be used to optimize join algorithms. We consider how to use histograms to improve the join processing in temporal databases. We define histograms for temporal data and a temporal join algorithm that makes use of this histogram information. The join algorithm is a temporal partition-join with dynamic buffer allocation. Histogram information is used to determine partition boundaries that maximize overall buffer usage. We compare the performance of this join algorithm to temporal join evaluation strategies that do not use histograms, such as a partition-based algorithm based on sampling and a partition-join using the Time Index, an index structure for temporal data. The results demonstrate that the temporal partition-join is substantially improved through the incorporation of histogram information, showing significantly better performance than the sampling-based algorithm and achieving equivalent performance to the Time Index join without requiring an index.
September 8th: Mike Liddell
Length-Restricted Coding Based on Modified Probability Distributions
The use of data compression has long been a central part of text databases and fast communication protocols. In many contexts, effective compression techniques use a minimum redundancy prefix code. However, if the length of a codeword exceeds the machine word size, the decoding routines must be altered and lose efficiency. To avoid these complications it is desirable to produce a prefix code with the constraint that no codeword should be longer than some constant. Larmore and Hirschberg's Package-Merge Algorithm is a well known method for producing minimum-redundancy length-restricted prefix codes, although other methods exist. We consider an alternative method for length-restricted coding which calculates an approximate code, rather than an optimal code, but which can be implemented to operate in linear time. This approach also has applications to non length-limited coding.
October 20th: Liz Haywood
Software requirements modelling approaches: selection and evaluation
Modelling the requirements of a software system is an important part of the requirements engineering process. Indeed, the ultimate success of a project may depend on how well the requirements are modelled. There are many different requirements modelling approaches, each of which provides differing levels of support for various aspects of the requirements modelling process. Practitioners have the problem of choosing a modelling approach that will best support the modelling activities of their particular project and this talk concentrates on the methods we have developed to enable practitioners to make that choice. Researchers are also supported by our methods in their endeavours to produce useful modelling approaches.

General seminar information

Further information for postgraduate students

Seminar coordinator: Inga Sitzmann