Postgraduate Seminar Series - Past Seminars (2000)
- February 11th: Tetra Lindarto
- The Design and Analysis of PPM Compression Algorithm
-
PPM (Prediction by Partial Matching) is one example of a context-based
text compression. This method is capable of a very efficient compression
scheme. PPM uses several orders blended together to obtain a good
probability estimate of the next symbol. PPM always predicts the current
symbol from the highest order and if the symbol is not found, an escape
symbol will be sent and and then uses the next lower order to predict the
probability of the current symbol. At the beginning, there will be a lot
of escape symbols to be sent to the decoder and this will add some
overhead in the compression effectiveness. In this seminar, we will try to
reduce the amount of escape symbols sent to the decoder by starting the
search from the best performance order in the past instead of from the
highest order.
- February 18th: Tony Wirth
- New approaches to the Burrows-Wheeler Transform
-
The Burrows-Wheeler Transform permutes text into a form that is easier to
compress. It is a one-to-one transform, and after compression and
decompression the original text can be retrieved. Traditionally, the
transformed text was further encoded using a move-to-front list. The new
approaches I have been exploring instead try to exploit the similarity of
the transformed text to the action of PPM (cf Tetra's seminar),
and also try to do without the move-to-front list.
No familiarity with the Burrows-Wheeler Transform will be expected, nor is it
essential to understand PPM, in case you can't attend Tetra's seminar.
- June 1st: Xiaoying (Sharon) Gao
- Knowledge-Based Approach to Information Extraction from the
World Wide Web
-
This research investigates a knowledge-based approach to the
generation of systems that extract information from Web pages
with semi-structured data. Knowledge is classified into three
categories: General (G), Domain specific (D) and Site specific
(S). We introduce methods for developing systems based on
{G, D}, {G, S} and {G, D, S}, and successfully demonstrate the
methods with prototype systems which extract information from
online real estate advertisements and car advertisements. Our
approach provides knowledge representation methods, knowledge
acquisition algorithms, and information extraction algorithms
to support system generation.
- June 9th: Xiuzhen (Jenny) Zhang
- Exploring Constraints to Efficiently Mine Emerging Patterns from
Large High-dimensional Datasets
-
Emerging patterns (EPs) are knowledge patterns
whose support increase sharply from a background dataset to a target dataset;
the support ratio of EPs is called growth rate.
Interesting long EPs often have low support;
mining such EPs from high-dimensional dense datasets
is a great challenge due to the combinatorial explosion of
the number of candidates.
We propose a Constraint-based EP Miner, ConsEPMiner,
that utilizes inherent as well as
external constraints for effectively pruning the search space.
ConsEPMiner can efficiently mine all EPs
at low support on large high-dimensional datasets,
with low minimums on growth rate and growth-rate improvement.
In comparison,
the widely known Apriori-like approach
is ineffective on high-dimensional data;
Dense-Miner,
a recent constraint-based association rule miner,
only considers external constraints for pruning.
Experiments on dense data show that, at low support,
ConsEPMiner outperforms the
Apriori-like approach by orders of magnitude and
is more than twice as fast as the Dense-Miner approach.
- June 16th: Emma Norling
- Enhancing Multi-Agent Based Simulation with Human-Like
Decision Making Strategies
-
We are exploring the enhancement of models of agent behaviour with more
`human-like' decision making strategies than are presently available. Our
motivation is to build multi-agent based simulations of human societies
that would exhibit more realistic simulation behaviours. This in turn
will allow researchers to study more complex issues regarding individual
and organisational performance. Forms of a rational choice strategy are
commonly-used for agent decision making, but although such strategies
may lead to optimal choices, humans rarely make decisions this way. Drawing
on studies into naturalistic decision making, which looks at people's
decision making in their natural environments, we propose ways of integrating
a recognition-primed decision making approach with the popular BDI agent
architecture, and report on preliminary empirical studies.
- July 11th: Linhui Jia
- Classification-Driven Object-Based Image Retrieval
-
This research investigates a semantic-level object-based
approach for content-based image retrieval. A unique
characteristic of this approach is that it encodes the
information of semantically meaningful object classes in
image retrieval. We proposed a framework for this
approach and designed a system that instantiates the
framework. In this talk, after a brief introduction to the
background, motivation, and objective of this research, I
will focus on the framework and the key techniques in the
system including feature extraction, image representation,
classifier learning, object classification, and image
similarity calculation. Successful experimental evaluation
results of the system on two small and two large image
databases will be reported. At the end, I will summarize
the contributions of this research, draw conclusions, and
discuss some interesting directions for future research.
- August 1st: R. Yugo Kartono Isal
- Word-Based Block Sorting Text Compression
-
Block-sorting compression algorithm is a unique lossless approach introduced in
1994 by Burrows and Wheeler. It involves at least three steps: sorts the input
one block at a time by using Burrows-Wheeler Transform (BWT), applies
Move-To-Front(MTF) to the
sorted blocks, and compresses them by using Huffman or arithmetic coder. BWT
permutes blocks of data to become more compressible by performing a sorting
algorithm on the shifted blocks according to its previous context. MTF is a
simple heuristic method which exploits the locality of reference. In character-based, the above steps
process blocks containing characters, whereas in word-based, the blocks contain
word-numbers or integer tokens.
This work will investigate, implement, and improve the performance of the
block-sorting compression in a word-based dictionary method by combining ideas
suitable for input text files in general. Input text is parsed into a
dictionary and a file of tokens/symbols
(in which case a 32-bits integer is used to represent a token) using
the spaceless-word approach.
Experiments show that compression on a file of integers can be achieved after
both BWT and MTF were applied. However, simple MTF performs poorly in running
time, particularly due to linear search and move to front operations on the
large set of alphabets.
The use of a splay tree to compute the exact MTF value is implemented. Further
modification of MTF using a number of splay trees to partition the set of
alphabets is proposed. Different ways of partitioning to approximate MTF is
performed while the corresponding time and accuracy is observed.
Preliminary results show not only that the modification runs faster, but also
an improved compression ratio can be achieved. Observations suggest hints to
improve time and accuracy performance in both direction.
- August 4th: Jinyan Li
- The Space of Jumping Emerging Patterns and Its Incremental Maintenance Algorithms
-
The usefulness of the landmark concept of version spaces is limited,
because frequently in practice version spaces tend to be empty.
This paper proposes the space of jumping emerging patterns,
or JEP space for short, to generalize version space, by
weakening the strict consistency with training data of version spaces;
as a result, JEP spaces represent many useful discriminating patterns
for situations where version spaces are empty.
Patterns represented by JEP spaces have been used to build
accurate classifiers elsewhere.
Similar to version spaces, JEP spaces satisfy the property of
convexity, and thus they can be concisely
represented by two bounds: left bound and
right bound, consisting respectively of
the most general elements and of the most
specific elements.
By exploiting the convexity, we develop efficient
incremental maintenance algorithms for JEP spaces.
In response to insertion of
new training instances, a JEP space is
modified by operating on its boundary elements
and the boundary elements of the JEP spaces associated with
the new instances.
In addition, our maintenance algorithms can also handle
other cases such as insertion
of new attributes and deletion of attributes.
A suite of experiments were conducted to verify
the efficiency of our algorithms.
The paper was presented in the Seventeenth International Conference
on Machine Learning, Stanford, June 28 - July 2, 2000, as a regular
paper.
- August 18th: Inga Sitzmann
- Improving Temporal Joins Using Histogram
-
Histograms are used in most commercial database systems to estimate query
result sizes and evaluation plan costs.
They can also be used to optimize join
algorithms. We consider how to use
histograms to improve the join processing in temporal
databases. We define histograms for temporal data and
a temporal join algorithm
that makes use of this histogram information.
The join algorithm is a temporal partition-join with dynamic buffer allocation.
Histogram information is used to
determine partition boundaries that maximize overall buffer usage.
We compare the performance of this join algorithm
to temporal join evaluation strategies that do not use histograms,
such as a partition-based algorithm based on sampling and
a partition-join using the Time Index, an index structure for temporal data.
The results demonstrate that the temporal partition-join is
substantially improved through the incorporation of histogram information,
showing significantly better performance than the sampling-based algorithm
and achieving equivalent performance to the Time Index join
without requiring an index.
- September 8th: Mike Liddell
- Length-Restricted Coding Based on Modified Probability Distributions
-
The use of data compression has long been a central part
of text databases and fast communication protocols. In many contexts,
effective compression techniques use a minimum redundancy prefix code.
However, if the length of a codeword exceeds the machine word size, the
decoding routines must be altered and lose efficiency. To avoid these
complications it is desirable to produce a prefix code with the constraint
that no codeword should be longer than some constant. Larmore and
Hirschberg's Package-Merge Algorithm is a well known method for producing
minimum-redundancy length-restricted prefix codes, although other methods
exist. We consider an alternative method for length-restricted
coding which calculates an approximate code, rather than an optimal code, but
which can be implemented to operate in linear time. This approach also has
applications to non length-limited coding.
- October 20th: Liz Haywood
- Software requirements modelling approaches: selection and evaluation
-
Modelling the requirements of a software system is an important part of
the requirements engineering process. Indeed, the ultimate success of a project
may depend on how well the requirements are modelled.
There are many different requirements modelling approaches,
each of which provides differing levels of support for various
aspects of the requirements modelling process.
Practitioners have the problem of choosing a modelling approach that will
best support the modelling activities of their particular project and
this talk concentrates on the methods we have developed to enable practitioners
to make that choice. Researchers are also supported by our methods in their
endeavours to produce useful modelling approaches.
General seminar information
Further information for postgraduate students
Seminar coordinator: Inga Sitzmann