36
@ U S T . H K
datasets of up to 100 gigabytes
(and potentially larger), their
“Wander Join” algorithm has achieved
better sampling through random walks,
returning results with the same accuracy
(for example, 95% confidence and 1%
error) in one-hundredth of the time
compared with prior solutions using the
same hardware.
BUILDING
SPEED AND
EFFICIENCY
Responding
to Queries… Fast
Building intelligence –
both human and computational –
requires the support of leading-edge
big data infrastructure, such as databases
and data centers, that drives the
fundamental capabilities available for
data analysis, decision-making, and new
ways of understanding.
Within the realm of digital databases,
for example, the keywords are speed,
accuracy, and accessibility. These are
the criteria that count for data analysts
and business users who need to perform
analytical queries on a large amount of
data with complex conditions and return
aggregated results that can enable
decisions to be made. For example, a sales
director of a company handling millions
of transactions per day might be
interested to know the total revenue of
all transactions for a specific product
category in a certain period of time,
where the buyer and seller are in specific
countries, and the product has parts
manufactured in yet another country. Yet,
long running times for such analytical
queries, even in leading, commercial-
level database systems, are still among the
major challenges to overcome.
Now Prof Ke Yi, an expert in database
systems and algorithms, has solved a
problem that has taxed the community
for over 15 years, enabling responses to
queries to be given in seconds rather than
minutes or hours. Prof Yi and his team’s
novel algorithm allows the database
to return approximate results in a very
short time, and continue to improve their
accuracy asmore time is spent.Working on
In the early days,
proving a conjecture and
having a theorem with
my name on it was my dream.
Now seeing my algorithm
used in practice is
more satisfying
PROF KE YI
Associate Professor of Computer Science
and Engineering
Random
walk is a technique
that has been used to solve
problems in many fields. Google’s
successful search engine is based
on this idea, enabling it to prioritize the
most authoritative pages related to the
keywords being searched with a high
degree of accuracy out of millions of web
pages. In Prof Yi’s research, this theoretical
idea is successfully applied for the first
time to the completely different scenario
involved in approximate query processing.
In addition, the algorithm has been
integrated into an open source database
to demonstrate its viability in the real
world and not just in academia, bringing
the sought-after goal of interactive data
analysis a step closer.
Assisting such original insight is
Prof Yi’s unusual status as a member of
both theoretical computer science and
database faculty groups at HKUST,
allowing him to bring knowledge from the
two areas together to address practical
problems. International recognition for
the work of Prof Yi and his team includes
winning the 2015 ACM SIGMOD Best
Demonstration Award and receiving the
2016 ACM SIGMOD Best Paper Award.
The work was carried out in collaboration
with the University of Utah.