CS5095701 Advanced Database Systems

Fall 2025
Department of Computer Science and Information Engineering
National Taiwan University of Science and Technology

 

Announcements.

l   9/15 Reassigned classrooms: TR-409-1.

l   9/5 Lecture time moved to 10:00 ¡V 12:30 starting from Sep. 12.

l   9/5 First class.

 

Lectures:

Friday  09:10 - 12:10 10:00 ¡V 12:30  (TR-515 TR-409-1)

Instructor:

Yi-Leh WU (§d©É¼Ö)
Email:
ywu_at_csie.ntust.edu.tw
Office: RB503-1
Office Hours: TR 10:30-11:30, or by appointment

Teaching Assistant:

Mr. Hsiao  (¿½¹t¿Ä) Email: M11415016_at_mail.ntust.edu.tw  Office: RB-306-3 (Office hours: TBD or by appointment) (O) 27333141 ext. 7322

Course Description:

This course covers ongoing database system trends, illustrating new research directions and current problems. The course pre-requisite is CS 3010301 or an equivalent background understanding of database systems. We will study a collection of papers; some of these are classic papers on relational databases in the early days. More recent papers cover a broad spectrum of research areas in database systems.

Textbook(s):

(optional) Fundamentals of Database Systems, sixth edition, by Elmasri and Navathe

Software Tools:

Required - Email
Optional (for access online lecture notes and slides) - Adobe Acroread

Grading Policy:

Paper presentation in class: 20%
Homework 20%

Midterm Exam 20%
Final report / proposal presentation: 20% / 10%
Class participation: 10%

Schedule:

Week

Topic and Paper

Handouts

1

Introduction/Overview

Syllabus

2

Overview: ER Model, Relational Data Model, Normalization

ER, Relational Data Model, Mapping

3

Overview: Relational Algebra, Calculus, and SQL

Algebra/Calculus,

4

Query Processing and Optimization

SQL, Query Processing

5

Overview: Normalization

Normalization, HW1

6

Overview: Transaction and Recovery

Transaction, Recovery

7

Midterm

8

Proposal Presentations

9

Student Presentations

schedule

10

Student Presentations

11

Student Presentations

 

12

Student Presentations

13

Student Presentations

 

14

Student Presentations

 

15

Final Proposal Presentations

 

16

Final Proposal Presentations

Proposal Due

Online Resources:

Class home page (this page) - http://faculty.csie.ntust.edu.tw/~ywu/cs5095701/index.html

 

Paper list:

1.          Relational Databases for Querying XML Documents: Limitations and Opportunities, J. Shanmugasundaram, K. Tufte, G. He, C. Zhang, D. DeWitt, J. Naughton, VLDB  1999.

2.          Efficiently Publishing Relational Data as XML Documents, J. Shanmugasundaram, E. Shekita, R. Barr, M. Carey, B. Lindsay, H. Pirahesh, B. Reinwald, VLDB  2000.

3.          R-Trees: A Dynamic Index Structure for Spatial Searching, Antonin Guttman, SIGMOD 1984.

4.          The R*-Tree: An Efficient and Robust Access Method for Points and Rectangles, Norbert Beckmann, Hans-Peter Kriegel, Ralf Schneider, Bernhard Seeger, SIGMOD 1990.

5.          B^ed-Tree: An All-Purpose Tree Index for String Similarity Search on Edit Distance, Zhenjie Zhang,  Beng chin Ooi, Marios HadjieleftheriouDivesh Srivastava, SIGMOD 2010.

6.          Fast In-Memory Sort on Modern CPUs and GPUs: A Case for Bandwidth-Oblivious SIMD Sort, Nadathur Satish, Intel Corporation; Changkyu Kim, Intel; Jatin Chhugani, Intel; Anthony Nguyen, Intel; Victor Lee, Intel Corporation; Daehyun Kim, Intel; Pradeep Dubey, Intel, SIGMOD 2010. ¡@

7.          Thread Cooperation in Multicore Architectures for Frequency Counting over Multiple Data Streams, Sudipto Das (UC Santa Barbara), Shyam Antony (UC Santa Barbara), Divyakant Agrawal (UC Santa Barbara), Amr El Abbadi (UC Santa Barbara), VLDB 2009.

8.          Models for Studying Concurrency Control Performance: Alternatives and Implications, Rakesh Agrawal, Michael J. Carey, Miron Livny:  SIGMOD 1985.

9.          Hippocratic Databases, Rakesh Agrawal, Jerry Kiernan, Ramakrishnan Srikant, Yirong Xu, VLDB 2002.

10.      Order Preserving Encryption for Numeric Data, Rakesh Agrawal, Jerry Kiernan, Ramakrishnan Srikant, and Yirong Xu: . SIGMOD 2004.

11.      Probabilistic Ranking of Database Query Results, Surajit Chaudhuri, Gautam Das, Vagelis Hristidis, Gerhard Weikum:  VLDB 2004

12.      Automatic Categorization of Query Results, Kaushik Chakrabarti, Surajit Chaudhuri, Seung-won Hwang:  SIGMOD 2004.

13.      A Unified Approach to Ranking in Probabilistic Databases (Best Paper Award), Jian Li (Univ. of Maryland), Barna Saha (Univ. of Maryland), Amol Deshpande (Univ. of Maryland), VLDB 2009.

14.      The Anatomy of a Large-Scale Hypertextual Web Search Engine, Sergey Brin, Larry Page, Proceedings of the 7th international conference on World Wide Web (WWW), 1998.

15.      Access Path Selection in a Relational Database Management System, Patricia G. Selinger, Morton M. Astrahan, Donald D. Chamberlin, Raymond A. Lorie, Thomas G. Price: SIGMOD 1979.

16.      Improved Histograms for Selectivity Estimation of Range Predicates, by V. Poosala, Y.E. Ioannidis, P. J. Haas, and E. Shekita, Proc. of the 1996 ACM SIGMOD 1996.

17.      A Particle-and-Density Based Evolutionary Clustering Method for Dynamic Networks, Min-Soo Kim (UIUC), Jiawei Han (UIUC), VLDB 2009.

18.      Mining Graph Patterns Efficiently via Randomized Summaries, Chen Chen (UIUC), Cindy Lin (UIUC), Matt Fredrikson (Univ. of Wisconsin - Madison), Mihai Christodorescu (IBM T.J. Watson Research Center), Xifeng Yan (Univ. of California at Santa Barbara), Jiawei Han (UIUC), VLDB 2009.

19.      Variance aware optimization of parameterized queries, Surajit Chaudhuri, Microsoft Research; Hongrae Lee, University of British Columbia; Vivek Narasayya, Microsoft Research, SIGMOD 2010.

20.      HadoopDB: An Architectural Hybrid of MapReduce and DBMS Technologies for Analytical Workloads , Azza Abouzeid (Yale Univ.), Kamil Bajda-Pawlikowski (Yale Univ.), Daniel Abadi (Yale Univ.), Alexander Rasin (Brown Univ.), Avi Silberschatz (Yale Univ.), VLDB 2009.

21.      Mining Document Collections to Facilitate Accurate Approximate Entity Matching, Surajit Chaudhuri (Microsoft Research), Venkatesh Ganti (Microsoft Research), Dong Xin (Microsoft Research), VLDB 2009.

22.      Database Compression on Graphics Processors ,Wenbin Fang,The Hong Kong University of Science and Technology;Bingsheng He,Nanyang Technological University;Qiong Luo,The Hong Kong University of Science and Technology,VLDB 2010

23.      GPU-Based Speculative Query Processing for Database Operations,Peter Benjamin Volk,Dirk Habich,Wolfgang Lehner;Dresden University of Technology,VLDB 2010

24.      Accelerating SQL Database Operations on a GPU with CUDA, Peter Bakkum, Kevin Skadron,University of Virginia,2010.

25.      DBSCAN Revisited: Mis-Claim, Un-Fixability, and Approximation, Junhao Gan and Yufei Tao. 2015.  In Proceedings of the 2015 ACM SIGMOD International Conference on Management of Data (SIGMOD '15). ACM, New York, NY, USA, 519-530. DOI: https://doi.org/10.1145/2723372.2737792

26.      DBSCAN Revisited, Revisited: Why and How You Should (Still) Use DBSCAN, Erich Schubert, Jorg Sander, Martin Ester, Hans Peter Kriegel, and Xiaowei Xu. 2017.  ACM Trans. Database Syst. 42, 3, Article 19 (July 2017). DOI: https://doi.org/10.1145/3068335
¡@
Any papers from VLDB, SIGMOD/PODS, ICDE, CIKM, or other database related conferences or journals.

Project proposal:

You are required to propose a database related research project in this course. You are encouraged to have a publication as a goal for your project. A list of project ideas is listed below.

You proposal should include the following aspects:

l   importance of the proposed project

l   your algorithm and ideas

l   main contributions

l   the design of the software you propose to build

l   how you propose to evaluate your ideas

l   the design of the experiments

l   literature survey

 

You will present to the class about your proposal (~ 5 minutes) and write a proposal report (2~4 pages in IEEE format).

 

Some project ideas:

l   high-dimensional indexing structures (R*-tree, etc.) 

l   meta-search engine for Google and Yahoo's results

l   heuristic/algorithm for ranking Google's query results

l   data streams

l   selectivity estimation

l   sensor data management

l   XML (+ relational)

l   OLAP

Resources: CSP project startup documents written by John Wilkes.

Some previous projects can be found here.