Communication
Venkatesh Raghavan,1220 North Fair Oaks Ave, Sunnyvale, CA 94089.
Email: venky AT cs DOT wpi DOT edu
CONFIDENTIAL: Some of the papers provided on this website are currently under submission and therefore not for public distribution.
Dissertation
TITLE: Supporting Multi-Criteria Decision Support Queries over Disparate Data Sources
DOCUMENT: Thesis document (pdf)
In the era of "big data revolution", marked by an exponential growth of information, extracting value from data enables researchers and businesses to address challenging problems such as drug discovery, fraud detection, and earthquake predictions. Multi-Criteria Decision Support (MCDS) queries are at the core of big-data analytics resulting in several classes of MCDS queries such as OLAP, Top-K, Pareto-optimal, and nearest neighbor queries. The intuitive nature of specifying multi-dimensional user preferences has made Pareto-optimal queries, also known as skyline queries, popular. Existing skyline algorithms focus primarily on skylines over a single data set, ignoring the crucial fact that applications perform analytics over disparate sources.In this dissertation, I thoroughly investigate the core topic of skyline-aware query evaluation. To demonstrate the generality of the solutions proposed in this work, I apply the foundational principle of this dissertation to an orthogonal research question -- enabling users in acquiring the desired information from a complex database via recommendation queries. The main contributions of this dissertation work include the following:
Part I - SKIN: Skyline and Mapping Aware Query Evaluation
In this dissertation, I propose a novel execution framework called SKIN that treats skyline over joins as first class citizens during query processing. This is in stark contrast to existing techniques that treat skylines as an "add-on," loosely integrated with query processing by being placed on top of the query plan. SKIN is effective in exploiting the skyline knowledge within individual data sources, as well as across disparate sources. This enables SKIN to significantly reduce two primary costs, namely the cost of generating the join results and the cost of dominance comparisons to compute the final results.- V. Raghavan, E. A. Rundensteiner, and S. Srivastava, "Skyline and Mapping Aware Join Query Evaluation," Information Systems (an Elsevier Journal) (pdf).
- S. Srivastava, V. Raghavan, and E. A. Rundensteiner, "Adaptive Processing of Multi-Criteria Decision Support Queries," VLDB Workshop 2011 (BIRTE) (pdf).
- N. Park, V. Raghavan, and E. A. Rundensteiner, "Supporting Multi-Criteria Decision Support Queries Over Time-Interval Data Streams," DEXA 2010, (pdf).
- V. Raghavan and Elke Rundensteiner, "SkyDB: Skyline Aware Query Evaluation Framework," SIGMOD Workshop 2009 (IDAR), (pdf).
Part II - ProgXe: Progressive Result Generation for Multi-Criteria Decision Support Queries
Second, I address a crucial business need to report results early; as soon as they are being generated so that users can formulate competitive decisions in near real-time. On top of SKIN, I built a progressive query evaluation framework ProgXe to transform the execution of queries involving skyline over joins to be non-blocking, i.e., to be progressively generating results early and often. By processing the query at multiple levels of abstraction, ProgXe is able to: (1) exploit the knowledge gained from both input as well as mapped output spaces, and (2) identify and then analyze about abstract-level relationships to guarantee correctness of early output.- V. Raghavan and E. A. Rundensteiner, "Progressive Result Generation for Multi-Criteria Decision Support Queries," ICDE 2010, (Acceptance: 12.5%), (pdf), (slides).
- V. Raghavan, and E. A. Rundensteiner, "ProgXe: Progressive Result Generation Framework for Multi-Criteria Decision Support Queries," SIGMOD 2010, (Acceptance Rate: 36.8%), (pdf).
Part III - CAQE: Contract-Driven Processing of Concurrent Decision Queries
Third, real-world applications handle query workloads with diverse quality of service (QoS) requirements. Time sensitive queries, such as fraud detection, require results to progressively output with minimal delay, while ad-hoc and reporting queries can tolerate delay. In this dissertation, I handle contract driven multi-query processing by proposing our novel Contract- A ware Q uery E xecution (CAQE). In CAQE I employ an adaptive execution strategy to continuously monitor the run-time satisfaction of queries and aggressively take corrective steps whenever the contracts are not being met.- V. Raghavan, and E. A. Rundensteiner, "Contract-Driven Processing of Concurrent Decision Support Queries: A Piece of CAQE," in submission (pdf).
Part IV - CAPRI: Cardinality Assurance Via Proximity-driven Refinement
Lastly, to elucidate the portability of the principles presented in this dissertation, I apply them to solve an orthogonal research question -- enabling users in acquiring the desired information from a complex database via recommendation queries. User queries are often too strict or too broad requiring a frustrating trial-and-error refinement approach to meet the desired cardinality while preserving original query semantics. Based on the principles of SKIN, I propose CAPRI to automatically generate refined queries that (1) attain the desired cardinality and (2) minimize changes to the original query. Our comprehensive experimental studies, over benchmark and real datasets, demonstrate the superiority of our proposed strategies over existing techniques.- M. Vartak, V. Raghavan, and E. A. Rundensteiner, "QRelX: Generating Meaningful Queries that Provide Cardinality Assurance," SIGMOD 2010, (Acceptance Rate: 36.8%), (pdf).
- V. Raghavan, M. Vartak, and E. A. Rundensteiner, "CAPRI: Cardinality Assurance Via Proximity-driven Refinement," in submission (pdf).
Publications
Journal:
- J1. V. Raghavan, E. A. Rundensteiner, and S. Srivastava, "Skyline and Mapping Aware Join Query Evaluation," Information Systems (an Elsevier Journal) (pdf).
- J2. Y. Zhu, V. Raghavan and E. A. Rundensteiner, "A New Look At Generating Multi-Join Continuous Query Plans: A Qualified Plan Generation Problem," Data and Knowledge Engineering (DKE Journal), (pdf).
Conference, and In-Submission Publications:
- C1. V. Raghavan, and E. A. Rundensteiner, "Contract-Driven Processing of Concurrent Decision Support Queries: A Piece of CAQE," in submission (pdf).
- C2. V. Raghavan, M. Vartak, and E. A. Rundensteiner, "CAPRI: Cardinality Assurance Via Proximity-driven Refinement," in submission (pdf).
- C3. V. Raghavan and E. A. Rundensteiner, "Progressive Result Generation for Multi-Criteria Decision Support Queries," ICDE 2010, (Acceptance: 12.5%), (pdf), (slides).
- C4. N. Park, V. Raghavan, and E. A. Rundensteiner, "Supporting Multi-Criteria Decision Support Queries Over Time-Interval Data Streams," DEXA 2010, (pdf).
- C5. V. Raghavan, Y. Zhu, E. A. Rundensteiner, and D. Dougherty, "Multi-Join Continuous Query Optimization: Covering the Spectrum of Linear, Acyclic and Cyclic Queries," BNCOD 2009, (pdf).
- C6. A. Mukherji, E. A. Rundensteiner, D. Brown and V. Raghavan, "SNIF TOOL: Sniffing for Patterns in Continuous Streams," CIKM 2008 (pdf).
Demonstration:
- D1. V. Raghavan, and E. A. Rundensteiner, "ProgXe: Progressive Result Generation Framework for Multi-Criteria Decision Support Queries," SIGMOD 2010, (Acceptance Rate: 36.8%), (pdf).
- D2. M. Vartak, V. Raghavan, and E. A. Rundensteiner, "QRelX: Generating Meaningful Queries that Provide Cardinality Assurance," SIGMOD 2010, (Acceptance Rate: 36.8%), (pdf).
- D3. V. Raghavan, E. A. Rundensteiner, J.P. Woycheese and A. Mukherji, "FireStream: Sensor Stream Processing for Monitoring Fire Spread," ICDE 2007 (pdf).
Conference Workshop:
- W1. C. Garcia-Alvarado, V. Raghavan, S. Narayanan, F. Waas, "Automatic Data Placement in MPP Databases," ICDE Workshop 2012 (SMDB).
- W2. S. Srivastava, V. Raghavan, and E. A. Rundensteiner, "Adaptive Processing of Multi-Criteria Decision Support Queries," VLDB Workshop 2011 (BIRTE) (pdf).
- W3. V. Raghavan and Elke Rundensteiner, "SkyDB: Skyline Aware Query Evaluation Framework," SIGMOD Workshop 2009 (IDAR), (pdf).
- W4. V. Raghavan, K.W. Deschler, E. A. Rundensteiner, "VAMANA - A Scalable Cost-Driven XPath Engine," ICDE Workshop 2005 (XSDM) (pdf).