AstroDB

Overview

Astronomy has long been a data-intensive science. Today, the need to analyze massive amounts of data streaming from telescopes and generated by simulations is more pressing than ever.

AstroDB is an inter-disciplinary group bringing together faculty from the department of Astronomy (in particular the Survey Science Group and the N-body Shop), department of Computer Science & Engineering (in particular the Database Group), and the UW eScience Instititute to investigate new methods, tools, and techniques for data-driven discovery in Astronomy.

People

Graduate Students

Postdocs

Research Scientists and Faculty

 

Projects, Publications, and Software

We are tackling several inter-related challenges:

 

Efficient Processing of Telescope Images

A core component of Astronomy research revolves around the efficient processing of images produced by various telescopes, in particular various sky surveys such as the SDSS and the LSST.

Our group is working on various novel methods and tools for processing these images and making their analysis not only faster but also more convenient.

Make sure to visit the UW Survey Science Group (LSST) page for background and important details.

 

Generating Simulated LSST Images from SQL Server Catalogs

In order to test the various LSST image processing pipelines, the UW Astronomy group has built a system for generating realistic LSST images. As part of the AstroDB collaboration, we studied how best to leverage a relational database system such as SQL Server in order to store the catalog data and query it efficiently as part of this image generation process.

Towards Efficient and Precise Queries Over Ten Million Asteroid Trajectory Models (poster). Yusra Alsayyad, K. Simon Krughoff, Bill Howe, Andrew J. Connolly, Magdalena Balazinska and Lynne Jones. SDBM 2011

 

Analyzing Astronomy Simulation Data

We have explored the use of relational database management systems both single-node and parallel and also the use of new MapReduce-type systems for the analysis of data generated by N-body simulations.

Make sure to visit the N-Body Shop for important details and background information.

The details of the specific project done by the AstroDB collaboration are described on our [Nuage project website](http://nuage.cs.washington.edu). We summarize the key aspects of the project here: * Basic use-case studying the potential of relational database systems and MapReduce systems for simulation data analysis. The results of the use-case are summarized in the following publication. The data used in the use-case is also available on the Nuage website: **[Analyzing Massive Astrophysical Datasets: Can Pig/Hadoop or a Relational DBMS Help?](http://nuage.cs.washington.edu/pubs/iasds09.pdf)**. Sarah Loebman, Dylan Nunley, YongChul Kwon, Bill Howe, Magdalena Balazinska, and Jeffrey P. Gardner. In the Workshop on Interfaces and Abstractions for Scientific Data (IASDS) 2009, New Orleans, LA, August 2009. * We also worked on more advanced analytics requiring data clustering: **[Scalable clustering algorithm for N-body simulations in a shared-nothing cluster](http://nuage.cs.washington.edu/pubs/ssdbm10.pdf)**. YongChul Kwon, Dylan Nunley, Jeffrey P. Gardner, Magdalena Balazinska, Bill Howe, and Sarah Loebman. In the 22nd International Conference on Scientific and Statistical Database Management (SSDBM), Heidelberg, Germany, July, 2010. * We developed new tools and techniques inspired by the astronomy simulation data analysis needs: **[Skew-Resistant Parallel Processing of Feature-Extracting Scientific User-Defined Functions](http://nuage.cs.washington.edu/pubs/socc10.pdf)**. YongChul Kwon, Magdalena Balazinska, Bill Howe, Jerome Rolia. In the First ACM Symposium on Cloud Computing (SoCC), Indianapolis, IN, June 2010. SkewReduce source code is publicly available at [http://skewreduce.googlecode.com/](http://skewreduce.googlecode.com/) **[A Study of Skew in MapReduce Applications](http://nuage.cs.washington.edu/pubs/opencirrus2011.pdf)** _(Best Student Paper)_. YongChul Kwon, Magdalena Balazinska, Bill Howe, and Jerome Rolia. In the 5th Open Cirrus Summit, Moscow, Russia, June, 2011. **[Time Travel in a Scientific Array Database](http://scidb.cs.washington.edu/paper/ICDE13_conf_full_422.pdf)**. Emad Soroush and Magdalena Balazinska. ICDE 2013 **[ArrayStore: A Storage Manager for Complex Parallel Array Processing](http://scidb.cs.washington.edu/paper/sigmod362-soroush.pdf)**. Emad Soroush, Magdalena Balazinska, and Daniel Wang. SIGMOD 2011 **[Hybrid Merge/Overlap Execution Technique for Parallel Array Processing](http://scidb.cs.washington.edu/paper/soroush-array-workshop.pdf)**. Emad Soroush and Magdalena Balazinska, ArrayDB Workshop to be held in conjunction with EDBT 2011.   ## Acknowledgments Projects related to the AstroDB collaboration are funded from a variety of sources. We acknowledge these sources on the specific project websites.