Astronomy has long been a data-intensive science. Today, the need to analyze massive amounts of data streaming from telescopes and generated by simulations is more pressing than ever.
AstroDB is an inter-disciplinary group bringing together faculty from the department of Astronomy (in particular the Survey Science Group and the N-body Shop), department of Computer Science & Engineering (in particular the Database Group), and the UW eScience Instititute to investigate new methods, tools, and techniques for data-driven discovery in Astronomy.
Graduate Students
Postdocs
Research Scientists and Faculty
We are tackling several inter-related challenges:
A core component of Astronomy research revolves around the efficient processing of images produced by various telescopes, in particular various sky surveys such as the SDSS and the LSST.
Our group is working on various novel methods and tools for processing these images and making their analysis not only faster but also more convenient.
Make sure to visit the UW Survey Science Group (LSST) page for background and important details.
In the context of the AstroDB collaboration, we have worked on image co-addition methods using Hadoop. The results of this work are summarized in the following publication: **[Astronomy in the Cloud: Using MapReduce for Image Co-Addition](http://homes.cs.washington.edu/%7Emagda/papers/wiley11.pdf)**., Wiley, K., Connolly, A., Gardner, J., Krughoff, S., Balazinska, M., Howe, B., Kwon, Y., Bu, Y., Publications of the Astronomical Society of the Pacific (PASP), Vol. 123, No. 901, pp. 366-380, University of Chicago Press, Mar 2011.
We are now working on various image co-addition and full image analysis pipelines using SciDB. This work is related to our efforts bulding the SciDB parallel data processing system. See the [main SciDB page](http://scidb.org) and our [UW CSE SciDB page](http://scidb.cs.washington.edu). **[Squeezing a Big Orange into Little Boxes: The AscotDB System for Parallel Processing of Data on a Sphere](http://homes.cs.washington.edu/~soroush/papers/vanderplas.pdf)** J.Vanderplas, E. Soroush, S. Krughoff, M. Balazinska and A. Connolly, for publication in IEEE Data Engineering Bulletin, 2013. **[A Demonstration of Iterative Parallel Array Processing in Support of Telescope Image Analysis](http://homes.cs.washington.edu/~soroush/papers/p807-soroush.pdf)** Matthew Moyers, Emad Soroush, Spencer C Wallace, Simon Krughoff, Jake Vanderplas, Magdalena Balazinska, and Andrew Connolly VLDB 2013
We are also working on exploring and analyzing sources extracted from telescope images using parallel database management systems including EMC/Greenplum. This work includes leveraging various machine-learning methods to improve the efficiency of the analysis.
SDSS image data is available through a Web-based SQL interface. We used logs from this website to evaluate new techniques that help author SQL queries in the context of Big Data analytics.
SnipSuggest: Context-Aware Autocompletion for SQL., Nodira Khoussainova, YongChul Kwon, Magdalena Balazinska, and Dan Suciu. PVLDB, Vol 4, Nb 1, 2010. (VLDB 2011)
In order to test the various LSST image processing pipelines, the UW Astronomy group has built a system for generating realistic LSST images. As part of the AstroDB collaboration, we studied how best to leverage a relational database system such as SQL Server in order to store the catalog data and query it efficiently as part of this image generation process.
Towards Efficient and Precise Queries Over Ten Million Asteroid Trajectory Models (poster). Yusra Alsayyad, K. Simon Krughoff, Bill Howe, Andrew J. Connolly, Magdalena Balazinska and Lynne Jones. SDBM 2011
We have explored the use of relational database management systems both single-node and parallel and also the use of new MapReduce-type systems for the analysis of data generated by N-body simulations.
Make sure to visit the N-Body Shop for important details and background information.
The details of the specific project done by the AstroDB collaboration are described on our [Nuage project website](http://nuage.cs.washington.edu). We summarize the key aspects of the project here: * Basic use-case studying the potential of relational database systems and MapReduce systems for simulation data analysis. The results of the use-case are summarized in the following publication. The data used in the use-case is also available on the Nuage website: **[Analyzing Massive Astrophysical Datasets: Can Pig/Hadoop or a Relational DBMS Help?](http://nuage.cs.washington.edu/pubs/iasds09.pdf)**. Sarah Loebman, Dylan Nunley, YongChul Kwon, Bill Howe, Magdalena Balazinska, and Jeffrey P. Gardner. In the Workshop on Interfaces and Abstractions for Scientific Data (IASDS) 2009, New Orleans, LA, August 2009. * We also worked on more advanced analytics requiring data clustering: **[Scalable clustering algorithm for N-body simulations in a shared-nothing cluster](http://nuage.cs.washington.edu/pubs/ssdbm10.pdf)**. YongChul Kwon, Dylan Nunley, Jeffrey P. Gardner, Magdalena Balazinska, Bill Howe, and Sarah Loebman. In the 22nd International Conference on Scientific and Statistical Database Management (SSDBM), Heidelberg, Germany, July, 2010. * We developed new tools and techniques inspired by the astronomy simulation data analysis needs: **[Skew-Resistant Parallel Processing of Feature-Extracting Scientific User-Defined Functions](http://nuage.cs.washington.edu/pubs/socc10.pdf)**. YongChul Kwon, Magdalena Balazinska, Bill Howe, Jerome Rolia. In the First ACM Symposium on Cloud Computing (SoCC), Indianapolis, IN, June 2010. SkewReduce source code is publicly available at [http://skewreduce.googlecode.com/](http://skewreduce.googlecode.com/) **[A Study of Skew in MapReduce Applications](http://nuage.cs.washington.edu/pubs/opencirrus2011.pdf)** _(Best Student Paper)_. YongChul Kwon, Magdalena Balazinska, Bill Howe, and Jerome Rolia. In the 5th Open Cirrus Summit, Moscow, Russia, June, 2011. **[Time Travel in a Scientific Array Database](http://scidb.cs.washington.edu/paper/ICDE13_conf_full_422.pdf)**. Emad Soroush and Magdalena Balazinska. ICDE 2013 **[ArrayStore: A Storage Manager for Complex Parallel Array Processing](http://scidb.cs.washington.edu/paper/sigmod362-soroush.pdf)**. Emad Soroush, Magdalena Balazinska, and Daniel Wang. SIGMOD 2011 **[Hybrid Merge/Overlap Execution Technique for Parallel Array Processing](http://scidb.cs.washington.edu/paper/soroush-array-workshop.pdf)**. Emad Soroush and Magdalena Balazinska, ArrayDB Workshop to be held in conjunction with EDBT 2011. ## Acknowledgments Projects related to the AstroDB collaboration are funded from a variety of sources. We acknowledge these sources on the specific project websites.