The three benchmark datasets are generated by N-body shop research group at University of Washington.
Dataset | Number of Particles | Number of Snapshots | Size of each snapshot | Total size |
---|---|---|---|---|
dbtest128g | 4.2 million | 128 | 169 MB | 21 GB |
cosmo50 | 33.6 million | 9 | 1.4 GB | 12.6 GB |
cosmo25 | 916.8 million | 2 | 36 GB | 72 GB |
The dataset is available only on request due to its volume. If you want to use the dataset, please contact to arrange download.
The Friends-of-Friends algorithm (FoF and references therein) is a domain-specific clustering algorithm that is also a simplified version of the more general and commonly used DBSCAN algorithm. A concrete application that also uses FoF is kernel density estimation (KDE), which involves searching for all points whose kernel can contribute to the density at a given point. In astrophysics, KDE techniques are used for object classification in a multi-dimensional parameter space of sky survey data.
The distributed Friends-of-Friends (dFoF) is an optimized implementation of FoF algorithm running in shared-nothing computational platform such as Hadoop and Dryad. Here we release an implementation using DryadLINQ.
It is a pleasure to acknowledge the help we have received from Tom Quinn, both during the project and in writing this publication. Simulations "Cosmo25" and "Cosmo50" were graciously supplied by Tom Quinn and Fabio Governato of the University of Washington Department of Astronomy. The simulations were produced using allocations of advanced NSF--supported computing resources operated by the Pittsburgh Supercomputing Center, NCSA, and the TeraGrid.
This work was funded in part by the NASA Advanced Information Systems Research Program grants NNG06GE23G, NNX08AY72G, NSF CAREER award IIS-0845397, NSF CRI grant CNS-0454425, the eScience Institute at the University of Washington, gifts from Microsoft Research, and Balazinska's Microsoft Research New Faculty Fellowship.