Image Analytics
Scientific discoveries are increasingly driven by analyzing large volumes of data. Increasingly this data is in form of images. However, systems support for large scale image analytics and machine learning is still scarce. This project is focused on leveraging decades of research on DBMS to build systems that support domain scientist working with image data, allowing them to focus on research on image data rather than worrying about storing, managing, comparing data, models and visualization for their research.
Papers
- 
    Comparative Evaluation of Big-Data Systems on Scientific Image Analytics Workloads, VLDB 2017 - 
        In this first investigation we evaluate five big data systems for parallel data processing: a domain-specific DBMS for multidimensional array data (SciDB), a general purpose cluster computing library with persistence capabilities (Spark ), a traditional parallel general-purpose DBMS (Myria), along with a general-purpose (Dask) and domain-specific (TensorFlow) parallel programming library. To evaluate these systems, we implement two representative end-to-end image analytics pipelines from astronomy and neuroscience. 
- Slides
- Paper
 
- 
        
- 
    Multilabel multiclass classification of OCT images augmented with age, gender and visual acuity data, under submission. - Optical Coherence Tomography (OCT) imaging of the retina is in widespread clinical use to diagnose a wide range of retinal pathologies and several previous studies have used deep learning to create systems that can accurately classify retinal OCT as indicative of one of these pathologies. However, patients often exhibit multiple pathologies concurrently. Here, we implement a novel neural network algorithm that performs multiclass and multilabel classification of retinal images from OCT images in four common retinal pathologies: epiretinal membrane, diabetic macular edema, dry age-related macular degeneration and neovascular age-related macular degeneration.
- Paper
 
People
- Parmita Mehta
- Magdelena Balzinska
- Alvin Cheung
- Ariel Rokem
- Andrew Connelly
- Aaron Y Lee
Acknowledgements
This work is supported in part by NSF grant AITF 1535565 and a gift from Intel.
