NeuralArtifactDB

Data Management for Model Explanation and Exploration

NeuralArtifactDB

Deep neural networks (DNNs) have gained widespread adoption in machine learning (ML) applications. Explaining these DNNs is crucial for practitioners to gain insights into the knowledge acquired by their models. Although numerous methods have been proposed for this purpose, they often fail to scale effectively with respect to the sizes of datasets and models, leading to tedious model explanation and exploration. NeuralArtifactDB contains two projects on data management for efficient model explanation and exploration at scale: DeepEverest and MaskSearch.

DeepEverest

DeepEverest focuses on accelerating interpretation by example queries that return inputs (e.g., images) in a dataset that have certain neuron activation patterns, e.g., "given a group of neurons, find the top-k inputs that produce the highest activation values for this group of neurons", and "for any input and any group of neurons, use the activations of the neurons to identify the nearest neighbors based on the proximity in the space learned by the neurons". These queries help with understanding the functionality of neurons and neuron groups by tying that functionality to the input examples in a dataset.

DeepEverest is a system for the efficient execution of interpretation by example queries over the activation values of a deep neural network. DeepEverest consists of an efficient indexing technique and a query execution algorithm with various optimizations. Experiments with our prototype implementation show that DeepEverest, using less than 20% of the storage of full materialization, significantly accelerates individual queries by up to 63x and consistently outperforms other methods on multi-query workloads that simulate DNN interpretation processes.

MaskSearch

Machine learning tasks over image databases often generate masks that annotate image content (e.g., saliency maps, segmentation maps, depth maps) and enable a variety of applications (e.g., determine if a model is learning spurious correlations or if an image was maliciously modified to mislead a model). While queries that retrieve examples based on mask properties are valuable to practitioners, existing systems do not support them efficiently.

MaskSearch formalizes this problem and presents a system that focuses on accelerating queries over databases of image masks while guaranteeing the correctness of query results. MaskSearch leverages a novel indexing technique and an efficient filter-verification query execution framework. Experiments with our prototype show that MaskSearch, using indexes approximately 5% of the compressed data size, accelerates individual queries by up to two orders of magnitude and consistently outperforms existing methods on various multi-query workloads that simulate dataset exploration and analysis processes.

DeepEverest: Accelerating Declarative Top-K Queries for Deep Neural Network Interpretation. Dong He, Maureen Daum, Walter Cai, Magdalena Balazinska. PVLDB, 15(1): 98 - 111, 2021. doi:10.14778/3485450.3485460 Paper DOI Extended Technical Report Code Talk

@article{DBLP:journals/pvldb/HeDCB21,
  author    = {Dong He and Maureen Daum and Walter Cai and Magdalena Balazinska},
  title     = {DeepEverest: Accelerating Declarative Top-K Queries for Deep Neural Network Interpretation},
  journal   = {Proc. {VLDB} Endow.},
  volume    = {15},
  number    = {1},
  pages     = {98--111},
  year      = {2021}
}

MaskSearch: Querying Image Masks at Scale. Preprint Code

@article{he2023masksearch,
  title={MaskSearch: Querying Image Masks at Scale},
  author={He, Dong and Zhang, Jieyu and Daum, Maureen and Ratner, Alexander and Balazinska, Magdalena},
  journal={arXiv preprint arXiv:2305.02375},
  year={2023}
}