OmniQuery: Contextually Augmenting Captured Multimodal Memories to Enable Personal Question Answering

UCLA1, Univeristy of Washington2, Stanford University3
ACM UIST 2024 Poster (Full paper under review)
Banner Image

OmniQuery enables free-form question answering on personal memories (i.e., private data in albums) with RAG. Specifically, it applies contextual data augmentation (taxonomy-based) to enhance the retrieval accuracy, and uses LLMs to generate answers based on the retrieved memory instances.

Exemplar Personal Questions

OmniQuery enables answering questions that require multi-hop searching and reasoning over personal memories. Here are some examples of the questions that OmniQuery can answer:

Look Up and Locate

Banner Image
Example 1: Look up the memories of a friend's wedding and locate information about the venue; Example 2: Find the last memory of vet visit.

Exploratary Search

Banner Image
Example 3: Recall the travel experience.

Summarize and Compare

Banner Image
Example 4: Summarize and compare workout frequency.

Implementation

OmniQuery leverages Retrieval-Augmented Generation (RAG) to answer complex personal questions on large amount of personal memory.

Banner Image

Specifically, the accurate retrieval of relevant memories is crucial for the performance of the system. To this end, we propose a novel taxonomy-based contextual data augmentation method to enhance the retrieval accuracy. The taxonomy is generated based on a one-month diary study, which collects realistic user queries and the necessary contextual information for integrating with captured memories. The taxonomy is then used to augment the captured memories with contextual information, which is used to retrieve relevant memories. The retrieved memories are then used to generate answers to the user queries using a large language model (LLM). For more details, please refer to our paper.

Banner Image

Full Video

BibTeX

@misc{li2024omniquerycontextuallyaugmentingcaptured,
  title={OmniQuery: Contextually Augmenting Captured Multimodal Memory to Enable Personal Question Answering}, 
  author={Jiahao Nick Li and Zhuohao Jerry Zhang and Jiaju Ma},
  year={2024},
  eprint={2409.08250},
  archivePrefix={arXiv},
  primaryClass={cs.HC},
  url={https://arxiv.org/abs/2409.08250}, 
}