At Ai2, I lead the research on data-driven scientific discoveries using AI. For the past 2 years, we have been working on how AI can be used to mine novel insights from large-scale datasets. We started with a position paper (ICML), then a benchmark (ICLR), and then moved towards building solutions for long-horizon explorations (NeurIPS).
Our recent release is a new tool called AutoDiscovery. It’s an AI system designed for open-ended data-driven discovery, capable of autonomously generating hypotheses, running experiments, and interpreting results with large datasets. For the past six months, we have been collaborating with oncologists (Providence Swedish), ecologists (Scripps UC San Diego) and social scientists (UUtah). We see notable discoveries made by the system, which were later independently verified by our partners in the lab or otherwise. Here is a brief account of these findings.
Today’s discussion aims to partly address ongoing speculation about autonomous AI-driven discovery while also raising broader questions about AI–scientist interaction in long-horizon research tasks. We will use data-driven discovery as the setting to examine questions that could reshape the very fabric of how we do science.