GPT-5 (ChatGPT) for Agent Mode Data Science in Visual Studio Code

Name: GPT-5 (ChatGPT) for Agent Mode Data Science in Visual Studio Code
Uploaded: Aug 7, 2025
Duration: 1069 s

Data Science for Public Health1.58K subscribers

846 views

Aug 7, 2025

17:49

Today saw the release of GPT5 for OpenAI. In this video I test it in agent mode in Visual Studio code, doing the same test as I used for Claude 4. You can find that video at https://youtu.be/xQbPlymrDSs In the video I instruct the agent in how to populate a Jupyter notebook and then explain the analysis I want in a prompt. The data set contains two columns. One to indicate if a participant has a particular disease or not and one to indicate whether a new diagnostic test is positive or negative. This allows us to consider the sensitivity, specificity, and the positive and negative predictive values. I also promote the agent the recalculate the PPV and the NPV given a change in the prevalence of the disease. The agent instructions are below. --- applyTo: "notebook" --- # Notebook Instructions * Start all notebooks with the Python code `%config InlineBackend.figure_format = 'retina'` to enable high DPI plotting. * Use a clear and descriptive title for the notebook. * Use markdown section cells to organize the notebook into sections. * Import all necessary libraries at the beginning of the notebook. Install any Python packages that are required and not yet installed * Use code cells for executable code. * Ensure that the code is well-commented and easy to understand. * Do not create excessively long code cells. * Break code cells into shorter, manageable chunks. * Do not use `print` statements to comment on the results of code. Use text and LaTeX in markdown cells after each code cell execution instead. * Use markdown cells with text and LaTeX for results, interpretations, explanations, comments, and documentation. For instance if the code cell contained the code `df.Data.mean()` and the result is 42, use a markdown cell to write: The sample mean $\bar{X}=42$ beats per minute. * Use consistent formatting throughout the notebook for a professional appearance. The prompt is below. The "Metrics.CSV" file in this folder contains data on an experiment to determine the sensitivity, specificity, and the positive and negative predictive values of a new test for a specific disease. A total of 322 participants were selected for the experiment. To reflect the current prevalence of the disease in the community, which is 18.6%, 60 participants have the disease and 262 do not. The `Disease` column has two classes. `No` indicates the absence of the disease as measured by a gold-standard test, and `Yes` indicates the presence of the disease by the same gold-standard test. The `Test` column has two classes and records the results of the new test. `Negative` indicates a negative test for the disease and `Positive` indicates a positive rest for the disease. Analyze the data using the following plan: 1. Calculate and visualize the frequency and relative frequency of each class in each of the `Disease` and the `Test` columns. 2. Create a contingency table of the two columns and visualize the results. 3. Calculate and interpret the joint probabilities of the contingency table. 4. Calculate and interpret the marginal probabilities of contingency table. 5. Calculate and interpret the sensitivity and the specificity of the new test. 6. Calculate and interpret the positive and the negative predictive values of the new test. 7. Recalculate the positive predictive value and the negative predictive value for a prevalence of 40% using the law of total probability. Explain why the increased prevalence changes the positive predictive value and the negative predictive value. 8. Write a summary and conclusion of the analysis of the new test. The CSV file is available at https://github.com/juanklopper/TutorialData/blob/main/Metrics.csv

Download

0 formats

No download links available.