{
    "objectID": "/weblog/2026/05/03/simple-data-analysis-setup/",
    "type": "weblog",
    "title": "Simple data analysis setup",
    "file": "weblog/2026/simple-data-analysis-setup.md",
    "url": "/weblog/2026/05/03/simple-data-analysis-setup/",
    "date": 1777800600,
    "date_year": 2026,
    "tags": [
        "python","jupyter","data","duckdb"
    ],
    "content": "Over the last couple of months I\u0026rsquo;ve had to do a bit more number-crunching than usual in order to write design docs. In a recent example I had to work with a datasource that was slow to query (GitHub\u0026rsquo;s REST API) and eventually needed to work with 100,000+ data points.\nIn order to demonstrate my setup there, I\u0026rsquo;ll work on a slightly contrived example: Get data about all PRs in the grafana/grafana project within the last 6 months. Then try to find out which of these include documentation and code changes (basically anything but changes to the docs folder).\nCore setup For me, the most convenient way to think about larger amounts of data is using Jupyter notebooks. I have a local Jupyter Lab set up where I collect all my analysis into separate folders.\ncd notebooks uv init uv add jupyterlab uv add ipywidgets uv add pygithub uv run jupyter lab Setup Before being able to access GitHub data, I need to log in. My gh cli is already set up, so I\u0026rsquo;m going to use that instead …"
}
