It’s a cool idea. I’ve wasted a lot of time over the past few months futzing around with beautifulsoup, Playwright and others I forget, or cloning entire repos and trying to figure out exactly which incantations for which build tools are going to get me the built docs I need, all in service of setting them up for retrieval and use by LLMs. Some projects (e.g. Godot, Blender, Django) make it very easy. Others do not (Dagster is giving me headaches at the moment).
I would probably prefer to receive unmodified, plain text/md versions (with the heavy lifting done by, e.g., docling, unstructured) than LLM summaries though, since I’d rather produce my own distillations.
I would pay for that kind of thing. I think the intersection between ethical scraping and making things machine-readable is fertile ground. For a lot of companies it’s something that can be of great value, but is also non-trivial to do well and unlikely to be a core competency in-house.
Why does it need to cost a lot of tokens, and why can't it incrementally update?
How do the VS code integrated AI helpers do it?
Oh! Recently I had the experience of working with someone who was using LLMs to build something using my JS canvas library. The code the LLM was producing for this person was ... sub-optimal. Over-complicated. Not a surprise to me as my library is very niche and the only documentation around it is the docs/lessons I've written myself. So now I'm in the middle of an exercise to write documentation[1] that tries to explains everything (that I can remember) that the library does.
The problem is, I've no idea how useful that documentation would be for LLM consumption - does anyone know of an "Idiot's Guide to writing documentation for LLM consumption" so I can review my work to date and improve the docs going forward?
[1] - In this branch. I'm writing the documentation in .md files which get converted into .html files (using Sundown) during a build step: https://github.com/KaliedaRik/Scrawl-canvas/pull/119/files