Show HN: Cape API – Keep your sensitive data private while using GPT-4

by gavinuhmaon 6/27/23, 1:04 PMwith 29 comments
by zb3on 6/27/23, 2:16 PM

So now instead of sending the data to OpenAI we send it to Cape? I know that you "promise to keep it secure", but I can only trust you, right? Something like this should IMO be done on-premise

by moffkalaston 6/27/23, 2:07 PM

Well let me ask the obvious question: Won't this redact data that is obviously crucial to getting the given task done?

Let's say in case of financial statements, it if can't read credit card numbers and names, then it can't tell you which days some credit card was used and by who. Maybe that's not the typical use case, but I would imagine it being very annoying, given the already high typical LLM failure rate.

by luke-stanleyon 6/27/23, 2:51 PM

I checked out Github for more info as suggested, and it seems the main ingredient, https://github.com/capeprivacy/private-ai is forked from udacity/private-ai. Hmmm. I was expecting to find a clever and useful repo to nicely identify and strips out personal info, that's not what it is.

I do think stripping and adding personal info back only when needed is in principle a good idea for some situations. But I have big doubts at the injection of another party into the mix.

by luke-stanleyon 6/27/23, 3:07 PM

I wonder what SOTA open-source PII stripping libraries there are? Something like https://github.com/microsoft/presidio for stripping out PII might fill the role I expected https://github.com/capeprivacy/private-ai to do.

by ebg1223on 6/27/23, 4:17 PM

Cool concept! I do have a concern about the healthcare aspects of the product as advertised. Do you provide a signed BAA for healthcare organizations? Without one, all the healthcare use cases listed on the site are basically a non-starter. Having said that, if you are using AWS as described, you already have a BAA with them, and providing one to clients should not be a huge deal.

by dingobreadon 6/27/23, 2:24 PM

Neat idea, but making this a cloud based SaaS makes it useless for us. The docs claim that your company wouldn't see the data, but we'd still be sending unencrypted data to your own team's black box endpoint. We would have to blindly trust your company, but this isn't any better than just blindly trusting OpenAI.

by sullivanmatton 6/27/23, 2:24 PM

If I were a user or integrator, how do I know that the de-identification step is actually working? Is there a way to test (and/or continue testing) your regex patterns or whatever mechanism used continues to accurately strip my sensitive information before it goes to OpenAI?

by noqckson 6/28/23, 12:50 AM

How do you deal with data persistence for storing the documents/vectorDB inside a Nitro Enclave? I would assume that you as the SaaS vendor are unable to decrypt the sensitive documents inside the enclave or see a users chat history?