Looks interesting, we solved this problem with Kinesis Firehose, S3 and Athena. Pricing is cheap, you can run any arbitrary SQL query and there is zero infrastructure to maintain.
How does it scale? Can you spin up multiple containers? For upcoming features auto archiving to cloud storage old data would be great.
Looks great, but what is missing for me are use cases.
Why should I use it? What are the unique selling points of your project?
Looks good. In market for something like this and I just ran it locally. how do I visualize data ? Is Grafana not included by default.
Also, minor issue in your docs. There is an extra comma in the sample JSON under the sample event. The fragment below:
"properties": {
"totalAccounts": 4,
"country": "Denmark"
},
}]
I had to remove that comma at the end.Looks super interesting. Any positioning thoughts on this vs https://jitsu.com ?
I've been exploring open source data analytics software and it's been a game-changer. I mean the flexibility and cost savings are huge perks. I've been looking into Apache Spark and KNIME, and they both seem like great options
>LLMs are really good at writing SQL
Unfortunately not my experience. Possibly not well promoted, but trying to get vscode copilot to generate anything involving semi-basic joins fall quite flat.
What is the advantage of this rather than using a postgres plugin for clickhouse and S3 storage of the data to build a kind of data-warehouse, which wouldn't require the bloat of Kafka?
If you don't mind me asking, why the name "Trench"?
how is this different from Posthog?
Could this be used to log IoT object events? or is it more for app analytics?
I _totally_ associate 'trench' with 'analytics'. Oh, perhaps the author associates it with 'infrastructure'? Just stupid.
1) Appreciate the single image to get started, but am particularly curious how you handle different events of a new user going to different nodes.
2) any admin interface or just the rest API?
3) a little bit on the clickhouse table and engine choices?
4) stats on Ingesting and querying tbe same time
5) node doesn't support the clickhouse TCP interface. This was a major bottleneck even with batching of 50k events (or 30 secs whichever comes first)
6) CH indexes?
7) how are events partitioned to a Kafka partition? By userId? Any assumptions on minimum fields
Will try porting our in-house marketing automation backend (posthog frontend compatible) to this and see how it goes (150M+ events per day)
Kudos all around. Love all 3 of your technology choices.