Hacker News

by karakanbon 2/27/24, 3:18 PMwith 48 comments

by simonwon 2/27/24, 6:16 PM

I was surprised to see SQLite listed as a source but not as a destination. Any big reasons for that or is it just something you haven't got around to implementing yet?

I've been getting a huge amount of useful work done over the past few years sucking data from other systems into SQLite files on my own computer - I even have my own small db-to-sqlite tool for this (built on top of SQLAlchemy) - https://github.com/simonw/db-to-sqlite

by yevpatson 2/27/24, 11:18 PM

Firstly, congrats :) (Generalized) ingestion is a very hard problem because any abstraction that you come up with will always some limitations where you might need to fallback to writing code and have full access to the 3rd party APIs. But definitely in some cases generalized ingestion is much better then re-writing the same ingestion piece especially for complex connectors. Take a look at CloudQuery (https://github.com/cloudquery/cloudquery) open source high performance ELT framework powered by Apache Arrow (so you can write plugins in any language). (Maintainer here)

by sascjmon 3/5/24, 9:16 PM

Hi Burak. I have been testing ingestr using a source and destination Postgres database. What I'm trying to do is copy data from my Prod database to my test database. I find when using replace I get additional dlt columns added to the tables as hints. It also does not work for a defined primary key only natural keys. Composite keys do not work. Can you tell me the basic, minimal that it supports. I would love to use it to keep our Prod and Test databases in sync, but it appears that the functionality I need is not there. Thanks very much.

by matijashon 2/27/24, 4:24 PM

This looks pretty cool! What was the hardest part about building this?

by kipukunon 2/27/24, 5:47 PM

Do you think you'll add local file support in the future? Also, do you have any plans on making the reading of a source parallel? For example, connectorx uses an optional partition column to read chunks of a table concurrently. Cool how it's abstracted.

by e12eon 2/28/24, 1:10 AM

Looks interesting. Clickhouse seems to be conspicuously missing as source and destination. Although I suppose clickhouse can masquerade as postgres: https://clickhouse.com/docs/en/interfaces/postgresql

Ed: there's an issue already: https://github.com/bruin-data/ingestr/issues/1

by hermitcrabon 2/27/24, 8:57 PM

I am very interested in data ingestion. I develop a desktop data wrangling tool in C++ ( Easy Data Transform ). So far it can import files in various formats (CSV, Excel, JSON, XML etc). But I am interested in being able to import from databases, APIs and other sources. Would I be able to ship your CLI as part of my product on Windows and Mac? Or can someone suggest some other approach to importing from lots of data sources without coding them all individually?

by jrhizoron 2/27/24, 5:14 PM

I like the idea of encoding complex connector configs into URIs!

by parkcedaron 2/27/24, 9:28 PM

This looks awesome. I had this exact problem just last week and had to write my own tool to perform the migration in go. After creating the tool I thought this must be something others would use- glad to see someone beat me to it!

I think it’s clever keep the tool simple and only copy one table at a time. My solution was to generate code based on an sql schema, but it was going to be messy and require more user introspection before the tool could be run.

by chinupbuttercupon 2/27/24, 6:52 PM

This looks pretty cool. Is there any schema management included or do schema changes need to be in place on both sides first?

by andenacitellion 2/28/24, 10:31 PM

Any thought on how this compares to Meltano and also their Singer SDK? We use it at $DAYJOB because it gives us a great hybrid of standardizing so we don’t have to treat it differently downstream while still letting us customize,

by ab_testingon 2/27/24, 7:32 PM

If you can add source and destination as csv, it will increase the usefulness of this product manifold.

There are many instances where people either have a csv that they want to load into a database or get a specific database table exported into csv.

by infotropyon 2/29/24, 3:49 PM

Looks really interesting and definitely a use case I face over and over again. The name just breaks my brain, I want it to be an R package but it’s Python. Just gives me a mild headache.

by PeterZaitsevon 2/27/24, 8:56 PM

Looks great Burak! Appreciate your contribution to Open Source Data ecosystem!

by ijidakon 2/27/24, 9:21 PM

Is there a reason CSV (as a source) isn't supported? I've been looking for exactly this type of tool, but that supports CSV.

CSV support would be huge.

Please please please provide CSV support. :)

by skangaon 2/27/24, 7:33 PM

Hi Burak, I saw cx_Oracle in the requirements.txt but the support matrix did not mention it. Does this mean Oracle is coming? Or a typo?

by Phlogion 2/27/24, 8:22 PM

I'd love to see support for odbc, any plans?

by yankoon 2/27/24, 9:48 PM

Db2 like not existing db in the real world

Show HN: I built an open-source data copy tool called ingestr