One of the things I always wonder with giant relational database is. How much of the "typical relational stuff" are they actually using?
Do they have constraints on rows? Are they using views or do they just denormalize and duplicate? Do they use joins at all? Are they even doing more than 1 thing in a transaction?
I didn't understand how they avoid downtime during the upgrade - they start with "Stop traffic to the database cluster" and then perform a bunch of steps that each take 5-10 minutes or longer before starting traffic again - so do they just accept the downtime and schedule it in advance, or did I miss a detail?
(2018)
Irony. I think the HN post took down the Slony website.
50TB is not so big these days. I read that in 2008 (!) Yahoo had a 2+ PB PG database. What is the largest you know of, 14 years later?
Thanks for sharing this. Really interesting. But a basic question, why not upgrading to PG 13 instead? I am curious about the reasons for staying on an older version of PostgreSQL.
Upgrading, not updating.
While others saying 50T DB is relatively normal, personally I can't imagine myself dealing with that sort of data which is always online. If I were them, I'd ask myself, could we archive this data into split chunks? Is this archive even required to be online? What is the size of a subset that has actuality? And so on. Of course they have answers to that and they are generally "no", but my imagination just stops here.
Some time ago when I studied bitcoin ledger structure, I was confused about how it does sum up all transactions to get the balance so quickly, like we in accounting usually do, with the help of some period caching, which is another point of failure and maintenance. Bitcoin is a large enough database to not do that easily. Few docs later I realized that it doesn't do this:
mining -> A 1
A -> B 0.2
(sum(A) == 0.8)
And instead it does this: mining -> A 1
A -> (B 0.2; A 0.8)
(last(A) == 0.8)
No sums required, all balances are checked by essentially "lookup the latest appearance in a db", where lookup also involves some merkle trees optimization, which I didn't understand enough to remember.
This article mentions that a key piece of software they use in all this is Slony[0], software which allows replicating one primary/master to many secondaries/slaves, even when those replicas have different versions. That's pretty cool, but I'd like to instead draw your attention (dear reader) to the phenomenal writing happening in Slonys description of itself. It describes, in plain but technical language what Slony is, then immediately jumps into addressing "why should you use this?" and does so very directly. It reiterates the problem it solves, mentions other potential solutions by name to contextualize the current solution space, then explains in a nice bulleted list exactly the situations and niches that Slony is tailored for.
Oh my heavens, I wish every piece of technology described itself this way!
[0] - Slony -- https://www.slony.info/