Not exactly the same thing, but along the same lines of "git and s3 are both object stores, why not use s3 as git storage?"
ctoth 17 hours ago [-]
Came here for a five-gallon bucket hooked to Dulwich (archiving rain?), Slightly disappointed :)
Go Git and Dulwich and friends are indeed fun tech.
StarlaAtNight 13 hours ago [-]
I’m not sure I understand what pain point this solves. What’s the value of doing this? Is it at a certain scale?
supriyo-biswas 15 hours ago [-]
Great work, though I wish some of this work could be upstreamed to Gitea instead?
xena 15 hours ago [-]
Author of the post here. I'll talk with someone I know at Gitea. I don't think this is viable to upstream into Gitea, but there's only one way to find out!
Eikon 17 hours ago [-]
Most of the pain here is the typical set of issues people run into trying to make S3 a filesystem as-is, common with S3FS-family approaches.
ZeroFS (https://github.com/Barre/zerofs) is 9P/NFS/NBD over S3 on an LSM. Point stock go-git, or just /usr/bin/git, at a mount and skip the gymnastics. Rename is a metadata op in the keyspace, so you get it atomic on any S3, no Tigris-specific X-Tigris-Rename needed.
Different point on the spectrum, but less square-peg, also most probably much, much faster (it works great on linux-sized repos) :)
znnajdla 15 hours ago [-]
I wouldn’t call it gymnastics. The surprising part of the article was that Git itself is an object store that happens to use a filesystem for persistence, but an S3 bucket might actually be more suitable than a .git directory on POSIX.
mikepurvis 14 hours ago [-]
It makes sense, similar to how blob storage is a natural fit for a nix cache, where you have a giant flat space of many hash-address immutable directories/archives.
I think most of the pain documented in the article is just that git-the-implementation contains a lot of assumptions about it being a (local) filesystem that it is operating on, hence stuff like calling stat a ton of times, or doing the rename trick to get atomic behavior from a not-normally-atomic operation (updating a file in place).
If it were possible to define a "backing storage" API layer within git, it might be possible to move all the filesystem/posix-centric stuff to the other side of it and leave behind an interface that maps quite nicely to blob storage.
15 hours ago [-]
15 hours ago [-]
xena 16 hours ago [-]
Author of the article here. I'm aware of ZeroFS and other similar approaches (such as something internal at Tigris that will become public at a later date), this was more of an experiment to see how far you can get with stuff I already had "on the shelf". I am going to be improving this a fair bit; I just need to plan out what I'm gonna work on and figure out the best times to stream it, etc.
supriyo-biswas 15 hours ago [-]
Would probably be great to have ZeroFS backed PVs for the K8s folks :)
I did something similar, though a full reimplementation of a git and git-lfs library in Elixir. Still a work in progress though as the S3 backend isn't quite complete and there are performance problems doing some git things through S3.
This was really thought provoking — it made me realize that Git just happens to use a filesystem for persistence, but doesn’t necessarily have to. A POSIX filesystem might not even be the best way to store a git repo. Makes me wonder: what else could speak Git + POSIX? Redis? Postgres? IPFS is a fun one — it’s already content addressed.
TonyStr 12 minutes ago [-]
I've seen a few people on HackerNews swear by Fossil ( https://en.wikipedia.org/wiki/Fossil_(software) ) which uses SQLite as the backend instead of the filesystem. There are a few other approaches as well
xena 15 hours ago [-]
IPFS would be much better for something like LFS support than git repos itself. Git repos are very mutable bits of state and IPFS is best for immutable state. I'm aware of mutable IPFS pointers, but I think the best bet currently is to use immutable things for immutable objects and mutable things for mutable objects.
[1] https://git-annex.branchable.com
Not exactly the same thing, but along the same lines of "git and s3 are both object stores, why not use s3 as git storage?"
Go Git and Dulwich and friends are indeed fun tech.
ZeroFS (https://github.com/Barre/zerofs) is 9P/NFS/NBD over S3 on an LSM. Point stock go-git, or just /usr/bin/git, at a mount and skip the gymnastics. Rename is a metadata op in the keyspace, so you get it atomic on any S3, no Tigris-specific X-Tigris-Rename needed.
Different point on the spectrum, but less square-peg, also most probably much, much faster (it works great on linux-sized repos) :)
I think most of the pain documented in the article is just that git-the-implementation contains a lot of assumptions about it being a (local) filesystem that it is operating on, hence stuff like calling stat a ton of times, or doing the rename trick to get atomic behavior from a not-normally-atomic operation (updating a file in place).
If it were possible to define a "backing storage" API layer within git, it might be possible to move all the filesystem/posix-centric stuff to the other side of it and leave behind an interface that maps quite nicely to blob storage.
https://anvil.fangorn.io/fangorn/ex_git_objectstore
The documentation isn't quite correct, but it's getting there
https://repo.autonoma.ca/repo/treetrek/tree/HEAD/git