GPT-Migrate converts repos from one lang/framework to another

by transitivebson 7/2/23, 3:46 AMwith 170 comments
by sdwolfzon 7/2/23, 7:05 AM

Who/what is the:

1. Author

2. Copyright holder

3. Copyright license

...of the code generated by this tool?

Unless the answer to all of these is unambiguously "the original", then you shouldn't be using such tool on any code, especially on your employer's intelectual property.

Sorry to be so negative about it but this is something that I see skipped over in all discussions related to AI. Just because it's AI does not make it immune to copyright law. You're giving away your code to a 3rd party company under their terms and conditions, and receivig some new code back, again under their terms and conditions. The fact that it uses AI under the hood is irrelevant, you're dealing with a business that produces you an output and you should know the terms before submitting anything to them, especially if you don't own that thing.

by kingroloon 7/2/23, 9:59 AM

It feels to me as though LLMs should (eventually?) really shine at these kinds of tasks where the intent is already defined in code of some sort and the challenge of the task is lots of detailed legwork that humans find hard, more because it's time consuming and not interesting so hard to focus on, rather than because it's technically challenging.

So swapping languages, yeah maybe, but I expect of more practical use would be the situation where you inherit a legacy codebase in an ancient version of a language or framework that hasn't been loved in a long time. I saw this so many times when doing dev team for hire work.

Obviously you'd want to do boat loads of testing and there may well be manual work left to do afterwards, but I think it would be the kind of manual work that felt like you were polishing something new and clean and beautiful rather than trying to apply bits of sticky tape to something unmaintainable.

I also wonder about eventually being able to say to an LLM "take this codebase and make it look like my code", or maybe one of your favourite open source developer's code. Maybe everyone could end up with their own code style vector attached to their github profile describing their style. You could find devs with styles close to yours to work on your team, or maybe find devs with styles different to yours so you could go and argue about tabs vs spaces or something.

by cfnon 7/2/23, 6:32 AM

This would really useful if it worked with legacy code. For example, you could migrate all that COBOL code into Java or Python, or all the Fortran scientific code into C++ or Python.

I tried to migrate a twenty year old Visual Basic 6.0 project to c# by doing it piecemeal with GPT4 and it failed completely. Both in the UI and the backend. I am keeping my fingers crossed for a GPT n+1 that actually can do this.

Incidentally, I found out that GPT4 (as in chatGPT) is very useful if you need to program in VB6 which is nearly absent from search results these days.

by dns_snekon 7/2/23, 10:14 AM

These examples always look so interesting and promising until you try them out with anything more than a "Hello world" application. It would be very interesting if it worked beyond trivial examples, but I'm not holding my breath.

by vesseneson 7/2/23, 12:22 PM

Some good comments in here; I have done a fair amount of work with GPT-4 as a writer of go code, and there are two categories of difficulties for a "Tier 2" (Tier 1.5?) language with GPT-4.

The first is API hallucination, which hits as soon as you drop down into non "major" repository packages. Even GPT-4 acts like 3.5, and will cheerfully make up / use old API interfaces, pretend it knows newer versions that it does not know, and generally loop you around in very, very convincing-looking code that just does not work.

The second is style related. In particular, Go is picky with its error return semantics, and GPT-4 doesn't worry too much about this; I'm remembering a particularly subtle and annoying deadlock where it didn't defer closing a database connection inside a go routine, or alternately check for an error, and close the handle.

On balance, both of these seem super, super solvable, either by a custom LLM, or a next version with updated training. I think of GPT-4 as a reasonable mid-to-senior engineer in terms of output right now, and I think it's reasonable to start trying to port frameworks.

That said, I think I'd want it to do an excellent job at porting tests over first, and I'd inspect those heavily, and then I'd consider how to deliver a style guide for the target language in the prompts. By default, GPT-4 doesn't know exactly how you want things coded.

One last comment, Claude seems appealing to me here, with its longer context window. That said, I haven't been successful at fully using the context window -- e.g. "here's a tarball of a repo, please do x/y/z". I think word on the street is that the Claude folks use ALiBi, regardless, the 100k attention window from Claude feels more like one that can choose to alight on key areas of the input, not one that can take the entire 100k tokens into context.

by ic4lon 7/2/23, 11:05 AM

How many people even have access to the gpt-4-32k, or gpt-4-32k-0613 models.

I think they give everyone access to the gpt-3.5-turbo-16k, but I have not found a way to request access for the 32k model.

There does seem to be an option through azures openai service: https://azure.microsoft.com/en-us/products/cognitive-service...

by discordanceon 7/2/23, 8:10 AM

Supported languages are in config.py:

Python, JavaScript, Java, Ruby, PHP, C#, Go, Rust, C++, C++, C++, C, Swift, Objective-C, Kotlin, Scala, Perl, Perl, R, Lua, Groovy, TypeScript, TypeScript, JavaScript, Dart, Elm, Erlang, Elixir, F#, Haskell, Julia, Nim, PHP

by YeGoblynQueenneon 7/2/23, 12:37 PM

>> GPT-Migrate is currently in development alpha and is not yet ready for production use. For instance, on the relatively simple benchmarks, it gets through "easy" languages like python or javascript without a hitch ~50% of the time, and cannot get through more complex languages like C++ or Rust without some human assistance.

In other words, it doesn't really work.

The current wave of LLM applications still seems to me like someone just invented homeopathy and a whole bunch of people are convinced it's real and are trying to use it to create a cure for cancer. It's just people waving their hands about and intoning magick formulae, that don't work and don't produce anything useful at all.

I am curious to see where all this is going to end up. Is someone going figure out a way to make LLMs work for real-world er work? Are we all waiting patiently the next big LLM version to see if it can do the things that the current best-of-the-best can't?

by complex_expon 7/2/23, 8:56 AM

Much less impressive, though still useful: ChatGPT is an awesome movie subtitle translator. Only very unusual phrases need to be corrected, often there are no such cases. There are projects on GitHub that automate the translation. Short SRT files can be just pasted into the chat with appropriate instructions.

by isoprophlexon 7/2/23, 5:57 AM

Very interesting concept!

I'm left wondering if you could also use this to document or clean up machine generated code. Eg, some process generates a huge wad of bytecode, or autogenerated Java; a GPT tool cleans it up so you can actually do some things with it as a regularly skilled human.

by aardvarkron 7/2/23, 6:24 AM

I have a community website that I built a while ago in angular that I wish I had built in react instead. Mayne then I would at least get some help maintaining the open source repo on GitHub. I’ll give this a try and report back

by shipscodeon 7/2/23, 1:36 PM

I really doubt this works reliably given my own attempts at doing this for extremely small use cases.

by mariuzon 7/2/23, 9:34 AM

We need migration tool between frameworks : Rails -> Django

Fast API - Express

by simion314on 7/2/23, 6:45 AM

Unfortunately I can't trust GPT will not hallucinate something, I tested it to code review my code and it hallucinated issues.

It would be great if you could give it old , ugly code and you could get something better.

Maybe Intellj guys can use this tools to increase their productivity and we can get 100% correct tools that work with AST not with tokens, and can do advanced transformations and review that you can trust without having to double check it.

by swader999on 7/2/23, 3:58 PM

Someday we'll all be writing code that injects the programming language as a dependency. This migrate magic seems bass-ackward.

by cpursleyon 7/2/23, 6:36 AM

Interesting and potentially great use case. How does it figure out the minutia selecting adequate dependencies?

by stuaxoon 7/2/23, 8:43 AM

I've done using chatgpt on some demo code I got it to write between languages and Frameworks, it works to an "ok" level.

Edit: this was demo code I asked chatgpt to come up with in the first place, so the output had no problems license wise that the input didn't already have.

by revskillon 7/2/23, 7:14 AM

I expect to see how you use your gpt-migrate to migrate gpt-migrate to JS.

Then from JS to Python again.

Run the test and compare.

Once done, good job !

by dumdumchanon 7/2/23, 2:15 PM

Is there an upper limit on the size of code base it can handle? (LLM context size)

by 00117on 7/2/23, 12:19 PM

It would be nice if it could estimate GPT usage costs with a dry-run.

by hkton 7/2/23, 7:33 AM

Ah, finally a way to turn the output of nodejs programmers into something less hateful! All it took was the early days of the singularity ;)

by villgaxon 7/2/23, 6:51 AM

I just want a small model which will ingest a language spec & be capable of understanding external repo code

by aitchnyuon 7/2/23, 4:58 PM

Anybody who noticed semantics correctly translated? a [1,2]==[1,2] will be different in Python and JS.

by rurbanon 7/2/23, 7:25 AM

Whow, I always wanted to move away from my old perl web app to typescript. Will report the results

by acoveron 7/2/23, 6:16 AM

How well does this work?

Could it generate tests to confirm behavior in the target and source?

by DaiPlusPluson 7/2/23, 5:47 AM

I'm tempted to point this at the Linux kernel repo and tell it to convert it "to work on Windows" and see what happens...

by forgingaheadon 7/2/23, 4:28 PM

Need all Python ML repos to be migrated to Ruby equivalents ASAP.

by bdhcuidbebeon 7/2/23, 1:02 PM

sure it does. good riddance kids these days

by gaileeson 7/2/23, 7:40 AM

I can’t believe the author doesn’t allow conversion to JavaScript

It’s the most popular language for heavens sakes. Developers really need to stop bringing their religion into open source.