GPT-4 architecture: what we can deduce from research literature

by kir-gadjelloon 3/14/23, 11:42 PMwith 6 comments
by kir-gadjelloon 3/14/23, 11:42 PM

As the discussion of GPT-4 heats up, the absence of details on its technical implementation becomes only more glaring. As an engineer, I have not learned anything applicable I haven't known yesterday from the newest OpenAI publication!

I have been investigating issues of LLM training and inference for quite some time, and have developed a number of hypotheses about future SoTA models, which I believe very likely apply to GPT-4.

by amrbon 3/15/23, 12:26 AM

I'd like to know how it can support 32k when all the other models I've seen are 2-4k, does this mean it's got a bigger layer for attention or it's 4x billions of parameters Large?

by seydoron 3/14/23, 11:48 PM

Well if the model is so smart, could it be that it is actually aware of its layers and parameters?