Jiayi Weng on OpenAI & AI Infrastructure

Source: WhynotTV Podcast

About this Transcript This document provides an unofficial English transcript of the podcast interview with Jiayi Weng, focusing specifically on his experiences at OpenAI. The original interview was conducted in Chinese on the WhynotTV podcast.

As official transcripts are currently unavailable on platforms like Apple Podcasts or YouTube, I utilized OpenAI Whisper to generate this English text with some manual corrections. The goal is to bridge the language barrier and make these technical insights AI Infrastructure accessible to the broader engineering community.

Disclaimer

Accuracy: As this transcript was generated via automated speech-to-text tools, it may contain translation or transcription errors.
Copyright: All intellectual property and copyright for the original audio content belong to the creators of WhynotTV.

Chapter 1: The Interview & The Decision to Join

Interviewer: Okay, let’s talk about OpenAI right away. But before OpenAI, I’m actually very curious. Did you consider reading for a PhD when you were looking for a job?

Jiayi: No.

Interviewer: Why?

Jiayi: Because those of you who have been in the industry will find that if you want to enter the industry, then reading for a PhD is a waste of your life. You can totally use a Master’s as a jump board and then come up with PhD standards in the industry. For example, you can buy presentations in a Master’s or you can buy presentations in this course. And then do some projects that can make you different from others. And then you can compete with PhD candidates at the same time. And then look at what you can do to make the other person choose you as a master, not another PhD.

I think it’s very important to think clearly about differentiation.

Interviewer: So you knew very early that my future must be the industry.

Jiayi: Yes, because I think teaching or being a professor in the academic world is too boring. And then it’s not what I want. And then I have to go and get funding for a project. And then it’s better to feel like there are a lot of restrictions.

Chapter 2: Academia vs. Industry Engineering

Interviewer: For example, we are a company, we are going to recruit and then we have the same Master and PhD. Do you think these two types of training are actually not very different? Because PhD is more about training your academic ability, right? You have to write a paper and make the story round and then paint it beautifully. Do you think this is important for a company?

Jiayi: To some extent, there is training.

Interviewer: But if you compare it with your extreme engineering ability, how do you think these two abilities will judge these two abilities in the current era of AI? Who is more valuable?

Jiayi: In the current era, of course, engineering ability is better. But if you put it at the time, it’s really hard to say. So my approach at the time was to satisfy both of them as much as possible.

Interviewer: You remember that you also sent a paper anyway. And then the open source infra work must also be very popular. Engineering ability is strong enough. Then why do you think it’s already very obvious now that engineering ability is the first one?

Jiayi: Let me quote a word from my colleague. My colleague was also a PhD student at RL (Reinforcement Learning). And then he got a very famous RL framework. And then he said: “It’s harder to teach a researcher how to do engineering than to teach an engineer how to do research.”

That’s because some of the research labs are actually fighting for the correctness of infra. If your infra is correct, then you can see how many times you can stack in your unit time. Because you have an idea anyway, you can’t just find someone to discuss it. Then the idea comes out, and then you verify it. As long as you can verify it well, then you are equivalent to this is your research work. And then you don’t actually move so much your brain.

People who move their brains are probably people who have been in this field for a long time. For example, Alec. And then he started in this field from the beginning of GPT-1. And then he may have a very strong sense of research. It’s more useful for him to move his brain than for ordinary PhDs to move their brains. And then you can just discuss it with him.

The idea is very cheap. And then what you have to do is to verify how many effective ideas you can verify in your unit time. And if it’s the right infra, the right result, you can stack it quickly.

Interviewer: And now the PhDs don’t have this ability or…

Jiayi: I don’t focus on this. Because this is not important to them. Because I think the current academic training system is about how to have a good academic direction. But there are actually people in the company who have this direction. Because as long as you work in this field for a long time, you will have some sense of research. And then you will realize what is good, what is bad, what should be done, what shouldn’t be done.

So the idea is cheap. And there is a very strong engineering skill to quickly stack the infra. After you have this, you can verify the idea. Maybe the agent can do it.

Interviewer: Yes. Because in my understanding, every infra has a different degree of bug. And then whoever fixes the bug, the more bugs they fix, the better the model training is. So Llama can’t train GPT because Llama has too many bugs?

Jiayi: Maybe. I don’t know. But I might guess so.

Interviewer: So you realize very clearly that the whole pipeline has to work. The key is not the innovation of your algorithms, but the correct super food, the correct infra, the good system that makes you quickly stack.

Jiayi: Yes. And I’m not very willing to do this research. This doesn’t attract me at all. I prefer to sell shovels.

Chapter 3: Selling Shovels at OpenAI

Interviewer: You like to have a playground. You put the foundation on and let others play, let others post their papers. You don’t care.

Jiayi: Yes. And then when others post their papers, they can probably bring me. And then you will find that many of the model releases in OpenAI have my name. This is because I have installed the whole post-training RL infra inside OpenAI.

Interviewer: So you are the core contributor to the whole post-training RL infra.

Jiayi: Yes. Because everyone used the whole post-training RL infra to train the RL Chef model before. So every time I post a big release, every time I post a big model, my name has to be put on it.

Interviewer: So you can be considered that every model behind OpenAI has you. Because you like to sell shovels.

Jiayi: Yes. And I am the one who sells shovels to the most face-to-face customers. Because RL is the top of the whole infra. The ecological value is very high. So if it is too low, the name may not be very good. If you write a data loader or storage, it may not be that good. But it’s something that everyone wants in RL.

Interviewer: So smart. I also thought about how my career should develop.

Jiayi: And then I set another target. I want to maximize the number of times I have a name on the OpenAI blog.

Interviewer: You are really good at writing rewards for yourself. And what do you need to do for this reward?

Jiayi: Then you must write infra first. Because if you do this one research, you can’t scale up. And then if you do infra, everyone uses you. So you can scale up. And I am good at writing RL infra. So this is a very suitable opportunity.

Interviewer: This is basically your main line after you enter Tsinghua. RL infra. Today you will encourage, because you have already passed this choice. But many of our viewers in this podcast may still be in this class. They may be saying that they are still hesitating between industrial and academic. How do you help them think about this process? Especially in 2025, under such a situation. From a long-term perspective, I still think that the academic world should be redeveloped. But now the young people who have a revenge for their future, they want to make an impact like you. So is he going to study PhD or enter the industrial world as soon as possible?

Jiayi: I think it’s better to enter the industrial world as soon as possible. Because if you study PhD, you don’t know what will happen after you graduate from PhD. Maybe after graduation, this meal has already come. Then you will find that what you do may not be of much use.

If your target function is to enter this AI lab, then you have to figure out what kind of people the AI lab needs first. If they need more people in Infra, then you can do more work in Infra. Even if you don’t have a PhD degree, it doesn’t matter. Because more importantly, it depends on whether your experience matches or not.

Interviewer: So what kind of people do you think the AI lab needs most now?

Jiayi: I think it’s still Infra. Yes, Infra is an invisible hole. Those people who have a sense of research, because after you graduate from ChatGPT, and you have worked in this industry for more than three years, you can get rid of them. Then the current problem is still Infra. How many times can you get rid of them in a single unit time? Then it directly determines your productivity efficiency.

Interviewer: It sounds like it’s not a particularly friendly environment for PhDs now. Maybe this also involves a gap. I think both of us have a very deep understanding. Because RL’s research in the academic world is focused on the tasks of Atari and MuJoCo. Overfit. It’s higher than you and me at 100k. But the industry doesn’t care at all. What the industry does is use RL to solve the real problem.

Jiayi: Yes, and then I realized this. Then when I realized this in August 2022, I gradually stopped the development of Tianshou. Because Tianshou is still targeting these toy benchmarks. Then I think I should invest more time in more meaningful things. For example, in the internal development of RL Infra. So my main experience is in maintenance or development of RL Infra.

Interviewer: In fact, your perception is actually very advanced in 2022. Absolutely not consensus. Why didn’t you break the news and write a blog to persuade me not to do it?

Jiayi: I’m afraid that if I say this, OpenAI will say I’m a secret agent.

Chapter 4: The Birth of ChatGPT & Post-Training

Interviewer: Okay, let’s talk about OpenAI. I’m very curious. We’ve already mentioned a lot. But in OpenAI, you are a very few in the world. From ChatGPT 3.5, GPT-4, GPT-4V, GPT-4o, GPT-4.5 to GPT-5. Behind all the core contributors are you. Maybe some people have contributed the first half. Some people have contributed the second half. But you have been there from start to finish. Your main contribution, I would say, is three words: Reinforcement Learning, Post-training, and Infra.

Jiayi: Yes.

Interviewer: We’ll talk about these techniques and the story behind them later. But I want to ask the first question first. What is reinforcement learning?

Jiayi: If there is feedback, if you can model an environment and then get feedback in the environment, then this cycle is reinforcement learning. Through that feedback, it gets better and better.

Interviewer: Then the second key word is post-training. What is post-training of a large language model?

Jiayi: There was no post-training at that time. At that time, the team was called RL. At that time, there was no distinction between pre-trained and post-trained.

Interviewer: So you didn’t have this when you first entered OpenAI. But when you first entered OpenAI, was ChatGPT already the main line?

Jiayi: No. When I first entered OpenAI, there was only the RL team under John Schulman. Then I went to make a version after WebGPT. WebGPT is a 3.5 model to do browsing. But if you only use 3.5 to do browsing at that time, the effect will be unimaginable. Because this browsing needs a tool call.

So we went back and forth. Let’s do the user interaction experience first. The only thing to solve is CHAT. Then CHAT can be solved through instruction following. In the way of RLHF.

Interviewer: So when you went in at that time, the 3.5 model was already available internally?

Jiayi: Yes. But the PPO pipeline at that time was very difficult to use. At that time, the biggest one we used was the 3.5 SFT. And then it fell down there several times. And then there were only four after that. Then 4GRAG wrote a special infra to support how to use GPT-4 training infra to support reinforcement learning training.

Interviewer: So when you joined, it was in July 2022?

Jiayi: Yes.

Interviewer: At that time, 3.5 was already the entire OpenAI all-in? Because there are still a few months before the release, right? Can you imagine that ChatGPT had such a large-scale success at that time?

Jiayi: No.

Interviewer: Of course, you can inside the model, right? Did you notice that this was going to be a game-changing thing?

Jiayi: No. Because I think I can see a lot of things that are insufficient. The first time I used it, I might think that this is a modular model. Then I used it a few times and found that it might help me solve a little problem on the code. That’s it. But it can’t help me solve so much. Then I used it a few times and it didn’t work that well. Because the problem it can help me solve is limited.

Because you already know this thing in advance. And then you have a process that goes on and on. And then you don’t think it’s so sudden. But for example, after I showed it to the people around me, they thought it was very sudden. And then I didn’t expect that.

Interviewer: At that time, OpenAI had already all-in ChatGPT in 2007?

Jiayi: No. Only our team was doing it. You can look at the blog of ChatGPT. It pulls down. Then it has a contributor. Then Joseph should be ranked first. Then the next one is Barret. Then Barret is the name of our team.

Interviewer: When did you realize that my work at OpenAI really triggered this thing?

Jiayi: Maybe it was when I was sending ChatGPT. I was running Neurons. It was a very long time ago. I sent it on November 30 and then December. After a few days, I found that people around me were discussing the issue of ChatGPT. Then they thought it was very useful. And then they packed the OpenAI server several times.

Interviewer: Just like when I dropped out of school. The server I dropped out of school was also packed several times. Then there may be this kind of… Self-request.

Jiayi: Yes, the effect of self-propagation. Yes, everyone is self-propagating. Will self-propagate products for you.

Interviewer: It sounds like you joined OpenAI to do this. Including OpenAI’s internal launch of ChatGPT. This kind of thing doesn’t seem to have been planned by anyone. Maybe it’s a series of half-incidental and half-necessary causes and effects.

Jiayi: Yes. At that time, sending ChatGPT was just to see if we could collect some real user data. Then maybe five days later, it was closed. If no one was there, we were looking forward to it. At first, there were 10,000 or 20,000 users. Then it fell behind and then it was gone.

Interviewer: What about the curve?

Jiayi: It’s an index.

Interviewer: This is the success of ChatGPT’s sudden explosion at that time. I’m also particularly curious about your first experience when you first went to OpenAI. What is the first impression that this company gave you?

Jiayi: I feel like it’s a big laboratory. In fact, I didn’t imagine that there was such a law. But there are a lot of people with a strong sense of research in it. Then he can point out the direction and then do it. But since Barret, Luke and Liam came, these three people came from Google and joined the RL team. Then our team changed and began to introduce the advanced productivity of Google. Then it started to stack.

Interviewer: Google is still very smart. There is a picture of the stack times and success rate of unit time. Then this thing is a real pen. So the more unit time you stack, the more your success rate goes up.

Jiayi: Yes. This is also a curve of RL. Because RL is not always trial and error. You keep trying and then try a certain number of times. Then you can achieve your goal. In fact, a lot of cases like this in life are actually RL.

Interviewer: So the advanced productivity that you introduced at the time was a philosophy. That is to say, let’s not think about what genius idea or genius algorithm is. Let’s put infra well. Infra lets us stack 30 times a week to 300 times a week.

Jiayi: Yes. Almost.

Chapter 5: Talent Density and Organizational Structure

Interviewer: I saw an interview. Someone asked Sam Altman, what is the reason for OpenAI’s success? He said that the reason why OpenAI can make a breakthrough in technology innovation is because in a small team with a very high talent density, any mediocre performance cannot be tolerated. Do you agree with this statement?

Jiayi: I agree. Because if the talent density goes up, then you can create something unexpected on your own. But if you change the environment, for example, if everyone is very mediocre, then it may be that at least one person chooses in front of the door. Then it’s okay to do your own thing. Then it may be delayed.

Interviewer: At that time, you went to OpenAI. It should be a company of hundreds of people.

Jiayi: I am 280.

Interviewer: You are 280. Now OpenAI should have more than 3,000 people. So ten times. Three years. Do you think OpenAI can still maintain the style of hard and innovative in this small and beautiful team?

Jiayi: I think this probability is declining. But it’s not that bad. But you can always draw out a small team and then do some research.

Interviewer: Do you think the leaders of OpenAI have any interesting efforts to be able to avoid it in this respect? For example, simplify some organizational structures, and then cancel all unreasonable meetings.

Jiayi: I think the organizational structure is more important.

Interviewer: What is an effective organizational structure that is conducive to hard and innovative?

Jiayi: The flow of information.

Interviewer: How do you understand it?

Jiayi: For example, today there is a decision, and then it can be transmitted to the bottom without damage. Then there is some progress made below, and then it can be transmitted to the top without damage. Otherwise, it may be that the top is playing a role and the bottom is working in two completely different directions, and then it does not come out late.

Interviewer: How does OpenAI make this good information flow?

Jiayi: First of all, like Sam and Greg, they will have a special… Sam used to have a special research assistant to help him understand the progress of some of the latest research in the company. Greg, I don’t have to say, he participated in most of the infrastructure. Yes, so they are very familiar with this technology.

Then all you have to do is maintain the sensitivity of the technology. You have to know what the current state of doing this thing is, and then there will be some self-improvement, and then there will be some use. So this thing has to be done by people like one hand and two hands. He has to be willing to go in and study the details, to understand every little thing about the company. I think it’s very similar to managing the company and managing the code library, which is consistency. If you are inconsistent, then you may be like a person, for example, like a brick man. His body moved, but his feet didn’t move. That’s weird.

Chapter 6: Technical Deep Dive: RLHF & Infra Challenges

Interviewer: Okay, let’s talk about post-training. Let’s start from 3.5.

Jiayi: But in fact, the PPO of 3.5 didn’t actually communicate so much. I first communicated with 4. Because 3.5 was the old one at the time. The new one was just right on August 24. Then I connected the first version of PPO on the new one. Then I ran with 4. Then it should be in September 2022.

Interviewer: This is very interesting. So when 3.5 came out, there was already that 4. And RLHF was first set up on 4. Then it was established on 3.5.

Jiayi: Yes. But in fact, because the other teams actually helped us step on a lot of holes, we used some of the PPO that we still have. How to use it, how to use it. But mainly some of the main things. For example, how to train the Reward Model. Then train yourself. Then collect the data yourself. Then ask yourself what’s wrong with the infrastructure.

Interviewer: At that time, what were the key challenges and breakthroughs for trying to get RLHF to work?

Jiayi: I think it’s how you should measure the performance. Because no one knows what the performance should be like.

Interviewer: You mean that after you’ve trained a lot of checkpoints, you don’t know if it’s getting better?

Jiayi: Yes. For example, this single reward will happen. This reward hacking. It is possible that this reward saturator will become a straight line. It will gradually rise and then become a straight line. But the real situation is that if there is a reward, it will go up first and then drop slowly. This is reward hacking.

So you can’t know which checkpoint is really better than other checkpoints. So choosing checkpoints is actually a technology. But we actually didn’t spend too much time choosing checkpoints. We just built a bunch of sampling-based evals. Then let’s take a look at what each benchmark looks like. But more of the benchmark is actually just a tree. And then if it passes a tree, it’s okay. You can’t say it’s good or bad. Because every time you run a model, it’s very different. There are a lot of noise.

Interviewer: How do you solve it in the end?

Jiayi: In the end, you really take it down and take a look. Then interact with it a few times. For example, what kind of experience do you have for yourself? Then find more people to look at and then vote for everyone.

Interviewer: So eval RLHF is still using HF to eval.

Jiayi: Yes, yes, it can only be like this. There is no way. Because at that time, there was no way to use technology.

Interviewer: So this is your first time to build RL infra at an industrial level. What do you think is the difference between the large-scale RL infra that large models need and the single-task or toy-task RL infra that you have built before?

Jiayi: I think the difference is very big. Because the bottleneck of toy-task is the environment. Because its model is very simple. Whether it’s training or making this action, it’s very cheap. But RL infra is that your model is very large, but the environment is very simple. The environment is a prompt. But if you want to adopt a model, you have to consider how to adopt it. Then if you want to train, you have to consider how to train it. Because this thing is probably a few seconds for the environment to provide this prompt. But if you run this inference, run this training, it may take hundreds of seconds or thousands of seconds. If you have fewer GPUs.

Interviewer: What new challenges do you think these models in the future have compared to 3.5 and 4? For the RL infra.

Jiayi: I think it’s still about performance or how to scale up. That is, how to use more GPUs. How to store more data.

Interviewer: That’s right. That’s not just related to RL, but also to the inference of the model.

Jiayi: Yes, it may be more end-to-end. And it will go into some implementation details. And then do some end-to-end optimization.

Interviewer: In fact, what you did is quite an intersection. You have RL. You have to understand RL. Then you have to understand MLCs. Then you have to understand how language models are inferenced. These you have to understand.

Jiayi: Yes. Maybe not in that position. To be honest, students at school can’t learn this kind of thing. That position is very exhausting. And then it’s very tiring. And then, I actually did it for a while. It’s really very tiring. Then I entered ER. It’s too hard to work overtime. And then my brain hurts and I can’t stand it. Then I went to the ER and took a look. But the doctor said it was nothing.

Interviewer: How hard was your work at that time?

Jiayi: I woke up in the morning and kept writing debug. Or say there were some problems. Then I slept all night.

Interviewer: How many days a week?

Jiayi: One week may be an average of six days, right? But after a while, you will find that this is not sustainable. So first of all, you have to have this body. You have to make sure your body is healthy. I now have a habit of running 3,000 times a week. But when I was in Tsinghua University, I only ran 3,000 in PE class. Then I wouldn’t run 3,000 at all. But now I realize that this is very important.

Chapter 7: The Future of Infrastructure

Interviewer: I think I’m so envious of what you did in OpenAI for the past two years. Because what you can explore is that 99.99% of the world’s research and infrastructure engineers have no right to touch anything. You have the advantage of being able to read the sky first. You can make this optimization on the most advanced model. Then you explore in an unknown field every day. And you know that your exploration must be the first in human history to find this.

Jiayi: But I think I also do some very unique things. I think my job is to maintain my daily life. I don’t need so much intelligence.

Interviewer: You don’t need too much intelligence.

Jiayi: Yes, you just have to do things right. Then the direction is very important. Then you just have to do something you think is right in the right direction.

Interviewer: Do you think RL for large models still needs a breakthrough? Or do you think this trend is already there? Let’s pull up the infrastructure.

Jiayi: I think there is still something that you can’t predict in the current state. What’s going to happen next? There may be new models. There may be new RL models. There may be Post-Chain models. All possible. So you have to face some challenges every day.

But then again, I think I’m actually very lucky in this position. But if you change me to any person, If he has my context, He should be able to take over completely. Yes, so I don’t think this is something that only I can do. You can do it for any normal human being.

Interviewer: You are too modest. Looking forward to the next five to ten years, what else do you think is waiting to be explored? Where will the biggest challenge be? The tranquility of the current model ability. Then what kind of breakthrough do we need to see?

Jiayi: I think the current state is not fully scaled up. Let’s wait for it to slowly heal and climb. Let’s wait for it to slowly come out of some low-scale RL experiments. Then look at how much performance it can reach. Then go and see what else needs to be done.

Interviewer: So what you mean is that there is no scale-up at all. It’s not that the compute is not enough, but that the current performance has not yet dried up. Let’s dry up the current method and the current compute first. Let’s see how long we have pushed. And there are still a lot of infrastructure bugs. Even your current infrastructure is very sure that there are bugs.

Jiayi: Yes, you can’t say 100% there are no bugs. After all, it’s human. Then everyone may make mistakes. Then you have to fix it. For example, because there are many people, then the context is inconsistent. Then everyone will write some strange things.

Interviewer: In the future, what do you think will be the biggest bottleneck in the post-train pipeline?

Jiayi: I think the bottleneck lies in the throughput of the infrastructure. How many bugs can you fix in a single period of time? And how many times can you replace in a single period of time? How many times can you properly replace? The rest is not important.

Interviewer: This can replace everything else. Whether it’s the algorithm or the environment.

Jiayi: Yes, if you fix all the bugs, then you may not even need to change the algorithm. It’s very good.

Interviewer: Then how do you improve the efficiency of the infrastructure? What kind of framework do you need? What kind of people do you need? What kind of resources do you need?

Jiayi: We are still exploring this. Yes. What I am doing now is that I am actually not in the core position. But I think I should do some more important things. Our team is rebuilding the infrastructure within OpenAI. And then we will do the next generation of infrastructure. Every generation of your infrastructure will be pushed back.

According to the current understanding, we will make a good top-level architecture. And then we will write like you wrote in the video. It will still be very small. It is currently being pushed back. Because the previous generation of infrastructure has been more than three years. And then there are actually a lot of problems with it. And then we hope to use a new infrastructure to clean up a lot of previous technical debt. And then we can give some researchers better iteration speed within the unit time.

Interviewer: So researchers will not participate in the process of infrastructure building. They may give some requirements. But they are not responsible for how to do distributed training.

Jiayi: Yes. And then they may change a flag at that time.

Interviewer: It sounds like OpenAI’s researcher may be the first one to be replaced by AI.

Jiayi: Yes, I feel so. Anyway, you can experiment with how many ideas in unit time.

Interviewer: Is it called idea?

Jiayi: Yes, it is idea. You can generate ideas very cheaply. You can write how many songs by yourself. And then you can also be modeled by AI to generate ideas. Yes. And then the next step is to replace it. I think the priority will be to replace the researcher, and then replace the infrastructure engineer. And then replace them all. But sales may not be so expected. Because you have to convince the other person to pay the bill. The AI may not be able to touch people. It may still be more important to communicate with people.

Chapter 8: Agents, AGI, and Strawberry

Interviewer: OK, we just talked about Text-Only 3.5. And then we talked too much. What do you think the difference between Agent and RL’s PostTrain?

Jiayi: There is no difference in nature. There is no difference in nature. It’s the same thing. It’s just that a few more steps of decoupling are added in the middle. Maybe the environment will change more.

Interviewer: Yes, the environment changes. So you think it’s not a new challenge compared to the standard AOM plus RL PostTrain, Agent plus RL PostTrain?

Jiayi: Yes, because it’s the same thing in itself.

Interviewer: What is your personal definition of AGI? Do you think we have reached AGI now? If you think you haven’t reached it or you’re still a little short of it, do you think the path called PreTrain and RL PostTrain can lead us to your definition of AGI?

Jiayi: There is a joke in OpenAI. You can catch 15 people and there may be 20 ways to define AGI. Yes, and my previous definition was that if this thing can complete 80% or 90% of the task that I think is meaningful, then it may be AGI. I don’t think so at the moment. Because at least from my point of view, I’m still worried that it will change my infra code directly.

Because this is very out of the distribution. AI infra is almost zero compared to its data ratio. And the feedback of AI infra is too long. The cost is too high.

Interviewer: Yes, the cost is too high. At present, there is still no way to deal with this. Good for you. It sounds like you won’t be AI-ed in the short term.

Jiayi: Yes, but anyway, before Strawberry came out, we actually used Strawberry for a while at that time. Then everyone thought that my job was going to be replaced. Or we wrote a pile of shit. And then after Strawberry, it will help us clean up. Now it seems that maybe a year or two has passed, and the time is still there. It doesn’t really change anything. Everyone will overreact.

I think this technology is here. What do we want to do? But in fact, it’s not that. It’s a very slow process.

Chapter 9: Open Source vs. Closed Source

Interviewer: Are you open now? OpenAI used to be brought by the Chinese CMU of academic institutions. And then it may have been brought by the lab of research. But now a lot of people criticize that OpenAI has nothing to do with Open. And then before you, your biggest hobby was also to open up. Break the information gap. Do you think this will create a conflict between your personal goals and preferences?

Jiayi: I think this is a trap. I actually still love to open up. If OpenAI has something to open up, I might try to participate. But I think I should do what I think is more important. And this opening is a trade-off for OpenAI. You can’t just open up the best model because the company wants to survive. If the company can’t survive, then you may not be able to continue to finance and do some experiments. And then there are some breakthroughs. These are all very realistic problems.

Interviewer: So I agree with this trap. But when OpenAI was first established, its architecture was actually very special. It was not created with the architecture of a commercial profit company. At least the slogan on the public data at the beginning was to make the whole of mankind equal and equal. Do you think “B source” (closed/open source) is to make this goal closer or farther?

Jiayi: First of all, the whole of mankind has been split into two parts. The first one is to achieve AGI. The second one is to make the whole of mankind equal. The whole of mankind is easy to say. Just pile up the IOU or the pre-training, the pre-calculation, and then scale up. The current solution to make the whole of mankind equal is to make products at as low a price as possible.

For example, there are free ChatGPT users. They can easily access this technology. For example, you can also use this voice mode for free. And then do some experience. This may be more beneficial to the whole of mankind. Instead of just opening up and throwing a naked model weight, it doesn’t know how to use it.

Interviewer: So you mean that OpenAI’s Open is not open to other large companies, large model companies, but to ordinary people. But such a strategy, if I think this is the last inch, we are going to reach AGI, I think it makes sense.

Jiayi: I agree.

Interviewer: But if AGI is still not a marathon, but not something that can be solved in a short time, will it be more technically open and transparent? It will also make OpenAI itself more profitable and reach AGI as soon as possible. Or do you think OpenAI doesn’t really need to reveal its technical details now, take back the community, and don’t need any help from the community? OpenAI is already self-contained and can achieve AGI.

Jiayi: Let me think about it. I think there is a path. You can open source and accept the feedback from the community, and then have a better AGI. This is something that can be done. It can be done for a reason, but there are actually many difficulties.

For example, if you are the first, if you open source, then the other people will be the first, and then the other people will be trained, and then the other people will close source. And then it will lead to your… Not everyone has the same heart. And then it will lead to, for example, OpenAI in the current environment, it may not be self-contained, and then no one will continue to study for it. This is a bit of a debate. Even if I want to take care of the whole of mankind for AGI, but maybe some people don’t think so. Some people just want to make money. So in order to prevent this from happening, OpenAI has to close source. At least that’s what I think.

Interviewer: This is the company’s survival, right?

Jiayi: Yes.

Interviewer: If OpenAI is now infinite resources, you don’t have to worry about death. Do you think you will be very happy to open source your RL infrastructure in the past two or three years? Will you be happy?

Jiayi: Of course I will be very happy. In fact, John Schulman asked me if I wanted to open source. And I didn’t think it was a good idea. Because this is for the company’s consideration. But he also asked me about this.

Interviewer: So is it DeepSeek, at least it’s OpenWay, that makes OpenAI re-evaluate internally?

Jiayi: Yes.

Chapter 10: Sam Altman and The Board Crisis

Interviewer: You just said that OpenAI’s mission is to be divided into two parts. First, achieve AGI, then create the whole of mankind. To go deeper, if we want to understand this mission, what do you think is the biggest opportunity and challenge for OpenAI?

Jiayi: Execute. Execute in the right direction. As long as it can be executed, don’t let it happen again.

Interviewer: For example, the company will close down in November 2023, right? The time when Sam Altman was fired?

Jiayi: Yes.

Interviewer: You want the whole organization to be more stable and better. It can be beneficial for you to move forward quickly. So from the internal perspective, what kind of information did you receive when Sam was fired? Because the information we received outside is very mysterious. What did Ilya see?

Jiayi: Nothing. It’s just a rumor. It’s just a rumor. And then a lot of people are making rumors here.

Interviewer: So what’s your internal perspective?

Jiayi: The internal perspective is, well, it should be not trusting. Ilya and some other board members don’t trust Sam. They voted him out. But the people working under us are very surprised. I don’t know what happened. Because the board was not transparent to the people below. And we don’t know how this decision was made.

Interviewer: So you don’t trust him?

Jiayi: I don’t trust him. Because you can go to the official public investigation report.

Interviewer: Yes, this is consistent. But in fact, the final ending is Ilya leaving. How did this chemical reaction happen? It’s obvious that he has fired him. He doesn’t trust Sam either. But in the end, it seems that Sam is the one with more people’s hearts.

Jiayi: Because many employees think that if there is a leader who is born with pure technology, there may not be so many elements. It’s not just because of technology that AGI is implemented. It’s okay if your technology is implemented. But there are still many commercial factors involved.

For example, you have to finance, you have to calculate, you have to convince some people to vote for you. This is actually a very important part. And if you only have some very good research experience, it may not be very supportive of you walking such a long road. From a long-term point of view, people like Sam are still needed. So Sam may be the most difficult person to replace in AI. Because he needs to move in business, even geopolitics, in resources.

You can portray Sam as a personality, portray him as an identity. And then this identity, if you replace it with AI in a short time, then other people will lose the sense of identity of this ID. So this can’t be left behind.

Interviewer: You just said that you were sad for a long time on the afternoon when John Schulman left. But in fact, John Schulman is not the only one who left OpenAI. After OpenAI achieved great success, countless team members have left. Do you think an extremely successful organization like OpenAI is inevitable to lose a lot of talent in the end?

Jiayi: A healthy organization can be replaced by everyone. As long as you can continue to train new people, have the ability to build snow, and let this organization operate normally, it’s okay. Even though there are many people who left, you can still spend some time and energy to train a wave of new people and continue to build snow, so OpenAI is irreplaceable now.

Interviewer: Yes. Does that mean that the people outside OpenAI are not as difficult as they are today?

Jiayi: You can think so. In fact, it’s just doing the simplest things. There is no black magic.

Chapter 11: DeepSeek and the Life or Death of Infra

Interviewer: Now we may be in the world’s most intense technology competition in human history. OpenAI is the company that can do everything. I want to know how intense your internal atmosphere is now. Will you have a lot of pressure?

Jiayi: Looking at the group, looking at the deadline, looking at the project timeline, for example, post-training, the pressure is still quite big. Then, like other groups, for example, we are now in refactoring Infra, there may be pressure, but it is not as big as post-training. Because they have a very clear deadline. Then we can take a closer look, because we need to focus on longer-term considerations. We have to get things right.

Interviewer: So, in fact, there is a fierce competition outside. Whether it’s from xAI, Anthropic, or a large model company in China, will it spread to your internal company’s daily development?

Jiayi: Not really. In addition to DeepSeek, because they claim, on Twitter, that their deadline is very fast. And this is still a warning for many people. Because the internal deadline is actually a little slower than the others. So this is why we are going to Refactor Infra.

Interviewer: So, in fact, for a large model company like OpenAI, life and death is the cycle time of Infra.

Jiayi: Yes.

Interviewer: Other things, such as data or algorithms, how many researchers are there? It’s good to put people in. This is the unit’s human power. Isn’t AI Infra human power?

Jiayi: Good question. But AI Infra needs more context. But if you only have some data like Ablation, like running some experiments, you don’t need so much context. It’s very simple. You come in and write a for loop, then configure the data, and that’s it. This can be automated.

Interviewer: So your real warning point is to realize that DeepSeek’s internal Infra is very good. Their data is very fast. This has attracted your attention. In fact, you don’t care which model is on which board and how much higher than GPT. This is not something you care about at all.

Jiayi: We haven’t been doing anything specifically for LMSys for a long time.

Interviewer: And what you really care about is the iteration speed and success rate in the unit time.

Jiayi: Yes.

Interviewer: Do you think OpenAI is the best in the world at this level?

Jiayi: No, definitely not. This has a lot to do with the organization structure. For example, you pull out a team and start up a startup. Their loading speed must be much higher than OpenAI.

Interviewer: Yes, much higher than OpenAI. First, your code is small and the cost of communication is low. Then you just focus on your use case. Then you can do this.

Jiayi: Yes, but OpenAI has to consider a lot of use cases at the same time and trade-off in various aspects. This organization is big and has this problem.

Interviewer: If OpenAI is not the number one in this life and death line, every company has this problem. It means that every company will slow down at this speed.

Jiayi: Yes. Then it depends on how bad everyone is. Then it’s not that bad. It’s not that bad.

Interviewer: This difference is relative to a startup company. A startup company may be better at this level. Its other indicators may be difficult to compete with OpenAI. For example, its users’ feedback.

Jiayi: Yes, so this is all trade-off. This is human development. Once the human organization develops to this scale, it will inevitably face this problem. You can’t avoid it. It’s not that it’s hard to maintain a high human resource density. It’s that it’s hard to maintain the consistency of the context sharing of the organization structure.

This will lead to your first infra not being consistent. Then the infra starts to grow. And then the organization structure also starts to grow. This is how human beings are.

Interviewer: So in theory, there should be an agent with an unlimited context to replace the context sharing. It sounds like this is a barrage.

Jiayi: Yes, because this will solve the problem of the organization structure that has grown in human history, whether it’s a tie or something else. Yes, because the human mind’s context is limited. You can’t store so many contexts at once. But AI can. Maybe in the future, every company will have an agent with such an unlimited context. And then he’ll be the CEO.

Interviewer: Yes, he’s responsible for all the sharing, all the decisions. There may not be a decision maker that is more suitable for such an agent.

Chapter 12: Philosophy, Determinism, and Personal Future

Interviewer: Then let’s talk about the future. If you want AI to solve a world problem, what will it be? How do you predict the future? The future you’re talking about will definitely not be a cup, right?

Jiayi: Yes.

Interviewer: Why do you think this is the most you want to do?

Jiayi: Personally, for example, it’s very attractive to pursue a world that you’ve built yourself. But if you look at it in a high-dimensional space, you need to have a… You need to create a script in advance.

Interviewer: My understanding is, do you think our fate can be predicted?

Jiayi: Yes. This world is a confirmed one. So we live in a certain Markov process.

Interviewer: Yes. If we go deeper, do you think people don’t have a free will?

Jiayi: Yes.

Interviewer: What am I thinking in my mind now? What do I say next? What do I ask next? I’ve verified this countless times. It sounds like a very pessimistic worldview to me.

Jiayi: Yes. Maybe it’s true, but deep in my heart, I don’t want to accept it.

Interviewer: I don’t want to accept it either. It’s as if I’ve become a prototype.

Jiayi: Yes.

Interviewer: Why do you think so?

Jiayi: Some of my personal experiences can be predicted. So in theory, it can be solved with AI. But if you get a machine that can predict the future, then it is actually a disaster for the individual. I think this will lead to the collapse of all value systems.

Interviewer: Yes. If there is such an AI model, it may be the best choice for human society to destroy such an AI model, so that it will never come out.

Jiayi: Yes. But some people are very willing to develop this kind of model. Otherwise, it will be manipulated by the so-called theory.

Interviewer: In other words, someone likes such a model because he wants to jailbreak. So you mean after having such a model, I have to do something to get rid of it?

Jiayi: No. It’s just to figure out the rules behind this world. Or why is this world determined? Why is this world the so-called theory of life?

Interviewer: So you think God is not a fool?

Jiayi: Yes. The red coffin is not a fool. The micro coffin is a fool.

Interviewer: Is this established?

Jiayi: If you look at quantum mechanics… This is established because you can modify some of the world lines in the background. Although you can think of what I said as nonsense.

Interviewer: No problem. I don’t think it’s nonsense. I’m trying to reason behind it. I think I’ve thought about this for a long time. Is this world determined? Do we have a script? Do we just make a random process under the dynamics of this world and complete our random process?

Jiayi: I don’t think it’s random. I think it is.

Interviewer: You have no doubt about this.

Jiayi: I doubt it. But I tried to prove it. I found it so. I really want him to prove it. I have this question. When I was six or seven years old, I mentioned this to my parents. How do you know that what I said is not certain? How do you know that your reaction is not certain? But I found out later that if this is the case, I don’t think I’m human. Or… So the best way is to forget all this. Then pretend you don’t know this. Then go to experience some of the current experiences.

Interviewer: Have you always done this?

Jiayi: I’ve always done this. I can’t help it.

Interviewer: There is another explanation. The timeline is not linear. It can jump. You can use a theory to explain that the future me helps the past me to complete a certain role. You just said that when you were in elementary school or high school, you suddenly told the future you. You said you were going to make an impact.

Jiayi: Yes, this is just what came out of my head. But I don’t know if it’s the future. I’ll tell you.

Interviewer: Then why do you think the timeline is not continuous?

Jiayi: I think three-dimensional creatures have their own limitations. In the three-dimensional perception, this time is a linear one-way flow. But in the time of thought, this time is not one-way flow. It can jump at will. This is a reasonable explanation that I have found so far.

Interviewer: So you really think you have countless moments in the past where there is a future you, and you push the pillar behind it.

Jiayi: Yes, this is one of the possibilities. Because this can’t be proved.

Interviewer: And he pushes the pillar, which will cause you to push the pillar again in the future. So the best solution for you now is to forget about this.

Jiayi: Yes, I still believe that Sisyphus is happy.

Interviewer: Yes. We can talk about this for another half an hour. Let’s skip this topic first. I believe that every A.I., especially OpenAI, has a seed of entrepreneurship in their hearts. Have you ever thought about this?

Jiayi: Maybe. But I don’t think I have a good idea yet. And I think OpenAI is a good place for me.

Interviewer: What kind of idea do you like? Is it more product or what kind of idea?

Jiayi: I think I prefer product. For example, like Tianshou, like TuiXue. In fact, I have users for research infrastructure. Because your users are researchers. Then you can stack according to some of the researches. So you would like to have something like this. Anyway, there is a need, and then your talent is very good. And then what kind of technology is not important.

For example, TuiXue is a very simple notification system. You can’t count the system. Just a few lines of code. Just write a PHP. And then you don’t even need PHP. You just do one. Even if TuiXue is the first version every day, update once in the day, update once in the evening. I am manual. But even so, there are many needs. Technology is not important. What is important is to catch the demand.

Interviewer: What do you hope Gong Jiayi will be like in ten years? What kind of things will he do in one place?

Jiayi: I hope he does what he wanted to do at that time. And then have enough resources, enough ability to do what he wants to do.

Interviewer: You don’t interfere with what he thinks at that time.

Jiayi: Yes, because the idea will change. Maybe it doesn’t matter what you think. Anyway, it’s a certain past. How do you hope now is not important. And then what I can do now is to invest in me at that time. Or invest in the future. Let him have the right to choose.

Interviewer: Then why do you still have to invest? Anyway, whether you invest or not, you will get there.

Jiayi: But I think it will be better to invest. Just in case you don’t invest.

Interviewer: You are not lying to yourself. There will be no impact if you invest. We are a certain past. You can also go to that place if you sleep at home every day. The future of investment may also be certain. So whether you invest or not is not your own free will. Your previous future of investment was to learn high school math in advance. Or say something. In addition to technology and AI, what else do you want to invest in the future?

Jiayi: Retirement in advance. Then there is enough capital to do what you want to do.

Interviewer: Give you unlimited money. What do you want to retire now?

Jiayi: Spend a little time to find what you want to do.

Interviewer: Some people say that what you have now is not what you really want to do. You are just forced to survive. You want to make enough capital to retire.

Jiayi: What I want to do, for example, “211”, is something I want to do for a long time. But as time goes by, for example, after it is gradually stable, or after some things happen, then because everyone’s center of gravity will change. I used to figure out what I wanted. But I can’t figure it out now.

Interviewer: You can’t figure it out now?

Jiayi: Yes, I think this is normal.

Interviewer: So you are now in a period of confusion in your life.

Jiayi: Yes.

Interviewer: Did you ever feel that you liked RL infrastructure or do something impactful?

Jiayi: Yes, because I have already seen the end. Then the rest is a very certain thing.

Interviewer: The thing on the AGI board?

Jiayi: Yes.

Interviewer: Okay, Jiayi, I don’t have any more questions. Then I hope you at this time in 2025, at the last part of this podcast, leave a message to explore what you really want.

Jiayi: Although I have figured out what I really want, I actually haven’t figured it out yet. This question is worth thinking about.

WhynotTV Episode 4 English Transcript

Blog

Jiayi Weng on OpenAI & AI Infrastructure

Chapter 1: The Interview & The Decision to Join

Chapter 2: Academia vs. Industry Engineering

Chapter 3: Selling Shovels at OpenAI

Chapter 4: The Birth of ChatGPT & Post-Training

Chapter 5: Talent Density and Organizational Structure

Chapter 6: Technical Deep Dive: RLHF & Infra Challenges

Chapter 7: The Future of Infrastructure

Chapter 8: Agents, AGI, and Strawberry

Chapter 9: Open Source vs. Closed Source

Chapter 10: Sam Altman and The Board Crisis

Chapter 11: DeepSeek and the Life or Death of Infra

Chapter 12: Philosophy, Determinism, and Personal Future