Conversation with Zhang Yaqin: My Preview of the AGI Roadmap
If you're interested in exploring the future of artificial intelligence and what the world may look like, it's highly recommended to hear what Academician Zhang Yaqin from Tsinghua University has to say.
Friendly reminder: This article is quite long, with nearly 7,000 words. It is recommended to save it for later.
Everyone Talks About AGI, But What Does Its Future Look Like?At the end of 2024, we interviewed Dr. Zhang Yaqin, President of the Institute for AI Industry Research (AIR) at Tsinghua University and former President of Baidu, to explore the future of AGI (Artificial General Intelligence). In his vision, AGI will follow a roadmap of "Information Intelligence > Physical Intelligence > Biological Intelligence," progressively approaching and ultimately achieving its full potential.
This concept paints an optimistic picture of AGI's future. According to Dr. Zhang, information-based AGI could be realized within five years. Physical intelligence, such as humanoid robots, may take about ten years to reach maturity. Finally, biological intelligence will likely require 15 to 20 years to achieve. By that time, the human brain could experience comprehensive enhancements, human lifespans might extend significantly, and we may even witness the emergence of "a new species."
"Looking back at humanity 30,000 years ago, would you consider them the same species as modern humans?" Dr. Zhang remarked. "Thirty thousand years ago, early humans in caves were at the ape-man stage. They had begun using tools and fire during the Stone Age, but those tools were extremely primitive. Today, we have smartphones, computers, and the internet. A century from now, future generations might view our current technologies—smartphones, the internet, and personal computers—as we view fire and stone tools from 30,000 years ago. And this evolutionary pace is accelerating."
Dr. Zhang believes that perhaps within 30 years, or at most 100 years, new species could emerge that are far more intelligent than current humans. However, he asserts, "They will still be under human control and guided by our consciousness."
The following is our full interview with Dr. Zhang Yaqin.
Information Intelligence: The IQ of the Brain
Zhang Xiaojun: Previously, you mentioned that over the next 20 years, the journey toward achieving AGI will progressively involve information intelligence, physical intelligence, and biological intelligence. Could you elaborate on your envisioned AGI roadmap?
Zhang Yaqin: I’ve always believed that the new generation of artificial intelligence comprises information intelligence, physical intelligence, and biological intelligence.
- Information intelligence is straightforward—it includes tools like ChatGPT, which handle text, images, and videos, representing intelligence in the information world.
- Physical intelligence applies AI to autonomous vehicles, robots, and embodied intelligence, integrating AI into fundamental physical systems.
- Biological intelligence involves applying AI in areas like brain-computer interfaces, such as Elon Musk's Neuralink, which connects AI to biological organisms, including medical applications, surgical robots, and new drug development.
These domains are interrelated but distinct. I believe that within five years, we will achieve AGI-level performance in information intelligence, passing the Turing Test. ChatGPT has already reached a near-human level in text processing. The next steps are to add reasoning and multimodal capabilities. For example, "Sora" might emerge soon. Over time, in natural language and content generation, AI will create images and videos that match human capabilities. Within five years, information intelligence will reach human levels.
Zhang Xiaojun: Let’s talk about information intelligence first. Can it fundamentally transform the current business ecosystem?
Zhang Yaqin: Artificial intelligence itself doesn’t change the essence of business—it doesn’t alter what needs to be done. Instead, it significantly enhances productivity and gives rise to new business models, but the core principles of business remain the same.
Today, AI makes tasks like writing emails, creating articles, and coding much more efficient. It can even solve mathematical problems better than most people, surpassing mathematicians and physicists in some cases. Its intelligence level can match or even exceed human capabilities. AI primarily embodies intellectual prowess. Imagine someone with an extraordinarily high IQ who can solve problems, invent new formulas, write excellent articles, paint, and compose poetry—AI reflects this kind of intellectual manifestation.
However, as with swimming, no amount of theoretical study can replace the practice required to swim in water. This is a characteristic of the physical world—you must engage in real-world practice. Similarly, no matter how much you read about wine, you won’t truly understand its taste until you experience it.
In the information world, AI can function like an exceptionally knowledgeable and highly intelligent person, and this level of capability can be realized within five years. Physical intelligence will take more time. A high level of physical intelligence, such as in humanoid robots, may take around ten years. Will these robots surpass humans in every aspect by then? They will certainly outperform humans in most tasks.
Physical Intelligence: Rather Than Calling Them Robots, Let’s Call Them “Machine Anything”
Zhang Xiaojun: The first implementation of physical intelligence is autonomous driving.
Zhang Yaqin: Yes, autonomous driving is a relatively controlled domain. The field of robotics, on the other hand, is an open-ended problem, whereas autonomous driving is a closed one. When it comes to autonomous driving, it focuses solely on driving and excels at it. But robots are different—they need to have common sense, understand human behavior, and handle complex environments. To put it simply, autonomous driving is essentially a robot that can drive, a vertical application of robotics tailored to specific scenarios.
Take a skilled driver as an example: they don't need to write poetry, possess exceptional intelligence, sing, or know biology—they just need to excel at driving. It's similar to a factory worker specializing in welding; they focus on mastering their task. However, humanoid robots face far greater demands.
In terms of robot applications, I categorize them into three major scenarios: home robots, industrial robots, and social robots.
The concept of home robots is easy to grasp—they primarily assist with elderly care, household chores, and even chatting with people. However, developing home robots is undoubtedly the most challenging.
Social robots, on the other hand, operate in public scenarios, such as performing tasks like policing, security, delivery services, and driving, all of which fall under the realm of social behavior. In the future, police forces may still require some human officers, but a majority of police work could be handled effectively by robots. Robot security guards might even outperform human counterparts, though they would still need to follow human directives and serve as human assistants.
It may not be long before the number of human drivers decreases significantly, and autonomous driving becomes mainstream.
Zhang Xiaojun: So autonomous driving also falls under the category of social robots?
Zhang Yaqin: Exactly. Robots that roam the streets and frequently interact with the public are considered social robots.
In addition, there are industrial robots. These include robots operating in hazardous environments like mines or performing precise tasks within factories. Industrial robots generally have clear objectives and operate within specific scenarios.
However, we aim for the underlying technologies to be universal. Even though the front-end might vary—for instance, in terms of embodied intelligence or edge computing—the foundational technology should ideally be shared.
Zhang Xiaojun: Are you saying that home, industrial, and social robots all share common technology?
Zhang Yaqin: Yes, a universal backend model. I envision that 70%-80% of the backend technology should be shared across applications.
Zhang Xiaojun: Would this massive model resemble today's models, like those developed by OpenAI?
Zhang Yaqin: Exactly. These models must be multimodal—they need to learn from the physical world and then transform that knowledge into intelligence. While the specific implementations may differ in various applications, the backend should remain largely unified, forming a universal platform.
Zhang Xiaojun: Currently, large models and autonomous driving operate on two separate architectures. They cannot yet be integrated, but in the future, they certainly will.
Zhang Yaqin: Exactly. Right now, our backend systems are all powered by large models. On top of this, we build a vertical model specifically for autonomous driving, and all of these are deployed in the cloud. In addition, vehicles are equipped with a smaller edge model. I categorize this system into three levels.
It's similar to an operating system. The operating system functions as a cloud-based system, running Super Apps, and when deployed on a mobile device, it includes corresponding applications. In the future, whether you call them Apps or Agents, their essence will remain the same: to accomplish specific tasks, though these tasks will vary across domains.
For example, industrial robots don’t need to take on a humanoid form—they just need to perform their assigned tasks efficiently. However, for home robots, a humanoid design might be the best choice. That said, even home service robots don’t necessarily have to be humanoid; they could take on forms similar to dogs or cats. For instance, a visually impaired individual might use a robotic dog for guidance in the future.
So, instead of being confined to the term "robot," it might be more fitting to refer to this category of machines as "machines for everything."
Zhang Xiaojun: Why are humanoid robots the best choice for home use?
Zhang Yaqin: There is a clear distinction between humanoid and non-humanoid robots. In a home environment, elderly people, for example, would find it much more natural and smooth to communicate with a humanoid robot compared to a computer or other non-humanoid machines. As technology advances, humanoid robots will become highly human-like in appearance, possibly to the point of being indistinguishable from real humans, which will further enhance the naturalness of human-machine interaction. You could confide in it, treat it as a companion, and have it help you with various tasks. It essentially becomes like your personal butler.
Another reason for humanoid robots is that our current societal environment, including all infrastructure, is designed around human needs and characteristics. Tasks like climbing stairs or pressing buttons are relatively straightforward for humanoid robots because these facilities were designed for human use. However, this is ultimately a matter of choice.
For social robots, take police robots as an example. If they are too small, they may lack authority; if they are too large, they could intimidate people and make them feel uncomfortable. A humanoid design, however, conveys a sense of interacting with a fellow human, which aligns with psychological expectations.
Zhang Xiaojun: To avoid the “uncanny valley” effect.
Zhang Yaqin: Exactly. This is more of a social evolution—you see a version of yourself. I've often said that perhaps within 10 years, the number of robots will surpass the number of humans, and each person might have their own copy—a robotic version of themselves. It would act like your double. This robot could handle many tasks for you, essentially serving as your avatar. Ideally, everyone would have one avatar; having too many could create a host of problems.
This avatar would belong to you, be completely obedient, and function as an assistant. This assistant would be incredibly smart, perhaps even more capable than you, but it would always follow your instructions.
Zhang Xiaojun: Would it be classified as a home, industrial, or social robot?
Zhang Yaqin: It could be any of those. At home, it’s a home robot. In a factory, it could be an industrial robot helping you with tasks. These scenarios don’t necessarily require the same robot, but the robot must be adaptable to different settings.
Zhang Xiaojun: Between social and home robots, which is harder to achieve?
Zhang Yaqin: It depends. All humanoid robots are challenging to develop and require significant time. I think in about 5 to 10 years, we’ll begin to see some initial forms of humanoid robots.
In the robotics field, autonomous driving robots are an area where we already have tangible results. I just returned from San Francisco and rode in a Waymo self-driving car. The experience was amazing—it drove smoother and more steadily than a human driver. What’s remarkable is that San Francisco residents are now happy to use these cars; they no longer see them as something novel but instead consider them their preferred mode of transportation.
Zhang Xiaojun: This year, "Luobo Kuaipao" (a self-driving service) launched in Wuhan, but the public isn't quite ready to accept it yet.
Zhang Yaqin: Yes, right now, Luobo Kuaipao's safety is at least 10 times higher than human-driven vehicles, and it drives very well. Although it’s a relatively new service, passenger reviews have been very positive. At the moment, the public still sees it as an emerging technology. When comparing China and the U.S., Apollo and Waymo are the pioneers. When you see them, you immediately think, "This is the future of self-driving." But autonomous driving, as we see it now, could already be implemented in cars today or even serve as a type of taxi service. It's really more of a business model issue, as the technology is already in place.
Zhang Xiaojun: What’s your view on the L2+ route represented by Elon Musk and the L4 route represented by Waymo? Musk believes that his L2+ can evolve into L4.
Zhang Yaqin: It can evolve, but it’s still in the development stage and requires a lot of testing and new technological support. As it stands, there are still some limitations. The current Full Self-Driving (FSD) system can’t yet deliver a true Robotaxi service because FSD is designed for a mode where humans are still involved. During the ride, a human can still take control of the vehicle if needed. A true Robotaxi, on the other hand, doesn’t need human intervention at all. In the future, it might not even have a steering wheel. The technological demands for this are extremely high. However, it’s worth mentioning that this seemingly advanced technology has already been achieved by some companies. If Tesla were to get into this business, its scale and influence would be even greater. In the future, other car manufacturers will also have the capability to do the same.
Zhang Xiaojun: So, car manufacturers will be able to do this in the future?
Zhang Yaqin: Yes, they will. Waymo and Baidu's Apollo have proven that autonomous driving technology is feasible, both in China and the U.S. This is really important. Now, it’s just a matter of who can go faster and how their business models will land. Two years ago, I thought we couldn’t see it coming. I’ve always had confidence, but when asked when it would happen, I didn’t know. Now, it’s clear.
Zhang Xiaojun: If car manufacturers can do it themselves, they already have their own fleets. How will the business models compare to those of products like Waymo or Baidu Apollo? In the end, who will dominate this space?
Zhang Yaqin: There are basically three types of companies in this field. One type is service providers, like Didi today or traditional taxi companies. They focus on providing operational services. Another type is car manufacturers. Then there are companies that specialize in car-related products, like chips—companies like Horizon and Nvidia fall into this category. Some companies may choose to focus only on car manufacturing and not get involved in Robotaxi services. Others may focus solely on service operations, using vehicles made by other companies. This kind of scenario will appear in the industry. For example, Tesla claims it does everything, which is certainly a viable approach. As a result, competition between these companies will form. But in the end, it’s not just one company that will prevail—multiple companies will secure a place in the market.
Zhang Xiaojun: Li Xiang (CEO of Li Auto) and He Xiaopeng (CEO of XPeng Motors) have both said that they are working on L4 technology, but they will not be the platform providers for Robotaxi services.
Zhang Yaqin: Many car manufacturers might choose not to engage in this business. Just like now, they don’t engage in the taxi business but just sell cars. This situation will likely continue in the future. Overall, the entire industry consists of three key parts: first, the service; second, the vehicles; and third, the in-vehicle chips, components, and other parts.
Zhang Xiaojun: Why is autonomous driving and embodied intelligence so popular this year?
Zhang Yaqin: Now, everyone is seeing the dawn of industry development. Among the key factors driving this change is the emergence of large models, which can be seen as the spark igniting the transformation. Let me first talk about autonomous driving and then about embodied intelligence. Over the years, autonomous driving has faced several major challenges: the first is the lack of data and the need for massive amounts of test data; secondly, there are numerous extreme situations (corner cases). In real driving, safety issues in many scenarios are rarely encountered, but during testing, new problems often arise, highlighting the poor generalization ability of autonomous driving technology; lastly, in the field of AI, various models are fragmented. For example, map creation is one model, visual recognition is another, and language processing is yet another. These fragmented models had to be integrated one by one to handle perception, fusion, planning, and decision-making, each with a lot of rules, including AI algorithms, neural networks, and fixed rules, creating a chaotic patchwork.
With the advent of large models, it’s not that all problems are solved, but they accelerate the resolution of these issues.
- To address the data shortage, generative AI can generate large amounts of data based on real-world data, significantly accelerating the speed of simulation and modeling.
- Regarding the problem of insufficient scenarios, large models themselves have excellent generalization ability, or what we call common sense. These large models not only have common sense but can also learn these common-sense concepts through simulators. So, even when encountering situations that they’ve never faced before, they can solve the problems more easily.
- Large models have also driven the exploration of end-to-end solutions. By integrating all these models, a single input can generate a single output, and some rules can be used to ensure safety. So, these three major challenges are not fully solved, but they have been significantly alleviated, dramatically accelerating the development of autonomous driving.
Zhang Xiaojun: Elon Musk’s proposed end-to-end model is an important product that shows us the light at the end of the tunnel.
Zhang Yaqin: Yes, the end-to-end solution is something everyone is researching and advancing, and it will soon be ready for application. We’ve already started adopting it.
The emergence of large models has greatly accelerated the development of related technologies and completely changed the entire industry ecosystem. In the past, although we had similar ideas, we lacked sufficient knowledge and technological support. Now, with the advent of large models, we have the possibility to implement these ideas. Now, everyone is using the Transformer architecture. It wasn’t like this before—sometimes we used convolutional coding (Convolutional Code), other times we used recurrent neural networks (RNN), with complex and diverse algorithms. This is actually the difference between the previous generation of AI, the first and second generations of deep learning. Back then, different algorithms had to be used for different inputs, generating different outputs, which were then fused together. But now, regardless of the input, we use the same Transformer to handle it, based on token processing. This is true in both the autonomous driving field and the embodied intelligence field.
For robots, the problems they face are essentially similar to those of autonomous driving. However, in the past, robot data was much scarcer. Also, when robots were simulated in a digital space, reinforcement learning was typically used to set up an environment and an agent, letting the agent learn a strategy and apply it to the physical robot. However, in practical applications, it was often found that this strategy didn’t work well. Currently, the AIR (Institute for Intelligent Industry Research at Tsinghua University) is developing the RSR technology, which closely connects the real world with the physical world, accelerating the development of robotics technology. In other words, the results obtained from learning in the simulated world, in the digital space, can be better applied to reality, helping robot technology reach the real world faster.
Zhang Xiaojun: Is this called the world model?
Zhang Yaqin: You could say that it’s a world model. I call it RSR, which stands for Real to Sim to Real. Here, “Real” refers to things in the real world. We start with real scenarios, analyze them, and transform them into content for the digital world. This is the “Real to Sim” process, which is the simulation process. During the simulation phase, we can use various AI generation tools, such as new generative AI tools like Stable Diffusion, to create large amounts of content. With these tools, we tightly connect the digital world and the real world. This way, things learned in the digital space can be directly applied to reality, effectively solving the data shortage issue, just as in the autonomous driving field.
Zhang Xiaojun: It’s like accumulating a bunch of common sense and then feeding it into the system.
Zhang Yaqin: The third point is crucial: having a large model. In the past, one of the biggest challenges for robots was their inability to understand human intentions. They might be able to recognize words they hear, but they didn’t understand the underlying meaning and lacked basic common sense. Robots had no comprehension and struggled with reasoning. However, with the emergence of large models, the situation has changed drastically. It’s as if the large model acts as a “brain” that can command the robots at the front end. This is a huge leap, thanks to the common sense possessed by large models.
For example, with a home robot, I might give the command: “Take the dirty clothes down, have them cleaned at the dry cleaner’s, and then bring them back.” In the past, this task would have been extremely difficult for a robot. But now, it’s much easier because models like ChatGPT or GPT, after receiving the command, can understand its meaning, break it down into specific actions, and instruct the robot to carry them out.
Zhang Xiaojun: It’s like adding a “brain” to it.
Zhang Yaqin: Yes, now the robot has a “brain,” which, in the past, was the hardest thing for a robot to have. In the past, every task was a huge challenge for a robot. But now it’s different. We’ve achieved very natural human-robot interaction, where the large model can understand what I’m saying, break the task down, and command the robot to execute it. The applications for robots are incredibly diverse. Take, for example, a breakfast-making robot and a floor-cleaning robot; their tasks are completely different, so they cannot be generalized. But we hope that the robot’s “brain,” or the backend system, is universal, and only the front-end execution part differs.
In the past, a floor-cleaning robot would be a completely separate system from a cooking robot. Robots designed for washing clothes, welding components, quality inspection, or driving were all specialized and had no connection to one another. But now, the backend is essentially the same; it’s just that the front-end differs depending on the task. This is similar to how a person has multiple abilities. Once a person learns basic common sense and intelligence, they can learn to drive and to clean. No matter what they do, they are the same person and don’t need to change “brains” every time.
Zhang Xiaojun: Today, driving is a relatively well-defined scenario, but other tasks are still uncertain.
Zhang Yaqin: For example, I used to joke that a robot could drive but wouldn’t be able to open a door. Even if it could open the door, it wouldn’t know how to open the microwave, and even if it opened it, it wouldn’t know how to close it. These small detail issues, if each required a new algorithm to be set up, would be too cumbersome. In the early stages, people tried to solve these problems with various rules, but now it’s different. With large models, the robot learns instantly by observing. Once it learns, no matter what it encounters, it knows how to open it and how much force to use. For example, if it brings you tea, it knows the tea is hot and will remind you to be careful. It can handle things like this, which is what we call common sense. Where does this common sense come from? It comes from the large model.
There’s no need to teach it every detail. In the past, setting rules one by one was a lot of work! Now, with the large model, once the robot learns, it can generalize, just like a human. Once it learns, it can reason by itself. For example, if you tell the robot you’re hungry and want something to eat, and it knows you love bananas, it might go find a banana for you. If you’re ordering food, and it knows you don’t eat lamb, it won’t order lamb for you. It will have this common sense without needing to remember every detail. It’s like a human—it understands you and knows the basic common sense of life.
Zhang Xiaojun: What are the difficulties in deploying autonomous driving and embodied intelligence today?
Zhang Yaqin: Why do I think the information world is relatively simpler? In the information world, all you need is a phone or a computer to start engaging in related activities. However, the deployment cycles in the information world are often long, and there are many difficulties. Take autonomous vehicles as an example: in order for an autonomous vehicle to drive normally on the road, the first hurdle is policy and regulatory issues. Additionally, you need to solve how to coordinate with human-driven vehicles on the road. So, in these physical scenarios, you need to create a complete world model. But in the virtual world, you don’t have to worry about these issues—you just need to build a physical model, though that’s also not easy.
The situation with robots is similar. For example, if you use a wheeled robot to deliver takeout, how would it go up and down stairs? How would it open doors? These specific scenarios require more complex technology. In the process of deploying each scenario, these detailed issues will require more effort to solve. It’s like going to school: when attending class, books are enough. The large model in the information world is like reading a lot of books—it becomes very smart, can summarize various principles, and can express them. But when it comes to actual practice, it’s still necessary to do the real thing. As the saying goes, "Travel thousands of miles," to truly do something, you still need the corresponding skills.
For robots, the key is how to make the technology truly practical. How can we achieve good interaction between the digital world and the physical world? In different scenarios, the situation varies. Some scenarios might be relatively simple, while others may be more difficult.
Zhang Xiaojun: In the field of information intelligence, it seems like all the right companies are already involved, and the same goes for autonomous driving. Today, no one can just say, "I’m going to start an autonomous driving company."
Zhang Yaqin: It’s not accurate to say that information intelligence is simple. The field of information intelligence has many different directions and vertical subfields. For instance, some are focused on horizontal development, but within this area, there are also vertical scenarios like generating images or helping people write code. Currently, autonomous driving is in an explosive phase. Companies like Horizon Robotics, which specialize in chips, are gradually rising. They’ve already stepped up to the next level, gone public, and not only have products but also offer services, with the public starting to recognize them. In the next five years, these companies will have vast development prospects. However, to succeed in any project, it truly takes a considerable amount of time; achieving goals in 1–2 years is usually not feasible. In the field of robotics, especially humanoid robots, companies that are working on this may take 8 to 10 years to achieve results.
The reason I say that general artificial intelligence (AGI) in the physical world will be achieved within 10 years is that I believe by then, AGI will become the mainstream trend.
Zhang Xiaojun: Is it easier for autonomous vehicle companies to start making robots?
Zhang Yaqin: There are many similarities, but ultimately it depends on how you implement it. Overall, robots are a bit more complex and require more components. However, developing autonomous driving requires more precision. For example, some robots move slowly, and it doesn’t matter if they take their time—for instance, making breakfast. If it’s slow or fast, it’s not an issue. But with driving, if you make a mistake in just a millisecond, that’s unacceptable. The actual needs and requirements are different. So, the technology for each scenario varies. But the backend elements are essentially the same. First, you need a perception model—you need to sense the environment, understand it, and have action—vision, language, and action (the VLA model). This model is still applicable.
Biological Intelligence: Expanding the Brain, Extending Lifespan, and Creating New Species
Zhang Xiaojun: What about biological intelligence in the future?
Zhang Yaqin: Achieving biological intelligence will take longer because it requires connecting our brains to machines either through implantation or non-implantation methods. I am more optimistic about non-implantation methods, using better and more sensitive sensors and brain-machine interfaces. New technologies are developing rapidly.
Zhang Xiaojun: Will we see this in our lifetime?
Zhang Yaqin: Yes, I believe we can achieve it within 20 years. By then, the human brain will experience comprehensive expansion. This expansion will manifest in several dimensions. First, in terms of memory, we will have nearly unlimited storage capacity. This means we won't need to rely on our limited memory to remember things; storage devices will be able to record vast amounts of information, greatly reducing the burden on our brains. Second, the development of intelligence will make us smarter. Initially, this might be more applied in the fields of healthcare. For example, for the blind, technology could stimulate the corresponding nerve cells to help restore some vision; for the deaf, it could stimulate auditory nerves to improve hearing; for the disabled, it may help connect and repair damaged nerves and central brain functions to assist in restoring some physical abilities. Not only that, diseases like Alzheimer's in the elderly, and conditions such as ADHD and autism in children, may also be treated or improved with these technologies. As the technology develops, its applications will expand beyond healthcare, even improving the intelligence of normal people. However, achieving this series of advancements will require a long time and continuous technological accumulation. Along the way, we will inevitably face some issues. On one hand, ethical and moral issues will become increasingly prominent, such as how to ensure fairness, safety, and respect for human dignity in applying these technologies. On the other hand, there will be challenges regarding how silicon-based life (machines based on silicon) and carbon-based life (humans and other life forms based on carbon) will coexist. Because these technologies will no longer be external things unrelated to humans, but will become more closely connected to us, profoundly impacting our lives and development.
Zhang Xiaojun: Is it possible for us to leave our lives to the silicon world in our lifetime?
Zhang Yaqin: It is possible. This involves philosophical thinking and questions related to belief. I firmly believe that this possibility exists. As of now, I don’t think artificial intelligence has achieved consciousness, but I believe it is feasible to connect silicon-based and carbon-based life forms. Through this connection, humans will become smarter, healthier, and live longer. Whether we will achieve immortality, I don't know, but our lifespan will definitely be extended.
Zhang Xiaojun: So, you mean the lifespan extension will be for silicon-based life, not carbon-based life?
Zhang Yaqin: It can extend the lifespan of carbon-based life.
Zhang Xiaojun: How is that possible?
Zhang Yaqin: For example, in the future, our organs may be replaced, and the brain could become healthier through various technological means. Currently, many people die from diseases like cancer or age-related mental illnesses. But I firmly believe that, given time, these diseases could be cured, either through drug treatments or other advanced medical technologies. However, I am skeptical about artificial intelligence developing self-awareness or creating a new kind of soul.
Zhang Xiaojun: Why don’t you believe in that? Shouldn’t it be able to develop consciousness once it becomes smart enough?
Zhang Yaqin: Because we still don’t understand how human consciousness is produced, so I think this is more of a belief issue. But humans already have consciousness. Now, if we add something to our brains, it will make us smarter, healthier, and live longer—this is possible.
Zhang Xiaojun: Last year, I interviewed Vitalik Buterin, the co-founder of Ethereum. He shared an interesting perspective: he said that in the future, biological and silicon technologies will merge. He believes this is the only way for humans to participate in creating super-intelligence. Otherwise, it will just be a computer, smarter than us, dominating the world.
Zhang Yaqin: That’s another topic. What I want to say is that in the future, there will be a fusion of the digital, physical, and biological worlds, and silicon-based and carbon-based life will definitely merge. However, it is still carbon-based life that possesses consciousness. This is currently our understanding of carbon-based consciousness. As for silicon-based life, even with vast computational power and massive data, there is no connection to consciousness. Or, we can’t create something we don't yet understand, and for things we don’t understand, we can’t create them. What will happen in the future for humans? I think a new species will emerge. This new species will still be under human control, but its intelligence will be much higher, and its abilities will be far more powerful. It will be an extension of humans, but ultimately, it will be a new species. If we look back at humans from 30,000 years ago, do you think they were the same species as us? I think there’s a huge difference. Humans from 30,000 years ago were in the "ape-man" stage; they had begun using tools and had fire, but their tools were primitive. Today, we have smartphones, computers, and the internet. In another 100 years, future generations may look at our current phones, the internet, and personal computers and see them as primitive as the fire and stone tools from 30,000 years ago. And this speed of evolution is accelerating. The pace of human evolution is fascinating. From the hunting era to the agricultural society, the way humans lived didn’t change much for 2,000 to 3,000 years. But I think the real breakthrough happened in the last 300 years, after the industrial revolution. First, humans mastered energy, which gave us great power. Then, the arrival of the information society brought another massive transformation. And now, humans are experiencing rapid intellectual expansion. So, since the invention of the steam engine, human evolution no longer follows the natural evolutionary model described by Darwin. Today’s evolution is nonlinear, showing exponential growth and rapid speed. Therefore, I believe that in just 30 years, at most 100 years, future generations will look at us and see how simple these current things were. By then, the new species will be much smarter than humans, but it will still be controlled by humans, governed by our consciousness.
Zhang Xiaojun: It’s a combination of silicon and carbon; it might not just be carbon-based, right? Because relying solely on carbon, I think it would be hard to replace a species within 30 years.
Zhang Yaqin: That depends on how you define "species." For instance, when you think of apes, do you consider them a new species? From one perspective, they could be considered one, yet they are drastically different from us, right? The times are changing so rapidly that we no longer need 30,000 years like in the past. Perhaps in just 30 years, or at most 100 years, a completely new form will emerge.
Zhang Xiaojun: Can you imagine what society will be like 10 years from now?
Zhang Yaqin: Ten years from now, many of the things we imagine now will already be a reality. There will be many self-driving cars on the roads, right? You’ll see many autonomous vehicles and robots. The number of people who choose to drive themselves will drastically decrease, and many people will abandon personal driving. Just like now, many young people in big cities choose not to buy cars because they find using services like Uber more convenient. By then, more people may choose not to drive themselves. Thirty years from now, if you see someone driving a car, it will be as if someone were riding in a horse-drawn carriage today—it will be a novelty. If you go to New York now, you can still see horse-drawn carriages. It’s quite a novelty to take one. Thirty years from now, seeing someone driving a car will be just as unusual. In the future, people might even need special permission to drive a car. Thirty years ago, there were still attendants operating elevators. Now, do you see anyone in the elevator? No. But I later discovered that in the UK, having an attendant for such services is still quite novel.
Zhang Xiaojun: That’s a high-end service.
Zhang Yaqin: Exactly. So, in the end, driving a car may become a very special scene. It’s foreseeable that robots will truly enter thousands of households. I believe every household will have a robot, just like how refrigerators and televisions are now commonplace. These robots will be responsible for home security. When you’re not at home, they can check if the windows are closed and watch out for any intruders.
Zhang Xiaojun: Maybe in the future, the money spent on buying a car will be used to buy a robot instead?
Zhang Yaqin: Yes. I think all of this will happen, including major changes in fields like education and healthcare. AIR is already working on AI hospitals with no human attendants. Many of the attempts we are making now will become a reality in the future. However, humans themselves won’t undergo fundamental changes. I expect that in 30 years, the human lifespan will be greatly extended. In 30 years, while immortality is unlikely, living to 100 years may become common, and some people might live to 120 or even 150 years.
Zhang Xiaojun: Will this bring about structural changes in society? Can you predict that?
Zhang Yaqin: I can’t predict it exactly, but I believe it will inevitably lead to changes in the social structure. For example, in the future, people’s working hours will likely become shorter. After the industrial revolution ended, we used to work 7 days a week, then it became 6 days. When I was abroad, China still had a 6-day workweek, but by the time I returned, it had shifted to 5 days. In some places in Europe, the 4-day workweek is already being implemented. If this trend continues, perhaps in the future, people will only need to work 1 day a week. If that happens, people could spend the rest of their time doing things they are interested in. To clarify, this doesn’t mean some people will work 7 days while others are unemployed, leaving many without work. What I mean is, as social productivity improves dramatically, and as the social structure changes, society will become more aged, and with increased life expectancy, the natural birth rate will decrease. Since life expectancy will increase, the total population may not decline, but the number of newborns will decrease. This has already become a major trend. If you observe closely, you’ll see that populations in more developed countries are already in decline. What’s the current median age of the global population? It’s around 40 years, but in the future, it might reach 80 years. So, in that time, people who are 80 may still be considered young or middle-aged. Everything will change. Furthermore, the health of the elderly in the future might also be quite good, because both physical and brain diseases will gradually be cured.