Exclusive Interview with Manus Founder Xiao Hong: The Andy Beal Law of the New Era

Chatbots are evolving into intelligent agents, gradually overcoming traditional limitations by leveraging Agentic capabilities and technological advancements to offer smarter and more efficient task handling and user experiences.

Unlike AI companies such as DeepSeek, which focus on building foundational capabilities starting from large models, Manus AI has been dedicated solely to AI applications since day one.

The founder and CEO, Xiao Hong (English name: Red), was born in 1992. After graduating from Huazhong University of Science and Technology in 2015, his first entrepreneurial venture focused on mobile internet for B2B. In June 2022, he embarked on his second entrepreneurial journey in the AI field, which has now spanned nearly three years.

In the venture capital world, Xiao Hong is widely recognized as a “founder with a strong product sense” rather than a technical-background founder like Liang Wenfeng or Yang Zhiyin. So far, Manus' parent company, Butterfly Effect, has completed two rounds of funding, with a total scale exceeding $10 million. The first round was led by ZhenFund, which also backed Xiao Hong’s previous venture. Notably, ZhenFund reinvested all the profits earned from Xiao’s last project into this one. The second round attracted investors such as Sequoia China, Tencent, ZhenFund, and Wang Huiwen.

In 2023, with the help of its investors, the company brought on key team members, including Ji Yichao (former founder and CEO of Peak Labs, now Chief Scientist at Manus AI) and Zhang Tao (former product lead at Lightyear Beyond, now Product Lead at Manus AI).

Since last year, I have conducted multiple interviews with Xiao Hong at different stages, resulting in a “relay-style dialogue.” The capabilities of large models are evolving rapidly, requiring entrepreneurs to adapt with agility and flexibility to the shifting external environment. My goal is to document the thought process of an AI application founder navigating an era of technological disruption, where everything is unstable. What makes this journey so fascinating is its ever-changing nature—and the fact that it will continue to evolve.

The year 2025 may mark the dawn of an explosion in AI applications and intelligent agents (Agents). Manus has fired the first shot in China's Agent revolution. This interview captures cutting-edge insights from the frontline of the emerging "AI application boom" and "Agent boom."

This article primarily focuses on the thought process behind the creation of Manus' Agent product, presenting the founder's complete chain of reasoning.

Over the past two years, Xiao Hong has reflected on and summarized his experiences with large models, product development, and entrepreneurship. Key takeaways include:

  1. "The Andy Beal Law of the New Era": Model capabilities are spilling over, and AI application companies can harness them.
  2. An open-source entrepreneurial idea: Anticipate the next breakthrough capability, develop applications in that domain, and position yourself ahead of the curve, ready for when model capabilities improve.

In the interview, you can also glimpse the psychological state of an AI application founder navigating a time of technological upheaval, surrounded by industry giants, and operating on an unstable foundation.

The Andy Beal Law of the New Era

Zhang Xiaojun: Was this past Spring Festival your busiest one yet?
Xiao Hong: Quite busy. Of course, most people were on vacation, but there were still many discussions and work being done.

Zhang Xiaojun: What big plans were brewing during the Spring Festival?
Xiao Hong: We were rushing iterations for the Agent product I mentioned to you before. Unexpectedly, DeepSeek had another wave of viral, phenomenon-level publicity during the Spring Festival, which had a broad impact in various ways. We discussed it a lot and did plenty of related work.

Zhang Xiaojun: Can you elaborate on the Agent product you’re about to release?
Xiao Hong: I’d like to share an observation I recently made. Looking back, before ChatGPT, the most popular AI application was Jasper, followed by ChatGPT, then Monica (a browser plugin), and others like Doubao. Next, Cursor gained traction. What’s interesting is that AI applications are evolving rapidly, with new ones emerging each year.

I’ve been trying to summarize patterns and make predictions. Even though these are based on limited data points and may not be entirely accurate, humans are naturally inclined to identify patterns. When I analyze these applications as data points, some patterns start to emerge.

Take Jasper, for example. Many people may not have used it, but its product design allows users to write marketing content by filling in fields—who the audience is, what the theme is—and then it generates the output. ChatGPT introduced a conversational format, which is more intuitive than filling out forms.

From ChatGPT to Monica, we see applications that come with context. Doubao and Quark also fall into this category—not just chatbots, but chatbots enhanced with context. For instance, these tools can read on-screen content, emails, or other applications (with user authorization) and help users reply to emails. It’s no longer just a simple chatbot but one that brings contextual understanding.

Then we have Cursor. When it became popular, two main user groups emerged: engineers and people close to product managers, even those who aren’t necessarily in product management. For instance, someone managing a public WeChat account might use Cursor to analyze account data, which is clearly not an engineer’s need.

Some also use it as a chatbot—left side displays code, the right side acts as a co-pilot. They don’t look at the code directly; they rely on the chatbot to write code and fix issues, without manually editing the code themselves.

In a way, it’s still being used as a chatbot, but this chatbot is different from others in that it doesn’t just chat or understand context; it solves problems by writing code.

When I first saw Cursor, many teams seemed to be interpreting it as a tool for programming. That’s definitely needed since engineers are a large user group. But I personally see it as meeting general user needs.

So, what’s the pattern I’ve observed?

First, it aligns more closely with ordinary human habits. Moving from forms to conversational interfaces feels more natural. Adding context makes things even easier—for example, you no longer need to copy-paste content into ChatGPT; it already knows your context. Previously, users would copy-paste code into a Python script, run it, encounter bugs, report them back to ChatGPT, and then manually merge the corrected code. That’s tedious. Cursor simplifies this process beautifully.

One main thread is that these tools are becoming more aligned with human habits while becoming more powerful. This increased capability is tied to the spillover of LLM (large language model) capabilities.

Cursor, for instance, was founded quite early, around 2022 or even earlier. Initially, it wasn’t a code editor. Even when it became one, it didn’t gain traction right away. It wasn’t until July-August 2024, with the release of Claude 3.5 Sonnet, that it truly became well-known.

Zhang Xiaojun: So its product evolution was still driven by advancements in model capabilities.
Xiao Hong: Exactly. The core point I want to express is this: while model capabilities are evolving rapidly, the “shell” around them also needs to evolve. Each generation of model improvements doesn’t necessarily come from the original developer. Instead, third-party companies often present these advancements in a way that creates perceptible value for users.

Without Cursor, I believe Claude 3.5 Sonnet could still write code, but the experience wouldn’t have been as smooth. I’ve come to define this as “The Andy Beal Law of the New Era.”

In the PC and semiconductor industries, there was a law—whatever Andy Grove (Intel) created, Bill Gates (Microsoft) would capitalize on. Due to Moore’s Law, Intel’s costs would drop, and computing power would increase every 18 months. Simultaneously, every 18 months, Windows would absorb those advancements by offering more graphical or powerful features.

LLMs are continuously evolving. We’re seeing cheaper, more powerful models. Initially, these models excelled at basic writing, answering questions, and information retrieval. Now they’re using tools, writing code, and calling APIs. Recently, ChatGPT introduced Operator, allowing it to use a browser. These are clear signs of model capabilities spilling over.

What about the “shell”? While original developers define it to some extent, entrepreneurs also play a key role. Claude is a great example. Everyone knows Monica didn’t create its own foundational model. During the Spring Festival, I was reading semiconductor-related books, including Morris Chang’s autobiography. AMD founder Jerry Sanders once said, “Real men have fabs!”—a diss against companies with only design capabilities but no manufacturing capabilities.

But as Morris Chang pointed out, TSMC created two industries: professional chip manufacturers (like TSMC itself) and professional chip design companies. Without TSMC, the division between design and manufacturing wouldn’t exist.

Zhang Xiaojun: So, you’re a design company.
Xiao Hong: Exactly. Looking at industry trends, vertical integration usually comes first, followed by segmentation. From the beginning, we believed that models would eventually become commoditized. This is becoming more evident, though it’s still risky for those without this capability to draw conclusions. Models are advancing rapidly. On the surface, it may not seem like commoditization is happening because there are always more advanced players. But from a long-term perspective, I think it will.

Our company’s choice is to focus on applications since models are rapidly advancing, and there are many players. In the long run, rather than stagnation, I see multiple players achieving similar levels of performance. At that point, focusing solely on applications becomes simpler, as we don’t need to invest heavily in training models.

Of course, we have great respect for companies developing models. This wave of progress is largely due to their innovation and efforts. But it’s not a zero-sum game between “doing applications” and “doing models.” Even as model companies progress, there’s still a need for companies focused on user and product perspectives. It’s not an either-or situation.

We maintain good relationships with model companies.

Predict the Next Model Capability and Build Applications Ahead of Time

Zhang Xiaojun: Where do you think product definition stands today compared to model capabilities? Which one is stronger?

Xiao Hong: Based on the narrative so far, all breakthroughs are driven by the models—models are always ahead, leading the way.

You’ll notice that original developers, like ChatGPT, likely didn’t anticipate the scale of its success at the time of release. Similarly, when DeepSeek was launched, I believe its impact exceeded expectations—the original creators were largely unprepared.

On the other hand, specialized or application-focused companies often only find product-market fit (PMF) after the models are released. Take Cursor, for instance. Its product existed earlier but initially relied on OpenAI models.

Zhang Xiaojun: So essentially, they wait for model capabilities to improve?

Xiao Hong: Exactly!

Here’s an open startup idea: predict the next big capability, and build an application for it now. When that capability materializes, you’ll already be ahead of the game.

If you wait until the capability is fully realized to start building, it’s too late. Someone else—whether due to belief in the potential, deeper understanding of foundation models, or a focus on specific domains like programming—will have already taken that step.

This creates an interesting dynamic. It’s also becoming more challenging for venture capitalists to evaluate opportunities. A product built around capabilities that aren’t ready yet might seem clunky or unimpressive today. But as soon as the model catches up, it could suddenly become highly effective—and commercially successful.

Its growth trajectory is more like a leap than a gradual curve.

Zhang Xiaojun: Right now, model capabilities lead the way. When models are released, the original developers might not be fully prepared. As an application company, you can position yourself to capitalize on these advancements, leading to explosive growth.

But original developers control the models. They can quickly follow up and create their own products, similar to what Cursor has done. How do you see this competition playing out between the two?

Xiao Hong: Today, most domestic and international model companies have both their own applications and open platforms. They typically provide a chatbot and offer APIs for third parties to call on their capabilities.

The question here is: how do we understand AGI? It likely has a public good attribute, meaning original developers won’t aim to do everything themselves.

  1. They can’t cover everything. For example, tasks in specific industries usually fall outside their scope.
  2. They may avoid highly labor-intensive tasks. When I visited Google and saw people walking their dogs at 4:00 PM, I thought, “They probably won’t assign 100 engineers to compete on very specific applications.”

For them, it’s better to focus on securing the most valuable part of the ecosystem and leave other tasks to third parties.

Zhang Xiaojun: They can still monetize through APIs.

Xiao Hong: Exactly.

Zhang Xiaojun: So as an AI application founder, you shouldn’t focus on the most lucrative opportunities.

Xiao Hong: Correct. However, there may still be a window of opportunity. While original developers might eventually address these gaps, in the short term, they might not have the bandwidth to do so. For entrepreneurs, this presents a dilemma—some will choose to pursue these opportunities, while others will pass.

Opportunities during these windows require clear strategy. What level of success do you need to achieve within the window? How do you prepare for the next phase? These are much more complex questions than they appear.

To summarize the potential opportunities for API-based businesses:

  1. Vertical or specialized domains: Original developers likely won’t address these areas.
  2. Labor-intensive work: Tasks that require significant engineering effort may be left to others.
  3. Window of opportunity: Areas where original developers might step in later but haven’t yet.

The complexity lies in the uncertainty of whether the original developers will act. If you perform exceptionally well, they might not intervene at all. This isn’t something you can infer logically.

You can’t assume they’ll definitely act—or that they won’t. In some cases, a leading third-party solution may emerge, and the original developers might decide it’s unnecessary to compete.

The current landscape is highly flexible, with no fixed answers.

Zhang Xiaojun: On the flip side, there are areas the original developers are almost certain to tackle—like chatbots, which seem to be universally pursued.

OpenAI’s Missed Opportunity, DeepSeek’s Breakthrough as Humanity’s First Glimpse of Thought Processes

Zhang Xiaojun: Why is everyone rushing to make chatbots? What’s the competition about?

Xiao Hong: I wouldn’t frame it as a competition. It’s more that chatbots align closely with our vision of AGI—a conversational interface capable of handling everything.

It seems insufficient to simply create a model. At the very least, you need a chatbot. Interestingly, among all the players, DeepSeek appears the least driven to build a chatbot. Yet, to date, it has achieved the best results.

Zhang Xiaojun: Why is that?

Xiao Hong: When I say “least driven,” I mean that DeepSeek only launched its own app in December, even though it had a web version earlier. If you look at the app, it’s essentially a “bare-bones” shell—just the simplest possible interface to showcase the model’s capabilities.

Zhang Xiaojun: A shell built around its own model?

Xiao Hong: Exactly. But here’s another perspective: if DeepSeek hadn’t done this, its influence and spread wouldn’t have been so significant. Many users were able to experience its app, see the thought process unfold, and enjoy a massive improvement in user experience. This led to widespread attention—though none of it seemed premeditated.

This is a complex matter, with many contributing factors, including geopolitical contexts between China and the U.S., as well as the open-source versus closed-source debate.

What struck me the most is how this incident has spiritually encouraged people. From an outsider’s perspective, perhaps this wasn’t DeepSeek’s intention, but it’s clear that their approach—being themselves—has resonated deeply.

DeepSeek has always adhered to its own pace. It was committed to open-sourcing even before gaining traction, simply doing things its way.

I recall a conversation we had over WeChat, where you asked me what I would do if I were running another foundation model company. After some thought, I realized that the most important thing is to stay true to yourself, rather than react impulsively to external pressures.

Of course, the founders of these foundation model companies have achieved far more than I have. But hypothetically, if I were managing such a company, my priority would still be sticking to my own rhythm. You’ll notice that simply excelling in technology often brings immense rewards—even though this is not always replicable.

Zhang Xiaojun: Could you share your thoughts on the DeepSeek product?

Xiao Hong: Perplexity’s CEO once tweeted that there are two significant innovations in user experience during the AI era:

  1. Highlighting the source of a statement, which enhances trust in the results.
  2. Displaying the reasoning process of an LLM (large language model).

Setting aside open-source or technical metrics, I’ve chatted with friends back home who feel the standout difference with DeepSeek lies in its ability to show its thought process. This is a breakthrough in user experience.

OpenAI’s o1 model also has reasoning capabilities, but OpenAI missed an opportunity—DeepSeek’s display of thought processes might be the first time humanity has truly seen this. Why? Because OpenAI’s o1 comes with a paywall.

Only recently did OpenAI start displaying its reasoning process fully, and even then, it’s simplified. OpenAI hesitated because they feared others might use the displayed data for their own model training. As a result, their approach seemed uninspiring.

  1. First, OpenAI’s paywall creates a barrier—many people are unaware of these features.
  2. Second, their partial display of reasoning falls short of full transparency, making the experience less impactful.

In contrast, DeepSeek’s full display of reasoning marks a significant leap in user experience. Plus, it’s internet-connected, while OpenAI’s o1 wasn’t.

Additionally, DeepSeek’s model quality is excellent. Users accustomed to average models found themselves encountering top-tier experiences with DeepSeek. Its articles are much better, its conversations more emotionally intelligent, and its overall innovation highly perceptible without being obscured by barriers.

Zhang Xiaojun: OpenAI defined product iterations in two stages:

  1. Chatbot: Through ChatGPT, it established the user paradigm for chatbots.
  2. Reasoner: It developed o1, but users didn’t fully grasp the connection between the technology and the product. Instead, DeepSeek demonstrated the power of a reasoner.

Also, we recently compared DeepSeek-R1 and Kimi K1.5 papers during a podcast. Kimi deliberately shortened its responses for a concise user experience, whereas DeepSeek’s outputs are longer.

People used to think shorter outputs were better from a user experience perspective, but DeepSeek’s approach is counterintuitive—its longer outputs, coupled with visible reasoning, resonate more with users.

Xiao Hong: Exactly. When we recently worked on our own agent product, we noticed a similar trend.

We had to teach it: “Don’t overuse bullet points. Don’t summarize too much. Just write everything out in full.” It’s fascinating. Initially, we thought being concise was preferable. But as users grew accustomed to chatbots’ brief outputs, they started asking for more details and explanations.

Zhang Xiaojun: So, in summary, DeepSeek excels not only in technology but also in product definition, right?

Xiao Hong: I would phrase it this way: DeepSeek effectively translates its technological innovations into user-perceivable features. This goes beyond product design—it includes strategies like offering free access, which might traditionally be seen as a business or operational tactic rather than a product strategy.

OpenAI’s hesitation to fully display o1’s reasoning seems less user-focused—it’s more about concerns over competition. DeepSeek, on the other hand, operates with a refreshing simplicity: “Here’s what we’ve built; try it out.”

Zhang Xiaojun: What are your thoughts on DeepSeek’s longer-term strategic positioning?

Xiao Hong: I’m not sure. DeepSeek might still be deciding whether to focus primarily on open-sourcing its models or transforming itself into a major consumer-facing product like OpenAI.

I’m uncertain about their internal deliberations, but they’re undeniably in a great position today.

Whether DeepSeek chooses to create a super app as its ultimate goal often depends on the decisions of its leadership. For instance, if OpenAI had Ilya instead of Sam Altman as CEO, ChatGPT might not have become what it is today.

These dynamics are fluid.

Regardless, DeepSeek’s model advancements and the global reception of its app—while perhaps not replicable—are undeniably inspiring.

There should be a virtual machine

Zhang Xiaojun: We were just talking about what original manufacturers will do. It seems that all original manufacturers are focusing on creating Chatbots.

Xiao Hong: Of course, Chatbots themselves are still rapidly evolving. If you look at human imagination about the future, the so-called AI Assistant is something where you just say a sentence, and it helps you do things or find information. This aligns with human expectations, and original manufacturers are definitely going to work on this.

Zhang Xiaojun: What about application companies?

Xiao Hong: Well, there’s some debate. Some investors or founders have this viewpoint: "Okay, Chatbots are definitely something original manufacturers will make, so we won’t make them and we’ll avoid it." I’m not that pessimistic. Today’s technology is still developing quickly, so is it too early to say it can’t be done? Honestly, we’ve recently launched an Agent product, and it might look like just a Chatbot, which fits the general imagination. But what it does on the application side is much more complex. And this complexity isn’t like Monica, which has a lot of "functions." Using these models well is quite complicated—so I think it’s worth trying. It’s a bit like I just mentioned, the third type of product on an open platform: even if there’s a window of opportunity, it’s worth trying.

Zhang Xiaojun: The definition of an Agent is "a large language model that interacts with the external world." Is it fundamentally different from the products we’ve seen before?

Xiao Hong: The concept of an Agent existed in 2023, but it hasn’t yet fully evolved into a product that people can really experience. Maybe Zhu Xiaohu would say, "I don’t believe in Agents," just like he once said, "I don’t believe in AGI." Maybe by this time next year, he’ll say, "I believe in Agents!" (laughs) I hope so. An Agent can sense the environment and perform tasks autonomously. You give it an abstract sentence, and it can complete the task. In recent years, the main reason Agent products have fallen short of expectations is that the models aren’t smart enough to automatically handle many tasks. Secondly, when we were developing Monica, we said, "Okay, it can perform tasks, so we’ll connect it to some APIs one by one." Just having a language model isn’t enough; it needs to be able to search the web, so we’ll connect it to a search API. It needs to read knowledge bases, so we created a knowledge base where users can upload documents and search it for answers. It needs to create PPTs or draw pictures, so we worked on integrating various APIs. But do you notice? This approach feels like building a feature phone—each feature stacked on top of the other.

Zhang Xiaojun: Stacking features.

Xiao Hong: So when I was developing Monica, honestly, even though it integrated many things and took a lot of time, it felt like we were creating a feature phone, a basic phone—just connecting APIs one by one. But a real Agent should be able to write code, call APIs, and execute tasks autonomously. It can handle many long-tail tasks without requiring developers to write anything—that’s the true vision for an Agent. I remember a senior person, Bai Ya (founder of Youzan), once told me, "Red, perfection isn’t enough. Personalization is what matters. When you achieve perfection, you’re like hao123. When you achieve personalization, you’re like Google." This is a very enlightening statement, and we’ve spent a lot of time thinking about it.

When I see companies like Cursor, which are making it easier for people to write code, it’s becoming more mainstream, not just for engineers. Today, you interview many engineers, and they can point out flaws. The more experienced the engineer, the more flaws they’ll find. But for beginners, it’s becoming more enjoyable to use. I also remember that when Cursor came out, there was a competitor called Windsurf. Windsurf had one feature that was different from Cursor. But Cursor later added a YOLO mode, which is a very interesting name, Y-O-L-O. It’s something young people might say. It stands for "You Only Look Once"—it means that you only need to see the process once, and it will take care of everything for you. So what’s the difference from before? Previously, when you wrote code in Python, if your computer didn’t have a certain library installed, it would throw an error. With YOLO mode, if there’s an error, it automatically sends it to the language model to fix it. One day, I was using Windsurf’s YOLO mode, and I gave it a task. It said, "Okay, I’ll go to Github to download this code and do something." Then it went ahead and wrote the code. At that moment, I felt like I was struck by lightning! It was using tools, and it was using human tools, like the code available on Github (which theoretically has any kind of code). It can create and use all tools. That’s when I truly felt the Agent era had arrived.

Zhang Xiaojun: Will original manufacturers or application companies work on Agents?

Xiao Hong: Original manufacturers are worth doing it, but it seems they’re not doing it well enough. (laughs) So, I think an Agent should be able to solve long-tail needs and call various tools. The best tools to call are existing code and code that it generates itself, and it uses APIs to get things done. But that’s not enough. Humans still have a lot of knowledge and services that can’t be accessed through APIs but must be accessed through the web. In overseas markets, I think a browser is still necessary. There’s another fundamental difference: when I used Windsurf, it ran on my computer, but sometimes it would ask me to confirm if I wanted to install a certain library or if I wanted to perform a command-line operation by answering yes or no. It might actually mess up my computer or cause some conflict. But it felt like it was shifting the responsibility onto me. If I’m a beginner, how would I know whether to answer yes or no? But it asks me to say yes, and it’s like the responsibility is now mine if something goes wrong. A few years ago, someone asked Bill Gates, "Why does Windows ask me if opening this might harm my computer—yes or no?" Microsoft doesn’t know, so how can a regular user like me know? So when I saw that, I thought, "There’s no ‘You Only Look Once,’ you still have to confirm with a yes." For a beginner or regular user, they wouldn’t understand. So I think, "Okay, it’s still not enough. There should be a virtual machine." The Chatbot should run on a cloud-based computer, where it can write code and browse the web. Because it’s a virtual server, if it breaks, it doesn’t matter—it can create a new one. It can even release the server after completing the task. So I believe the architecture should be a virtual server, a browser, and the ability to write code and call APIs. This would enable it to handle all kinds of long-tail tasks, and this is what we’re working on.

Zhang Xiaojun: Is the model capability ready now?

Xiao Hong: Today, it’s just about right. We only realized it towards the end of last year. Everything is starting to connect. For example, watching Cursor, seeing it use code from Github, and going back to what I said about the development process—from Jasper to ChatGPT, to Monica, to Cursor, what’s next? You’ll notice that it’s consuming more and more tokens, and it’s getting more advanced, right? For example, it can write code that exceeds the abilities of a normal person, but AI can do it. Also, it can run multi-step iterations. You give it a task, it might encounter problems, but it will try again and keep trying to solve them. Today’s models are good enough that we’ve created a basic version that works. But it’s still not smart enough. We saw that when ChatGPT released Operator and similar tools, the completion rate of many tasks was still not high. Models still need to be smarter. So, I’m still expecting manufacturers’ models to be smarter and cheaper!

Application companies are ruthlessly consuming huge amounts of tokens, eating up tons of tokens.

It should be asynchronous

Zhang Xiaojun: Does your product resemble Cursor?

Xiao Hong: It should not resemble it.

Zhang Xiaojun: Is it a programming-related product?

Xiao Hong: No, no, it's not about programming. It should feel like a chatbot to the consumer. The consumer experience is key.

Zhang Xiaojun: Will it evolve from Monica?

Xiao Hong: It will be released as a new product. You can't imagine Hao123 and Baidu being the same product. (laughs)

Zhang Xiaojun: It’s still a chatbot-like product, but with many iterations in the product?

Xiao Hong: Yes.

Zhang Xiaojun: Does it resemble DeepSeek?

Xiao Hong: Reasoning, showing the thought process, is essential. We might call it a "Planner." For example, if you give it a question, it breaks it down into several steps, completing each step before solving the final task. This introduces two key aspects: one is the model, and the other is the experience.

Let me talk about the experience first. It should be asynchronous. Today, all chatbots operate synchronously — you send a message, and it replies instantly. However, during its reply, if you send another message, it may interrupt the previous conversation. This is not how human communication works. When you message someone, they might reply after a while. During that time, you might send two or three more messages, and the person would respond to multiple topics at once. Or if someone is working on something, and they realize they made a mistake, they might tell you they are starting over. Human communication involves many branches, and tasks sometimes take time.

All current chatbots are limited by this synchronous A-B-A-B model. But humans don’t communicate like this. You ask a question, and the other person doesn’t always reply immediately. They might take time to research or think about the answer. So, current chatbots, which try to complete everything in a single step, face many limitations.

Zhang Xiaojun: Your product would allow me to give it a task, and it would reply after a period of time?

Xiao Hong: It will tell you what steps it needs to take. Then, it will carry out those tasks, keeping you updated with progress — this is our ideal of the best intern. If you ask it to do something, it will say, "Okay, I’ll do it this way," and after each step, it will update you. If you realize your original request was wrong, you can ask it to adjust, and it will make the change and keep working until the final result is achieved. It’s more human-like.

Zhang Xiaojun: Today’s products can respond instantly. This new product must handle more complex demands for me to give it more time. How can it complete these more complex tasks?

Xiao Hong: This is what I want to talk about. Today’s models can handle complex tasks that require multiple steps to complete. However, no such product has been released yet, so people don’t feel the capabilities. Let me give an example. When ChatGPT launched Deep Research, there was a test set called GAIA. We were working on this topic internally. One question was about a specific moment in a YouTube video — what animals were visible in the frame at that time? We were amazed to find that our agent opened YouTube, watched the video, and used keyboard shortcuts to navigate to that specific second, then told us which animals appeared. This process is very different from a traditional chatbot. First, it watched the YouTube video instead of reading subtitles. Second, we were even surprised that it used YouTube’s shortcuts to answer the question.

Zhang Xiaojun: From the perspective of a beginner user, what kind of task can I give it? How long does it take to fulfill?

Xiao Hong: For example, if you want to analyze patterns in Elon Musk's Twitter posts, you can ask it to do so. It might call the Twitter API, retrieve all the data, and perform semantic analysis to give you a reasonable output. These are relatively advanced tasks. People don’t typically use chatbots for such things, but today many tasks can already be completed. Information retrieval and queries will still be the most common tasks. Whether you use a chatbot for search or chatting, those needs won’t disappear. They will coexist with Agent products, which expand the boundaries of chatbot usage, aligning with people’s expectations.

Zhang Xiaojun: If Chatbot evolves further and starts doing things like this, will we need a second or third entry point for it? Should we have a dedicated asynchronous tool?

Xiao Hong: The reason we’re launching a new product is because we felt the need for it. Monica is a product with many users, and it inevitably retains habits from earlier users. You can’t just start from scratch. A newer product without those burdens is better suited to adapt to these new possibilities.

Zhang Xiaojun: What do you expect the usage frequency of this product to be?

Xiao Hong: I don’t know. We haven’t released it yet. (laughs)

Zhang Xiaojun: How did the idea for this product come about?

Xiao Hong: The process was a reflection on the evolution from Jasper to ChatGPT, from Monica to Cursor to Devin. Devin aligns perfectly with the architecture I just described — it uses the most hardcore engineers, but I prefer a general-purpose approach, not focusing on a specific industry. For example, it’s not meant just for engineers. This architecture fits my vision for an agent, and it should be for regular users, not for engineers. It shouldn’t be priced like Devin, at $500. It’s similar to how OpenAI priced their products. (laughs)

Zhang Xiaojun: Once a company starts charging fees, it can always bring down the price to attract more users.

Xiao Hong: Pricing is part of the positioning. What is the positioning? We believe it’s a consumer-grade product, a mass-market product, so the pricing should reflect that. At least for entry-level pricing, right? As usage grows, users can pay more because of the increased costs. But the base pricing determines whether you’re a consumer-grade or enterprise-grade product. This is also important for model vendors. Right now, only Claude Sonnet 3.5 from Anthropic can run the architecture we just discussed — internally, we call this "Agentic" capability. Traditional chatbots focus on trying to resolve everything in one conversation. However, we found that only Claude Sonnet 3.5 has the long-term planning and step-by-step problem-solving capabilities. Long-term planning means when you give it a task, it will lay out a plan in steps, complete each step, and move to the next. This ability is what differentiates it from traditional chatbots that aim to resolve everything in one go. This is a result of different training approaches. So, model vendors need to train their models specifically for Agentic capabilities.

Zhang Xiaojun: At the beginning of the year, everyone was talking about Agent, Agent, Agent, but now, it seems the hot topic is Reasoner. (laughs)

Xiao Hong: Yes, but it’s too late to chase after it once it’s already popular. You must Be Yourself and set your own pace — reacting too late is not ideal.

Zhang Xiaojun: I imagine that by Q1 this year, we’ll see your product. It will be an app, right?

Xiao Hong: There will be a web version and an app. We might start with some small-scale testing. One of the key roles of a product manager is to manage user expectations. If we claim the product can do everything — like "How can I make $1 million?" — that’s not something an agent should be responsible for. But if we provide more concrete examples, we can align expectations and make the experience smoother for users.

Source:https://mp.weixin.qq.com/s/FI19Sc6GA8eXC4zzbdMTlg