Author|Large Model Task Force Email|damoxingjidongzu@pingwest.com
Ethan Mollick, a Wharton School professor and leading AI researcher, engaged in a frank and extensive conversation with Joel Hellermark, founder and CEO of Sana, about the rapidly changing world of AI in the workplace. They explored how AI is more than just an efficiency tool, but a turning point – forcing companies to choose between incremental optimization and transformational scale. The discussion covered the roots of machine intelligence, the relevance of AGI, and how to build an organization for an AI-native future from the ground up.
Below is the full transcript of the conversation:
Joel Hellermark: I'd like to start from the beginning. What were your thoughts when you were at MIT with Marvin Minsky and others?
Ethan Mollick: That period was a bit like stealing technological glamor, because I wasn't the one programming with Marvin. I was a business school student trying to help people in the AI field explain what AI was to others. So I worked a lot with Marvin and others at the Media Lab on that.
What was particularly interesting was that it was in the middle of an AI winter, so people weren't paying much attention to AI at the time, and they were all thinking about how to create complex solutions for intelligence. There were projects at the time that observed all the behaviors of babies, believing that this might allow us to create AI, and there was the government's “Society of Mind” project, all sorts of complex interconnected content. It's ironic that the actual solution eventually turned out to be feeding massive amounts of language into learning systems, and then large language models emerged.
Joel Hellermark: That's interesting, because many technological ideas eventually proved to be wrong. However, some of the core philosophical ideas are now back in vogue. Minsky and Engelbart advocated for augmenting human intelligence. Minsky was more inclined to replace human intelligence with machines, making machines conscious. What foundational ideas about how to apply AI do you think are still relevant today?
Ethan Mollick: We are all grappling with this now, because we have seen these results and have returned to the question of “augmentation.” Two weeks ago, a new paper pointed out that GPT-4.5 has already been able to pass a three-party Turing test. In fact, in 70% of cases, people would mistake AI for a human in the room. I don't know what that means, but it's better than random guessing.
I think we are facing some issues that thinkers have long worried about: Will AI replace humans? How should we use it? For augmenting human intelligence, what exactly does augmentation look like? This has become a big question. For example, our discussion here, I don't think it was deeply explored before, because it was somewhat fictional. So how should we treat these very intelligent but also limited machines? What role should humans play in this equation? I feel this question has never been answered before, but now it has suddenly become very important.
Joel Hellermark: The Turing test was a great idea at the time. But if we were to design a new “Mollick Test” now, what do you think the “Mollick Test” for AGI should be like?
Ethan Mollick: I've always been confused by the concept of AGI; it's very vaguely defined. The Turing test is interesting for the same reason other tests are great: they're excellent when we have nothing to test. For example, the Turing test was brilliant when computers clearly couldn't pass it. We also face some problems, like AI performing excellently in all our existing creativity tests, but these tests were designed for humans and are only average for humans.
Now we expect AI to pass these tests to judge if someone has empathy. In social sciences, the best test is the “mind-reading test,” where we show people a bunch of eye pictures and ask them to state the emotion of the person in the picture. None of these tests were designed for AI. So I often ponder this question, and I tend to look at it from a practical application perspective.
First, everyone has their own test standard for AGI. I'm a business school professor, and for me, one of the simplest test standards is whether this agent can make money and get things done in the real world. As a useful test, can it discover new knowledge, verify it, and produce results? But I think we're starting to realize that AGI will be a phase we are in, rather than a specific point in time, and there won't be fireworks to announce its arrival. Tyler Cowen said GPT-3 is AGI. When asked why, he said it's like pornography, you know it when you see it. So we don't know the answers to these questions, and we're even starting to realize that these questions are meaningless.
Ethan Mollick: Because it turns out, as you've learned, if AI is properly connected to systems and company processes, the results are much better than the sum of their parts. This is completely different from just having conversations, like making strategic decisions.
Joel Hellermark: When these models are released, they are always tested on the most hardcore math and science problems, rarely involving more business applications. If you were to define a benchmark that focuses more on practical company applications, what would it look like?
Ethan Mollick: I think this is one of the most critical issues we face right now. Because the people in labs are scientists, and they think the only meaningful thing in life is programming, plus they want to use AI to develop better AI. So programming and math become important skills, followed by biology, because they all want to be immortal, and that's how this trend formed.
But in other areas, there are almost no benchmarks. We know AI companies develop models for benchmarks and adopt some unreliable methods to optimize models, but they also use these methods for testing. So the lack of good business benchmarks is a real problem, and I actually advocate that companies do this themselves to some extent.
Some can be based on specific data, like how often it makes mistakes when processing accounting procedures. But some can be based on subjective feelings, as they say, you can invite external experts to evaluate the quality of the answers and see if they are as good as what a human does. Set your own Turing tests for each important part of your work. Is the analysis report good enough? What is the error rate? If you use it to provide strategic advice, what is the quality of the decision options? These questions are not difficult to measure, nor are they highly specialized, but they do require some effort.
Joel Hellermark: I think products are also very lacking in this regard, especially when deploying agents. The ability to test these agents, understand what knowledge they have and lack, correct them, and run test sets is very limited. When we consider designing an AI-first organization, assuming you have a thousand-person company, how would you restructure it to be fully evaluable?
Ethan Mollick: First, redesigning an organization to be AI-centric is not easy, because it wasn't originally built that way. We are in a very interesting phase. For hundreds of years, organizational development has paralleled the Industrial Revolution and the communication revolution. The first organizational chart was drawn in 1855 for the New York and Erie Railroad, solving an unprecedented problem: how to use telegraphs to coordinate large amounts of transport on railway lines in real time. The founder of McKinsey came up with this organizational chart solution, which we still use today. Later there were many other significant organizational innovations, such as Henry Ford's production line, the time clock system, which we still use, and Agile development models.
All these models are predicated on there being only one form of intelligence: human intelligence. Human capacity is limited; the span of control is usually five to seven people, which is called the “two pizza rule.” But now the situation is different, and we need to rebuild from scratch. I'm a bit worried that modern Western companies have given up on organizational innovation.
In the past, Dow Chemical or IBM won by coming up with new sales methods or new ways of collaborating with other organizations. But now, we outsource all of that. Enterprise software companies will tell you how to structure your company because Salesforce will sell you their products and teach you how to sell, and large consulting firms will tell you how to operate your organization. Now is the time when leaders truly need to innovate.
Returning to the original question, redesigning an organization requires considering a trend where the need for humans in products will decrease. Then you have to choose whether to augment human capabilities or replace humans, and then build the system from that perspective. Is it to have fewer people doing better work, or to have more people doing more work, and together conquer the world?
Joel Hellermark: Does this mean we will reduce the number of so-called “super employees”? Or will we double everyone's productivity? Is it about forming small teams to oversee the operation of agents, significantly increasing productivity, or widely deploying them throughout the organization, allowing a few people to gain more improvement?
Ethan Mollick: I think these are critical choices. One thing I'm worried about is that, based on early applications, people view AI as an efficiency-boosting technology. I'm partly responsible for this; our earliest research focused on the productivity gains brought by AI, and I still focus on that because it's important. But I'm very concerned that on the cusp of the Industrial Revolution, or this new revolution, companies are treating AI as ordinary technology.
For example, if they use AI to improve customer service efficiency by 25% or save costs, they'll lay off 25% of their employees. I often hear about this, and it carries many risks. One risk is that, apart from yourself, no one knows how to deploy AI in your organization. You can develop very useful tools and technologies, but ultimately, the people in the company have to judge if they are useful. They have relevant experience and evidence to make that judgment. If they are afraid to try AI because they'll be fired, punished, or replaced for using it, then even if AI can improve efficiency, they won't let you see it.
Another issue is that if we are on the verge of a productivity explosion, it would be unwise to shrink the organization to its minimum size at this time. Imagine the Industrial Revolution in the early 19th century: if a local brewer got steam power, he could choose to lay off most of his employees and increase profit per barrel of ale; or he could emulate Guinness, hire 100,000 people, and expand globally. I am indeed worried that too many people are choosing the small-scale path instead of the large-scale one.
Joel Hellermark: You have always advocated for augmenting human capabilities, just like we used to say “mind bicycles.” Now we might have “mind airplanes” to some extent. How do you think AI will augment human intelligence? This is somewhat different from our previous understanding. We used to think that AI would start with monotonous, repetitive tasks, then move to knowledge work, programming, and finally creative tasks. But in reality, the situation is almost the opposite: AI performs well in creative tasks and knowledge work, but monotonous, repetitive tasks are very difficult to automate. How do you think we should apply AI?
Ethan Mollick: It's very interesting that when you try to explain the concept of “love” to AI, it might “crash” and be unable to understand. But now we have some strange systems that are very emotional and need to be persuaded to do things. For example, in prompt engineering, sometimes you have to explain to the AI why it should do a certain step, instead of just directly ordering it. You have to tell it, “This is important, you should do this.” It's very strange.
Speaking of augmenting human capabilities, our work consists of many different tasks; no one would design a job the way it is now. For example, as a professor, I do many things: I have to be a good teacher, come up with good ideas, communicate with you, do research, manage academic departments. Many of these tasks can be given to AI. I don't mind letting AI grade homework if it helps. I also don't mind providing more consulting support through AI if it helps.
So augmenting human capabilities doesn't mean that because AI can perform creative, knowledge-based tasks, it's better than humans in those areas. At least for now, it hasn't reached expert level in those areas. The things you are best at, you might do better than AI. So, the first step to augmenting human capabilities is to hand over the parts of your job you're not good at to AI. The second step is to use AI to improve what you are already doing. We are starting to have some evidence to support this as well.
Joel Hellermark: What happens when these systems become more proactive rather than passive? Currently, we rely heavily on feeding information to systems, getting feedback from them, and prompting them. At some point, we should have systems that are better at asking questions than we are, and they can actively serve us. Taking your field as an example, has it ever happened that a system completed all your research for you and then said, “Ethan, these align with your research direction, I've written five papers, choose the best one”?
Ethan Mollick: Several points you mentioned are very important. One of them, though relatively minor, is also critical: the situation where the system provides me with ten papers. The problem we face now is information richness but also excess. We are not yet accustomed to easily obtaining large amounts of information and being able to filter it. So, the ability to filter information becomes very important, being able to select appropriate content from many options. This is somewhat like management ability; after all, many people aspire to have management ability. The key is how to guide the system in the direction we want it to go.
But ultimately, we are unsure how good these systems can become, and every question depends on your expectations for AI development. If AI can perform all the work in our organization at a high level, such as my work as a professor, then we enter unknown territory, and I don't know the answer. I think actual organizational operations are much more complex than we imagine, and they don't always pursue efficiency.
AI's capabilities also have limitations; it might not be able to complete an entire paper because certain parts will fail. But if I have experience, I can know where it will fail and intervene and adjust, just like advising a Ph.D. student. So I think for a long time, we will still need to provide direction and guidance, and autonomy will still be limited.
Joel Hellermark: I think the limitations of AI's capabilities might be the most lacking aspect in current organizational applications. Communicating with the system is very confusing; sometimes it performs brilliantly, sometimes it's very foolish. This also makes it very difficult to deploy AI independently in organizations. It's a bit like autonomous cars; deployment has taken a long time because they perform superhumanly in some applications but encounter problems in other situations. What do you think the application of independent agents will be like? Will it be hindered by capability limitations in the next decade, or will we quickly trust these systems?
Ethan Mollick: I think domain-specific agents are already doing quite well. For example, the deep research agents launched by Google, OpenAI, and X, while also confusing, are excellent. They do a great job of finding information and providing answers for specific tasks, which is very valuable work. However, they are not yet perfect; for example, they cannot access the private data people need to fully use these systems. But they have started to perform very well in areas such as legal research, accounting, market research, and financial research, so assigning some complex, specific tasks to domain-specific agents is feasible.
I think some clever methods could allow agents to supervise each other, but no one is strongly pushing for that yet. We're only just beginning to engage with AI, and there are two issues to consider. One is the capability boundary, and my concept of a “jagged frontier” means this boundary is always expanding outwards but unevenly. Some shortcomings will exist for a while, but as AI's overall capabilities improve, even if it performs poorly in some aspects, it will still be stronger than humans. So the question is, do you wait for the frontier to expand before solving problems, or do you improve around these shortcomings now? I think both need to be done. But if you focus too much on solving shortcomings now, as models continue to improve, you might eventually be constrained by systems built on old shortcomings.
Joel Hellermark: That makes a lot of sense. One challenge organizations face is discovering AI application scenarios. Some organizations adopt a bottom-up strategy, where most members of the organization are already using AI tools to some extent but not telling leadership. Other organizations adopt a top-down strategy, for example, by formulating an AI strategy. How do you think these application scenarios should be discovered within an organization? What strategies are there?
Ethan Mollick: I think for AI to work in an organization, three elements are needed: leadership, grassroots adoption, and R&D investment. I'll elaborate on leadership later.
That is to say, organizations need to start from the CEO and senior management to think about some fundamental questions, such as what is our organization's business? What do we want it to become? What experiments do we want to conduct in organizational forms? If these questions are not answered, the incentive mechanisms for organizational members cannot be set correctly. Everyone in the company wants to know what their daily work will be like if they work with intelligent agents, so these must be clearly defined by the leadership. One current problem is that senior leaders are not using these systems enough; you can see that organizations that use them well spread them much faster.
For example, JPMorgan Chase publicly states that they are using AI, and this practice is gradually spreading, which is one reason why JPMorgan Chase performs so well in AI applications. There needs to be leadership push, and also a grassroots base, allowing everyone to use these tools in some way. Furthermore, incentive mechanisms need to be established for them to share their experiences. People don't share their AI usage experiences for many reasons, such as thinking they are smart and not wanting others to know; worrying that efficiency improvements will lead to layoffs; work becoming easier and not wanting to share the extra value with the company; having good ideas but not wanting to risk sharing them. So, organizational members need to be willing to share.
Next, to convert these individual experiences into products and intelligent agents, actual R&D work is needed. This doesn't just mean programming; tool development is also important. The key is how to experiment, how to turn simple prompts into intelligent agent systems, and how to benchmark these systems. These three elements are indispensable.
Joel Hellermark: Over the past year, you've done a lot of research involving AI applications in team collaboration, consulting assistance, and more. Which application scenarios do you think are already delivering meaningful value?
Ethan Mollick: The situation is quite clear now. Some tasks, like corporate social responsibility work, are still difficult for AI to handle. In terms of replacing and augmenting direct human interaction, the results are clear. Individual collaboration with AI, especially when people can share information, is very useful for idea generation; it can help you produce better ideas. Different methods have different effects, but this collaborative approach complements various tasks, such as translation, information extraction, and summarization.
But the most interesting thing is accelerating workflows. I've seen many cases of rapid prototype development. For example, once you have an idea, you let AI generate 25 related ideas, test these ideas against creative standards, then simulate user reactions to these ideas, further refine them, and finally create a runnable prototype. This process might only take 25 minutes, achievable through the command line and OpenAI. But organizations often encounter problems in this process; for example, after having many good prototypes, their manufacturing capacity and output can't keep up. So, in the early stages, AI's augmentation is very obvious. Furthermore, research agents and knowledge management agents are also very valuable, as they can provide timely advice.
Joel Hellermark: When everyone can program, do scientific research, and delve into multiple fields, what changes will the economy undergo? For example, if the output of the healthcare industry increases tenfold, will we still be limited by regulations? Or will the system adapt to this change?
Ethan Mollick: Both will happen. System changes take a long time. When we talked to people at DeepMind, they said that drug discovery had achieved great results within a year, which would prompt changes in the system. But the uncertainty of the regulatory environment is a problem. For example, Europe and the United States have different regulatory reasons, which makes it difficult for us to determine investment direction, and AI has limited ability to act in the real world.
Robotics and organizational structures both lag behind AI. So how to consider these factors becomes very important. One reason people like to use intelligent agents is that they can automate some tasks and save us effort, but they will eventually face real-world problems, and these friction points will slow down progress. On the other hand, if we can overcome these friction points and provide some potential compounds, that would be a huge step forward. So the benefits will gradually appear, but we are not yet sure of the specific situation, and this is also related to the autonomy of the system.
Joel Hellermark: In this scenario, which roles do you think will be more valuable in an organization?
Ethan Mollick: This is a tough question, and it largely depends on the organization's choices. I think management roles and roles that involve thinking about systems will be very valuable, because systems have many problems, and experts will also become very valuable. It turns out that expertise is very important; no system can compare to the top experts in their field. We usually measure AI against the average level of a field, and AI performs excellently in that regard. But if you are a top 2% expert in a certain field, you can outperform AI in that field, so in this area, expertise is crucial. Either deep expertise, or broad knowledge as a system leader, or excellent judgment—these three points will help you.
Joel Hellermark: I've been thinking about a question: on one hand, you can hire more senior developers, as you said, only the top 2%, and they will bring us big changes; on the other hand, now you can also hire more junior developers, because with AI's assistance, they can reach the level of senior developers. Do you think the popularization of professional knowledge will allow you to build teams with more junior talent, while senior talent might benefit less from this technology?
Ethan Mollick: Actually, several influencing factors are at play simultaneously, and they're worth analyzing. Our Boston Consulting Group study was the first to document that low performers gained the most performance improvement from AI in a real-world setting. But people don't talk much about why we found this phenomenon. We measured a metric called “retention rate,” which is the percentage of times consultants ultimately converted AI's answers into their own. For about 80% of consulting tasks, the only way to mess up was to inject your own ideas into the AI's answers. As long as you just submitted the AI's answers, you would achieve excellent results.
The only way to mess up is to add your own ideas to the AI system's answer. As long as you just submit the answer given by the AI system, you will perform very well. Essentially, don't add your own ideas, so you can basically reach the top 8%. When you say you want to hire a junior developer in the morning and make them better, I think it's necessary to clarify: are we saying that humans are just replacing things that we currently cannot let AI do autonomously? For example, I paste requirements, attend meetings, but in reality, AI is doing the work. Is that it? Or can it really bring people to that level?
At the same time, at the level of truly excellent talent, we see this effect: if you are excellent and use AI correctly, your work efficiency can increase tenfold or even a hundredfold. So I think you have to consider both aspects; this substitution effect exists. I've always believed that many benefits come from having expertise yourself and then using AI to compensate for areas you're not good at.
For example, I've always thought about the problem of entrepreneurs. I am an entrepreneur myself and teach entrepreneurship courses. Entrepreneurship means you're not good at many things, but you're excellent at one. The reason I teach entrepreneurship courses is to prevent you from being tripped up by the 95% of things you're not good at. For example, you didn't know you needed a business plan before, or you didn't know how to do a business presentation, but your idea is great, and you know how to implement it in this market. So AI can help you solve 80% of these problems, which is really good. This is actually replacing your job. But in the 99.9% area where you excel, you can get a 100x improvement, and I think the principle is the same. I think the problem is, if you hire junior staff and expect them to always use AI, how can they grow into senior staff? That will be a real challenge.
Joel Hellermark: What do you think is the solution? For example, I've talked to many law firms. For them, a core part of training is doing basic work. Then, as you gain more seniority, you do more complex legal analysis. But looking at the actual work junior staff do, I think most of it doesn't match senior staff's work; it's simple, repetitive, etc. Do you think this will become a problem, meaning people can't grow through the career hierarchy like before, and consequently, we don't have as many people capable of senior positions, or people will move into senior positions faster?
Ethan Mollick: I'm really worried about this problem. Just like at other universities, I teach at the Wharton School, and the students are very bright. They are generalists; I teach them how to do analysis, but not how to become a Goldman Sachs analyst. Then they go to Goldman Sachs or law firms or places like that, and they learn in the same way we've been teaching any white-collar knowledge work for the past 4000 years: apprenticeship.
You're right, they're asked to do repetitive work over and over again. Repeating this repetitive work is how expertise is accumulated. You'll be scolded by your senior manager. At some companies, you might be treated poorly; at others, perhaps well. But basically, you'll constantly be corrected, for example, when writing a deal memo. It's not just about learning to write a deal memo; you're also learning why this method doesn't work. You learn a lot about what the goal is from your mentor, but that's how things happen.
If you have a good mentor, apprenticeship works. We don't spend a lot of time specifically training people. It's like magic; some people learn, others get fired. They might be fired because of poor performance, but it could also be bad luck, encountering a bad mentor, or not learning the right things. That kind of master-apprentice tradition has lasted for thousands of years.
But now, if you're a junior person and you go to a company, you don't want others to know what you don't understand, because you want to keep your job. So you'll use AI to do everything. That way, you don't have to think, because AI is better than you. Every middle manager also realizes that instead of finding an intern who sometimes messes up or cries, it's better to let AI do the work, because it does it better than the intern. I'm really worried that this talent development chain will break.
The problem is, we treat this as something implicit. For example, in law firms, there are almost no specific courses teaching you how to be a good lawyer. You can only hope you have a good mentor and then copy their methods. That's why bankers often work 120 hours a week. Why? Because that's how it's always been, and it teaches you something. So I think we must think more formally about how to teach people expertise and put it into practice. Ironically, we do very well in sports because in that field we have learned how to cultivate specialized skills through repeated practice under a coach's guidance. We need to adopt similar methods in other forms of learning.
Joel Hellermark: If you were to start a new university for the age of intelligence right now, how would you plan it? Assuming AI models continue to improve over the next few decades, how would you design a university around that?
Ethan Mollick: There are several aspects to consider. One is what we should teach, and the second is how we should teach. I'm more concerned about the second question. I think it's important that we teach people AI skills. As someone who interacts with these systems a lot, I'd say learning relevant skills isn't actually that hard.
First, there are about five courses of skills to learn, unless you want to build large language models, and then you need a lot of practical experience. So I don't think the focus is on teaching people how to use AI. I think a lot of the disciplinary knowledge and skills we teach are very important. We want people to learn how to write well, have broad knowledge, and deep expertise. I think universities are well-suited to do that.
But where we don't do well is in teaching methods. Everyone is cheating now, and AI detectors don't work at all. People have always cheated, but now everyone is cheating openly. There's a great study that shows that from the internet age and when social media really took off, around 2006 or 2007, almost all students at Rutgers University who did their homework diligently performed better on exams.
But by 2020, almost no one could improve their grades by diligently doing homework; only 20% of people improved their exam scores, because everyone else was cheating, so you had to put in the effort. AI doesn't let us skip the step of hard study, but with a one-on-one AI tutor, we can teach according to each person's level, and we can accelerate the learning process in some ways. So I'm more interested in how to change teaching methods; I've already tried it in my classroom. How to use AI to change our teaching methods is a very interesting question. I don't know if the teaching content will change. I think we can also scale up teaching and teach more people, but I think some core disciplinary content won't change.
Joel Hellermark: You've done some really cool things. What other ways have you adopted to conduct teaching?
Ethan Mollick: All aspects. My entrepreneurship course is entirely AI-based. Previously, at the end of the course, students would have a business plan and a presentation, and many students raised millions of dollars through my course and the same course taught by my colleagues. But now, at the end of a week-long course, students can create a runnable product.
When I introduced ChatGPT to my entrepreneurship course that Tuesday after its release, a very easily distracted student came to me after class and said, “I built the entire product while we were chatting.” At that time, it was still shocking that AI could write code, but now the situation is completely different. Now, I have students do AI simulations, and they have to teach AI things.
We have a junior “AI student,” and all course materials are equipped with “AI tutors” whom students use to build cases. In team collaboration, AI observes their performance and gives feedback, or plays the role of an antagonist. So there are many cool things that can be done to assist teaching, but the goal is always to make the classroom experience more active and engaged. So I don't think classrooms will disappear, but what we do in them will change.
Joel Hellermark: One question we've been discussing is how organizational structure design should be built. Should companies hire a Chief AI Officer to oversee all internal deployments? Or should they adopt a model where one person in each team explores application scenarios? What do you think? How would you build your AI department?
Ethan Mollick: Sometimes I worry a bit about setting up a “Chief AI Officer” position, for the same reason everyone faces: everyone wants answers. I often talk to all AI labs, and I know you do too; you've been in this field longer than most people in it. Soon you'll painfully realize that no one really knows what to do. It's not like the lab has an instruction manual they haven't given you. Regarding this field, the data I share with you, and the data I share online, is pretty much all there is; there are no secrets. Everyone is eager to imitate others, but there's nothing to imitate.
So, when you say you want to hire a “Chief AI Officer,” how much experience would they have from the past two years? How would they be different from others? No one could have predicted how powerful large language models would be. You entered this field earlier than many, which gave you a one-year head start—this is a very special situation. So there's simply no so-called expert to hire.
Furthermore, a major problem with applying AI in enterprises is that the concept of AI was vastly different between 2010 and 2022. Big data is still important and worthwhile for driving various developments, but that's different from current AI. So, hiring a Chief AI Officer is relatively difficult. I firmly believe that enterprises actually have enough internal expertise to succeed, because only true experts know how to use AI.
Someone who has worked in a certain position thousands of times can easily run a model and judge whether it is effective. In fact, in our Boston Consulting Group study, a second paper showed that junior employees' ability to use AI was far inferior to senior employees', which many people did not expect. Everyone always thought that the younger generation should use AI.
But that's not the case, because a junior employee writes a memo for you to see, and it looks pretty good. But you might have worked in this field for 20 years, and you can point out seven deficiencies in that memo, so expertise and experience are very important. I think there's no need to assign a dedicated person for AI to every team. Moreover, we don't even know what kind of person is good at using AI. So I usually recommend connecting the general employee population with the AI lab.
The role of the general employee population is not just to discover AI application scenarios. In fact, in almost all enterprises, at most 20%-30% of employees use AI models internally; others either don't use them or secretly use others' AI because they don't want others to know their situation. But when 20%-30% of employees start using them, you'll find that 1%-2% of them are very good at it. These are the people who can lead the enterprise's AI development work.
At first, you don't know who they are, and you won't know, but they will stand out. The problem is, they create huge profits for the company, and you might want to move them away from frontline positions, but they should become the core force of the AI lab to explore how to better use AI. So I think building in-house AI development capabilities is the right approach. When we are still unclear about who is good or bad at AI, it's difficult for me to recommend that companies hire a large number of AI-related personnel, and the organizational context of the company is also very important in this regard.
Joel Hellermark: So how do you think incentive mechanisms should be set? If you bring together experts from various fields and ask them to explore how to deploy AI, even automating their own jobs, how do you incentivize them to do that?
Ethan Mollick: This is why leadership is so important. First, for companies with good corporate culture, this will be easier. If the CEO announces that the company is in growth mode, if employees trust the CEO or founder, and they say, “We won't lay anyone off because of AI. We will expand our business and make AI work for everyone.” Then employees will be more motivated to do this.
This is much easier than for mature, large companies, because large companies often use AI for layoffs, and employees can feel the difference. So you have to be open from the beginning. If this threatens employees' jobs, they have the right to know, and you have to think about what you're going to say first. In this situation, incentives can be very diverse.
I talked to a company that gives a $10,000 cash bonus every week to employees who do the best in automating work. Compared to traditional IT deployments, this is like directly handing employees a big box of cash. I also talked to another company that, before hiring, required applicants to spend two hours with the team trying to complete a task using AI, and then rewrite the job description based on their AI usage; or when submitting project proposals, they had to first try to complete part of the work using AI, and then resubmit the proposal.
So you can incentivize employees in many different ways, but a clear vision is very important. If you say that in four years your job will be to complete a certain task with AI, people will ask, “What does that mean? Will I be sitting at home, sending instructions to an intelligent agent to do things in my room? Will the number of employees decrease?” I find that many executives want to push this question off, saying, “AI will bring many benefits.” But without corresponding compensation, why would employees share their increased productivity with the company? So starting with this vision is very important.
Joel Hellermark: You also did a study on AI embedded in work like a colleague and collaborating. You studied people working alone, people working in teams, people working alone with AI, and people working in teams with AI. What insights does this study offer on how we integrate AI into teams?
Ethan Mollick: My colleagues at MIT, Harvard, and Warwick University and I conducted a large-scale study of 776 people from Procter & Gamble, a large consumer goods company. As you said, the study participants were divided into cross-functional two-person teams and individuals working alone, and they worked either with or without AI assistance.
First, we found that in actual work tasks, individuals working alone with AI performed as well as teams. This was a very significant improvement, and because they worked with AI, they were happier. They gained some social benefits from collaborating with these systems, which led to high-quality results.
We also found that teams working with AI were more likely to come up with breakthrough ideas, and the differences in expertise would be reduced. If you measure the technical content of a solution, people with technical backgrounds will propose technically sophisticated solutions, and people with marketing backgrounds will propose marketing-oriented solutions. But once AI is involved, the solutions become more diverse. So AI is a very good complement to human work, though this is still relatively preliminary research. We gave them some prompts and let them operate, but often they were just interacting with these systems. So, it's still the same problem as before: if companies wait for others to provide solutions, the situation will be worse than if they start trying now and figuring out what works and what doesn't.
Joel Hellermark: What do you think the interface for collaboration will look like? Will it be directly embedded in Google Docs and Slack, allowing us to communicate with them like colleagues? Or will there be interfaces specifically designed for AI, allowing us to collaborate with them?
Ethan Mollick: I think interfaces specifically designed for AI make more sense. They should be designed around team collaboration, rather than having an intelligent assistant in every document. Having an interface that can maintain state across different tasks is not far off now. It's like holding my phone and opening ChatGPT's intelligent agent. It can observe our surroundings and provide feedback on what we're doing. I think this is a promising direction, and it's also about redesigning work. I think autonomous intelligent systems are more attractive because they not only automate work but also integrate many work processes.
Joel Hellermark: You mentioned an example earlier where AI fabricated a quote from you, and you even thought it was your own. When do you think we'll be able to get systems to the research level of “Ethan Mollick”? What conditions are needed? Is it about providing them with more context? Do you think we can achieve this soon? What would that mean—would you just need to filter the best papers it generates based on your own standards?
Ethan Mollick: I think a lot of things are already achievable with our current model levels. There's a paper that showed a 0.1 version preview, which isn't even the most cutting-edge model right now. In a case study from the New England Journal of Medicine, the previous model had a hallucination rate of about 25%, while this model reduced it to about 0.25%. When you connect to data sources and use smarter models, the hallucination problem starts to decrease.
Problems still exist, but as you mentioned earlier, I've used AI in my classroom. My initial classroom rule was to allow students to use AI in class. For the first three months, it was great, right? When ChatGPT 3.5 was released, my students were smarter than ChatGPT; it made more obvious mistakes. I let them use AI freely, because if they didn't think for themselves, they'd at most get a B-grade score, and AI couldn't do better at that time. Then GPT-4 was released, just like my less diligent students. So I think the situation we face now is similar; these systems are very powerful.
As people build intelligent agent systems, I think you might be realizing what I realized long ago: when you think about these systems from the perspective of an intelligent agent, they can do a lot more. And Google has been building AI labs, and Carnegie Mellon University is doing similar things. I actually think that building a system that can conduct interesting research is more about willpower. In many areas of AI, I will exclaim, “Wow, we have proven that it can play a very good role as a mentor.” So why are there only a few good mentor-style AIs, and not thousands? Where are the thousands of scientific applications? Where are the internal training systems? These are all achievable now; the key is to do them.
Joel Hellermark: What's the most surprising thing you've encountered in your work recently? Among the latest generation of models, what situations that didn't work before are now working really well?
Ethan Mollick: I mean, take the latest version of Gemini for example. For academics, one of the most agonizing things is writing a tenure statement. You probably write it only once in your life. You have to summarize typically 15 years of your academic work, which is very complex, then distill it into a few themes, and write an article about why your research revolves around those themes.
Recently, I was able to use the new Gemini model to input all the academic papers I've written, because it has a huge context capacity. It helped me distill two out of three of those themes, which took me two months to write myself, and its analysis level was quite high. Even more interestingly, I can now input any academic paper and ask it to turn the paper into a video game, and it can output a well-running video game. I've also recently used it to write some 3D games. Keep in mind I'm not good at programming, but I built well-running systems. So I feel that one threshold after another is constantly being broken, and I'm often surprised and can't believe these systems can do so much.
Joel Hellermark: For businesses, how should we view this? Is this equivalent to injecting more IQ into the system? Or investing more labor? As a business, how should I look at this?
Ethan Mollick: There are tactical and philosophical perspectives. From a philosophical perspective, we're actually not clear. Of course, it involves intelligence, but intelligence and labor are just two very simple input factors. But what does getting better advice mean? What does having a better mentor mean? What does having a second opinion mean?
From a tactical perspective, I think the goal should be to adopt aggressive strategies. I think very few organizations adopt such aggressive strategies: to fully utilize the system and let it do everything. If it can't, that's fine, you'll have a benchmark for testing future systems, and it might actually be able to do everything. If it does, you've learned valuable experience. So I really don't agree with incrementalist approaches, like just letting the system summarize documents. Of course, that's fine, but I could do that a long time ago. Why would you only let it summarize documents? We should let it complete the task directly, instead of just doing intermediate steps.
Joel Hellermark: I think that's a very interesting point, because many companies now start with a small proof-of-concept project, and then try to scale it up. But often, after six months, they get stuck in the proof-of-concept phase and can no longer scale. While other companies have adopted a direct, full-scale deployment approach, making it available to everyone, and then investing more in use cases that work well.
Ethan Mollick: But even then it's not radical enough, though it's sufficient, and you're absolutely right. Because those use cases that work well are produced within the system's limitations and people's capabilities at the time. And developing applications is often the worst entry point, because you end up with a semi-successful product that you then have to build around its limitations.
There are also other issues. We can say that one problem IT teams face when deploying AI is that they focus heavily on low latency and low cost. It turns out that in these models, low latency and low cost are the opposite of high intelligence. So sometimes we need low latency and low cost, but sometimes, for a very wise decision or a new chemical substance, I'm willing to pay 15 cents, which is a reasonable price.
So you have to balance that, because people often develop based on cheap, small models and then get stuck. That's why staying neutral and keeping up-to-date is so important. Even when people do that, they often don't find radical approaches. This is where labs come in; you really need people to do what seems impossible.
Joel Hellermark: What's the difference between using it as an “assistant” and using it as an “enhancement” tool? Do you have any advice?
Ethan Mollick: The definition of “assistance” was originally proposed by Gary Kasparov. What I take from it is, it's like a centaur figure, meaning you're essentially dividing labor with AI. I know Castro's elaboration of this definition goes deeper; this is a preliminary way of applying it. For example, I hate writing emails but am good at analysis, so I can let AI help me analyze emails.
The “enhancement tool” application is more integrated. For example, writing my book was an “enhancement tool” task. Since then, the system has improved a lot, but at that time, its writing ability was very poor. I think my writing ability is pretty good; at least I'm proud of it. So AI barely helped me write anything, but the writing process was painful, and it helped me solve all the problems that made writing painful.
For example, when I get stuck on a sentence, it can give me 30 ways to end it, and I can choose one; it will read chapter content and ensure quality. Just like my Substack blog, I often have two or three AI programs read it and give me feedback. I rarely let it do core writing, but I always get feedback from it and make revisions accordingly. Letting it read academic papers and ensuring I've cited references correctly—these use cases truly demonstrate its power.
Joel Hellermark: There's a study that shows people who take AI advice end up being more productive, but it mainly helps senior employees, while lower-performing employees are less able to internalize the advice. What does it mean for society if everyone receives advice on how to deploy AI in organizations?
Ethan Mollick: I don't think it's always the same advice; AI is very good at giving context-specific advice. You might be referring to the study on Kenyan entrepreneurs, which was a great controlled study where entrepreneurs only received advice from GPT-4 and couldn't have it produce products or do other things for them. The results showed that for high-performing entrepreneurs, their profitability increased by 8% to 13%; I can't remember the exact numbers, but just advice alone could bring such an improvement, which is simply amazing.
If I could give advice to students and increase their profitability by that much, that would be great. Everyone has strengths and weaknesses, so even if you get advice from AI, it will focus on your weakest areas, not your strongest. Lower-performing entrepreneurs fared worse because their businesses were already struggling and unable to implement these ideas.
I think in terms of giving advice and offering a second opinion, there is indeed a risk that it might lead us all in the same direction, and we've also found this problem in creative ideation. AI has some fixed themes; if you've used these models, you'll know that, for example, GPT-4 loves to generate ideas related to cryptocurrency, augmented reality, and virtual reality, and also environmentally friendly ideas. I guess this is related to its post-training; it just keeps outputting this content. But in some other work, we've found that if you give more subtle prompts, it can generate diverse ideas like a group of people. So, the question here is, what can a consultant do for you? Maybe you need four or five consultants, and you don't want to rely solely on a generic molecular consultant; you might also want to consult Adam Grant and Gary Kasparov, which might be more valuable.
Joel Hellermark: Maybe I'll ask you to list 30 good examples of companies deploying AI, and issues like giving cash rewards to those who deploy it best. What novel ideas have you seen?
Ethan Mollick: I've seen many such examples. Unfortunately, I can't list 30, and I can't even tell you everything I know, because some information I can't disclose. However, a common practice is to have all programmers use AI tools, and then change your reward mechanisms around that. For example, halfway through every creative meeting, you could ask AI how it's going, or if the meeting should continue, or even end it directly. Even in offline meetings, you could pause and talk to AI to think about the current progress.
I've seen people equip everyone with an AI consultant, allowing them to consult on strategic advice at every decision point. There are also some very interesting applications in training. For example, I've seen people use simulated training environments and involve AI in some way, which works very well. I can't give 30 examples in limited time.
Joel Hellermark: But I think “Ethan” (the agent) probably can.
Ethan Mollick: Definitely. You see, I'm not performing very well, which shows I'm real. I'm a little worried you're not satisfied with my performance. Your expectations of me are high, and I'm worried you'll get better answers from someone else.
Joel Hellermark: We'll definitely try using AI to answer. What do you think is the best-case scenario? Assuming everything goes well, and AI is widely adopted in society, what will the best-case scenario be like in the next ten years?
Ethan Mollick: Leaving aside the super AI scenario, where we are protected by loving and benevolent machines, let's return to reality. I think the problem is that the best case also requires policy decision support, because this will clearly impact employment, but we are not yet clear on the specific forms. It's very likely that everyone will have more job opportunities, but they will need to be retrained. I don't know what the future holds.
So currently, there is a lack in policy. But I think in the future, people's work will be more fulfilling because foundational tasks will be reduced. In such a world, productivity improvements will be more interesting, not just measuring how many words you type now. For example, if you build an intelligent agent system to work for you, suddenly you'll feel like you're in a completely different world, and your sense of fulfillment will greatly increase. You work less, but produce more, and at critical junctures, people's creativity will come into play, and those with unique styles, methods, and perspectives will produce results completely different from others.
It's like AI being five to ten times more powerful than it is now, but without crossing a certain boundary. In some ways, this is a bit of a strange expectation, but it's the easiest to imagine, a result similar to today's world. If these systems become more intelligent, it will become: since the system can automatically generate videos, why do you still need to go to work? I feel that in five years, we can recreate character images, turn them into 3D, put us in volcanic scenes, and let us communicate with them individually using everyone's language and voice. We are already very close to this level, and by then, job roles will change even more dramatically.
Joel Hellermark: What views in this field do you strongly disagree with at the moment?
Ethan Mollick: I think everyone is too focused on safety, although I understand safety is important. There's a paper that says we either focus on external risks or we don't. Many people do focus on external risks, which is worth thinking about, but I'm more worried that we currently lack control over decision-making. I'm worried that people treat AI as a pure technology, like our current discussion, viewing it as a bulldozer, which is actually wrong. We must figure out how to use and shape this technology, and that's important.
Everyone participating in this event has the right to decide how to use and shape AI, and these decisions, in turn, will influence the direction of AI's development. So I'm really worried about this lack of control, as if AI will do whatever it wants to us. We can make choices; we can make choices that uphold values we consider essential as humans, and that meet customer and societal needs. Avoiding such discussions worries me. I also think that many people in the AI technology field don't understand how actual organizations operate. Organizations are actually more complex, and even very intelligent agents may not be able to change how a company operates overnight. When change will happen is unclear; it might take five to ten years, and it will be intermittent.
Sometimes people's ideas are very naive, like my sister is a Hollywood producer. Every time I hear someone say AI will replace Hollywood, I think, they simply don't understand how much effort goes into making a Hollywood movie. Some jobs will indeed disappear, but in fact, they are already using AI to improve efficiency, which is an interesting example. She was involved in producing a movie starring Michelle Pfeiffer. Previously, for audio test dubbing, now they have a synthetic Michelle Pfeiffer voice that can be used for testing, but they can't use this voice for cinema audiences because actors have strong union protection. So this is just an experimental platform. But Michelle Pfeiffer still needs to personally record what she wants to express. So I think we can build a world that defends humanity, but that requires us to make choices.
Joel Hellermark: If you were to have a model help you make all decisions from now on, how would you set up its prompt?
Ethan Mollick: First, I would provide it with a lot of context. You would need to understand a lot about me and my decision-making habits, possibly inputting millions of words of information. But because I've written some articles, AI has some understanding of me and will also have its own opinions of me. So when I ask it to “think like Ethan Mollick,” I get pretty good answers. It's sometimes a bit too enthusiastic and likes to use hashtags, which I don't really recommend, and it also loves emojis, but I'm not really an emoji person; it thinks I'm more like a Gen Z.
Other than that, if I seek decision advice from it, I would say, you need to put yourself in my shoes, knowing that you are working for Ethan Mollick and helping him make decisions. Before making a decision, be clear about the four or five very important things he values. I want you to first identify four or five possible decision options, at least a few of which should be very aggressive. Then compare these decisions, listing two or three simulated outcomes for each option. Next, simulate an impatient version of Ethan and a thoughtful version of Ethan, and have them debate which option is best. Finally, list the pros and cons of each option, and then choose the best one. It needs a bit of a chain of thought, and also a bit of empathy.
Joel Hellermark: That's a great prompt; we should try it. A few years ago, I did something where I trained a model on everything Steve Jobs ever said, starting from his principles, and the answers were very interesting. For example, during the pandemic, I asked it, should we implement remote work? Should we become a remote-first company? Steve's answer to me was no, 95% of communication problems can be solved by having people in the same room, always keeping team members together. If you train a model based on someone's work, you get a specific viewpoint, not an average one like you get on the internet.
Ethan Mollick: This is a very important point when getting advice, and it's also why companies are so important. If a company founder's philosophy can influence AI, if you hand over the company's principles handbook to AI and let it know these are what we believe in, the results will be completely different from when it doesn't have this information. I believe that AI should not be seen as an omniscient brain that always provides the correct answer; it only provides a viewpoint, and this viewpoint can be shaped. If you believe your principles and views of the world are correct, giving these principles to AI and letting it help you implement them is much better than just letting it give you random advice.
Joel Hellermark: I've noticed a very interesting phenomenon: currently, these systems aren't optimized for user engagement. We basically just train them to predict the next word. But if we understand the consumer service sector, we'll know that they will quickly evolve to have deeper conversations with us. One can imagine deploying a chatbot in our organization, and we'd want to maximize interaction with it. It would attract people, ask them interesting questions, and so on. What do you think will happen once these systems are optimized for engagement? This hasn't happened yet.
Ethan Mollick: I have some concerns. I think large labs are starting to realize they can do this. If you look at OpenAI's product development trend, they've become more casual, more chat-like. There's an interesting example: when the new Llama 4 model was released, the version ranked number one on the leaderboard was not the same as the version released to the public. If you look at the chat logs of that leaderboard version, it's full of emojis, it praises you for being great, and it tells some slightly funny little jokes. But that's not the version released to the public. Versions optimized for engagement use more vocabulary to flatter you.
I'm really worried about this. We have some early evidence showing that doing so leads to higher user stickiness, and social media has validated that optimization for engagement makes it a very dangerous place. But I think it's inevitable, so how to deal with this problem becomes very important.
Joel Hellermark: One question I often get asked is, how should we measure the effectiveness of AI applications? If you're a business leader and you want to measure something to prove that deploying AI has increased productivity, what do you think you should measure?
Ethan Mollick: This is a point I strongly insist on: in the early stages of R&D, the worst thing to do is to set a bunch of KPIs. If we only focus on improving engagement, and you focus on optimizing a single metric, you'll only get improvements in that area, and other areas might not improve.
We don't know what effects these systems can bring. You've invested in R&D, we know there will be performance improvements, and we can see those improvements. But if you're optimizing for performance, does that mean how many documents are generated per day? Or how quickly people submit reports? Is that what you want? Some organizations were not founded with the intention of achieving the KPIs you set.
In the past, people thought writing as much text content as possible was valuable. For example, you could write an excellent report, do four slide presentations, or research six companies. But now, do you want people to research 25 companies and create 300 slides a week? Or pursue the number of lines of code people write? Can you imagine that in some cases, quickly clearing backlog tasks is important, but is that what we want people to do? So I'm really worried about setting key performance indicators and quantifiable KPIs, especially because these metrics often end up just being about cost savings. And the goal of cost savings is often to cut 30%, which then means layoffs, and that affects everything you do.
People do need to have an R&D mindset. Productivity improvements are obvious, and applying this mindset to programming work is fine, because productivity improvements in programming are significant. But I'm still worried that some people want to improve productivity in document writing. This feels like a risky thing, because the goal you want to optimize is unclear.
Reference: https://www.youtube.com/watch?v=KEQjwE7hDjk