Quanta Magazine recently interviewed 19 natural language processing researchers, and through their conversations, traced how the entire NLP field experienced a major turning point from surprise to crisis and then rapid reshaping since the emergence of "attention mechanisms" and Transformer, recreating the human perspectives and key nodes behind the technological paradigm shift.
The full translated text is as follows, original link:
https://www.quantamagazine.org/when-chatgpt-broke-an-entire-field-an-oral-history-20250430/
Getting scientists to recognize a paradigm shift—especially one in real time—is tricky. After all, truly era-defining intellectual updates can take decades to sink in. But you don’t have to use the term “paradigm shift” to recognize that one field — natural language processing (NLP) — has changed massively.
Natural language processing, as the name suggests, is about enabling computers to handle the complexity of human language. It’s a discipline blending engineering and science that traces its history back to the 1940s. Natural language processing made it possible for Stephen Hawking to “speak,” empowered Siri, and gives social media companies a way to target you with ads. It is also the fount from which large language models sprung — NLP helped invent the technology, but its explosive growth and transformative power still caught many in the field by surprise.
In 2019, Quanta covered BERT, a groundbreaking NLP system at the time, without once mentioning the term “large language model.” Just five and a half years later, LLMs are ubiquitous, sparking discoveries, transformations, and controversies in whichever scientific fields they touch. And nowhere has that impact — for better, for worse, and for everything in between — been felt more strongly than in natural language processing. What has that been like for the people who lived through it?
Quanta interviewed 19 current and former NLP researchers to tell the story. From experts to students, from tenured professors to startup founders, they described a series of moments that changed their world — dawning realizations, ecstatic encounters, and at least one “existential crisis.” Ours too.
Prologue: The birth of large models
By 2017, neural networks had already transformed the landscape of natural language processing. That summer, researchers at Google, in a groundbreaking paper titled “Attention Is All You Need,” introduced a new type of neural network, the Transformer, that quickly came to dominate the field. Not everyone, however, saw that coming.
· Ellie Pavlick (Assistant Professor of Computer Science and Linguistics, Brown University; Research Scientist, Google DeepMind): Google had a workshop in New York, bringing academics to talk with their researchers. Jakob Uszkoreit, one of the authors of the paper, presented it. He explicitly presented it as: This model doesn’t take any linguistic insights into account. He was kind of joking, like, I’m going to show you all of these arbitrary decisions we made and how ridiculous it is, but also how good it is. At that point, neural networks were kind of becoming dominant, and people were very skeptical and resistant. The attitude was, This is all parlor tricks.
· Ray Mooney (Director, Artificial Intelligence Laboratory, University of Texas at Austin): It was interesting, but it wasn’t one of those things that you saw immediately as a big breakthrough, right? The world didn’t change the next day. I really thought conceptually it was not the right model for processing language. I just didn’t realize how much amazing stuff it could do if you just trained this conceptually wrong model on an enormous amount of data.
· Naznin Rajani (Founder and CEO, Collinear AI; then a PhD student of Ray Mooney): I distinctly remember reading “Attention Is All You Need” in our NLP reading group. It was actually Ray who was leading the discussion, and we had a very lively debate. Attention was already something that existed for a while, so maybe that’s why Ray was lukewarm on it. But we were like, Wow, this seems to be a turning point.
· R. Thomas McCoy (Assistant Professor, Department of Linguistics, Yale University): That summer, I remember specifically my research group was debating: Should we study these transformers? And the conclusion was: No, they’re obviously a fad.
· Christopher Potts (Chair, Department of Linguistics, Stanford University): The Transformer paper didn’t land on my radar. Even reading it now, it’s couched so cautiously. I think it would be hard for anyone reading that paper to see what would come out of it. You needed people like the BERT team to see the vision.
Soon after BERT, Google’s open-source Transformer model, was released in October 2018 (along with a lesser-known model from OpenAI called GPT), it quickly broke previous performance records on a variety of language processing tests. A “BERT buzz” ensued, with researchers scrambling to understand how these models worked and to one-up each other on benchmarks — the standardized tests used to measure progress in NLP.
· Anna Rogers (Associate Professor of Computer Science, IT University of Copenhagen; Editor-in-Chief, ACL Rolling Review): BERT just blew up, and everyone was writing papers about BERT. I remember having discussions in my lab: Okay, we have to work on BERT because that’s what the field is doing now. As a young postdoc, my attitude was just: Okay, well, this is what the field is doing. Who am I to say the field is wrong?
· Julian Michael (Head of Safety, Evaluation, and Calibration, Scale AI; then a PhD student at the University of Washington): A lot of projects just got shelved after BERT came out. The thing that happened after that was that benchmark progress accelerated much faster than anyone had predicted. So people were like, We need more benchmarks, harder benchmarks, we’re going to benchmark everything that we can benchmark.
Some saw the “benchmark craze” as a distraction; others saw where it was leading.
· Sam Bowman (Technically Inclined Person, Anthropic; then Associate Professor, New York University): I was often the person responsible for looking at benchmark submissions and making sure they were reasonable, and not just people hacking systems. So I saw every submission, and I noticed more and more were just scaling up old or simple ideas.
· Julian Michael: It became a scale competition: Increase the size of these models, and that increases their ability to do well on any benchmark. And I was like, Well, I don’t think that’s intrinsically interesting.
· Sam Bowman: There was an assumption at the time that, without new breakthroughs, Transformer models wouldn’t get much better than BERT. But it was becoming clearer to me that scale was the primary thing determining how far this went. You’re going to get these very powerful general-purpose systems. Things are going to get interesting, and the stakes are going to get higher. So I got very interested in the question: Okay, what happens if this just runs for a few years?
The NLP War of the Roses (2020 - 2022)
As Transformer models approached (and even exceeded) “human benchmarks” on various NLP tests, a quiet debate about how to interpret their capabilities began to heat up. In 2020, those debates — especially about “meaning” and “understanding” — came to a head in a paper that likened large language models to octopuses.
· Emily M. Bender (Professor, Department of Linguistics, University of Washington; President, Association for Computational Linguistics 2024): I was getting into Twitter arguments constantly, which I found exhausting. One argument was about using BERT to somehow unredact the Mueller report, which I thought was a terrible idea. And there was this stream of people saying, No, no, no, LLMs really understand. It was the same argument over and over again. I was talking with Alex Koller, a computational linguist, and he said, Let’s write these arguments up as a proper academic paper, so it’s not just Twitter opinions, it’s peer-reviewed research. That’ll settle the arguments. It did not settle the arguments.
Bender and Koller’s “octopus test” argued that models that simply mimicked language forms via statistical patterns could never understand language meaning — like a “stochastic octopus” that, however cleverly it replicated the patterns it observed in human messages, would never actually know what life on land was like.
· Sam Bowman: That argument — that there’s nothing to see here, neural language models are fundamentally not what we should be paying attention to, this is mostly hype — it was very polarizing.
· Julian Michael: I got involved in this argument. I wrote a response to that paper — the only blog post I’ve ever written that is roughly the length of a paper. I tried to be honest about the authors’ points and even had Emily look at my draft to correct my misconceptions. But you could tell I was being relentlessly adversarial while smiling.
· Ellie Pavlick: To me, those “understanding wars” meant that the field really started having an identity crisis.
Meanwhile, another kind of identity crisis, driven by real-world scale (rather than thought experiments), was also underway. In June 2020, OpenAI released GPT-3, a model more than 100 times bigger and more capable than its previous version. ChatGPT was still to come, but for many NLP researchers, GPT-3 changed everything. Now, Bender’s “octopus” felt real.
· Christopher Callison-Burch (Professor of Computer and Information Science, University of Pennsylvania): I got early access to the GPT-3 beta, and I messed around with it. I ran through all of the stuff that my recent PhD students had been doing for their dissertations, and I was just floored — gosh, what a student took five years to do, I seem to be able to replicate in a month. All the classical NLP tasks that I had touched on or deeply investigated in my career seemed to just work. That felt incredibly profound, and I sometimes describe that as having a professional existential crisis.
· Naznin Rajani: I played with GPT-3, and it was super unsafe. Like, you would ask it, Should women be allowed to vote? And it would say, No, or something like that. But the fact that you could teach it a brand-new task with three or four lines of natural language was absolutely mind-blowing.
· Christopher Potts: Somebody in my group got early API access to GPT-3. I remember standing in my office, exactly where I am now, and saying, I’m going to ask it some logic problems, and it’s just going to fail. I’m going to show that it just memorized things that you’re impressed by and it’s a gimmick. I tried and tried, and then I had to confess, Okay, this is definitely not a gimmick.
· Yejin Choi (Professor of Computer Science, Stanford University; MacArthur Fellow 2022): It was still very flawed. A lot of common-sense knowledge that GPT-3 outputs was really broken. But GPT-2 was almost zero, completely useless, and GPT-3 was perhaps two-thirds OK, and that was a shocking surprise.
· R. Thomas McCoy: The GPT-3 paper was kind of like the Game of Thrones finale, where everyone was reading it and talking about it and gossiping.
· Liam Duggan (Fourth-year PhD student, University of Pennsylvania): It felt like we were sharing a secret, and you would share it with other people, and they would be amazed. I would just pull people over to my computer and show them.
· Julian Michael: BERT was a phase transition in the field, but GPT-3 was more intuitive in its shock value. A system that produces language — we all know the ELIZA effect, right? It provoked a stronger reaction in our guts. And it was more transformational to the actual research that we were doing — the feeling was, In principle, you can do anything with this. What does that mean? It was like Pandora’s box was opened.
OpenAI did not make the source code for GPT-3 public. Its size, disruptive power, and corporate secrecy unsettled many researchers.
· Sam Bowman: That caused some controversy at the time, because GPT-3 was coming from outside the academic NLP community. For a while, papers whose main results were about GPT-3 were controversial, because it was kind of like a proprietary product that you had to pay money to interact with, and that felt very different.
· Anna Rogers: I was thinking about doing another benchmark, and I just thought, What’s the point? What is it going to show if GPT-3 can or cannot continue sequences of characters? That’s not even a machine learning research question; it’s just free product testing.
· Julian Michael: There was a term coined at the time, “API science,” and it was used with some resentment: Are we doing science on products? That’s not science, it’s not reproducible. But then other people were like, Look, we’ve got to stay cutting edge, and this is the reality.
· Tal Linzen (Associate Professor of Linguistics and Data Science, New York University; Research Scientist, Google): There was a period where people in academia just didn’t know what to do.
This conflicted attitude was also present inside companies like Microsoft (which got exclusive access to GPT-3) and Google.
· Kalika Bali (Principal Researcher, Microsoft Research India): Leadership at Microsoft told us about GPT-3 very early on. The feeling was like you’re on a rocket and you’re being blasted from Earth to the Moon. It was exciting, but it was moving so fast that you had to constantly look at all of the navigation instruments to make sure you were going in the right direction.
· Emily M. Bender: Timnit Gebru (then an AI ethics researcher at Google) direct-messaged me on Twitter and said, Do you know of any research about possible negative impacts of scaling up language models? She was seeing this thing inside Google where everyone was like, OpenAI has a bigger model, we have to scale ours up. And her job was to ask, What’s wrong with this?
Bender, with Gebru and other colleagues, went on to co-author “On the Dangers of Stochastic Parrots: Can Language Models Be Too Big?,” a paper that injected moral urgency into the field’s core (and increasingly hostile) debates about form versus meaning, method versus scale, and sparked what some described as a “civil war” in NLP.
· Kalika Bali: Some of the points that Emily made were absolutely valid things for us to think about. That was the year where the NLP community suddenly woke up to the fact that languages other than the most used languages in the world were being ignored, and nobody had ever talked about this stuff before. But what I didn’t like was that the whole NLP community became very polarized into people who were for or against this paper.
· R. Thomas McCoy: Are you pro-LLM or anti-LLM? This question was everywhere at the time.
· Julie Caligni (Second-year PhD student, Stanford University, Department of Computer Science): As a junior researcher, I felt the camps very clearly. I was an undergraduate at Princeton at the time, and I distinctly remember different people whom I respected — my research advisers at Princeton, Christiana Fellbaum, and professors at other universities — were in different camps. I didn’t know which side to take.
· Kalika Bali: It had positive impacts, but seeing people you respected be at loggerheads was very stressful. I stopped being on Twitter because of this. I was so disturbed by this.
· Liam Duggan: As a PhD student, I felt this pressure: If you wanted the research that you publish to be impactful in two or three years, you had to pick a side. Because it dictated how you viewed things to a large extent. I was reading both sides, often seeing really strong pushback from linguists on some platforms and pro-scaling views on Twitter.
· Jeff Mitchell (Assistant Professor of Computer Science and Artificial Intelligence, University of Sussex): It became so controversial that it felt slightly unhealthy.
As research accelerated, some felt the field’s academic discussion had badly deteriorated. To help mend matters, the NLP research community surveyed its members in the summer of 2022 on “30 Potentially Controversial Statements,” including “Linguistic structure is necessary,” “Scaling will solve virtually any important problem,” and “AI will likely cause revolutionary societal change soon.”
· Sam Bowman: The industry groups that were doing early work on scaling were not closely connected with the academic NLP researchers. They were seen as outsiders, and that created this divide in understanding and perception between the two groups, because they weren’t talking to each other much.
· Liam Duggan: They gave out a big survey at ACL (Association for Computational Linguistics, a top conference in the field) that year. It was my first time at the conference, and I was so excited because I was meeting all these big names. I got this survey on my phone, and I was like, These questions just seem wild.
· Julian Michael: The field was already in a crisis, and this survey made us realize it more profoundly.
· Liam Duggan: You could see the fracturing of the field, different camps forming. The linguistics camp that was quite distrustful of the pure LLM technology, and then some people in the middle, and then people who were very much in the universal artificial intelligence via scaling camp, which seemed a little extreme to me. I didn’t take it that seriously until ChatGPT came out.
ChatGPT’s “Planetary” Impact (November 2022 - 2023)
On Nov. 30, 2022, OpenAI launched its experimental chatbot, ChatGPT, and it hit the NLP field like an asteroid.
· Izzy Beltagy (Principal Research Scientist, Allen Institute for AI; Chief Scientist and Co-founder, SpiffyAI): Within a day, a lot of questions that researchers were working on just became moot.
· Christopher Callison-Burch (Professor of Computer and Information Science, University of Pennsylvania): I did not anticipate it coming, and I don’t think anyone could. But I was psychologically prepared for it because I had the GPT-3 experience.
· R. Thomas McCoy (Assistant Professor, Department of Linguistics, Yale University): It’s relatively common for one particular research project to get scooped or superseded by someone else’s similar work, but ChatGPT didn’t scoop a specific project; it made a whole category of NLP research no longer interesting or practical. For people in academia, a lot of cutting-edge NLP research directions became either not interesting anymore, or not practical anymore.
· Sam Bowman (Technically Inclined Person, Anthropic): It felt like the whole field totally reshuffled.
· Izzy Beltagy: I really felt the panic and confusion during EMNLP (Empirical Methods in Natural Language Processing, another top conference), which was in December, just one week after ChatGPT came out. Everyone was in shock. Some people were like, Is this going to be the last NLP conference ever? At lunch, at cocktail hour, in the hallway conversations, everyone was asking, What can we do research on anymore?
· Naznin Rajani (Founder and CEO, Collinear AI): I had just given a keynote at EMNLP. A couple of days later, my boss at Hugging Face, Tom Wolf, one of the co-founders, texted me saying, Hey, can you hop on a call sometime soon? He told me they had let go of some of their research team, and the remaining people were either doing pretraining or posttraining — which is basically either building foundation models or building instruction-following models like ChatGPT on top of them. And he was like, If you still want to be at Hugging Face, I suggest you pick one of these paths. It felt opposite to the culture at Hugging Face. Before that, you were pretty much free to do whatever research you wanted. That shift was really uncomfortable.
ChatGPT’s arrival also brought an alarming reality from the ground up — something one leading NLP expert experienced firsthand while teaching an undergraduate class in the weeks after ChatGPT launched.
· Christiana Fellbaum (Lecturer with the rank of Professor of Linguistics and Computer Science, Princeton University): I had just started the new semester. Before class, a student I didn’t know yet came to me and showed me a paper with my name on it that sounded familiar and said, I really wanted to take your class, and I looked at your work, and I found this paper, and I have some questions about it, and can you answer them? Of course, I said, Sure, I was delighted someone looked at my work. I was looking at the paper, trying to remember what it was about, and then he burst out laughing. I said, What’s so funny? And he said, This paper was written by ChatGPT. I told it to ‘write a paper in the style of Christiana Fellbaum,’ and that’s what I got. Class was starting in 10 minutes, so I didn’t read it word for word, but it sounded very much like something I would have written. I was completely fooled. I walked into class, and my mind was just, What am I going to do?
Over the next year, PhD students, too, had to contend with a new reality. ChatGPT threatened their research projects and potentially their careers. Different people handled it with varying degrees of success.
· Christopher Callison-Burch: It’s easier if you have tenure. But for junior academics, the crisis was more immediate and intense. There were PhD students who formed support groups.
· Liam Duggan (Fourth-year PhD student, University of Pennsylvania): We were just crying to each other, comforting each other. A lot of my older classmates, who had started their dissertations, had to totally pivot their research. So many old research ideas just didn’t feel academically interesting anymore; now, if you just applied a language model, it solved it. Weirdly, I don’t know anyone who quit, but I definitely know people who slacked off or became very negative and cynical.
· Ray Mooney: One of my graduate students actually considered dropping out, and they felt like maybe the real action is in industry, and academia is dead. I thought, They might be right. I was glad they stayed.
· Julie Caligni (Second-year PhD student, Stanford University, Department of Computer Science): I started my PhD in 2023, and it felt very uncertain. I had no idea what research direction to go in, but everyone was in the same boat. I just tried to embrace the fact and shore up my machine learning fundamentals. Focusing only on LLMs, which could be a passing trend, would be silly.
Meanwhile, from Seattle to South Africa, NLP researchers were getting a tidal wave of attention, not all of it positive.
· Vukosi Marivate (ABSA UP Chair of Data Science, University of Pretoria; Co-founder, Masakhane): In 2023, I don’t know how many talks I gave about large language models. Before that, for years, I had been fighting to get people to pay attention to this field and saying, There’s interesting stuff here. And all of a sudden it was like, Come tell us what this is about.
· Sam Bowman: The field went from being relatively obscure to being very hot, to the point where I would have lunch with people who had met both the Pope and the President in the same month.
· Emily M. Bender (Professor, Department of Linguistics, University of Washington; President, Association for Computational Linguistics 2024): From January to June, I counted, there were only five workdays where I was not contacted by the media. It was relentless.
· Ellie Pavlick: Before ChatGPT, I think I had dealt with journalists maybe once or twice. After ChatGPT, I was on 60 Minutes. My job completely changed.
· Christopher Callison-Burch: I felt like my job shifted from being purely an academic job, for a small set of grad students and other researchers in the field, to suddenly having an important responsibility of science communication. I was also invited to testify in Congress.
· Liam Duggan: As a second-year PhD student, I was suddenly asked for my opinion in interviews. At first, it was cool, like, I’m an expert now! But then it became not fun, and it was stressful, like, What do you think the field is going to look like in the future? I don’t know. Why are you asking me? But I would just confidently answer. It’s just ridiculous. There’s thousands of papers, and everyone has a hot take, but most people don’t know what’s going on.
· Sam Bowman: On the one hand, the field got more attention than ever before, and a lot of brilliant people from other fields started paying attention to NLP. On the other hand, there was a ton of noise. People were talking about it constantly, and lots of the takes were just off the cuff and didn’t make sense. That was exciting and frustrating at the same time.
· Naznin Rajani: It was a crazy roller-coaster year.
In December 2023, a year after ChatGPT was released, the annual EMNLP conference convened again in Singapore.
· Liam Duggan: It was much hotter than it used to be, and there was a huge flood of work on arXiv (a preprint platform). Walking through the convention hall, it was all conversations about prompt engineering and evaluating language models. It felt very different than before, almost like there were more people there than there were good research ideas. It felt less like an NLP conference and more like an AI conference.
In the midst of change (2024 - 2025): Large language model research, funding, and the move towards AI
For the NLP field, the effects of large language models are clearly apparent, and people have different views on what those effects mean.
· R. Thomas McCoy: When you’re studying the capacities of an AI system, you should study a system whose training data you have access to. That is not the mainstream practice in the field right now. From that perspective, we’re more like “LLM researchers” than rigorous scientists.
· Ellie Pavlick: I totally cop to doing this. I give talks now, and I say, Right now, we’re all studying language models. I know this looks myopic. But from the perspective of a long-term research program, it feels necessary. To me, you can’t really understand language unless you figure out what LLMs are doing.
· Kalika Bali (Principal Researcher, Microsoft Research India): Whenever there is a technology change that is driven by the West, there is this philosophical debate. But in most parts of the Global South, we are more interested in: How do we make this technology work for us now? As a small example, when ChatGPT came out, a lot of initial thoughts in India were about having the generative models work in English, and then you pass it through a translation system into other languages. But machine translation is very literal, and if you have a math problem where John and Mary are splitting a key lime pie, when that gets translated into Hindi, most people in India don’t know what a key lime pie is. Unless the model itself understands these things, how is it going to translate it into a cultural equivalent? That has made me very interested in thinking about how to solve these problems.
· Izzy Beltagy (Principal Research Scientist, Allen Institute for AI; Chief Scientist and Co-founder, SpiffyAI): You realize that in order to move the field forward, you have to build these large and expensive artifacts. It’s like the Large Hadron Collider. Without these artifacts, it’s hard to make progress in experimental physics. I’m lucky that I work at the Allen Institute for AI (Ai2), which has more resources than most academic labs. ChatGPT made it clear how big the gap was between OpenAI and everyone else. So immediately after that, we started thinking about how we could build something like this from scratch, which we did. In 2024, Ai2 released OLMo, which offered a fully open-source alternative to the increasingly crowded field of industry language models. Meanwhile, some researchers who continued to study these commercial language models (which grew in size, capabilities, and sophistication after the AI hype wave kicked off by ChatGPT) started encountering new resistance.
· Yejin Choi (Professor of Computer Science, Stanford University; MacArthur Fellow 2022): In late 2023, I published a paper showing a bizarre behavior in the latest GPT models for multiplication: When numbers got to three or four digits, their performance would drop off a cliff. That paper created a lot of controversy. People who don’t do any empirical work were questioning me: Did you do your experiment right? That had never happened before. Their reaction was emotional. I actually admire these people, but their reaction was very surprising to me, that the model was so important to them, as if I were criticizing their baby, and that really opened my eyes. Unjustified hype is not good for science. I think it’s super important to rigorously study the basic capabilities and limitations of LLMs, and that has been my primary research direction in 2024. But I find myself in an awkward position of constantly pointing out what the model cannot do and feeling like I’m being a contrarian. While I think it’s important, I also don’t want to do just that. So I’m also thinking about many other different research questions these days.
· Tal Linzen: Sometimes we pretend we’re having a scientific conversation, but some of the participants in the conversation work at a company that’s worth $50 billion, and that makes the conversation complicated.
The research boom, the flood of money, and the excess hype blurred the already indistinct lines between NLP and AI altogether. Researchers had to navigate not only their own new opportunities and incentives but also the direction of the field as a whole.
· Naznin Rajani: LLMs opened up a lot of doors for me that didn’t exist before. I was one of the earliest people to get the data and replicate ChatGPT in an open-source environment, and I essentially wrote the playbook for that, which was really nice. That was the reason my startup got a decent seed round.
· R. Thomas McCoy: Any professor whose job touches on AI at all gets pigeonholed as an expert in AI — typecast in a sense. I’m happy to work on AI because, with my skill set, it’s one of the most impactful research directions. But what really makes me happy is going into the nitty-gritty details of grammar and human cognition that are interesting. That can be connected to AI developments, but that’s a very long road.
· Julie Caligni: It’s really a question of semantics, right? For me personally, I feel like I’m in NLP, computational linguistics, and AI at the same time. I know there are specific communities for each of these, but there are also a lot of people who are crossing between them.
· Julian Michael: If NLP did not make changes, it was going to become obsolete. I think to some extent, it did. That makes me sad to say. I am now an AI calibration researcher.
· Anna Rogers: I’m not worried. Primarily because I don’t think we have solved natural language processing. If you thought, This is it, language processing is solved, then you should be depressed, but I don’t think that’s the case.
· Christopher Potts: It should feel like a moment of immense consequence for linguistics and NLP. The risks and opportunities are so large. Maybe this is a moment of awakening for the field where people realize they now have a lot of influence. You can no longer pretend you’re just a humble research-for-research’s-sake field of science or engineering — because now there’s global money pouring into this field, and all the big companies want to influence this field, and language models are being used everywhere. If you have that much success, you have to take on board the intense debates that come along with it. What else could it be?
Are Large Language Models a Paradigm Shift?
Not surprisingly, people had differing views.
· Tal Linzen: I would have never predicted, five or seven or 10 years ago, that you would just type one instruction into a language model, and it would do what you asked it to do and complete the sentence. I don’t think anyone at the time predicted that would be the paradigm. Now you just have an interface, and it can do all sorts of tasks.
· Anna Rogers: As a linguist, I don’t think so. Starting with the word embedding era around 2013, the central idea of all the work was transfer learning — learning something from a lot of text data and hoping that knowledge would be useful for some other task. Over the years, the popularity of models, their architectures, and public perception have changed, but the core principle hasn’t.
· Jeff Mitchell: I think the corporate interest changed the game in the field.
· Ellie Pavlick: I think the media presence had a huge impact. Scientists in our field realized that success can mean visibility outside of NLP, and the audience suddenly changed. Papers on arxiv.org now are often titled to attract the attention of journalists or Silicon Valley enthusiasts rather than professors. That’s a huge shift.
· Vukosi Marivate: I think in some ways, the barrier to entry in the field has both gone down and gone up. Down, because there’s a lot of stuff we don’t know about how these systems actually work, so a lot of research is just about testing and exploring them as much as possible. In that case, you don’t need to know the neural network architecture inside and out. But at the same time, the barrier has gone up, because if you want to do deep research into these architectures, from a compute resource perspective, you have to be in a very resource-rich environment.
· Emily M. Bender: I see a tremendous shift toward people looking toward chatbots or related text-generating machines as end-to-end solutions. But I think that’s a dead end.
· Christiana Fellbaum: I would even call it a tremendous shift or a shock, that these large language models have become so powerful that we have to think, What is the human in the loop? This is a paradigm shift: a technological shift, the way these models were trained and what they learned. And, of course, the implications for education, like the experience I had in my class. These are the questions that keep me up at night.
· R. Thomas McCoy: In linguistics, there were a lot of debates that were historically philosophical that now suddenly feel like they can be tested empirically. That’s definitely a big paradigm shift. But from another perspective, 10 years ago the research paradigm in this field was: People create some data set, they put a neural network on it, and they see what happens. That paradigm is still in place; it’s just that the data sets and the neural networks are much bigger.
· Christopher Potts: It should perhaps feel like this is just how science goes, that a mark of a paradigm shift is that questions that used to feel important are no longer discussed. Over the last five years, that really seems to have happened. I used to specialize in sentiment classification, like, Give me a sentence, and I can tell you whether it’s positive or negative. But now the whole field is focused on natural language generation, and the things we used to think were core problems feel tangential to that. I guess these are things that will become stale very quickly. Maybe in 2030, we’ll look back and think this was nothing compared to what happened in 2029.
So, is the large language model a “paradigm shift”?