Banner Banner

Expert Opinions on the Recent Success of DeepSeek

BIFOLD and TU Berlin: Expert Opinions on the Recent Success of DeepSeek.

Experts from BIFOLD and TU Berlin discuss the differences between open-source applications like DeepSeek and other LLMs, as well as Europe's role in artificial intelligence (AI) development.

The experts:
Dr. Vera Schmitt (research group leader) and Dr. Nils Feldhus (postdoctoral researcher) from the XplaiNLP Group of the Quality and Usability Lab at TU Berlin conduct research into high-risk AI applications and are developing AI-supported systems to support intelligent decision-making. Their research explores top-performing, transparent, and explainable AI solutions for applications such as disinformation detection and medical data analysis. In the field of natural language processing, the group focuses on key areas such as explainable AI, the robustness of large language models (LLMs), argumentation structure modeling, and human-machine interaction.

Dr. Oliver Eberle is a postdoctoral researcher in the Machine Learning Group at the Berlin Institute for the Foundations of Learning and Data (BIFOLD) at TU Berlin. His research places a particular emphasis on explainable artificial intelligence, natural language processing, and their applications in different scientific disciplines, such as the digital humanities (for example in computer-supported text processing) and cognitive science. He specializes in model interpretability and is developing methods to enhance our understanding of the underlying mechanisms of large language models.

1. How do the concepts behind DeepSeek and ChatGPT differ?
Schmitt and Feldhus: DeepSeek prioritizes open source transparency and efficiency, while ChatGPT relies on massive computer power and scaling. The former allows for customization and lower costs; the latter offers optimized performance, but is a proprietary service and resource-intensive. That said, DeepSeek is not 100% open source, as not all of the training data that went into the model has been disclosed. However, the fact that the model parameters are available and DeepSeek's communication is much more open allows for open source community initiatives such as Open-R1 to reproduce the model with far fewer resources than the huge and expensive infrastructures behind OpenAI, Microsoft, and others.

Eberle: DeepSeek is integrated into the "Hugging Face" community, a platform that provides hundreds of open source models and source code and that plays an important role in the availability, accessibility, and transparency of LLMs in both research and industry. DeepSeek has used other open source models (such as the Llama model from Meta) as a basis in the past (for example in "DeepSeek-R1-Distill-Llama-70b"). This saves on computing time because model distillation is much less computationally intensive than training a new model from scratch. DeepSeek publishes detailed descriptions and technical reports of its models and also reports on negative outcomes – a valuable contribution to the open source community, as this will help to improve open LLM systems in the future. In comparison, ChatGPT is a proprietary system and only the interface is accessible; this means that we do not know the exact specification of the model or the trained parameters in detail, nor do we have open access to them. As far as I am aware, neither DeepSeek nor ChatGPT publish the code for training their models or specific data sets.

 

2. Do you work with other open source large language models (LLMs)?
Schmitt and Feldhus: We work with a number of different LLMs like LLaMa, Mistral, Qwen, Bloom, Vicuna, and we've started experimenting with DeepSeek. We use these open source models selectively in various application areas. One of our key areas of focus is disinformation detection, which involves using LLMs to analyze narratives in digital media, uncover misinformation, and provide explanations for identified misinformation. We also use LLMs to anonymize and process medical data in joint projects with Charité.

Eberle: We use various models, including Llama, Mistral, Gemma, Qwen, and Mamba, and focus on their interpretability, developing methods to better understand their underlying mechanisms.

 

3. How does the open source element of large language models support your research in concrete terms? Will DeepSeek help advance it?
Schmitt and Feldhus: Using open source LLMs allows us to customize models specifically for our research. Open access allows us to ensure transparency and make specific architectural adjustments. It also allows us to evaluate models, develop them further, and integrate them more effectively into human-AI processes. DeepSeek could advance our research as it offers more efficient model architectures and new training approaches that we can reproduce on computers here at the University. What is particularly exciting are the potential improvements in resource efficiency, as well as in multilingual processing and adaptability for specific domains, which could complement and optimize our existing methods.

Eberle: DeepSeek joins the ranks of other open source model families (Llama, Mistral, Qwen, and so on) and allows us to draw conclusions about a broader set of LLMs. The structure of these models is largely comparable and differs mainly in the training approach and the data sets used. DeepSeek now gives us access to a model with state-of-the-art reasoning capabilities, which could lead to new insights into how LLMs solve complex tasks.

 

4. Why are chip manufacturers such as NVIDIA linked to the success/failure of AI?
Schmitt and Feldhus: The success or failure of AI is closely linked to chip manufacturers such as NVIDIA because modern AI models require enormous computing power, which is mainly provided by specialized GPUs (graphics processing units) and AI accelerators. NVIDIA is an industry leader with powerful chips like the H100 and A100 series that are specifically designed to train artificial intelligence and deliver results quickly. NVIDIA also offers the right software to carry out these calculations efficiently with its CUDA program. Of course, the demand for these chips rises sharply when AI technologies prosper – companies, research institutions, and cloud providers are investing huge amounts in GPU clusters. This pushes up NVIDIA's revenue and stock price. And vice versa: A decline in AI demand or technological shifts towards alternative architectures, as we are now seeing with DeepSeek R1/V3, would reduce the dependency on NVIDIA and, in some ways, negatively impact their business. NVIDIA's dual monopoly – of hardware and software – makes it difficult to decouple AI successes from the company. As long as DeepSeek also uses GPUs from NVIDIA or uses CUDA, it will be impossible to imagine the AI discourse without the industry giant. In other words: Hardware development and the success of AI are symbiotic – advances in AI drive the chip industry, while more powerful chips enable new AI models.

 

5. Was the community anticipating the magnitude of the impact the new Chinese LLM would have?
Schmitt and Feldhus: Yes, it was to be expected that China would invest more in the development of high-performance LLMs. The advances made by DeepSeek and other Chinese models didn't emerge from nowhere – there have already been huge investments and strategic initiatives in the AI sector in recent years. That's why DeepSeek didn't come as a huge surprise, rather a natural development in creating more resource-efficient LLMs. DeepSeek also relies heavily on existing open source model families like LLaMA, Mistral, and Qwen, thus letting us analyze a wider range of LLMs. Qwen in particular, another product of Chinese research, already made it very clear to us that China is a key player in the AI arena and should not be underestimated. What is remarkable about DeepSeek R1 is that the reasoning has improved significantly, giving us new insights into the ability of LLMs to solve complex tasks. This is of particular significance for more difficult tasks with a higher level of complexity, such as disinformation detection.

Eberle: DeepSeek is a familiar name, and its predecessor DeepSeek-V2 proved to be quite successful, for example in generating code. That's why I'm a bit surprised by the strong reaction the market and media have had. DeepSeek-V3 is obviously an impressive technical achievement and can help bring open source models on par with the capabilities of proprietary models such as ChatGPT. However, DeepSeek should be seen in the context of the successful development of other open source LLMs.

 

6. What is Europe's position in this area?
Schmitt and Feldhus: As things currently stand, the EU is primarily focusing on regulating AI, and not enough resources are being pooled to even remotely counter the advances from the US and China. Especially when we consider investment plans such as the Stargate Project, the EU really cannot hold its own at the moment. Neither can the EU remain competitive, as promising AI start-ups are all too often acquired by US companies and/or relocate their headquarters to the US. Regulations and taxes significantly impact the innovative strength of NLP companies (natural language processing) within the EU. We can see from the innovative spirit of small European labs such as Mistral and Flux (image generation) that, despite all this, the European research community still wants to participate in global AI development and has quite a lot of influence. With more investment, these ambitions could be made a reality and transform Europe into a serious player in AI.

Eberle: Europe and Germany are focusing on the development of trustworthy and transparent AI methods. I also have the impression that Europe is specializing in specific applications of LLMs, for example basic LLM models for applications in medicine (e.g. aignostics' RudolfV model for recognizing pathology data), law (legal LLMs such as LEGAL-BERT for editing and creating legal texts), and AI methods for quantum chemistry.

 

7. Of course, DeepSeek is subject to Chinese censorship. To what extent do such restrictions influence the capacities of large language models?
Eberle: The restrictions are usually imposed after the model has been trained, so they can be seen as a filter that suppresses unwanted output. So I would not assume that systems with no restricted topics are better across the board. However, if large amounts of data are filtered before training, this could affect the ability of these models to generalize. There is an important difference here between not giving the model any data on sensitive topics and ordering the model not to say anything about them.

Authors: Barbara Halstenberg and Wolfgang Richter
Source: TU Berlin