DeepSeek, China's AI model: News & Discussion

Beijingwalker

VIP Member
Joined
Nov 4, 2011
Messages
92,016
Reaction score
112,960
Reputation
2,147.0
Country of Origin
Country of Residence

China’s cheap, open AI model DeepSeek thrills scientists

DeepSeek-R1 performs reasoning tasks at the same level as OpenAI’s o1 — and is open for researchers to examine.

By Elizabeth Gibney
23 January 2025

DeepSeek website seen on an iPhone screen.

Chinese firm DeepSeek debuted a version of its large language model last year.Credit: Koshiro K/Alamy

A Chinese-built large language model called DeepSeek-R1 is thrilling scientists as an affordable and open rival to ‘reasoning’ models such as OpenAI’s o1.

These models generate responses step-by-step, in a process analogous to human reasoning. This makes them more adept than earlier language models at solving scientific problems and could make them useful in research. Initial tests of R1, released on 20 January, show that its performance on certain tasks in chemistry, mathematics and coding is on par with that of o1 — which wowed researchers when it was released by OpenAI in September.

“This is wild and totally unexpected,” Elvis Saravia, an AI researcher and co-founder of the UK-based AI consulting firm DAIR.AI, wrote on X.

R1 stands out for another reason. DeepSeek, the start-up in Hangzhou that built the model, has released it as ‘open-weight’, meaning that researchers can study and build on the algorithm. Published under an MIT licence, the model can be freely reused but is not considered fully open source, because its training data has not been made available.

“The openness of DeepSeek is quite remarkable,” says Mario Krenn, leader of the Artificial Scientist Lab at the Max Planck Institute for the Science of Light in Erlangen, Germany. By comparison, o1 and other models built by OpenAI in San Francisco, California, including its latest effort o3 are “essentially black boxes”, he says.

DeepSeek hasn’t released the full cost of training R1, but it is charging users around one-thirtieth of what o1 costs to run. The firm has also created mini ‘distilled’ versions of R1 to allow researchers with limited computing power to play with the model. An “experiment that cost more than £300 with o1, cost less than $10 with R1,” says Krenn. “This is a dramatic difference which will certainly play a role its future adoption.”

Challenge models​

R1 is the part of a boom in Chinese large language models (LLMs). Spun out of a hedge fund, DeepSeek emerged from relative obscurity last month when it released a chatbot called V3, which outperformed major rivals, despite being built on a shoestring budget. Experts estimate that it cost around $6 million to rent the hardware needed to train the model, compared with upwards of $60 million for Meta’s Llama 3.1 405B, which used 11 times the computing resources.

Part of the buzz around DeepSeek is that it has succeeded in making R1 despite US export controls that limit Chinese firms’ access to the best computer chips designed for AI processing. “The fact that it comes out of China shows that being efficient with your resources matters more than compute scale alone,” says François Chollet, an AI researcher in Seattle, Washington.

DeepSeek’s progress suggests that “the perceived lead [the] US once had has narrowed significantly,” wrote Alvin Wang Graylin, a technology expert in Bellevue, Washington, who works at the Taiwan-based immersive technology firm HTC, on X. “The two countries need to pursue a collaborative approach to building advanced AI vs continuing on the current no-win arms race approach.”

Chain of thought​

LLMs train on billions of samples of text, snipping them into word-parts called ‘tokens’ and learning patterns in the data. These associations allow the model to predict subsequent tokens in a sentence. But LLMs are prone to inventing facts, a phenomenon called ‘hallucination’, and often struggle to reason through problems.
 
To view this content we will need your consent to set third party cookies.
For more detailed information, see our cookies page.
 

DeepSeek AI might be smarter than OpenAI's smartest AI, and you can try it out now

Important: DeepSeek R1 is open source.

By Stan Schroeder on January 22, 2025

A screenshot showing DeepSeek AI's search tool.

This thing is smart - and cheap. Credit: DeepSeek

There's a new AI player in town, and you might want to pay attention to this one.

On Monday, Chinese artificial intelligence company DeepSeek launched a new, open-source large language model called DeepSeek R1.

According to DeepSeek, R1 wins over other popular LLMs (large language models) such as OpenAI in several important benchmarks, and it's especially good with mathematical, coding, and reasoning tasks.

To view this content we will need your consent to set third party cookies.
For more detailed information, see our cookies page.


DeepSeek R1 is actually a refinement of DeepSeek R1 Zero, which is an LLM that was trained without a conventionally used method called supervised fine-tuning. This made it very capable in certain tasks, but as DeepSeek itself puts it, Zero had "poor readability and language mixing." Enter R1, which fixes these issues by incorporating "multi-stage training and cold-start data" before it was trained with reinforcement learning.

Arcane technical language aside (the details are online if you're interested), there are several key things you should know about DeepSeek R1. First, it's open source, meaning it's up for scrutiny from experts, which should alleviate concerns about privacy and security. Second, it's free to use as a web app, while API access is very cheap ($0.14 for one million input tokens, compared to OpenAI's $7.5 for its most powerful reasoning model, o1).

Most importantly, this thing is very, very capable. To test it out, I immediately threw it into deep waters, asking it to code a fairly complex web app which needed to parse publicly available data, and create a dynamic website with travel and weather information for tourists. Amazingly, DeepSeek produced completely acceptable HTML code right away, and was able to further refine the site based on my input while improving and optimizing the code on its own along the way.


DeepSeek AI

I'll do all of that...tomorrow. Credit: Stan Schroeder / Mashable / DeepSeek

I also asked it to improve my chess skills in five minutes, to which it replied with a number of neatly organized and very useful tips (my chess skills did not improve, but only because I was too lazy to actually go through with DeepSeek's suggestions).

I then asked DeepSeek to prove how smart it is in exactly three sentences. Bad move by me, as I, the human, am not nearly smart enough to verify or even fully understand any of the three sentences. Notice, in the screenshot below, that you can see DeepSeek's "thought process" as it figures out the answer, which is perhaps even more fascinating than the answer itself.

DeepSeek AI

We get it, you're smart. Credit: Stan Schroeder / Mashable / DeepSeek

It's impressive to use. But as ZDnet noted, in the background of all this are training costs which are orders of magnitude lower than for some competing models, as well as chips which aren't as powerful as the chips that are on disposal for U.S. AI companies. DeepSeek thus shows that extremely clever AI with reasoning ability doesn't have to be extremely expensive to train — or to use.

 

Scale AI CEO says China has quickly caught the U.S. with the DeepSeek open-source model​

PUBLISHED THU, JAN 23 202510:36 AM ESTUPDATED THU, JAN 23 20251:03 PM EST
Hayden Field@HAYDENFIELD

The U.S. may have led China in the artificial intelligence race for the past decade, according to Alexandr Wang, CEO of Scale AI, but on Christmas Day, everything changed.

Wang, whose company provides training data to key AI players including OpenAI, Google and Meta, said Thursday at the World Economic Forum in Davos, Switzerland, that DeepSeek, the leading Chinese AI lab, released an “earth-shattering model” on Christmas Day, then followed it up with a powerful reasoning-focused AI model, DeepSeek-R1, which competes with OpenAI’s recently released o1 model.

“What we’ve found is that DeepSeek ... is the top performing, or roughly on par with the best American models,” Wang said.

In an interview with CNBC, Wang described the artificial intelligence race between the U.S. and China as an “AI war,” adding that he believes China has significantly more Nvidia H100 GPUs — AI chips that are widely used to build leading powerful AI models — than people may think, especially considering U.S. export controls.

Wang also said he believes the AI sector will reach a trillion dollars, on par with estimates that the generative AI market is poised to top $1 trillion in revenue within a decade.

“The United States is going to need a huge amount of computational capacity, a huge amount of infrastructure,” Wang said, later adding, “We need to unleash U.S. energy to enable this AI boom.”

Earlier this week, President Donald Trump announced a joint venture with OpenAI, Oracle and SoftBank to invest billions of dollars in U.S. AI infrastructure. The project, Stargate, was unveiled at the White House by Trump, SoftBank CEO Masayoshi Son, Oracle co-founder Larry Ellison and OpenAI CEO Sam Altman. Key initial technology partners will include Microsoft, Nvidia and Oracle, as well as semiconductor company Arm. They said they would invest $100 billion to start and up to $500 billion over the next four years.




In the interview Thursday, Wang said he believes that it’ll take two to four years to reach artificial general intelligence, or AGI, a widely cited but vaguely defined benchmark used in the AI sector to denote a branch of AI pursuing technology that equals or surpasses human intellect on a wide range of tasks. AGI is a hotly debated topic, with some leaders saying we’re close to attaining it and some saying it’s not possible at all. Wang said his own definition of AGI is “powerful AI systems that are able to use a computer just like you or I could ... and basically be a remote worker in the most capable way.”

Anthropic, the Amazon-backed AI startup founded by ex-OpenAI research executives, ramped up its technology development throughout the past year, and in October, the startup said that its AI agents were able to use computers like humans can to complete complex tasks. Anthropic’s Computer Use capability allows its technology to interpret what’s on a computer screen, select buttons, enter text, navigate websites and execute tasks through any software and real-time internet browsing, the startup said.

The tool can “use computers in basically the same way that we do,” Jared Kaplan, Anthropic’s chief science officer, told CNBC in an interview at the time. He said it can do tasks with “tens or even hundreds of steps.”

OpenAI reportedly plans to introduce a similar feature soon.

When asked which U.S. artificial intelligence startups are leading the AI race right now, Wang said that models each have their own strengths — for instance, OpenAI’s models are great at reasoning, while Anthropic’s are great at coding.

“The space is becoming more competitive, not less competitive,” he said.

 

Scale AI CEO says China has quickly caught the U.S. with the DeepSeek open-source model​

PUBLISHED THU, JAN 23 202510:36 AM ESTUPDATED THU, JAN 23 20251:03 PM EST
Hayden Field@HAYDENFIELD

The U.S. may have led China in the artificial intelligence race for the past decade, according to Alexandr Wang, CEO of Scale AI, but on Christmas Day, everything changed.

Wang, whose company provides training data to key AI players including OpenAI, Google and Meta, said Thursday at the World Economic Forum in Davos, Switzerland, that DeepSeek, the leading Chinese AI lab, released an “earth-shattering model” on Christmas Day, then followed it up with a powerful reasoning-focused AI model, DeepSeek-R1, which competes with OpenAI’s recently released o1 model.

“What we’ve found is that DeepSeek ... is the top performing, or roughly on par with the best American models,” Wang said.

In an interview with CNBC, Wang described the artificial intelligence race between the U.S. and China as an “AI war,” adding that he believes China has significantly more Nvidia H100 GPUs — AI chips that are widely used to build leading powerful AI models — than people may think, especially considering U.S. export controls.

Wang also said he believes the AI sector will reach a trillion dollars, on par with estimates that the generative AI market is poised to top $1 trillion in revenue within a decade.

“The United States is going to need a huge amount of computational capacity, a huge amount of infrastructure,” Wang said, later adding, “We need to unleash U.S. energy to enable this AI boom.”

Earlier this week, President Donald Trump announced a joint venture with OpenAI, Oracle and SoftBank to invest billions of dollars in U.S. AI infrastructure. The project, Stargate, was unveiled at the White House by Trump, SoftBank CEO Masayoshi Son, Oracle co-founder Larry Ellison and OpenAI CEO Sam Altman. Key initial technology partners will include Microsoft, Nvidia and Oracle, as well as semiconductor company Arm. They said they would invest $100 billion to start and up to $500 billion over the next four years.




In the interview Thursday, Wang said he believes that it’ll take two to four years to reach artificial general intelligence, or AGI, a widely cited but vaguely defined benchmark used in the AI sector to denote a branch of AI pursuing technology that equals or surpasses human intellect on a wide range of tasks. AGI is a hotly debated topic, with some leaders saying we’re close to attaining it and some saying it’s not possible at all. Wang said his own definition of AGI is “powerful AI systems that are able to use a computer just like you or I could ... and basically be a remote worker in the most capable way.”

Anthropic, the Amazon-backed AI startup founded by ex-OpenAI research executives, ramped up its technology development throughout the past year, and in October, the startup said that its AI agents were able to use computers like humans can to complete complex tasks. Anthropic’s Computer Use capability allows its technology to interpret what’s on a computer screen, select buttons, enter text, navigate websites and execute tasks through any software and real-time internet browsing, the startup said.

The tool can “use computers in basically the same way that we do,” Jared Kaplan, Anthropic’s chief science officer, told CNBC in an interview at the time. He said it can do tasks with “tens or even hundreds of steps.”

OpenAI reportedly plans to introduce a similar feature soon.

When asked which U.S. artificial intelligence startups are leading the AI race right now, Wang said that models each have their own strengths — for instance, OpenAI’s models are great at reasoning, while Anthropic’s are great at coding.

“The space is becoming more competitive, not less competitive,” he said.

Alexandr Wang. A lot of US AI top scientists are ethnic Chinese.
 
Ah, a Chinese AI platform inviting the whole world to hop on and share their data—what could possibly go wrong? It's like asking a fox to guard the henhouse, but hey, it’s global now, so everyone gets a front-row seat! Privacy warriors must be sweating bullets, trying to figure out if their GDPR shields can hold up against the Great Firewall’s data vacuum cleaner.
 
I gathered the courage to visit the site, hesitating like I was about to defuse a bomb. But the moment it asked me to sign in with my email and personal data, I felt like my balls fell off and rolled under the desk—probably safer there than in their database.
 

Tech CEOs sound alarm on ByteDance, DeepSeek breakthroughs​

To view this content we will need your consent to set third party cookies.
For more detailed information, see our cookies page.
 
Chinese tech always serves world and prevents US companies from ripping off the humanity.
 
I gathered the courage to visit the site, hesitating like I was about to defuse a bomb. But the moment it asked me to sign in with my email and personal data, I felt like my balls fell off and rolled under the desk—probably safer there than in their database.
I believe you might be living under a rock vis a vis US tech companies and their highly intrusive modus operandi.
 
@Musings @Waz

There is already a thread on this, lets merge them together...

 

Users who are viewing this thread

Country Watch Latest

Latest Posts

Back
Top