DeepSeek, China's AI model: News & Discussion

nahtanbob · Jun 29, 2025

lightning f57 said:
Hows the US more advance in semi conductor manufacturing when they use Dutch EUV machines. China can make advance chips as well but the sanctions on advance EUV machines is hampering them. If its a level playing field Intel will start to see a rapid decline.

ASML has 20% of its workforce in USA

冖_冖 · Jul 15, 2025

Kimi K2 and when "DeepSeek Moments" become normal

One "DeepSeek Moment" wasn't enough for us to wake up, hopefully we don't need a third.

www.interconnects.ai

Alibaba-backed startup Moonshot released late Friday night its Kimi K2 model as a low-cost, open source large language model, with a focus on coding capabilities.

Jul 14, 2025

Kimi K2, described as an "Open-Source Agentic Model" is a sparse mixture of experts (MoE) model2 with 1T total parameters (~1.5x DeepSeek V3/R1's 671B) and 32B active parameters (similar to DeepSeek V3/R1's 37B). It is a "non-thinking" model with leading performance numbers in coding and related agentic tasks (earning it many comparisons to Claude 3.5 Sonnet), which means it doesn't generate a long reasoning chain before answering, but it was still trained extensively with reinforcement learning. It clearly outperforms DeepSeek V3 on a variety of benchmarks, including SWE-Bench, LiveCodeBench, AIME, or GPQA, and comes with a base model released as well. It is the new best-available open model by a clear margin.

These facts with the points above all have useful parallels for what comes next:

Controlling who can train cutting edge models is extremely difficult. More organizations will join this list of OpenAI, Anthropic, Google, Meta, xAI, Qwen, DeepSeek, Moonshot AI, etc. Where there is a concentration of talent and sufficient compute, excellent models are very possible. This is easier to do somewhere such as China or Europe where there is existing talent, but is not restricted to these localities.
Kimi K2 was trained on 15.5T tokens and has a very similar number of active parameters as DeepSeek V3/R1, which was trained on 14.8T tokens. Better models are being trained without substantial increases in compute — these are referred to as a mix of "algorithmic gains" or "efficiency gains" in training. Compute restrictions will certainly slow this pace of progress on Chinese companies, but they are clearly not a binary on/off bottleneck on training.
The gap between the leading open models from the Western research labs versus their Chinese counterparts is only increasing in magnitude. The best open model from an American company is, maybe, Llama-4-Maverick? Three Chinese organizations have released obviously more useful models with more permissive licenses: DeepSeek, Moonshot AI, and Qwen. A few others such as Tencent, Minimax, Z.ai/THUDM may have Llama-4 beat too but are a half step behind the leading Chinese models on some combination of license and performance.

This comes at the same time that new inference-heavy products are coming online that'll benefit from the potential of cheaper, lower margin hosting options on open models relative to API counterparts (which tend to have high profit margins).

Kimi K2 is set up for a much slower style "DeepSeek Moment" than the DeepSeek R1 model that came out in January of this year because it lacks two culturally salient factors:

DeepSeek R1 was revelatory because it was the first model to expose the reasoning trace to the users, causing massive adoption outside of the technical AI community, and
The broader public is already aware that training leading AI models is actually very low cost once the technical expertise is built up (recall the DeepSeek V3 $5M training cost number), i.e. the final training run is cheap, so there should be a smaller reaction to similar cheap training cost numbers in the Kimi K2 report coming soon.

Still, as more noise is created around the K2 release (Moonshot releases a technical report soon), this could evolve very rapidly. We've already seen quick experiments spin up slotting it into the Claude Code application (because Kimi's API is Claude-compatible) and K2 topping many nice "vibe tests" or creativity benchmarks. There are also tons of fun technical details that I don't have time to go into — from using a relatively unproven optimizer Muon3 and scaling up the self-rewarding LLM-as-a-judge pipeline in post-training. A fun tidbit to show how much this matters relative to the noisy Grok 4 release last week is that Kimi K2 has already surpassed Grok 4 in API usage on the popular OpenRouter platform.

Later in the day on the 11th, following the K2 release, OpenAI CEO Sam Altman shared the following message regarding OpenAI's forthcoming open model (which I previously shared more optimistic thoughts on here) :

we planned to launch our open-weight model next week.

we are delaying it; we need time to run additional safety tests and review high-risk areas. we are not yet sure how long it will take us.

while we trust the community will build great things with this model, once weights are out, they can’t be pulled back. this is new for us and we want to get it right.

sorry to be the bearer of bad news; we are working super hard!

Many attributed this as a reactive move by OpenAI to get out from the shadow of Kimi K2's wonderful release and another DeepSeek media cycle.

Even though someone at OpenAI shared with me that the rumor that Kimi caused the delay for their open model is very likely not true, this is what being on the back foot looks like. When you're on the back foot, narratives like this are impossible to control.

We need leaders at the closed AI laboratories in the U.S. to rethink some of the long-term dynamics they're battling with R&D adoption. We need to mobilize funding for great, open science projects in the U.S. and Europe. Until then, this is what losing looks like if you want The West to be the long-term foundation of AI research and development. Kimi K2 has shown us that one "DeepSeek Moment" wasn't enough for us to make the changes we need, and hopefully we don't need a third.

chinasun · Aug 13, 2025

China’s Lead in Open-Source AI Jolts Washington and Silicon Valley

China’s Lead in Open-Source AI Jolts Washington and Silicon Valley© Alexandra Citrin-Safadi/WSJ
China’s ambition to turn its open-source artificial-intelligence models into a global standard has jolted American companies and policymakers, who fear U.S. models could be eclipsed and are mobilizing their responses to the threat.

Chinese advances in AI have come one after another this year, starting with the widely heralded DeepSeek and its R1 reasoning model in January. This was followed by Alibaba’s Qwen and a flurry of others since July, with names such as Moonshot, Z.ai and MiniMax.

American companies that have kept their models proprietary are feeling the pressure. In early August, ChatGPT maker OpenAI released its first open-source model, called gpt-oss.

The history of technology offers many examples where a welter of competitors in an industry’s infancy eventually evolved into a monopoly or oligopoly of a few players. Microsoft’s Windows operating system for desktops, Google’s search engine, and the iOS and Android operating systems for smartphones are just a few of the examples.

History also teaches that the battle to become an industry standard isn’t necessarily won by the most technologically advanced player. Easy availability and flexibility play a role, which is why China’s advances in open-source AI worry many in Washington and Silicon Valley.

In an AI action plan released in July, the Trump administration said open-source models “could become global standards in some areas of business and in academic research.” The report called on the U.S. to build “leading open models founded on American values.”

President Trump displayed a signed executive order related to his Artificial Intelligence Action Plan last month.© Chip Somodevilla/Getty Images
For now, the rewards to the winners in open-source AI are slim, since they spend hundreds of millions of dollars developing models and get paid nothing directly in return. But those who lock in users may be able to sell other services piggybacking on the free part, just as Google offers search, YouTube and other revenue-generating products bundled with its Android operating system.

Android is itself open source and built on Linux, an open-source operating system still widely used in the industry.

Researchers have long embraced open source as a way of accelerating the development of emerging technology, since it allows every user to see the code and suggest improvements. Chinese officials have encouraged open-source research and development not only in AI but also in operating systems, semiconductor architecture and engineering software.

“Fearing being cut off from American technologies, China is fostering open-source projects as a strategic fallback and emergency resource,” said Lian Jye Su, an analyst at research firm Omdia focusing on AI.

This year’s U.S.-China trade war has shown how each side can leverage its industrial advantages—such as Nvidia chips for the U.S. and rare-earth minerals for China—to extract concessions from the other side. U.S. officials worry that if Chinese AI models dominate the globe, Beijing will figure out a way to exploit it for geopolitical advantage.

Away from politics, open-source AI models are vying for adoption by businesses. Many customers like open-source AI because they can freely adapt it and put it on their computer systems, keeping sensitive information in-house.

Singapore-based Oversea-Chinese Banking, one of Southeast Asia’s biggest banks, has developed around 30 internal tools using open-source models, including Google’s Gemma to summarize documents, Qwen to help write computer code and DeepSeek to analyze market trends.

The bank said it avoided being locked into any one model. It monitors new releases and can switch if it likes a new model. It also prefers models that many developers are familiar with, so it can get technical support.

“At any point in time, we probably have a stable of about 10 open-source models that we’re using,” said Donald MacDonald, an executive at OCBC.

The overall performance of China’s best open-weight model has surpassed the American open-source champion since November, according to research firm Artificial Analysis. The firm, which rates the ability of models in math, coding and other areas, found a version of Alibaba’s Qwen3 beat OpenAI’s gpt-oss.

However, the Chinese model is almost twice as big as OpenAI’s, suggesting that for simpler tasks, Qwen might consume more computing power to do the same job. OpenAI said its open-source model outperformed rivals of similar size on reasoning tasks and delivered strong performance at low cost.

Major U.S. cloud-service providers have started offering gpt-oss to their users. Amazon Web Services said the OpenAI model was more cost-effective than DeepSeek’s R1 running on its infrastructure.

Engineers, especially those in Asia, said they found Chinese models were often more sophisticated in understanding their local languages and catching cultural nuances. Models from China are trained with more data in Chinese, which shares similarities with some other Asian languages.

Shinichi Usami, an engineer in Yokohama, Japan, recently developed a customer-service chatbot for a retail client. He chose Alibaba’s Qwen.

With a leading U.S. model, he said, “we’ve observed instances where the chatbot struggles to grasp the implicit intent from users’ words and the responses can occasionally be not polite enough,” said Usami. “Qwen appears to handle these nuances better.”

Companies in China’s hypercompetitive AI industry at first focused on undercutting each other’s prices for closed-source models. That competition has extended in recent months to open-source models as everyone fights for adoption and public recognition.

“Chinese companies often prioritize user stickiness over immediate revenue,” said Charlie Chai, a Shanghai-based tech analyst at 86Research.

While startups have a window to attract users, it won’t last long, analysts said, and larger tech companies are often best-positioned to cash in on a big user base by offering related services such as specialized apps or cloud services.

“This Darwinian life-or-death struggle will lead to the demise of many of the existing players, but the intense competition breeds strong companies,” wrote Andrew Ng, head of Silicon Valley startup DeepLearning.AI, in a recent blog.

Reference
https://www.msn.com/en-us/news/tech...lts-washington-and-silicon-valley/ar-AA1KpRea

F-22Raptor · Aug 14, 2025

Chinese artificial intelligence company DeepSeek delayed the release of its new model after failing to train it using Huawei’s chips, highlighting the limits of Beijing’s push to replace US technology.

DeepSeek was encouraged by authorities to adopt Huawei’s Ascend processor rather than use Nvidia’s systems after releasing its R1 model in January, according to three people familiar with the matter.

But the Chinese start-up encountered persistent technical issues during its R2 training process using Ascend chips, prompting it to use Nvidia chips for training and Huawei’s for inference, said the people.
The issues were the main reason the model’s launch was delayed from May, said a person with knowledge of the situation, causing it to lose ground to rivals.

Training involves the model learning from a large dataset, while inference refers to the step of using a trained model to make predictions or generate a response, such as a chatbot query.

DeepSeek’s difficulties show how Chinese chips still lag behind their US rivals for critical tasks, highlighting the challenges facing China’s drive to be technologically self-sufficient.

The Financial Times this week reported that Beijing has demanded that Chinese tech companies justify their ordersof Nvidia’s H20, in a move to encourage them to promote alternatives made by Huawei and Cambricon.

Industry insiders have said the Chinese chips suffer from stability issues, slower inter-chip connectivity and inferior software compared with Nvidia’s products.

Huawei sent a team of engineers to DeepSeek’s office to help the company use its AI chip to develop the R2 model, according to two people. Yet despite having the team on site, DeepSeek could not conduct a successful training run on the Ascend chip, said the people.

DeepSeek is still working with Huawei to make the model compatible with Ascend for inference, the people said.

Founder Liang Wenfeng has said internally he is dissatisfied with R2’s progress and has been pushing to spend more time to build an advanced model that can sustain the company’s lead in the AI field, they said.

The R2 launch was also delayed because of longer-than-expected data labelling for its updated model, another person added. Chinese media reports have suggested that the model may be released as soon as in the coming weeks.

“Models are commodities that can be easily swapped out,” said Ritwik Gupta, an AI researcher at the University of California, Berkeley. “A lot of developers are using Alibaba’s Qwen3, which is powerful and flexible.”

Gupta noted that Qwen3 adopted DeepSeek’s core concepts, such as its training algorithm that makes the model capable of reasoning, but made them more efficient to use.

Gupta, who tracks Huawei’s AI ecosystem, said the company is facing “growing pains” in using Ascend for training, though he expects the Chinese national champion to adapt eventually.

“Just because we’re not seeing leading models trained on Huawei today doesn’t mean it won’t happen in the future. It’s a matter of time,” he said.

Nvidia, a chipmaker at the centre of a geopolitical battle between Beijing and Washington, recently agreed to give the US government a cut of its revenues in China in order to resume sales of its H20 chips to the country.

“Developers will play a crucial role in building the winning AI ecosystem,” said Nvidia about Chinese companies using its chips.
“Surrendering entire markets and developers would only hurt American economic and national security.”

DeepSeek and Huawei did not respond to a request for comment.

John Smith · Aug 14, 2025

Looks like it'll take a while for Chinese chipmakers and GPU-makers to catch up to the US. Its really a cutting edge technology and it takes quite a lot of investment and technical talent and experience.
Sad that Deepseek has allegedly denied the August release of R2. I've been waiting eagerly for their model

Yousafzai_M · Aug 14, 2025

US is a mafia. It has been for a long time but now its out in open. I like it!

The world is getting more and more clearly divided between those who will lick Americans' shoes and those who won't.

Beijingwalker · Aug 15, 2025

AI experts return from China stunned: The US grid is so weak, the race may already be over

By Eva Roytburg
Fellow, News
August 14, 2025 at 3:55 PM EDT

A drone photo shows staff members of State Grid Bortala Electric Power Supply Company patrolling near Sayram Lake scenic area to ensure power supply in Bortala Mongolian Autonomous Prefecture, northwest China's Xinjiang Uygur Autonomous Region, July 17, 2025.
Yin Tianjie/Xinhua via Getty Images

“Everywhere we went, people treated energy availability as a given,” Rui Ma wrote on X after returning from a recent tour of China’s AI hubs.

For American AI researchers, that’s almost unimaginable. In the U.S., surging AI demand is colliding with a fragile power grid, the kind of extreme bottleneck that Goldman Sachs warns could severely choke the industry’s growth.

In China, Ma continued, it’s considered a “solved problem.”

Ma, a renowned expert in Chinese technology and founder of the media company Tech Buzz China, took her team on the road to get a firsthand look at the country’s AI advancements. She told Fortune that while she isn’t an energy expert, she attended enough meetings and talked to enough insiders to come away with a conclusion that should send chills down the spine of Silicon Valley: in China, building enough power for data centers is no longer up for debate.

“This is a stark contrast to the U.S., where AI growth is increasingly tied to debates over data center power consumption and grid limitations,” she wrote on X.

The stakes are difficult to overstate. Data center building is the foundation of AI advancement, and spending on new centers now displaces consumer spending in terms of impact to U.S. GDP—that’s concerning since consumer spending is generally two-thirds of the pie. McKinsey projects that between 2025 and 2030, companies worldwide will need to invest $6.7 trillion into new data center capacity to keep up with AI’s strain.

In a recent research note, Stifel Nicolaus warned of a looming correction to the S&P 500, since it forecasts this data-center capex boom to be a one-off build-out of infrastructure, while consumer spending is clearly on the wane.

However, the clear limiting factor to the U.S.’s data center infrastructure development, according to a Deloitte industry survey, is stress on the power grid. Cities’ power grids are so weak that some companies are just building their own power plants rather than relying on existing grids. The public is growing increasingly frustrated over increasing energy bills – in Ohio, the electricity bill for a typical household has increased at least $15 this summer from the data centers – while energy companies prepare for a sea-change of surging demand.

Goldman Sachs frames the crisis simply: “AI’s insatiable power demand is outpacing the grid’s decade-long development cycles, creating a critical bottleneck.”

Meanwhile, David Fishman, a Chinese electricity expert who has spent years tracking their energy development, told Fortune that in China, electricity isn’t even a question. On average, China adds more electricity demand than the entire annual consumption of Germany, every single year. Whole rural provinces are blanketed in rooftop solar, with one province matching the entirety of India’s electricity supply.

“U.S. policymakers should be hoping China stays a competitor and not an aggressor,” Fishman said. “Because right now they can’t compete effectively on the energy infrastructure front.”

China has an oversupply of electricty

China’s quiet electricity dominance, Fishman explained, is the result of decades of deliberate overbuilding and investment in every layer of the power sector, from generation to transmission to next-generation nuclear.

The country’s reserve margin has never dipped below 80%–100% nationwide, meaning it has consistently maintained at least twice the capacity it needs, Fishman said. They have so much available space that instead of seeing AI data centers as a threat to grid stability, China treats them as a convenient way to “soak up oversupply,” he added.

That level of cushion is unthinkable in the United States, where regional grids typically operate with a 15% reserve margin and sometimes less, particularly during extreme weather, Fishman said. In places like California or Texas, officials often issue warnings about red-flag conditions when demand is projected to strain the system. This leaves little room to absorb the rapid load increases AI infrastructure requires, Fishman ntoed.

The gap in readiness is stark: while the U.S. is already experiencing political and economic fights over whether the grid can keep up, China is operating from a position of abundance.

Even if AI demand in China grows so quickly renewable projects can’t keep pace, Fishman said, the country can tap idle coal plants to bridge the gap while building more sustainable sources. “It’s not preferable,” he admitted, “but it’s doable.”

By contrast, the U.S. would have to scramble to bring on new generation capacity, often facing years-long permitting delays, local opposition, and fragmented market rules, he said.

Structural governance differences

Underpinning the hardware advantage is a difference in governance. In China, energy planning is coordinated by long-term, technocratic policy that defines the market’s rules before investments are made, Fishman said. This model ensures infrastructure buildout happens in anticipation of demand, not in reaction to it.

“They’re set up to hit grand slams,” Fishman noted. “The U.S., at best, can get on base.”

In the U.S., large-scale infrastructure projects depend heavily on private investment, but most investors expect a return within three to five years: far too short for power projects that can take a decade to build and pay off.

“Capital is really biased toward shorter-term returns,” he said, noting Silicon Valley has funneled billions into “the nth iteration of software-as-a-service” while energy projects fight for funding.

In China, by contrast, the state directs money toward strategic sectors in advance of demand, accepting not every project will succeed but ensuring the capacity is in place when it’s needed. Without public financing to de-risk long-term bets, he argued, the U.S. political and economic system is simply not set up to build the grid of the future.

Cultural attitudes reinforce this approach. In China, renewables are framed as a cornerstone of the economy because they make sense economically and strategically, not because they carry moral weight. Coal use isn’t cast as a sign of villainy, as it would be among some circles in the U.S. – it’s simply seen as outdated. This pragmatic framing, Fishman argued, allows policymakers to focus on efficiency and results rather than political battles.

For Fishman, the takeaway is blunt. Without a dramatic shift in how the U.S. builds and funds its energy infrastructure, China’s lead will only widen.

“The gap in capability is only going to continue to become more obvious — and grow in the coming years,” he said.

Rationale · Aug 15, 2025

Yousafzai_M said:
The world is getting more and more clearly divided between those who will lick Americans' shoes and those who won't.

Most countries are in the former camp, except for the BRICS nations and a handful of others.

ChineseTiger1986 · Aug 15, 2025

XYZ123 said:
Looks like it'll take a while for Chinese chipmakers and GPU-makers to catch up to the US. Its really a cutting edge technology and it takes quite a lot of investment and technical talent and experience.
Sad that Deepseek has allegedly denied the August release of R2. I've been waiting eagerly for their model

Liang wants his R2 model to blow out everything out of water.

Since China has already maxed out its DUV, and it is testing its EUV for mass production.

The chips fabricated by its domestic EUV should be able to sustain the performance of the R2.

The US doesn't need to make everything by itself, but rely on the supply chain built together with its allies, while China has to rely everything with the supply chain built by itself.

神威98 · Aug 15, 2025

I had the same kind of problem with my first WiFi 6 router which was a Huawei. It didn't work that's why I went and bought a Xiaomi and been praising their products ever since! I think China's future seriously in Jeopardy with monopolistic profit driven companies like Huawei at the helm...

DeepSeek turning to Nvidia for R2 AI model following Huawei chip failure

Following Huawei chip hurdles, DeepSeek is now reportedly marching to Nvidia for its R2 AI model. The company has also extended the launch time for its new artificial intelligence model due to training failures and certain other factors. PatentlyApple noted that DeepSeek is walking away from the...

www.huaweicentral.com

DeepSeek turning to Nvidia for R2 AI model following Huawei chip failure

"
Following Huawei chip hurdles, DeepSeek is now reportedly marching to Nvidia for its R2 AI model. The company has also extended the launch time for its new artificial intelligence model due to training failures and certain other factors.

PatentlyApple noted that DeepSeek is walking away from the Huawei AI chip after seeing constant woes in the R2 model and planning to use Nvidia H20 chipsets.

DeepSeek R2 was expected to debut in May this year. Although the Chinese AI startup was dealing with inference training problems that delayed the launch timeline.

A recent report suggested that DeepSeek R2 could debut by the end of this month with the Huawei Ascend 910B/C chips. But looks like the company is now denying the alleged release window following consistent issues with Ascend processors.

FinancialTimes reported that Chinese authorities have been pressuring the AI startup to use Huawei chips over Nvidia. A reason behind this force is the fruitful results of the R1 AI model launch in the consumer market.

Though it seems things aren’t working as expected earlier. Huawei even sent a team of expert engineers to check out the ongoing problems with the Ascend AI chips. But it didn’t work, and the firms failed to complete a single successful training run!

What are these issues? The Ascend chip is reportedly leading to immature software support, especially with the CANN toolkit, slow interconnect speeds, and unstable performance.

As a result, DeepSeek is now turning towards Nvidia H20 chips for training its R2 AI model. It may continue to use Huawei hardware for inference tasks.

Following all these hustles, Ritwik Gupta, an AI researcher at the University of California, Berkeley, said:

“Models are commodities that can be easily swapped out. Just because we’re not seeing leading models trained on Huawei today doesn’t mean it won’t happen in the future. It’s a matter of time.”
"

Nan Yang · Aug 15, 2025

Trump shutdown power projects across the US as China adopts "all of the above" energy policy

神威98 · Aug 15, 2025

神威98 said:
What are these issues? The Ascend chip is reportedly leading to immature software support, especially with the CANN toolkit, slow interconnect speeds, and unstable performance.

I can imagine how frustrated the guys at Deepseek must have been. This is a 1:1 mirror of the situation I encountered with their WiFi routers, which was completely unstable connections.

How can you be the most talked about company in the news in China and produce such junk as non-working WiFi routers and unusable Ascend GPUs for AI training? It's unthinkable!

I love reading up on tech advances from China. BUT if it's Huawei doing it, I'm not too keen or enthusiastic...

52051 · Aug 16, 2025

Huawei's case is their eco-system need maturity, some of their AI op have some hidden bugs that are not easy to solve (cannot blame them, NPU is a lot of harder to tune and programm with), the very fast evoluation of LLM eco-system itself make the situation worse.

Thats why the company I worked for, have develop tools for these clients and writting ops and AI training infra for them, there is huge need for that, which lead to the fast track of my company's IPO progress. And Huawei is about to give up their pure NPU design, and the next generation Ascend will have a smiliar SIMT inferance just like all other GPGPUs providers in China. Huawei have a very aggressive marketing team, sometimes they bite more than they could, which actually give opporntunies for the alternatives in China.

For DeepSeek, actually they are now working with Hanwuji, another AI chip vendor, for their next-gen LLM, the user experience so far is alot of better.

Fatman17 · Aug 16, 2025

To be merged

lightning f57 · Aug 16, 2025

So fundamentally the issue is with getting sufficient hardware sorted. The AI model could be the most advance but without resolving the first issue it can't be employed nationally and globally.

DeepSeek, China's AI model: News & Discussion

Elite Member

Registered Member

Registered Member

Elite Member

Registered Member

Registered Member

VIP Member

AI experts return from China stunned: The US grid is so weak, the race may already be over​

China has an oversupply of electricty​

Structural governance differences​

Registered Member

Registered Member

Registered Member

DeepSeek turning to Nvidia for R2 AI model following Huawei chip failure​

Registered Member

Registered Member

Registered Member

Moderator

Trusted Member

Users who are viewing this thread

Share this page

We value your privacy

AI experts return from China stunned: The US grid is so weak, the race may already be over

China has an oversupply of electricty

Structural governance differences

DeepSeek turning to Nvidia for R2 AI model following Huawei chip failure