Rendered at 02:38:24 GMT+0000 (Coordinated Universal Time) with Cloudflare Workers.
brianolson 6 hours ago [-]
> Why aren’t these AI companies submitting to the TOP500 to show off their computing prowess?
my knowledge is 10+ years out of date, but once upon a time if they'd chosen to, Google could have had _several_ entries in the top 10 of the TOP500 list
It's just poker, they didn't want to tip their hand
davidmr 5 hours ago [-]
I’ve worked on several systems that had enough flop/s to make it in the top 5-10, but for which we never submitted benchmarks. Sometimes their backend network layout technically would make them several smaller clusters for an HPL run, sometimes it’s because the cluster is too heterogeneous to get a good benchmark result, and sometimes it’s because the employer wants to keep a low profile.
Most of the time, it just that it’s a hassle. It takes a while to prep and tune a big hero run for benchmarking, and if you spend a billion dollars on a cluster, it’s making you a lot more than that. Taking it down for a day or two stops the money printers.
JumpCrisscross 1 hours ago [-]
Is there international value to these designations? As in, would it be worth it for the U.S. to pay a bonus to anyone who qualifies into the TOP500, to offset the cost of the run?
ziofill 5 hours ago [-]
Also, would those 550k Blackwell have good FP64 performance? How would one even compare them?
ls612 3 hours ago [-]
Why would the scientific computing people want to tip their hand? It’s an open secret that the main point of these mammoth FP64 compute machines is to simulate nuclear weapons detonations to comply with the CTBT you’d think that crowd would really not be fans of broadcasting their capabilities.
kube-system 2 hours ago [-]
In adversarial scenarios, there are varying strategies in communicating one's capabilities, just as one might do in a poker game.
Sometimes you want to show off what you can do to dissuade others from fucking with you. Sometimes you want to undersell your capabilities to hide your true ability. Sometimes you want others to think you are underselling your capabilities when you are actually at a disadvantage.
dopa42365 1 hours ago [-]
What for? You only need to match the performance that existed in the 1950s. In the Soviet Union. Everything else is a lack of knowledge rather than computing power.
Also you should read the second sentence of the CTBT Wikipedia article to find out why it's not even in force (spoiler: US hasn't ratified it).
Onavo 1 hours ago [-]
At some point, you will get diminishing returns no? I don't think compute is the bottleneck right now for mechanical engineering if you don't count AI.
iberator 6 hours ago [-]
Cloud computing is not a supercomputer. Different architecture, bandwitch, interconnectivity and latencies.
dgacmu 6 hours ago [-]
That's not nearly as true when you look at AI training clusters. They're basically supercomputers but without an FP64 focus.
(These are the systems to which GP was referring at Google.)
cynicalkane 5 hours ago [-]
Even before AI training clusters became important, Google has had an outstanding custom fabric (there's papers about it) together with the ability to tune NICs for their own cases, and "their own cases" meant nearly everything engineered within Google. Ethernet hardware has had low kernel latency and DMA for a long time; it's the rest of the stack that hurts. But as far back as the early 2010s (if not further back, that goes beyond my knowledge horizon), you could just make it not hurt, if you had the software engineers to do it.
jeffbee 5 hours ago [-]
I thought TPUs couldn't reasonably run LINPACK at all because TPUs do not acknowledge that FP64 exists.
I know Google wants to compare their stuff to El Capitan or whatever but the comparison does not seem valid to me.
5 hours ago [-]
wmf 5 hours ago [-]
Historically there have been a bunch of clusters on the Top 500 that weren't used for HPC. The tell is that they used Ethernet (this was before RoCE). It's less efficient but you can still get an OK Linpack score.
jandrewrogers 6 hours ago [-]
TOP500 hasn't been a particularly useful measure of practical computing power in modern systems for many years because what it measures isn't a significant bottleneck in most real systems. It has become a measure of how much money someone is willing to spend for bragging rights. (HPCG is better in that it is a bit more bandwidth focused but still pretty narrow.)
Most companies with huge systems don't participate.
bee_rider 5 hours ago [-]
I wonder if there would have been an opportunity to generate some finer-grained benchmarks with something like BiCGStab+ILU (or maybe CG+incomplete cholesky). Instead of CG+Gauss Seidel. The pitch being, you might have made different memory vs compute trade-offs with designing your cluster, but you should be able to select a fill-in factor for the preconditioner to suit it.
chrisss395 3 hours ago [-]
I haven't kept up with the latest on supercomputing power, but I recall some years ago there being strong evidence that China had a couple of un-announced supercomputers that would have topped the charts. It makes me wonder what is publicly disclosed vs. actual.
flopsamjetsam 6 hours ago [-]
> We think it is highly likely that these LX2 chiplets are etched using SMIC 7 nanometer processes at the N+3 refinement, and we base that on the fact that the chip only runs at 1.55 GHz. That is nowhere near the 3 GHz that SMIC can push with that process, but it is probably lower to get the memory and core speeds more balanced. [1]
Interesting to see PAC mentioned on the slide, I'd have assumed security features would be a waste of transistors on something so compute-optimized - but maybe they want to isolate workloads from each other?
b33f 4 hours ago [-]
Why are they not using GPUs? is it use cases that don't suit GPUs or because of the limitations they are imposing on themselves to use SMIC domestic chips?
wmf 2 hours ago [-]
I suspect Chinese GPUs (e.g. Biren) are not mature.
amelius 3 hours ago [-]
GPUs are for graphics (the G in GPU). These systems are used for more general computations.
galaxy_quest 2 hours ago [-]
I’m not sure if I’m missing a joke, but that’s why we have general purpose computing on graphics processing units (GPGPU) which is why 8/10 of the top 10 machines have GPUs.
2OEH8eoCRo0 6 hours ago [-]
Extremely impressive accomplishment considering they did this with Chinese interconnects and Chinese chips. This is a wake up call.
jandrewrogers 6 hours ago [-]
TOP500 can be done with inexpensive silicon. It is more about a willingness to aggregate enough hardware in one place. As a benchmark, it tells you almost nothing about computing power or scalability for other applications because it doesn't exercise the bottlenecks most high-scale applications have.
echelon 6 hours ago [-]
We're too busy regulating the tech, not granting access to US engineers and companies, arguing against power and data centers, stopping skilled immigration.
This is absolutely going to bite us in the face in five to ten years.
2OEH8eoCRo0 6 hours ago [-]
Separate issue that has nothing to do with US manufacturing or HPC. I think our retreat from science funding and offshoring advanced manufacturing is a bigger issue.
ziofill 6 hours ago [-]
> Two cores are disabled per cluster.
I’m sure there is a good reason for this, which is..?
jandrewrogers 5 hours ago [-]
It is likely that those cores are dedicated to unrelated management, monitoring, and administrative tasks. This is common and many workloads are throttled on bandwidth anyway. For the purposes of the benchmark, those cores are not participating in the workload.
brianolson 5 hours ago [-]
Yield. Some fraction of cores had a speck of dust or something, but at 38/40 good cores per chip they got economical yield
tjhei 5 hours ago [-]
And then even if some nodes had 40/40 "good" cores, it would make load balancing a lot more complicated if core counts vary. Easier to turn them off at the hardware level.
dist-epoch 5 hours ago [-]
Couldn't some chips have 40 good cores, while others have only 36? Do they all need to be exactly 38?
my knowledge is 10+ years out of date, but once upon a time if they'd chosen to, Google could have had _several_ entries in the top 10 of the TOP500 list
It's just poker, they didn't want to tip their hand
Most of the time, it just that it’s a hassle. It takes a while to prep and tune a big hero run for benchmarking, and if you spend a billion dollars on a cluster, it’s making you a lot more than that. Taking it down for a day or two stops the money printers.
Sometimes you want to show off what you can do to dissuade others from fucking with you. Sometimes you want to undersell your capabilities to hide your true ability. Sometimes you want others to think you are underselling your capabilities when you are actually at a disadvantage.
Also you should read the second sentence of the CTBT Wikipedia article to find out why it's not even in force (spoiler: US hasn't ratified it).
(These are the systems to which GP was referring at Google.)
I know Google wants to compare their stuff to El Capitan or whatever but the comparison does not seem valid to me.
Most companies with huge systems don't participate.
Based on the ARMv9.2.
[1] https://www.nextplatform.com/hpc/2026/06/25/a-deep-dive-on-c...
This is absolutely going to bite us in the face in five to ten years.
I’m sure there is a good reason for this, which is..?