I’ve just come back from ARM TechCon 2011 and it was a pretty special time for ARM: on the first day, Simon Segars, EVP and GM of the Physical IP Division at ARM, opened things by talking about the full range of physical IP that ARM now has, and the Processor Optimisation Packages (POPs) that make ARM’s great CPUs even faster and lower power (we’re looking at ways to add the same value for our Mali GPUs in the future). After that, we had Nandan Nayampally’s talks on ARM Cortex-A7, and the big.LITTLE processing announcement together with CCI-400 Cache Coherent Interconnect (CCI). To finish it all off, Mike Muller (ARM’s CTO) gave one of his witty keynotes, in which he included the announcement of the ARMv8 64-bit instruction set architecture. So, where was our graphics?We did have a couple of very good talks at the show, particularly on GPU computing, but we didn’t announce any new Mali graphics products – we felt that a list like that above might tend to overshadow such an announcement (though keep your eyes and ears open over the next few weeks!). What we showed were a number of consumer products based on Mali GPUs such as the Samsung Galaxy S II smartphone powered by the ARM® Mali-400MP4. We also had a new tech preview from SiXiTS exclusively available on Mali-powered devices and much more, but what I really want to talk about is how the nature of ARM’s joined-up technology (across all the areas) became clearer at the show.
The unique advantage of Mali GPUs is they don’t have to use the CPU all the time to push large quantities of data into the GPU: the GPU reads the data autonomously from memory. This autonomous capability allows the GPU to do the graphics processing while using less CPU horsepower. What this means is that our benchmark-leading Mali-400 GPU can reach its maximum performance with the Cortex-A9 CPU frequency turned down to 50% of peak CPU frequency. This makes it ideal to pair with a big.LITTLE processing combination of Cortex-A7 and Cortex-A15: we can run for more of the time without needing the full power of the Cortex-A15.
With a big.LITTLE design, there should be a cache-coherent interconnect, and Mali-T604 is designed to utilise that coherency to reduce the cost of offloading work from the CPU to be processed on the GPU (GPU computing) through any of the methods we support, such as Khronos OpenCL, Google Android RenderScript or Microsoft’s DirectCompute. Being cache-coherent means that data in the CPU cache can be read by the GPU direct from the CPU cache, without going via external memory. This saves two things: the energy and bandwidth of writing it out to memory and reading it back in, and the overhead on the CPU.
Last year, when we launched the Mali-T604, some commentators were surprised at the Midgard architecture design, which is 64-bit internally, uses coherency, has an MMU that shares the ARM page table formats etc. In my blog at the time, I explained why this was going to be necessary with the requirements for 64-bits ints and addresses, 64-bit floats, Full Profile OpenCL, etc. My colleague Sean Ellis in his recent blog also explained why we have a fully-featured MMU in our GPUs, and why Mali-T604 shares the page tables formats of the latest ARM CPUs and are ready for the ARMv8 64-bit leap. In his blog, Roberto Mijat wrote about how the requirements of Google’s’ RenderScript Compute are driving the need for higher precision.
Given ARM’s increasingly clear, joined-up story across CPU, GPU, fabric interconnect and physical IP, perhaps now the reason for some of our design decisions has become a little clearer?
For some light relief, here’s a video interview of Ian Smythe and I which was taken at ARM TechCon:
*Mali GPUs noted as based on a published Khronos Specification are conformant, or expected to pass the Khronos Conformance Testing Process. Current conformance status can be found at http://www.khronos.org/conformance.
Jem is an ARM Fellow and likes to think of himself as "The Godfather" to technical talent in ARM. After spending some time in his youth writing software for satellites and traffic-lights among other fascinating things, Jem spotted the technical inflection point of the mobile industry: graphics, video and other visual computing. As VP of technology in the Media Processing Division of ARM, Jem is busy with a lot of projects involving the future of cool ARM technology, which will revolutionise how people experience and interact with digital devices.
All company and product names appearing in the ARM Blogs are trademarks and/or registered trademarks of ARM Limited per ARM’s official trademark list. All other product or service names mentioned herein are the trademarks of their respective owners.
2 Comments On This Entry
Please log in above to add a comment or register for an account
Page 1 of 1
Techu
10 December 2011 - 08:41 PM
I'm a little late to this thread , but i do have a question regarding this ARMv8 64-bit leap...
were does your partners Samsung and the other key device player's stand as regards getting far faster DDR*L ram on-board all the new A15 (or even the quad core A9/NEON ASAP) and newer mali gfx cores/PCB's given Samsung have their http://www.xbitlabs...._Bandwidth.html "1Gb wide I/O mobile DRAM can transmit data at 12.8GB/s..." now.
how does this 64bit + wide IO 512-Bit Bus ram interface compare to average BOM costs and innovative large speed increases for less power usage etc!
it seems a real shame that ARM vendors in the popular press cant/wont seem to take advantage of this really innovative data throughput advantage in a timely manor while end users are very keen to take and advocate something new in an ARM device, and put AMD x86/gfx etc to pasture or at least relegate them to second choice in the mass consumer markets ASAP
preferably in 2012.
were does your partners Samsung and the other key device player's stand as regards getting far faster DDR*L ram on-board all the new A15 (or even the quad core A9/NEON ASAP) and newer mali gfx cores/PCB's given Samsung have their http://www.xbitlabs...._Bandwidth.html "1Gb wide I/O mobile DRAM can transmit data at 12.8GB/s..." now.
how does this 64bit + wide IO 512-Bit Bus ram interface compare to average BOM costs and innovative large speed increases for less power usage etc!
it seems a real shame that ARM vendors in the popular press cant/wont seem to take advantage of this really innovative data throughput advantage in a timely manor while end users are very keen to take and advocate something new in an ARM device, and put AMD x86/gfx etc to pasture or at least relegate them to second choice in the mass consumer markets ASAP
Page 1 of 1
»
Blog Tags
»
Recent Entries
»
Search My Blog
»
























