Login

ARM The Architecture For The Digital World  

ARM Community: SoC Design - ARM Community

Jump to content

Squaring the circle - Optimizing power efficiency in a Cortex-A15 processor

It is entirely appropriate that ARM will announce technical details of its latest hard macro product, the Cortex™-A15 MP4 Hard Macro for TSMC 28HPM node at COOL Chips XV, the IEEE Symposium on Low-Power and High-Speed Chips, being held this week in Yokohama, Japan (18-20th April, 2012). This exciting new hard macro not only perfectly encapsulates the theme of the symposium, but also pulls together the contemporary and divergent design challenges of offering extremely high-performance compute engines within a conservative power budget.

The Cortex-A15 MP4 Hard Macro is a high performance, power-optimized quad-core hard macro implementation of our flagship Cortex-A15 processor, on leading 28nm process. It delivers three significant firsts for the ARM hard macro portfolio, as not only is this the first quad–core hard macro, but also the first hard macro based on the highest performance ARMv7 architecture-based Cortex-A15 processor, and it is also the first hard macro based on 28nm process.

In terms of configuration, the Cortex-A15 MP...

Early Power, Performance, Area Analysis & AMBA Designer: A Winning Combo

Power, performance and area or “PPA,” as it is called, has become a universally interesting topic to system-on-chip (SoC) designers around the world. Atrenta – an ARM Connected Community Partner and a leading provider of SoC Realization solutions for the semiconductor and electronic systems industries – showcased a new and innovative design flow for early PPA estimation at last year’s ARM TechCon.

Design complexity now demands that all aspects of the design be co-optimized. If you reduce power, you will impact performance and area and so on. A holistic approach that balances all requirements of the chip is needed to deploy SoCs successfully.

Improving PPA by analyzing interconnect fabric earlier
Anyone will tell you that the interconnect...

Using Cache Coherency to Verify the AMBA4 Protocol

The Jasper User Group Meeting was held on November 8 & 9 and was full of presentations on the diverse ways that users are applying formal techniques – some in areas where never before thought possible. Paul Martin from ARM was one of those users who presented on this topic. ARM discussed how modern multi-core processors now require much more sophisticated cache control than before, ensuring that all devices in the system have the same view of shared data, known as cache coherency. ARM in particular has created some quite sophisticated protocols, AXI Coherency Extensions (ACE), under the AMBA 4 umbrella, that they announced at DAC.

The need to move cache management to hardware
In the old days, cache coherency management was largely done in software, invalidating large parts of the cache to ensure no stale data could get accessed, and forcing the cache to gradually be reloaded from main memory. There are several reasons why this is no longer appropr...

将大小计算引擎完美地整合在-起 - ARM Cortex-A7

今天,ARM Cortex-A7 的隆重推出…标志着 big.Little 处理架构的最终实现!

我驾驶着一辆节能型的本田飞度行驶在 20 英里的市区上班途中。有时我会突发奇想,希望在自己的座驾上安装一个更快的引擎,但大部分时间我还是对自己驾驶的节能车辆感到很满意。但我必须承认我曾抗拒换一辆经济型轿车;我时常梦想自己拥有像保时捷或宝马车那样的性能,但只希望在驾车的小部分时间里拥有这种卓越的性能。如果我能驾驶一辆平均能效保持在四缸引擎范围内,而在需要最高性能的片刻可迅速切换到高性能的涡轮增压式 V8 引擎汽车,那该有多好啊?如果平均燃油能效接近于 4 缸引擎,而最高性能接近于涡轮增压式 V8 引擎的性能,会怎样?
...

big.LITTLE and AMBA 4 ACE keep your cache warm and avoid flushes

High performance and power efficiency are critical to the latest mobile devices, and AMBA® 4 ACE™ is a fundamental technology supporting ARM’s big.LITTLE processing. In case you missed the announcements, the big.LITTLE technology offers an innovative way to run the ‘always on’ tasks on the highly efficient Cortex™-A7 processor, while the high performance and responsive applications are predominantly executed on the Cortex™-A15 processor. So what does this have to do with AMBA 4? Well AMBA 4 ACE and the CoreLink™ CCI-400 Cache Coherent Interconnect offer the critical glue to join these processors together into a big.LITTLE multi-processing (MP) system. Let me explain…

Earlier this year ARM announced the public release of the AMBA 4 phase 2 specification including AC...

Energy Efficiency and Air Conditioning - Part 2: ARM Cortex-A7

ARM Cortex-A7 processor…It's all about right-sized equipment.
In Part 1 of this blog we saw how right-sizing of air conditioning is vitally important because it performs three different functions simultaneously: Cooling, dehumidification and ventilation. Increases in efficiency could be obtained by separating out these three functions and optimizing them independently. As we saw last time, pumping air through ducts is inefficient due to the wasted pumping energy. You could use a hydronic system like a traditional underfloor heating system as is common in northern Europe, but with cooling in the ceiling as well as heating in the floor. Pumping water is a more efficient way to move energy than air. Even though the water pipe is physically smaller than an air duct, in terms of thermal energy transfer capacity it's a fatter pipe. But this wouldn't dehumidify or ventilate, so you'd still need a very small A/C system with air ducts to provide these functions. Duct size and pumping losses would be much lower than a pure ducted central air system though...

Combining large and small compute engines - ARM Cortex-A7

Today the ARM Cortex-A7 processor was announced…the power of big.Little processing is finally realized!

I drive a Honda Fit, mainly for the fuel efficiency, on a 20 mile city street commute. Sometimes I wish my car had a faster engine, but most of the time I’m happy to drive for high gas mileage. But I have to say I was a reluctant convert to economy cars; I often find myself longing for the performance of a Porsche or BMW, but I only really want that performance a small percentage of the time I’m driving. Wouldn’t it be great if it were possible to drive a car with the average efficiency of a 4-cylinder engine, a car that could switch to a high performance of a turbocharged V8 engine for the small percentage of time you actually wanted peak performance? What if the average fuel economy was closer to the 4-cylinder and the peak performance was closer to that of the turbo V8?

...

Avoiding rush hour traffic jams in your SoC design

Thank goodness for the new packetization, virtual networking and clock gating features of the ARM CoreLink NIC-400. I’m fed up with hearing about spurious comparisons between a single humungous crossbar switch and NoC solutions. No design of any complexity uses a single cross bar switch! Modern SoC designs integrate large numbers of IP cores supporting the de facto industry standard AMBA AXI interface (open to all) using a network of small switches to allow very high operating frequencies and reduced routing congestion while still maintaining very low latencies and simplicity and flexibility of design.

Smart customers evaluate the alternatives to compare performance (bandwidths and latencies) vs. cost in terms of silicon area, routing congestion, power and price.

Packetization is great to increase wire utilization, but comes with a packetization/depacketization overhead that costs latency, gates and power. So should be used where it helps most to cross long distances with multi-cycle timings, f...

Cache Coherency and Verification Seminar at DAC – Now Online!

If you missed DAC, then you missed the seminar on cache coherency and verification of cache coherency given by ARM and Jasper. Learn how to overcome the typical challenges of capturing the intent, reviewing possible scenarios and how to correct errors in functional terms as they relate to the specification.

The seminar (on-line recording) kicked off with a discussion on how the implementation of hardware-based coherency in high-performance parallel compute environments is not new. The seminar quickly revealed that architects and designers of high-performance, heterogeneous, embedded multi-processing SoC’s, particularly those with one or more caches and when many masters share a single area of memory, now require robust specifications, design & verification tools and systems IP, to ensure their devices minimize off-chip memory transactions, while maximizing performance and power efficiency. The seminar then went on to explain why ARM has chosen to include Coherency Extensions...

Upgrading Your Verification For Cache Coherency With Jasper!

Are you using a mobile device? Chances are you’re probably reading this blog post on one. You’re also probably reading this with the confidence that your device is doing what you intend it to do. As consumers, we place a lot of demands on not only our mobile devices but also the rest of our personal electronics. We want them to perform all sorts of tasks efficiently, accurately, and with minimal power consumption. As an engineer, you’re aware of the complex embedded SoC’s used in your device to make sure that it does what’s intended. Today’s embedded SoC’s are high performance, heterogeneous, and multi-processing systems.

In the future, most of these embedded SoC’s are likely to contain multiple caches that share a single memory resource. At a high level, cache coherency means that two caches cannot have same cache line in a dirty state and that if a cache contains a cache line in a unique state, that line must not be in another cache. In addition, at least one transaction must always be able make forward progress (no deadlock.) To prevent these cache coherency issues, ARM has included cache coherence extensions in their AMBA 4 protocol specification.

While this is a tremendou...

  • (7 Pages)
  • +
  • 1
  • 2
  • 3
  • Last »
All company and product names appearing in the ARM Blogs are trademarks and/or registered trademarks of ARM Limited per ARM’s official trademark list. All other product or service names mentioned herein are the trademarks of their respective owners.