For me, the most important aspect of this talk was the public announcement of the availability of a new Cortex™-A5 Hard Macro for the TSMC 40nm Low Power node (40LP) which can achieve a whopping speed of over 1GHz in a tiny footprint of just 1mm2.
This Hard Macro is based on a uni-processor implementation of the Cortex-A5, and includes NEON™, Floating Point, and 32KB Instruction and Data Caches. This configuration delivers excellent performance and power consumption while also being completely compatible with high-performance Cortex-A9 solutions, enabling out of the box support of Android™ and other rich OS’s. However, being so small and low power, the implementation is also a fantastic replacement for the classic ARM926™ and ARM1176™ processors which are still shipping in the hundreds of millions, delivering a stepping stone to modern Cortex capabilities and full Android support for a huge raft of devices.
The “PPA”, ARMs shorthand for Power, Performance and Area stats of this hardened core, are impressive in their own right and deliver unrivaled performance per milliwatt and performance per square mm, but there is another story of why ARM decided we needed to develop this type of hard macro cell and the cyclical nature of engineering.
The first reality of developing an advanced SoC is that it is really hard, and our partners have to develop the skills and depth of knowledge to undertake a large project. If you are a big silicon developer you probably have developed lots of skills over many years. If you are a small or medium sized company you may have drafted in a few seasoned engineers, who may be the founders, but largely you will have a group of good, talented but unseasoned engineers. The challenges break down into three groups.
There are challenges around the processor technology. Taking an advanced processor which may have many cores and many optional components, implementing it correctly and validating it is challenging even for many large organizations with big engineering teams and swathes of experience. For many small and medium organizations implementing a dual or quad core design with timing path constraints can represent a herculean task.
There are challenges around the process implementation technology. If silicon designers are taking advanced Cortex-A series processors they are likely to be developing in advance process geometries, such as 40nm and 28nm, where SPICE models may still be embryonic and where the design rules are “evolving”. Further the electrical characteristics have altered radically from only a couple of generations ago, such that leakage is starting to dominate some designs, and new mitigation philosophies must come into play, with differing percentages of low-threshold, or high-threshold, gates; with multi-channel length gates; with more aggressive power and clock management; and with DFT/DFM methodologies and requirements.
And then there are commercial challenges. The requirement from marketing to hit emotive frequency targets, such as 1GHz for the Cortex-A5 40LP Hard Macro, to ensure products are considered for application sockets. The need to hit market windows that are shortening even as the technology becomes more complex, and of course, the need to innovate and deliver a differentiated solution that enables the execution of standardized rich Operating Systems, such as Android, while still delivering the unique capabilities that will move the market forward.
With these growing challenges there is a need for IP suppliers, such as ARM, to evolve our support and make life easier for our silicon partners, reducing configuration confusion, minimizing implementation issues, and removing time to market and business risks.
To better help our partners we have evolved our ActiveAssist support program. This program enables support against the partners customer needs which can vary widely, and enable a level of knowledge transfer which can build confidence and capability for advanced system.
This solution is ideal for a partner who is looking to educate their engineering staff or who wants to develop something unusual, market specific, or super-challenging. It is often significantly preferable than learning through trial and error, and the extended project cycles this leads to, but it still requires significant time and expertise and unfortunately often does not deliver an exceptional implementation at the end of the development cycle.
Secondly ARM innovated the Physical IP offering to deliver specific Processor Optimization Packs, commonly known as POPs. These libraries of Physical IP have been carefully tuned to deliver optimal performance and guidelines for implementation for the processors for specific process nodes, such as the Cortex-A9 or Cortex-A5 on 40LP.
Utilizing the POP silicon designers can develop a wide array of implementations based on different configurations of the processor, and different mixtures of the cell libraries providing unlimited flexibility. However it can still take significant effort and of course there is still some design implementation risk.
The third innovation, and the prompt for this blog, is the reintroduction of hard macro cells.
Not so many years ago all processor IP was delivered in hardened format. The ARM7TDMI® processor was successful as a hard macro, and only later was it extended to the ARM7TDMI-S™ to denote it was synthesisable. In fact right through to the ARM1176 processor there was development of hardened versions of the processors to help smaller organizations get product to market, although innovation in tools reduced some of the need. But in engineering, as in life, most things are cyclical, and with the growing challenge the industry is facing there is an urgent need to provide simple, elegant, and functional solutions.
The beauty of hard macros is that they offer a high performance implementation that is fully benchmarked, while ironing out all the challenges and minimizing the power consumption. Not only is the PPA assured but the design further integrates all of the Design for Manufacture (DFM) and Design for Test (DFT) requirements, making it ready to run out of the box.
Naturally there is a trade off in any out-of-the-box solution, with the features being largely fixed at design time. However, given the majority of implementations exist in fairly narrow ranges, these hard macros do service a large percentage of the application needs out there. For example, beyond the decision for the number of cores and process node and cell library, many partners will always choose to include NEON and floating point, and decent cache sizes to maintain compatibility for the operating systems.
At present there are three Hard Macros available providing best-in-class performance at challenging process nodes. The first of these is the new Cortex-A5 UP Hard Macro I mentioned at the start of this blog, while the other two are based on the Cortex-A9 and are dual-core solutions based on the TSMC 40G process. One of these has been design as a truly high performance solution – achieving in excess of 2GHz at typical operation conditions. The second has been designed as a more power conservative solution, delivering over 4000 DMIPS for just 500mW.
The requirement for Hard Macros is strong as the industry tackles even tougher challenges, including delivering high-performance quad-core systems, heterogeneous computing, and evolving larger compute sub-systems with GPGPU, while at the same time small and medium organizations extending to 28nm and beyond.
It is true that there will always be partners who need the full flexibility of RTL and POPs, but there is also a group for whom having a pre-integrated and hardened ready to run solution out of the box is the best route to market.
Haydn Povey is Director of Marketing in the ARM Processor Division. He divides his time between process and implementation technologies for high performance processor cores and driving the need for enhanced security from bank payment cards through to electronic wallets on smart phones.
0 Comments On This Entry
Please log in above to add a comment or register for an account
Kickin’ It Up in Austin at DAC’s 50th with ARM and its Partners
on May 22 2013 08:34 PM
ARM Cortex-A57 Test Chip on TSMC 16nm FinFET Process Optimizes Tools & Flows
on May 21 2013 08:48 AM
Seven tips for ARM Accredited Engineer exam success
on May 20 2013 09:22 AM
The Server in Your Hand - and the three new interfaces inside it
on May 09 2013 08:54 PM
A DATE with Computing Destiny
on Mar 18 2013 06:57 PM