Simulations of Elba at this point of the program were starting to supply some rather noticeable power levels for the processor, especially at the design corner we were most familiar with. The worst case design corner is a statistical point across the variations that you could potentially see from a silicon process at a temperature that is assumed to not exist. Remember, ARM's primary market was mobile devices, so for these devices a manufacturer wanted to know that every chip delivered from the fab would achieve the defined performance. So speed would always be defined by the statistically slowest piece of silicon, and the power would be defined by the statistically fastest and hottest piece of silicon – neither of which would ever exist in reality, but allowed the manufacturer to maximise device yield without testing each part for its performance. You may know that around 1 billion phones were sold last year – and that would be a lot of cost to ‘speed-bin’ parts across that market.
There are various ways to speed-bin a SoC, but basically the two main ones are to split them across their maximum clock speed, or the power they consume while achieving a given clock speed. The general microprocessor market is very familiar with the first, for years folk have bought and paid more for the fewer parts that go faster than other parts. So, rather than sell all parts at say $25 and say all of them will achieve, say 500MHz as was typical in the ARM ecosystem, this speed-binning would allow the exact same silicon to be sold at say $20 for the few parts that can only reach 500MHz, and then maybe double for the fast ones that would typically be able to achieve 1GHz. As vendors expert in binning parts also know, you can sell parts that would typically have been sold as fast parts as low-power parts since these can reach the target speed using a lower voltage – a good reason to block such a device from being overclocked I think.
Anyway, back to the power management of the processor macro. The power number we were seeing kicked off two new aspects to the program, the first was the creation of various independent power regions across the macro, and the other was the physical IP layer, the actual transistor level of the design, where we started to look at various different transistor designs that could be used in the “G” process but cause as much leakage. The design of the actual gates was then defined in collaboration with the processor designers so that specific logic paths through the RTL design could maximize performance, while reducing power on other non time-critical paths. Both these developments are now available as a physical IP product, the multi-channel library, and the Artisan Processor Optimization Pack, (PoP).
Within the macro, there were eight independent power regions, each allowing the power to be removed from that aspect of the macro, these included each CPU, each NEON unit, each debug trace unit, debug itself, the MBIST controller and finally the L2 controller and processor snoop unit. With so many power domains, clearly a lot of effort was then needed to ensure the current in-rush when these blocks were brought back online didn’t surge higher than the design envelope. The complexity of the problem was further increased with a design goal of ensuring power could be restored within 100ns. This was achieved with a hierarchy of power switches throughout the design and integral logic to restore synchronization.
Power Optimized Design
The key component to address in the power optimized design was the gate leakage, especially at higher temperatures. We already had all the typical transistor types available, HvT transistors are typically used to reduce leakage, but these were not enough for the power optimized macro to have any commercial interest. So we set ourselves the goal that it must be able to clock faster than the equivalent “LP” progress while consuming less power at each temperature/voltage point, a goal that needed something very different. The ‘magic bullet’ was to design cells that had exactly the same dimensions of the standard cells for the process, but design them with an increased channel length. In our case, this meant having 50nm cells available for the 40nm process. These cells could be used interchangeably with the native 40nm cells, and could also be used in combination with Hvt and other transistor speeds too. Together, the result is the power optimized 40G macro has an active power characteristic that is higher speed and lower power than 40LP, and actually more closely matches the more costly 32LP process – something that has proven to be commercially very interesting.
In part four of this blog I’ll outline how we brought the complete design together and the conclusions we drew.
- Part 1: “Wouldn’t it be interesting if we…” – Giving Birth to ‘Elba’
- Part 2: Elba – How do we know it works?
- Part 4: Elba - Bringing it all together
John Goodacre, Director of Technology and Systems, ARM. John joined ARM in February 2002 and took responsibility for their platform architecture. Today he is Director of Program Management focused on various programs around the application processor’s technology roadmap including the definition and market development of the ARM MPCore multicore processor technology.
Prior to working at ARM, he specialized in enterprise software having worked for Microsoft for 5 years, firstly as Group Program Manager in the Exchange Server group and latterly as the manager of a team developing mobile phones software.
Graduating from the University of York with a BSc in Computer Science, John has over 20 years experience of realizing new technologies in the engineering industry.
0 Comments On This Entry
Please log in above to add a comment or register for an account
Kickin’ It Up in Austin at DAC’s 50th with ARM and its Partners
on May 22 2013 08:34 PM
ARM Cortex-A57 Test Chip on TSMC 16nm FinFET Process Optimizes Tools & Flows
on May 21 2013 08:48 AM
Seven tips for ARM Accredited Engineer exam success
on May 20 2013 09:22 AM
The Server in Your Hand - and the three new interfaces inside it
on May 09 2013 08:54 PM
A DATE with Computing Destiny
on Mar 18 2013 06:57 PM