Login

ARM The Architecture For The Digital World  

ARM Community: Software Enablement - ARM Community

Jump to content

Page Colouring on ARMv6 (and a bit on ARMv7)

Page colouring is a technique for allocating pages for an MMU such that the pages exist in the cache in a particular order. The technique is sometimes used as an optimization (and is not specific to ARM), but as a result of the cache architecture some ARMv6 processors actually require that the allocator uses some page colouring. Some ARMv7 processors also have related (though much less severe) restrictions. This article will explain why the cache architecture imposes this restriction, and what it means in practice.

Note that this restriction only very rarely needs to be considered outside of the physical memory allocator in the kernel (or other privileged code). Typical user-space code probably won't have to deal with this directly, though understanding page colouring can help to explain why some mmap calls work on ARMv7 but fail on ARMv6, for example.

The restriction stems from the fact that many ARMv6 processors use VIPT caches. VIPT means "virtually indexed, physically tagged". If you're not familiar with cache terminology, that probably won't mean a lot, but I will try to explain by way of example.

In general, ARMv7 is not affected by ARMv6's page colouring restrictions. However, ARMv7 can have VIPT ...

Design West (ESC) Day 1: Optimizing Your Software on ARM

It was a full first day at the ESC Summit of Design West. I spent most of the time doing what I love to do best: talking with ARM Partners. Most of the conversations focused on how engineers can achieve better software on their ARM processor-based SoCs.

Andy Frame had a few meetings in the morning so he passed the baton (microphone) to me so I can share some of the insights with the ARMFlix followers. I’m looking forward to Day 2 and speaking with more of the ARM Partners at ESC (check out our handy map). Don’t miss the ARM Connected Community Theatre at ...

Ne10: A New Open Source Library to Accelerate your Applications with NEON

The past three years we have seen explosive growth in the use of the NEON™ SIMD engine by many of our software partners in the open-source community. The engine itself, defined as part of the ARM® Architecture, Version 7 (ARMv7), has shown itself to be extremely flexible and able to accelerate everything from Video Codecs such as VP8 to elements of the emerging HTML5 standard including <svg> and <canvas> filters. From an applications developer viewpoint, all of this acceleration takes place behind the scenes in upstream open source projects that are harvested to build the latest and greatest open source operating systems and frameworks such as Android™ and QT. While it is good to kno...

Setting Up Android Mobile Phone to Use ARM Streamline for Profiling

This is an article describing the steps how to set up your Android Phone to run ARM Streamline Performance Analyzer.

ARM Streamline Performance Analyzer is a system-wide visualizer and profiler for targets running ARM Linux or Android native applications and libraries. Combining an ARM Linux kernel module, target daemon, and a graphical user interface, it transforms system trace and sampling data into reports that present the data in both visual and statistical forms.

Streamline supports Cortex™-A8, Cortex-A9, ...

Developing Top Performing Graphics Applications for Android Made Easy

The new DS-5 Community Edition brings CPU and GPU statistics together to speed up Android games and applications

Game Developers Conference (GDC), San Francisco - These are very special days for Android application developers targeting ARM processor-based devices.


Attached Image
On March 2nd the version of the ARM® Development Studio 5 (DS-5™) toolchain dedicated to Android native application developers, the DS-5 Community Edition (CE), was selected as finalist for the Eclipse Community Awards in the ...

Top 2011 ARM Software blogs: Android, NEON, RISC vs CISC & Assembly

2011 was a busy year for developing software on ARM and the activity is reflected in the page views of top Software Enablement blogs. The topics included managing caches, Android (multiple), NEON (multiple), Memory Access Ordering, RISC vs CISC architectures (multiple), and optimizing assembly code (listed by popularity below). In addition, the Software Enablement Community pages (Linux, Solution Center for Android, RTOS, Microsoft, etc) were some of the highest referenced pages on the ARM site. Please let us know if you have more ideas for easing your software development on the ...

x264 on ARM: Bringing a wider application of video conferencing (Part 1)

Video is increasingly becoming an important and essential part of consumer electronics. Video centric features like augmented reality and video conferencing provide enhanced visual user interaction. Such features are now expected across a wide variety of application segments. In the embedded world, intensive video compression is typically done using standard DSP’s or specialized hardware accelerators, as they can provide both the specialized functionality and the high level of performance required. However, now ARM processors with NEON™ technology can be as capable of compressing video as some dedicated hardware, and do so with greater power efficiency.

H.264/MPEG-4 AVC (Advanced Video Coding) is currently one of the most commonly used formats for the recording, compression, and distribution of video content. The H.264 video format has a very broad application range that covers all forms of digital compressed video from low bit-rate Internet streaming applications to HDTV broadcast. With the use of H.264, bit rate savings of 50% or more ar...

Optimizing DirectFB with ARM NEON

DirectFB (Direct Frame Buffer) is a graphics library that is widely used in embedded systems, especially home market. More and more applications or libraries choose DirectFB as backend, such as Cairo, GDK, Qt, V8, X11 and Webkit. ARM NEON technology could be well used in 2D acceleration. In this blog, I’ll describe how to optimize DirecFB using NEON.

1. Introduction
1.1 DirectFB Introduction
DirectFB (Direct Frame Buffer) is a thin library that provides hardware graphics acceleration, input device handling and abstraction, integrated windowing system with support for translucent windows and multiple display layers. It is free software licensed under the terms of the GNU Lesser General Public License (LGPL). Graphics features provided by DirectFB including Rectangle Filling/Drawing; Triangle Filling/Drawing; Line Drawing Blit; Alpha Blending (texture alpha, alpha modulation); Porter/Duff; Colorizing; Source Color Ke...

Memory access ordering part 3 - memory access ordering in the ARM Architecture

In my previous posts, I have introduced the concept of memory access ordering and discussed barriers and their implementation in the Linux kernel. I chose to do it in this order because I wanted to start by communicating the underlying concepts before I went into detail about what the ARM architecture does about memory ordering. This post goes into the juicy bits of what this actually means and how this is handled in the ARM architecture.

Two separate concepts are relevant to memory access ordering in the ARM architecture — memory types and shareability domains. These progressively made their explicit entry into the ARM architecture in versions 6 and 7, implemented by the ARM11 and Cortex family of processors respectively.

Enter the abstract

When describing many of the concepts mentioned in thi...

10 ways to give your customers the DS-5 experience

ARM Development Studio 5 (DS-5) is the software development tool that sets the standard for ARM with its optimising compiler, its extensible and easy to use debugger, and its unique analysis tool, Streamline. But DS-5 is not just for the ARM IP: if you're the designer of an ARM based SoC, an operating system that supports ARM or have productivity tools that support ARM, you can join the DS-5 ecosystem to make sure that your customers also get the DS-5 experience. Here are 10 ways you can do it.

1. Add DS-5 debug support to your ARM based SoC. The DS-5 debugger has a target database that is extensible by you,...

Debug & performance analysis of Linaro images with ARM Development Studio-5

ARM Development Studio 5 (DS-5™) provides a user friendly interface for debugging Linux applications running on ARM platforms. Also built into DS-5 is ARM Streamline, a powerful profiling tool that allows us to measure the performance of Linux applications running on ARM Linux.

The PandaBoard is a compact mobile platform built around the Texas Instruments OMAP 4430 processor. With a dual-core Cortex-A9 processor at its heart, it is ideal for running ARM-Linux, and has good connectivity.

In this article we will go through the steps required to setup Linux on the PandaBoard using files supplied from ...

Using the ARM Profiler with the Cadence Virtual System Platform

One of the most common requests from software engineers running software on a Virtual Platform is to be able to profile the executing software. In this document I will describe how to use the ARM Profiler included in RVDS Professional with the Fast Models from ARM that are commonly used to create SystemC Virtual Platforms with the Cadence Virtual System Platform (VSP). As you can see from the introduction there is a combination of products that all work together to enable non-intrusive profiling of embedded software, but at times the HOW TO details of profiling can be a bit of a mystery and knowing where to look to find the details may not be obvious. In fact, a recent VSP user tried to setup profiling by himself and was not successful and wasn’t really sure where to look. As a result I created the following information that I’m sure would be valuable to many other readers.

...

Porting Linux made easy with DS-5

Here at ARM, a colleague recently wanted to port Linux to a prototype of a new high-performance Cortex-A9 based platform. To develop and debug this port, he needed to be able to set breakpoints, view registers, view memory, single-step at source level, and so on, in fact all the normal facilities provided by a debugger, but he wanted to do these both before the MMU is enabled (with a physical memory map), and after the MMU is enabled (with a virtual memory map).

The DS-5 Debugger has a slick Debug Configuration dialog in Eclipse that makes it easy to configure a debugging session to a target. Predefined debug configuration types include “Bare Metal Debug”, “Linux Application Debug”, and “Linux Kernel and/or Device Driver Debug”. The latter is the topic of this blog. This debug configuration type is primarily designed for post-MMU debug to provide full kernel awareness, but also has some extra features that allow it to be used for pre-MMU debug too. This makes it possible to debug the Linux kernel, all the way from its entry point, t...

From Zero to Boot: Porting Android to your ARM platform

This article describes how to get Android running on your favourite ARM-based System on Chip (SoC) board. We run through the overall procedure and point out potential pitfalls and other things that you may encounter.

Since the Android software stack was primarily designed around the ARM Architecture, there are not many things that need amending to get it to work on another ARM platform.

We assume that your workstation has Ubuntu (10.10 or later) Operating System installed, and that you have already followed the instructions found at [1] to be ready to build Android sources. These instructions have been tested with Ubuntu 10.10, but they should be compatible with other GNU/Linux OSes.

Terminology

For the purposes of this document, we use the following terms.

Mainline kernel ...

如何将Android带入互联网数字家庭? 第二篇

在上一篇中,我们探讨了数字电视/机顶盒软件架构的现状与未来,分享了数字家庭软件平台未来的发展趋势和特点(http://bit.ly/jCvlNs)。在本篇中,我们将一起来探讨为什么Android能够成为未来数字家庭软件平台的选择之一;而我们又如何才能将原本为手持设备量身定做的Android移植到电视/机顶盒平台?

1. 首先,我们需要回答的第一个问题就是:
Why Android?
为什么
Android能够成为未来数字家庭软件平台的有力竞争者?
先来看看Android自身的天然的优势:
Android 是一套完整的消费电子设备的软件解决方案,它包括:...

如何将Android带入互联网数字家庭? 第一篇

Android作为优秀的开源软件解决方案, 它的作用域已经从手机市场,波及到了平板电脑,甚至以数字电视、机顶盒为典型应用的数字家庭领域。Android最初是为手机移动设备量身定做的,它默认支持的分辨率,色彩深度, 多媒体播放架构,用户交互方式,2D/3D图形的性能等都无法适应类似于数字电视,机顶盒这样的家庭应用。
因此,将Android移植到数字电视或机顶盒需要对Android进行大量的定制和修改。这些修改和定制涉及到Android软件架构中的各个层面,我将用四篇Blog来依次介绍如何将标准的Android移植到数字电视或机顶盒平台。

在开始我们的讨论之前, 我们先来简单探讨一下目前数字电视和机顶盒软件的现状和未来
数字电视/机顶盒软件的现状
1 软件架构的差异化

目前, 数字电视和机顶盒的软件架构中由于使用了不同的操作系统, ...

Linaro Second Engineering Cycle Highlights

As we come to the end of our second engineering cycle, I thought it would be interesting to highlight 4 of the initiatives happening in Linaro that I believe are having the biggest impact on how we are demonstrating Linaro delivering on its initial mandate.

Linaro Evaluation Builds (LEBs): We’ve had an almost universally positive reaction to the initiative we started this year – to deliver evaluation builds of popular OSS distributions on our Member’s hardware. Our initial targets are Android and Ubuntu. The LEBs provide an integration point for Linaro Working Group developments, delivered on a set of reference platforms for the relevant OS. LEBs were created to make it easier for companies producing distributions or vertically integrated open source stacks to adopt Linaro software, reduce time to market for our Members through streamlined integration and validation of our Landing Team efforts, and mediate the flow of innovation between Linaro and their engineering teams ...

Memory access ordering part 2 - barriers and the Linux kernel

My previous post provided an introduction to the concept of memory access ordering. It did not however provide any solution to the problem, or necessarily specify where such ordering can be significant.

Now, not all software developers need to be deeply aware of memory access ordering or barriers. Unless your code interacts directly with hardware, interacts directly with code executing on other cores or directly loads or generates instructions to be executed, things will mostly Just Work. If your interaction with hardware is completely through a device driver (meaning: no device control registers mapped directly into your application), then it is the responsibility of the driver to enforce ordering. If your communication with software running on a different core makes use of a multithreading API, for example using Pthreads or Java threads, then it is the responsibility of that API to enforce ordering. If your program executes on an operating system that implements demand paging, then clearly it is the responsibility of the operating system to enforce ordering of such operations.

However, if you are writing device drivers, implementing your own thread-communications or creating a JIT compiler, then not being aware of the proper use of barriers can lead to unexpected and difficult to diagnose problems. Where your program requires a ...

How to run LAMP and Drupal on a PandaBoard in seven simple steps

This tutorial explains how to have a LAMP server running Drupal on a PandaBoard. These instructions will apply to any other Cortex-A platform with few or no changes.

The growing variety of ARMv7-based inexpensive and easy to use devices, like PandaBoard, opens the door to leveraging ARM energy efficient and small form factor performance with server software. The availability of the Ubuntu Linux distribution for ARM, makes this a really simple task. The possible applications are many: domestic server, small business server, hobbyist experimentation, web development.

It can also be used to gain experience and become more prepared for the arrival of large-scale, ARM-based server systems.

One of the most well-known server software stacks is LAMP. There are many varieties of LAMP, but the most common is the combination of

Drupal is a versatile, well-known, open source platform running on top of this stack. The software is a generic Content Management System, used as the basis for many sites, from blogs to community forums to government web pages. Notable examples are The White House, Ubuntu, FastCompany, ...

Memory access ordering - an introduction

I recently gave a presentation at the Embedded Linux Conference Europe 2010 called Software implications of high-performance memory systems. This title was my sneaky (and fairly successful) way to get people to attend a presentation really about memory access (re)ordering and barriers. I would now like to follow that up with a few posts on the topic. In this post, I will be introducing a few concepts and explain the reasons behind them. In future posts, I will follow up with some practical examples.

The Sequential Execution Model

In the Good Old Days, computer programs behaved in practice pretty much the way you might instinctively expect them to from looking at the source code: Things happened in the way specified in the program.Things happened in the order specified in the program.Things happened the number of times specified in the program (no more, no less).Things happened one at a time.


In modern computer architecture, this nostalgic fantasy is sometimes referred to as the Sequential Execution Model. In order for existing programs and programming models to remain functional, even the most extreme modern processors will attempt to preserve the illusion of Sequential Execution from within the executing program. However, underneath your feet...

ARM系统预引导固件的新机遇- UEFI, Part 2

上回我介绍了UEFI和它的历史,现在我将探讨它,特别是在ARM的系统上的优越性。我也会更详细地描述UEFI论坛的组织结构。

优越性
尽管现有的ARM预引导固件并没有BIOS的束缚,使用UEFI标准对ARM预引导固件仍有很多优越性。 OEM / ODM厂商一直在试图降低开发成本。代码共享是在预引导固件领域实现这目标的一种方式。

ARM和x86都注重计算连续性,UEFI不仅使得在ARM产品之间或在x86产品之间代码共享,还可以让代码在不同处理器架构的产品之间共享。产品可以共享外围设备(网络,SATA,USB控制器等),以及众多的设计功能集。

图2显示了从x86到ARM的一个移植有99.42%的代码不需要更改。

Attached Image
...

UEFI – A New Opportunity for Preboot Firmware on ARM-based Systems, Part 2

Previously I introduced UEFI and its history, now I will get into its benefits, especially for the usage on ARM-based systems. I will further explain the organization of the UEFI Forum.

Advantages
Even though existing ARM preboot firmware does not have the BIOS limitations, there are many advantages for ARM preboot firmware to standardize on UEFI. OEM/ODMs are always looking into reduced development cost. Code sharing among products is one way to achieve that.
With ARM and x86 both in the computing continuum, UEFI not only enables code sharing among ARM products or among x86 products, it also enables code-sharing across processor architectures. Products may share many of the peripheral devices (Network, SATA, USB controllers, etc.) and feature sets across the designs.
Figure 2 shows an ARM port where 99.42% lines of code does not need to change from an x86 port.
Attached ImageFigure 2: Lines Added/Change...

An introduction to ARM Development Studio 5 (DS-5)

A couple of weeks before Christmas, ARM released v5.3 of its new software development suite, DS-5. DS-5 is a new product, introduced to the market last year, but it builds on 20 years of software development tools from ARM. I have been personally involved in this development since inception, when we decided to embrace open source frameworks and build around Eclipse, and I’m very proud of what we have achieved. We’ve created a great new development tool chain with very broad applicability, helping to make it even easier to develop for ARM based platforms, and enabling collaboration with our partners and the ecosystem. In this short article, I'll describe what I mean by all this.

Firstly, at the heart of the ARM tools is comprehensive support for the ARM device itself. The tools are used here at ARM during the development and validation of the ARM architecture and ARM CPU, and are designed to make the best use of the features provided by the CPU and associated debug and trace capabilities with technology such as ...

Valgrind 3.6.0 for ARM-Linux

Version 3.6.0 of Valgrind was released a couple of weeks ago. Probably the largest change this release is the addition of support for Linux running on ARM.

Valgrind is a GPL'd framework for building simulation based debugging and profiling tools, plus a set of "standard" tools. The best known of these is Memcheck, a memory error detector, but in fact it is only one of eight tools in the standard distribution: two memory checkers, two thread checkers, two performance profilers and two space profilers.

You can download the sources from www.valgrind.org. Alternatively, you may be able to get pre-built packages via your Linux distro, or via Linaro, although note that the 3.6.0 upstream release post-dates pre-built packages. 3.6.0 is known to work on Ubuntu 10.04 and 10.10 on ARM, and on the Nokia N900 running Maemo 5.

Also available online is full documentation. For those impatient to get going, the ...

Wealth of knowledge found at ARM Techcon: Linux, Android & development tools

The 2010 ARM Technology Conference (Techcon) is taking place in Santa Clara next week. A large number of companies will be presenting their solutions to support development and optimization of products based on ARM technology, and open source will be discussed in many of these with projects like Linux, Android and development tools. For instance, many of these solutions are using open source to leverage earlier work that ARM has done with the open source community, contributing CPU and architecture support to the upstream Linux kernel and GNU compilation tools ahead of partner silicon platforms being available. One of the most recent illustrations is the contribution of Cortex-A15 CPU support to the Linux kernel as the processor was announced. Linux kernel and GNU development tools are key building blocks to support the development of solutions such as Android, ...

Cortex-A15 to A5: Software compatibility from Superphone to Feature phone

It was always about the code (and where it would be used!)

When I was a software developer I would often find that the project team I was in would try to guess how many devices the code would eventually run on. So at the launch of the Cortex-A15 last week one of the main points that hit home for me was just how wide the spectrum of power and performance points the Cortex-A family of processors could cover - from feature phone to superphone, tablet to DTV, home server to web server etc. This means that a developer could now find their software running across a huge range of devices in the future.

So is it the same software?

Absolutely. Cortex-A15 is based on the same ARMv7A architecture that the other Cortex-A processors use, therefore allowing the exact same application code to run on all of them, from a ...

Using DS-5 with Gumstix Overo

DS-5 Application Edition can be used to debug a Linux application running on pretty much any ARM Linux target, with a network connection, not just the BeagleBoard that is used in the examples. Ronan, a colleague of mine, saw the cute Gumstix Overo COM (Computer-on-Module) and convinced me I needed to get one and give it a try with DS-5.

Attached Image

The tiny Gumstix Overo next to 50p to show a size comparison


First I ordered the Gumstix Overo Water, but any of the Overo models (Earth, Air, Fire) will probably work the same for my purposes here. I also ordered a Gumstix Tobi so that I can easily hook it to Ethernet and/or USB.

The Gumstix developers website has great getting started material. There seem to be at least two other useful Gumstix websites as well: www.gumstix.com, and ...

How do you make Java Fast? Answer: Go down the pub! Part 2

In January 2009 Ed set about the task of rewriting the interpreter. Java byte codes are quite a compact code for representing programs but the Virtual Machine they target has a stack architecture rather than a register architecture. This makes the Java VM somewhat at odds with modern day processor architectures such as ARM which is register based. Ed’s approach was to use a Peephole Optimizer to spot common byte code sequences that loaded items onto the Java Virtual Machine stack, manipulated them and stored them back into memory. These complete sequences could then be executed in optimized ARM assembler. Having done this process for the first version of the optimized interpreter longhand it became clear that this repetitive task could be eased with the creation of a notation to describe the sequences and how they related to the ARM assembler. (Ironically I did the same thing some 20 years ago for a different VM). A tool could then be used to automatically generate the template interpreter from the notation. The tool proved to be extremely useful and was naturally processor agnostic so we contributed it back into open source along with the optimized interpreter it generated last year.

Ed’s optimized interpreter for OpenJDK increased the performance by around a factor of 4X for a couple of the classic Java benchmarks- Embedded Caffeine Mark and EEMBC Grinderbench ...

Locks, SWPs and two Smoking Barriers (Part 2)

In the last article, I explained how to modify SWP code to make use of compiler intrinsics. Using intrinsics hides the underlying detail needed to use the load and store exclusive instructions (LDREX and STREX) and the use of memory barriers. In this article I look at implementing atomic memory accesses in assembler.

In order to describe memory barriers, what they are and how they should be used, I need to describe two types of memory model, strongly ordered and weakly ordered. The strongly-ordered model is very natural for programmers. In this model, the order that a program writes data to memory is the order in which the data is observed being written into memory, that is, other programs sharing the data will "see" the same ordering regardless of the CPU that they are executing on.

For example, if a CPU writes a new X then writes a new Y, all other CPUs that subsequently read Y then a read X, will access either the new Y and new X, the old Y and the new X, or old Y and the old X. However, because the order of the write is strongly ordered as write X first then write Y, n o CPU will access the new Y and the old X .

Modern CPUs, such as ARM, optimize memory acce...

Why is Open Source Important?

Sitting in the airport at the end of a week’s business trip to the US, I reflected back on the week. It turned out that my colleague on this trip has an even worse sense of direction than myself…Potentially disastrous, especially when you’re driving between airports, hotels and meetings in cities that you’ve never visited. This is where Google Maps becomes utterly indispensable. Installed on my Nokia E71 it makes use of the built in GPS and 3G and Edge networks to provide a running view of where we are, driving or walking. Without it we wouldn’t have found the wonderful Boulderado hotel or the Boulder Bookstore with its impressive converted ballroom. Actually, we’d probably still be driving around somewhere near Dallas.

Life changing and mind boggling as the online, always connected life of a sometime digital nomad is, w...

Locks, SWPs and two Smoking Barriers

Before ARMv6, the main synchronisation mechanism was the SWP instruction. SWP has two aspects, in a uniprocessor system it allows the read and write operations not to be interrupted between them. In a multiprocessor system it ensures that multiple masters will do the locking. For multiprocessor systems with complex memory hierarchies and long memory latencies SWP creates performance bottlenecks.

This was replaced in the ARMv6 architecture by exclusive loads and stores (LDREX and STREX). This works on the principle of a monitor existing for the location in memory. This effectively tags the memory with the identity of the agent(s) trying to access it. In a spinlock implementation, an exclusive load reads data from the memory, tagging it with its identifier. A short number of instructions later, it uses an exclusive store to write data to memory but this only works if the tag is still valid and the tag will only be valid if some other ag...

Hello World! SW Development, Optimization and Partnership on ARM

ARM is hiring smile.gif OK, so that got some people’s attention and confused others – actually we are hiring, and in particular software developers. What can often come as a surprise to people is that as well as having a team of people that go plan and work alongside some our different software partners, ARM has a software engineering group that work on key bits of software, particularly on Cortex-A8 and Cortex-A9 projects. In fact it's very likely that some of the code running your mobile phone was developed by some of the ARM team.

The team cover a wide range of software projects that include:Web and web runtime optimization, for example JavaScript JIT optimization work on projects such as Tamarin, Webkit and Squirrelfish NitroExtreme), and OpenJDK optimization work.Operating System development – including Android, Linux kernel hacking and a ...

  • (19 Pages)
  • +
  • 1
  • 2
  • 3
  • Last »
All company and product names appearing in the ARM Blogs are trademarks and/or registered trademarks of ARM Limited per ARM’s official trademark list. All other product or service names mentioned herein are the trademarks of their respective owners.