Login

Important information

This site uses cookies to store information on your computer. By continuing to use our site, you consent to our cookies.

ARM websites use two types of cookie: (1) those that enable the site to function and perform as required; and (2) analytical cookies which anonymously track visitors only while using the site. If you are not happy with this use of these cookies please review our Privacy Policy to learn how they can be disabled. By disabling cookies some features of the site will not work.

ARM Community: Software Enablement - ARM Community

Jump to content

Christmas, Linux and Linaro

Attached Image
‘Tis the season when most people start to look back on the year and evaluate how well things have gone and in some cases, not gone . I just attended a meeting held by Linaro, and I started to look back over what has gone on over the past three years since this organization was formed.

Linaro is a not-for-profit company dedicated to developing a great ARM kernel for Linux. The initial concept for the company came out of a discussion with a mobile OEM who said each of its system-on-chip (SoC) vendors had a separate Linux kernel for ARM. This was clearly very in-efficient not only for SoC companies but also OEMs, and the key software partners needed to add more resources and time to fix the problem—not good for anyone. So ARM decided to get all the key players together (Freescale, IBM, ...

Top 2011 ARM Software blogs: Android, NEON, RISC vs CISC & Assembly

2011 was a busy year for developing software on ARM and the activity is reflected in the page views of top Software Enablement blogs. The topics included managing caches, Android (multiple), NEON (multiple), Memory Access Ordering, RISC vs CISC architectures (multiple), and optimizing assembly code (listed by popularity below). In addition, the Software Enablement Community pages (Linux, Solution Center for Android, RTOS, Microsoft, etc) were some of the highest referenced pages on the ARM site. Please let us know if you have more ideas for easing your software development on the ...

Solving the Challenge of Software Complexity for Today’s Embedded Developer

Just launched today: the Embedded Software Store! As mentioned in my previous post, here is our solution for addressing the challenges of embedded software development.

Management typically has four approaches to addressing the challenges of software complexity.

Increase work hoursOutsource workAdd headcountRaise efficiency
We will examine each of these approaches considering the trade-offs with cost, time to market and scalable engineering resources.

The first 3 solutions are the most obvious approaches. But there are limitations to these approaches and one major setback - cost implications.

Increase work hours: This is easily the most popular management approach. This can be effective if the duration is short, but looking at the magnitu...

Advances in technology create new problems for today’s embedded developers

For embedded developers, Moore’s Law states that “the number of transistors that can be placed inexpensively on a semiconductor integrated circuit (IC) doubles approximately every two years”. Conversely, advances in process technology have yielded increased processor bandwidth and higher memory densities as process geometries shrink.

These trends have created significant challenges for embedded developers, as they now have more capability to work with and hence can create more complex products. This increased design intricacy has created an environment where software is poised to grow significantly, as witnessed in the automotive and smart metering markets where the processing bandwidth has leaped from three to five DMIPS to over 150 DMIPS and the memory requirements for software have increased by up to 40x over the past 20 years.

Software engineers are facing many challenges these days including:

1) Complexity: As consumer demand for simplified interfaces and high performance low power electronic and home entertainment devices increases, the complexity of the software expands exponentially. In addition the integration of a wide array of hard...

Valgrind 3.6.0 for ARM-Linux

Version 3.6.0 of Valgrind was released a couple of weeks ago. Probably the largest change this release is the addition of support for Linux running on ARM.

Valgrind is a GPL'd framework for building simulation based debugging and profiling tools, plus a set of "standard" tools. The best known of these is Memcheck, a memory error detector, but in fact it is only one of eight tools in the standard distribution: two memory checkers, two thread checkers, two performance profilers and two space profilers.

You can download the sources from www.valgrind.org. Alternatively, you may be able to get pre-built packages via your Linux distro, or via Linaro, although note that the 3.6.0 upstream release post-dates pre-built packages. 3.6.0 is known to work on Ubuntu 10.04 and 10.10 on ARM, and on the Nokia N900 running Maemo 5.

Also available online is full documentation. For those impatient to get going, the ...

Oracle’s Java SE Embedded for ARM Multicore at Techcon

Just in time for the ARM® Technology Conference (Techcon) - last week Oracle released Java SE Embedded 6u21 with support for ARM. Java SE 6u21e syncs with the latest release of Java SE 6u21 for desktop and servers allowing developers to deploy on their ARM embedded device the same full Java SE version as found on their PC.

One of the key, new features of this release is multi-core support for ARMv7. The multi-core functionality of Java SE such as background JIT compilation and parallel garbage collection is now available for the growing use of ARM multi-core systems in embedded.

Java SE 6u21e release offers the following: latest features and fixes of standard SE 6u21multi-core support for ARMv7up to 20% performance improvements on ARMheadless support for ARMv5 soft-float and ARMv6/v7 hard-floatheadful support for ARMv7optimizations for embedded including small footprint, memory savings, power conservation
Stop by ...

Going Maverick - Ubuntu 10.10 for ARM

Wow it's that time again; our 4th release of Ubuntu on ARM is upon us. In the past we have provided a Freescale iMX51 image, a Marvell Dove image and a TI OMAP 3 image for Beagle Boards. This cycle we will be releasing images for Marvell dove and Texas Instruments (TI) OMAP series of processors both OMAP 3 and OMAP 4. Until now we have always provided a "live image” just like the X86 CD's,that is you could test Ubuntu and then choose to install it to your storage media. Well for the OMAP series of development boards this did not make sense so we have introduced a pre-installed image format that we are using...

Condition Codes 3: Conditional Execution in Thumb-2

Thumb-2 can make use of the same conditional execution features that the ARM instruction set provides. For conditionally executing one or two instructions, this mechanism can provide code-size and performance benefits over the (more conventional) conditional branching mechanism.

I noted at the end of the last post in this series that this mechanism is not directly available to Thumb. Instead, Thumb-2 has an instruction — it — which can provide the same functionality as ARM conditional execution. In this article, I will describe the it instruction, and I will also explain a few caveats of condition-setting instructions in Thumb-2. Note that the it instruction is only available to Thumb-2, and so most of this article will not be relevant to the old Thumb instruction set 1.

...

Using DS-5 with Gumstix Overo

DS-5 Application Edition can be used to debug a Linux application running on pretty much any ARM Linux target, with a network connection, not just the BeagleBoard that is used in the examples. Ronan, a colleague of mine, saw the cute Gumstix Overo COM (Computer-on-Module) and convinced me I needed to get one and give it a try with DS-5.

Attached Image

The tiny Gumstix Overo next to 50p to show a size comparison


First I ordered the Gumstix Overo Water, but any of the Overo models (Earth, Air, Fire) will probably work the same for my purposes here. I also ordered a Gumstix Tobi so that I can easily hook it to Ethernet and/or USB.

The Gumstix developers website has great getting started material. There seem to be at least two other useful Gumstix websites as well: www.gumstix.com, and ...

Coding for NEON - Part 4: Shifting Left and Right

This article introduces the shifting operations provided by NEON, and shows how they can be used to convert image data between commonly used color depths. Previous articles in this series: Part 1: Loads and Stores, Part 2: Dealing with Leftovers and Part 3: Matrix Multiplication.

Shifting Vectors

A shift on NEON is very similar to shifts you may have used in scalar ARM code. The shift moves the bits in each element of a vector left or right. Bits that fall of the left or right of each element are discarded; they are not shifted to adjacent elements.

The amount to shift can be specified with a literal encoded in the instruction, or with an additional shift vector. When using a shift vector, the shift applied to each element of the input vector depends on the value of the corresponding element in the shift vector. The elements in the shift vector are treated as signed values, so left, right and zero shifts are possible, on a per-element basis.

Attached Image

A right shift operating on a vector of signed elements, indicated by the type attached to the instruct...

Detecting Overflow from MUL

Detecting Overflow from Arithmetic Operations

I discussed in a previous blog post that it is possible to set some condition flags based on the result of an arithmetic operation. Consider the following code:

adds r0, r0, r1 bvs some_address

The above code adds r1 to r0, then branches somewhere if a (signed) overflow was detected. This technique is used frequently in JIT-compilers for dynamic languages. In such contexts, the type and size of a variable is often not known when the code is compiled, so the JIT-compiler will test for overflow, and then fall back to a slower implementation in the case where a signed 32-bit integer cannot represent the result of the required operation. This is the approach taken by Mozilla's Trace Monkey JavaScript engine, for example.

Setting the Flags with mul

Those familiar with ARM's mul instruction may realize that although it can take the s suffix to upda...

Coding for NEON - Part 3: Matrix Multiplication

We have seen how to load and store data with NEON, and how to handle the leftovers resulting from vector processing. Let us move on to doing some useful data processing – multiplying matrices.

Matrices

In this post, we will look at how to efficiently multiply four-by-four matrices together, an operation frequently used in the world of 3D graphics. We will assume that the matrices are stored in memory in column-major order – this is the format used by OpenGL-ES.

Algorithm

We start by examining the matrix mutiply operation in detail, by expanding the calculation, and identifying sub-operations that can be implemented using NEON instructions.

Attached Image

Notice that in the diagram, we multiply each column of the first matrix (in red) by a corresponding single value in the second matrix (blue) then add together the results for each element to give a column of results. This operation is repeated for each of the four columns in the result matrix.

...

10 Android NDK Tips

With new devices and new capabilities being exposed by the Android NDK (Native Development Kit) it is now possible to really get the best out of these ARM based devices. Here are a few quick tips to help that along.

1 - Stay on Target

The newest devices are generally ARMv7, meaning that it can pay to use v7 builds and features. The latest version of the NDK adds support ARMv7 and NEON code allowing key loops and media operations to be optimized far beyond what would otherwise be possible. The NDK provides a small static library that will allow you to identify what options you have at runtime. For examples of how to use these features, look at the hello-neon example project in the samples directory of the NDK

The older devices are v6, but the NDK does not specifically support it, leaving you with the choice of building safely for v5TE or taking the risk that there may be v5TE devices out there. If you need every iota of speed, and know what hardware you are targeting, then it may be worth building for v6. The newest devices, supporting Android 2.0 and up, seem generally to be ARMv7 based, although yo...

Computex: Windows Embedded Compact 7 Highlights Investment in ARM

Yesterday at Computex, the Microsoft Windows Embedded team announced the availability of the latest version of Windows Embedded CE – officially known Windows Embedded Compact 7. The release is a Community Technology Preview (CTP) which is a fancy way to say public beta. The CTP can be downloaded from the Microsoft website.

Windows Embedded Compact 7 includes a list of cool features to help OEMs develop smart, connected, service oriented devices with custom user-interfaces. But, if you take a closer at the code you’ll notice an engineering investment and significant improvement – Compact 7 now includes support for more ARM architectures including ARMv7, ARMv7 NEON™ and SMP.

The added ARM architectures provide OEMs working with Windows Embedded competitive performance in the segments proliferated by ARM and our ARM Partners – ...

Android Phones, tablets, TV’s… oh my!

I’ve written before about the proliferation of Android as a consumer device platform beyond its humble origins as a handset OS, but I’m continually amazed at the pace of this innovation from consumer electronics companies crafting new and savvy products from Android. I am at Computex this week and there are numerous products on display that fall into this category.

I won’t catalog the litany of devices here, I’m sure you’ll get enough of that via the ARMFlix YouTube channel or your favorite consumer device blog, instead I want to talk about why I think Android is able to adapt at such a breakneck pace. While a case can be made for any number of reasons, fundamentally, I believe there are two overwhelming factors. They are; (1) the architecture and versatility of the Android software stack and (2) the size of the IP and services ecosystem that has rapidly ...

Coding for NEON - Part 2: Dealing With Leftovers

In the first post on NEON about loads and stores we looked at transferring data between the NEON processing unit and memory. In this post, we deal with an often encountered problem: input data that is not a multiple of the length of the vectors you want to process. You need to handle the leftover elements at the start or end of the array - what is the best way to do this on NEON?

Leftovers

Using NEON typically involves operating on vectors of data from four to sixteen elements in length. Frequently, you will find that your array is not a multiple of that length, and you have to process those leftover elements separately.

For example, you want to load, process and store eight elements per iteration using NEON, but your array is 21 elements long. The first two iterations go well, but for the third, there are only five elements remaining to be processed. What do you do?

Fixing Up

There are three ways to handle these leftovers. The methods vary in requirements, performance, and code size. They are listed below in order, with the fastest approach first.

Larger Arrays

If you can change the size of the arrays that you are processing, increase the length of the array to the next multiple of the vector size using padding elements. This allows you to read and write beyond the end of your data without corrupting ad...

How do you make Java fast? Answer: Go down the pub!

It all started back in 2008, I’d been looking at what the Software Bill-of-Materials would be for an ARM-based Netbook. I’m a great fan of JEOS (Just-Enough-OS) to support the end users software needs but even taking a JEOS approach the list of software that we had to enable was quite daunting. Back then, the Cloud as a platform for desktop apps like word processing hadn’t quite taken shape. I had converted my family over to Google Docs but I wasn’t sure if the rest of the world would be quite as ready to make that move when ARM-based devices became available. Open Office was quite a popular office suite in the Western world, however in Asia a small company called Haansoft (now Hancom, Inc.) were making headway with an office suite called ThinkFree Office that was small, lightweight and could run across multiple device form factors. The one minor problem was that ThinkFree Office was writt...

Locks, SWPs and two Smoking Barriers

Before ARMv6, the main synchronisation mechanism was the SWP instruction. SWP has two aspects, in a uniprocessor system it allows the read and write operations not to be interrupted between them. In a multiprocessor system it ensures that multiple masters will do the locking. For multiprocessor systems with complex memory hierarchies and long memory latencies SWP creates performance bottlenecks.

This was replaced in the ARMv6 architecture by exclusive loads and stores (LDREX and STREX). This works on the principle of a monitor existing for the location in memory. This effectively tags the memory with the identity of the agent(s) trying to access it. In a spinlock implementation, an exclusive load reads data from the memory, tagging it with its identifier. A short number of instructions later, it uses an exclusive store to write data to memory but this only works if the tag is still valid and the tag will only be valid if some other ag...

Caches and Self-Modifying Code

Ideally, caches act as some magic make-it-go-faster logic, sitting between your processor core (or cores) and your memory bank. Whilst it can be beneficial to consider specific cache features when writing some performance-critical code, it is usually advisable to consider only general cache behaviour in mind. However, there are cases where the cache behaviour must be considered in order to get the result that you want, and self-modifying code is an excellent example.

Cached ARM architectures have a separate cache for data and instruction accesses; these are called the D-cache and the I-cache, respectively. For this reason, the ARM architecture is often considered to be a Modified Harvard Architecture, though I must admit that with most real processors existing somewhere between Harvard and von Neumann architectures, I do not find that label particularly useful. There are a few benefits of this design, but the one I have seen discussed the most often is that with two interfaces to the CPU, the core can load an instruction and some data at the same time.

Whilst employing this Harvard-style memory interface is useful for performance, it does have its own drawbacks. The typical drawback of a pure Harvard architecture is that instruction memory is not directly accessible from the same address space as data memory, though this restriction does not apply to ...

Hello World! SW Development, Optimization and Partnership on ARM

ARM is hiring smile.gif OK, so that got some people’s attention and confused others – actually we are hiring, and in particular software developers. What can often come as a surprise to people is that as well as having a team of people that go plan and work alongside some our different software partners, ARM has a software engineering group that work on key bits of software, particularly on Cortex-A8 and Cortex-A9 projects. In fact it's very likely that some of the code running your mobile phone was developed by some of the ARM team.

The team cover a wide range of software projects that include:Web and web runtime optimization, for example JavaScript JIT optimization work on projects such as Tamarin, Webkit and Squirrelfish NitroExtreme), and OpenJDK optimization work.Operating System development – including Android, Linux kernel hacking and a ...

  • (12 Pages)
  • +
  • 1
  • 2
  • 3
  • Last »
All company and product names appearing in the ARM Blogs are trademarks and/or registered trademarks of ARM Limited per ARM’s official trademark list. All other product or service names mentioned herein are the trademarks of their respective owners.