Login

ARM The Architecture For The Digital World  

ARM Community: Software Enablement - ARM Community

Jump to content

Design West (ESC) Day 1: Optimizing Your Software on ARM

It was a full first day at the ESC Summit of Design West. I spent most of the time doing what I love to do best: talking with ARM Partners. Most of the conversations focused on how engineers can achieve better software on their ARM processor-based SoCs.

Andy Frame had a few meetings in the morning so he passed the baton (microphone) to me so I can share some of the insights with the ARMFlix followers. I’m looking forward to Day 2 and speaking with more of the ARM Partners at ESC (check out our handy map). Don’t miss the ARM Connected Community Theatre at ...

Using ARM NEON to accelerate Scalable Vector Graphics in webkit by up to 4x

Introduction
In the information era with its increased use of mobile devices to communicate and access information, web browsers constitute the central component to navigate through the vast amount of information as they are able to fetch and visualize content spread across the world-wide data network known as the Internet. Over the last decade, visualization capabilities of web browsers have been greatly enhanced by the increase in processing power of general purpose CPUs and graphics accelerators. Most mobile platforms include general-purpose SIMD engine, such as ARM NEON which can be used to efficiently process multimedia formats and help enhance user experience – up to a 4x improvement as discussed in this article.

Background
The web browser group at the ...

Setting Up Android Mobile Phone to Use ARM Streamline for Profiling

This is an article describing the steps how to set up your Android Phone to run ARM Streamline Performance Analyzer.

ARM Streamline Performance Analyzer is a system-wide visualizer and profiler for targets running ARM Linux or Android native applications and libraries. Combining an ARM Linux kernel module, target daemon, and a graphical user interface, it transforms system trace and sampling data into reports that present the data in both visual and statistical forms.

Streamline supports Cortex™-A8, Cortex-A9, ...

Developing Top Performing Graphics Applications for Android Made Easy

The new DS-5 Community Edition brings CPU and GPU statistics together to speed up Android games and applications

Game Developers Conference (GDC), San Francisco - These are very special days for Android application developers targeting ARM processor-based devices.


Attached Image
On March 2nd the version of the ARM® Development Studio 5 (DS-5™) toolchain dedicated to Android native application developers, the DS-5 Community Edition (CE), was selected as finalist for the Eclipse Community Awards in the ...

x264 on ARM: Bringing a wider application of video conferencing (Part 3)

In part one and part two of this blog series, we introduced the video conferencing use case requirements and performed tuning of x264 for optimal tradeoff between bit rate, frame rate and video quality… In this part, we will test and analyze encode performance for optimal execution on the target ARM platform.

1 Test result of on the target ARM platform

Using the results from the previous step, we test the options “default”, “--preset ultrafast”, “--preset superfast” and “--preset very fast” with our optimal settings on the ARM platform and evaluate the performance against our use case requirements for video conferencing.

1). List of Combinations (settings tested)
Attached Image


2). Result

Attached Image
Attached Image
Attached Image


3). Conclusion
According to above information, we can conclude:
Attached Image


2 Summary of the optimal settings
According to the test results above, we might conclude: when rc-lookahead is set to 1, the bit rate is the minimum and the...

x264 on ARM: Bringing a wider application of video conferencing (Part 1)

Video is increasingly becoming an important and essential part of consumer electronics. Video centric features like augmented reality and video conferencing provide enhanced visual user interaction. Such features are now expected across a wide variety of application segments. In the embedded world, intensive video compression is typically done using standard DSP’s or specialized hardware accelerators, as they can provide both the specialized functionality and the high level of performance required. However, now ARM processors with NEON™ technology can be as capable of compressing video as some dedicated hardware, and do so with greater power efficiency.

H.264/MPEG-4 AVC (Advanced Video Coding) is currently one of the most commonly used formats for the recording, compression, and distribution of video content. The H.264 video format has a very broad application range that covers all forms of digital compressed video from low bit-rate Internet streaming applications to HDTV broadcast. With the use of H.264, bit rate savings of 50% or more ar...

Optimizing DirectFB with ARM NEON

DirectFB (Direct Frame Buffer) is a graphics library that is widely used in embedded systems, especially home market. More and more applications or libraries choose DirectFB as backend, such as Cairo, GDK, Qt, V8, X11 and Webkit. ARM NEON technology could be well used in 2D acceleration. In this blog, I’ll describe how to optimize DirecFB using NEON.

1. Introduction
1.1 DirectFB Introduction
DirectFB (Direct Frame Buffer) is a thin library that provides hardware graphics acceleration, input device handling and abstraction, integrated windowing system with support for translucent windows and multiple display layers. It is free software licensed under the terms of the GNU Lesser General Public License (LGPL). Graphics features provided by DirectFB including Rectangle Filling/Drawing; Triangle Filling/Drawing; Line Drawing Blit; Alpha Blending (texture alpha, alpha modulation); Porter/Duff; Colorizing; Source Color Ke...

ARM Fundamentals: Introduction to understanding ARM processors

Finding one's way through references to ARM processors is not always obvious.
This article is the first of a series on ARM fundamentals that will introduce various topics to help you get more familiar with the ARM architecture. It aims at helping you to better understand ARM processors, starting with explaining how they are named, and then showing how knowing your processor matters by introducing a few of their recent features.

If you are curious about what is in your pretty electronic device or are a developer willing to understand how to start getting the best out of your processor, you may find some useful information here. The second part of the article may be technically a bit more challenging than the first, but don't worry! The few code samples are only concrete examples used to illustrate the explanations. The specific details are not necessary to understand the global picture.

The first step is to understand how ARM processors are referenced: it certainly sounds nice, but what is this "dual Cortex-A9, based on ARMv7" in your super-phone?


Processor fami...

定位,合作,共享 - 中科创达(Thundersoft)的Android红海战略

简介:ThundersoftAndroid核心技术和整体解决方案提供商, 通过提供完整的Android方案和服务,帮助OEM客户快速推出高品质产品。Thundersoft公司在Android 系统底层技术、中间件和应用开发、集成和服务方面经验丰富,在智能手机, 平板电脑等移动互联网终端产业链中具有独特优势Thundersoft...

10 ways to give your customers the DS-5 experience

ARM Development Studio 5 (DS-5) is the software development tool that sets the standard for ARM with its optimising compiler, its extensible and easy to use debugger, and its unique analysis tool, Streamline. But DS-5 is not just for the ARM IP: if you're the designer of an ARM based SoC, an operating system that supports ARM or have productivity tools that support ARM, you can join the DS-5 ecosystem to make sure that your customers also get the DS-5 experience. Here are 10 ways you can do it.

1. Add DS-5 debug support to your ARM based SoC. The DS-5 debugger has a target database that is extensible by you,...

Debug & performance analysis of Linaro images with ARM Development Studio-5

ARM Development Studio 5 (DS-5™) provides a user friendly interface for debugging Linux applications running on ARM platforms. Also built into DS-5 is ARM Streamline, a powerful profiling tool that allows us to measure the performance of Linux applications running on ARM Linux.

The PandaBoard is a compact mobile platform built around the Texas Instruments OMAP 4430 processor. With a dual-core Cortex-A9 processor at its heart, it is ideal for running ARM-Linux, and has good connectivity.

In this article we will go through the steps required to setup Linux on the PandaBoard using files supplied from ...

ARM technology software newbie? Try the Cortex A-Series Programmer's Guide

The ARM architecture has been used for many years in mobile phones and electronic devices, but it is only relatively recently that the architecture has diversified into being used in laptops, tablets and smartphones. There are now many companies that have adopted the ARM architecture as the basis for their next world-beating technology product. This is great, but the problem is that if you are new to the ARM architecture and want to start writing programs for an ARM processor, where do you start? What document do you need to read first before you dive into the library of technical information that is available on the ARM InfoCenter?

My choice would be the recently released Cortex A-Series Programmer's Guide. This guide provides a gentle in...

How to run LAMP and Drupal on a PandaBoard in seven simple steps

This tutorial explains how to have a LAMP server running Drupal on a PandaBoard. These instructions will apply to any other Cortex-A platform with few or no changes.

The growing variety of ARMv7-based inexpensive and easy to use devices, like PandaBoard, opens the door to leveraging ARM energy efficient and small form factor performance with server software. The availability of the Ubuntu Linux distribution for ARM, makes this a really simple task. The possible applications are many: domestic server, small business server, hobbyist experimentation, web development.

It can also be used to gain experience and become more prepared for the arrival of large-scale, ARM-based server systems.

One of the most well-known server software stacks is LAMP. There are many varieties of LAMP, but the most common is the combination of

Drupal is a versatile, well-known, open source platform running on top of this stack. The software is a generic Content Management System, used as the basis for many sites, from blogs to community forums to government web pages. Notable examples are The White House, Ubuntu, FastCompany, ...

Wealth of knowledge found at ARM Techcon: Linux, Android & development tools

The 2010 ARM Technology Conference (Techcon) is taking place in Santa Clara next week. A large number of companies will be presenting their solutions to support development and optimization of products based on ARM technology, and open source will be discussed in many of these with projects like Linux, Android and development tools. For instance, many of these solutions are using open source to leverage earlier work that ARM has done with the open source community, contributing CPU and architecture support to the upstream Linux kernel and GNU compilation tools ahead of partner silicon platforms being available. One of the most recent illustrations is the contribution of Cortex-A15 CPU support to the Linux kernel as the processor was announced. Linux kernel and GNU development tools are key building blocks to support the development of solutions such as Android, ...

Going Maverick - Ubuntu 10.10 for ARM

Wow it's that time again; our 4th release of Ubuntu on ARM is upon us. In the past we have provided a Freescale iMX51 image, a Marvell Dove image and a TI OMAP 3 image for Beagle Boards. This cycle we will be releasing images for Marvell dove and Texas Instruments (TI) OMAP series of processors both OMAP 3 and OMAP 4. Until now we have always provided a "live image” just like the X86 CD's,that is you could test Ubuntu and then choose to install it to your storage media. Well for the OMAP series of development boards this did not make sense so we have introduced a pre-installed image format that we are using...

Cortex-A15 to A5: Software compatibility from Superphone to Feature phone

It was always about the code (and where it would be used!)

When I was a software developer I would often find that the project team I was in would try to guess how many devices the code would eventually run on. So at the launch of the Cortex-A15 last week one of the main points that hit home for me was just how wide the spectrum of power and performance points the Cortex-A family of processors could cover - from feature phone to superphone, tablet to DTV, home server to web server etc. This means that a developer could now find their software running across a huge range of devices in the future.

So is it the same software?

Absolutely. Cortex-A15 is based on the same ARMv7A architecture that the other Cortex-A processors use, therefore allowing the exact same application code to run on all of them, from a ...

Condition Codes 1: Condition Flags and Codes

Every practical general-purpose computing architecture has a mechanism of conditionally executing some code. Such mechanisms are used to implement the if construct in C, for example, in addition to several other cases that are less obvious.

ARM, like many other architectures, implements conditional execution using a set of flags which store state information about a previous operation. I intend, in this post, to shed some light on the operation of these flags. Of course, the Architecture Reference Manual is the definitive source of information, so if you need to know about a specific corner-case that I do not cover here, that is where you need to look.

A Realistic Example

Consider a simple fragment of C code:

for (i = 10; i != 0; i--) { do_something(); }

A compiler might implement that structure as follows:

mov r4, #10 loop_label: bl do_something sub r4, r4, #1 cmp r4, #0 bne loop_label

The last two instructions are of particular interest. The cmp (compare) instruction compares r4 with 0, and the bne instruction is simply a b (branch) instruction that executes if the result of the cmp instruction was "not equal". The code works because cmp sets some global f...

Coding for NEON - Part 3: Matrix Multiplication

We have seen how to load and store data with NEON, and how to handle the leftovers resulting from vector processing. Let us move on to doing some useful data processing – multiplying matrices.

Matrices

In this post, we will look at how to efficiently multiply four-by-four matrices together, an operation frequently used in the world of 3D graphics. We will assume that the matrices are stored in memory in column-major order – this is the format used by OpenGL-ES.

Algorithm

We start by examining the matrix mutiply operation in detail, by expanding the calculation, and identifying sub-operations that can be implemented using NEON instructions.

Attached Image

Notice that in the diagram, we multiply each column of the first matrix (in red) by a corresponding single value in the second matrix (blue) then add together the results for each element to give a column of results. This operation is repeated for each of the four columns in the result matrix.

...

Support for VP8 and WebM on ARM

It continues to be an exciting time for the development of web technologies on the ARM architecture; allowing the Internet to reach the maximum number of devices. Today sees an advancement in video for the web with the WebM project that has been announced at Google I/O 2010 (Google’s annual developer’s conference). A key part of this announcement was the contribution of the VP8 video codec, free of royalties to Google.

So why is this good for ARM and our Partners? Well ultimately the delivery of the full web drives the development of great devices, and video in particular makes up an ever increasing proportion of data being consumed: in other words consumers want video, and an efficiently designed, open video codec helps.

There is already a huge amount of video being delivered on the Internet: Cisco’s Visual Networking ...

Coding for NEON - Part 2: Dealing With Leftovers

In the first post on NEON about loads and stores we looked at transferring data between the NEON processing unit and memory. In this post, we deal with an often encountered problem: input data that is not a multiple of the length of the vectors you want to process. You need to handle the leftover elements at the start or end of the array - what is the best way to do this on NEON?

Leftovers

Using NEON typically involves operating on vectors of data from four to sixteen elements in length. Frequently, you will find that your array is not a multiple of that length, and you have to process those leftover elements separately.

For example, you want to load, process and store eight elements per iteration using NEON, but your array is 21 elements long. The first two iterations go well, but for the third, there are only five elements remaining to be processed. What do you do?

Fixing Up

There are three ways to handle these leftovers. The methods vary in requirements, performance, and code size. They are listed below in order, with the fastest approach first.

Larger Arrays

If you can change the size of the arrays that you are processing, increase the length of the array to the next multiple of the vector size using padding elements. This allows you to read and write beyond the end of your data without corrupting ad...


  • (12 Pages)
  • +
  • 1
  • 2
  • 3
  • Last »
All company and product names appearing in the ARM Blogs are trademarks and/or registered trademarks of ARM Limited per ARM’s official trademark list. All other product or service names mentioned herein are the trademarks of their respective owners.