App-V and .Net Performance – Confessions of a Guru

NOTE: This post contains some great information, but has been updated in this post.

This post is about the performance impacts of combining Microsoft App-V 5 sequenced virtual applications with applications that use Microsoft .Net. I wish this story ended with good news, or at least reasonable actions that you could take to improve performance, but it isn’t so. Instead it is mostly informational, however, in the end there is a technique mentioned that might be useful for a really poorly performing app. In the most typical cases the trick is probably too much work for the performance gains that you might receive, but maybe it can help you if you are stuck with user’s complaining about the runtime performance. So here is the story (so far).

Background on .Net

Like Java, Microsoft .Net programs are constructed with the concept of “write once, run anywhere”. Except anywhere means places where you have the supporting runtime. For Java, this means the appropriate Java Runtime (which can be placed about everywhere), and with .Net it means the appropriate .Net Framework (more limited currently, but now it might be coming to other OSs). But while Java may run on more CPU architectures and OS platforms, .Net tends to have much fewer version incompatibilities. Both programing languages achieve processor independence by compiling into a processor independent language. This is called MSIL (Microsoft Independent Language) in “.Net-speak”.

The processor independent language is compiled into binary form by a “just-in-time” compiler, known as JITing. Both Java and .Net use JIT compilation to improve speed over interpreted languages like the older Visual Basic (the non-managed variety) or Perl languages. While binary code will always be faster than interpretation, the overhead of JIT compiling every time the program is run is a drag on performance. Microsoft supports three forms of performance improvements for .Net programs that the developer of the app can take advantage of.

Strong Named Assemblies. When a .Net dll is added to a system, if it was built with a StrongName it can load faster. A StrongName is basically a weak public digital signature that doesn’t verify authenticity but verifies that the file hasn’t been messed with. When dlls built with StrongNames are stored in the Global Assembly Cache, signature verification occurs only once (when placed into the cache). When the dll is loaded by a program into memory, the OS can perform a quick hash check that the file hasn’t been modified since installation into the cache. For large dlls where you might not need all of the functionality this will improve performance by a small amount. To gain this performance, the StrongNamed dll must be added to the .Net Global Assembly Cache (one of the following folders under C:\Windows\assembly: GAC, CAG_32, GAC_64, GAC_MSIL). The dll placed in the Global Assembly Cache is not compiled, so gains are limited. Some .Net dlls are built without a StrongName, and these will never be added to the Global Assembly Cache. Adding this into the cache is typically a post-installation step (sometimes performed by the installer).
Native Compilation (post-install). When a .Net dll or exe is installed on a system, NGEN can be used to create a “Native Image” copy via compilation. This compiled copy is specific to a CPU architecture. Back when I attended the “Hailstorm” event in Redmond (when .Net was originally announced more than a dozen years ago), compilation to a CPU was very processor specific. But these days, my sources inside Microsoft inform me that we don’t need to worry about AMD versus Intel, or Xeon versus Core I*. Compilation is really to the bit-ness of the CPU architecture (32 versus 64 bit) and major design. Usually, these native compilations occur as part of the application installer as a post-installation event. Often, the installer designer forgets about this, and the program is just JITed all of the time. Post-install compilation is performed using the ngen.exe program that is part of the framework. The compilation may be performed synchronously, or placed into a queue for asynchronous compilation by the NGEN service whenever the system is idle for a period of time. When compiled, an executable named foo.exe or bar.dll will have a compiled copy placed in the windows\assembly subfolder for the major .Net version (2 or 4, currently). These files are distinguished by being called foo.ni.exe or bar.ni.dll. These files will almost always be larger than the original file, but usually perform much faster as JIT compilation is no longer needed at runtime. The system loader automatically detects if there is a Native Image available whenever the original component is loaded into memory.
Native Image Compilation (by the developer). This is a new option available to developers in Visual Studio 2014 preview for applications being developed for the Windows Store. Because the compilation is being performed at build time, the optimizations available are far greater. Among other things, this compilation pulls in the portions of framework dlls directly into the native image (sort of like embedding mfc dlls into a Win32 executable), making these Native Images independent of the .Net Framework itself. Of course on the flip side, these Native Images are processor specific. The result is a real exe, so special handling by the loader is not required.

.Net compilation can improve performance significantly. There are no hard numbers on this, as it really depends on the program. Some simple tests I have run using post-installation compilation into Native Image on small programs saw a 15 to 20% improvement in launch time. So the potential performance gains of .Net Native Image compilations are significant enough to be interested in.

Which brings us to the .Net and App-V discussion.

When we install and capture programs in the sequencer, post-installation activity of type 1 and 2 are a possibility and may get captured. If captured, we are essentially redistributing these components to the clients. I used to think that this would break things, but never saw that happen. And now with the latest information from Microsoft, it seems not to be the case and redistribution in an App-V sense should be OK.

To confirm when a native image is in use, I use ProcessExplorer to examine what is actually loaded into the process memory. Using the lower pane display to show loaded modules, you can see that both AppV_Manage.exe and Appv_Manage.ni.exe are loaded in the process shown in the image below:

Process Explorer showing a process using a Native Image

Note: The above capture is with AppV_Manage version 3.8 where I realized that my installer wasn’t creating the native image either. You will now also see the compression library has a native image also.

In tests not using App-V, I have proven the ability to redistribute an “AnyCPU” executable that was natively compiled on an x86 system. Outside of App-V, I redistributed the native image to both an x86 system and an x64 system. The x86 target system used the Native Image (as seen in ProcessExplorer) and saw the expected performance gains. The x64 system did not use the Native Image compiled for 32-bit as the “AnyCPU” executable ran as 64-bit on that system, thus the performance was the same as if not compiled. But it wasn’t any for the worse! So the bottom line is that capturing the native image and redistributing might make things better and don’t seem to make things worse.

But it turns out that we don’t often capture native images in our App-V packages. I recently added support to the AppV_Manage Analyzer (version 3.8) to detect the number of modules in the Global Assembly Caches and the Native Image caches. An example display is shown below:

AppV_Manage Analyzer showing a package containing a Native Image

Why don’t typically we capture Native Images in our App-V packages?

There at least a couple of reasons:

The installer didn’t request the compilation. Depending on the installer, this may require the addition of a custom action.
The installer requested asynchronous compilation. This makes the installer seem faster, but unless you let the sequencer sit around a while after the installation is complete it won’t happen in time. Pretty much, unless you are interrupted or go get some coffee or a smoke, the compilation won’t happen in time before you end monitoring.

So I was thinking that I’d like to force the compilation. Either case can be handled by using ngen.exe which comes with the framework. You might have two copies, one for v2 and one for v4. When in doubt, use the v4 copy (it handles .Net 2 apps also). To solve the second issue, you only need to run the command:

C:\Windows\Microsoft.Net\Framework\v4.0.30319\ngen.exe ExecuteQueuedItems

On a 64-bit system, you should also run the command in the Framework64 folder.

To solve the first issue, you identify the major exe programs and run:

C:\Windows\Microsoft.Net\Framework\v4.0.30319\ngen.exe Install "C:\...pathto...exe"

If you compile the exe, it automatically compiles the dependent dlls, so typically you do not need to compile the dll files directly. Do these things while in monitoring mode, and the global and native image caches are automatically captured.

But does it do any good to do this in your packages?

Unfortunately, NO.

Currently (as of App-V 5.0 SP2 with Hotfix 5 on a Windows 7/8.1 system with .Net 4 through 4.5.2), the loader cannot detect the native images that are inside the App-V package. Nothing is broken, it is just that the VFS caches for .Net are not used. Some simple testing indicates that it is not a timing issue. There is an undocumented index file which might play a role in this, but I think it is a deeper issue.

Potential Workaround?

In theory, if you had a really badly performing virtual application, you could use a Publish-Package script to force an ngen compilation against the major exe(s). You would have to create the package using the forced load for the streaming configuration, and then run this script outside of the virtual environment using the ProgramData reference to the exe. If you had a really horrible app, it might be worth a try. Keep in mind that you might not be able to have multiple versions of the same app with improved performance with different Native Images if done this way – it depends on the app.

I am reporting this into Microsoft for consideration. So the story could have a happy ending one day after all. Maybe.

Addendum

I should mention that while I only tested Microsoft App-V 5.0 SP2 (with Hotfix 5) in preparation of this article, it probably applies to any technology used to virtualize or layer applications. Ultimately, I believe any filter driver, or user-mode dll injection, that adds application components to the OS will fail to deliver the native components in a way that the OS will use as intended. This includes other application virtualization tools (like Symantec or VMWare ThinApp), and application layering solutions (like UniDesk, and VMWare AppVolumes, but maybe not FsLogix which should work as is). At least with App-V I know that we can work around it, but I’m guessing that due to their architectures, some of the others cannot.