Did you ever sequence an app, give it the ol’ smoke test and think it is good to go, just have have it thrown back at you because the although the app works great, it always crashes after about 10 minutes after launch? If so, this is the blog post for you! This post will cover some old ground and provide new insights and tricks.
The cause of the error message above is an intentional decision by the App-V client development team. Well, maybe it could have been averted by a smarter sequencer, but given that it wasn’t the client is doing the what comes naturally to it: catching the problem while it is small and preventing a bigger one, followed by giving you an obscure indication of the problem. So what is going on here with this error?
Technical boring detail of the cause (some folks will want to scroll down to the tricks).
When App-V starts the first app of an application virtual environment, it must place certain hooks into that executable to do it’s job. But it can’t add those hooks until after the executable has started up enough. I’m not sure why, but possibly because the hook would be in a user mode system dll that all exes load in. So the client must wait enough time after starting off the process and then place the hook. Rather than guess how long is long enough, the client looks for system signals (events) and that triggers adding in the hook. Typically, it is waiting for an event referred to as “Input Idle”. In the old days this meant that the app was just looping and looking for user input from “the message pump”, but in modern programing this is now abstracted from the programmer.
By enabling verbose logging we can see that the client first creates the Process to launch, gets the process identifier (PID), and then, according to the log file, it launches an “injector process” (either mavinject32.exe or mavinject64.exe depending on the bitness) passing along that PID.
Later in the verbose log, we see the sftlp process started (we can also see this appear as a task in the task manager as sftlp.exe with the WaitForIdleInput argument). You can then see the client waiting in the verbose log at a line starting with “WaitForIdleApp” which lists the timeout value of “600000 ms”, or 10 minutes. Further down in the log, when this error occurs, we see the 41E error message appear about 50 ms after the 10 minute elapses.
When this works properly, it looks more like this:
To do all of this, the client needs to know what kind of executable the application’s file is. Every exe and dll file for an Intel x86/x64 compatible processor uses a format called the “Portable Executable” format, or PE format for short. In theory, the sequencer will examine the PE header of your executable and add the correct entries into the OSD file so that the client will know. These typically appear in the IMPLEMENTATION section of the OSD, right after the WORKINGDIR element. Here, you will find something like:
<SUBSYSTEM VALUE="windows" />
With the correct settings in the OSD, the client will know how to monitor for the event and how to put in the hooks. Ten minutes after starting the exe if the client hasn’t injected the hooks (because of incorrect OSD settings or other reasons), a timer goes off which is handled by killing the exe and the virtual environment and posting the error shown above. The sequencer is known to get the settings wrong on some apps for some reason! The tricks below show ways to address this error, from any cause.
So how do you know what the values in the OSD should really be for a given app?
Most of the time I just guess at the solution. Whenever this error occurs it is highly likely that the issue is solved by changing the SUBSYSTEM value to “console”. Sometimes it is a bitness thing, so we change the VM value, but usually it is just that pesky SUBSYSTEM. So I just locate the OSD in the client cache (no longer a trivial task thanks to client side GUID generation of the file names) and edit the SUBSYSTEM from windows to console and try launching it to see if that fixes it. If so I then make the change back in the package source. This covers something like 95% of the cases.
But what if you want to know for sure what the settings should be and you don’t really know what that exe is?
One solution is to grab a tool that can examine the PE format of the executable and tell you. One such tool is PEsnoop, which can be found at http://www.prestosoft.com/download/plugins/PESnoop.zip. The zip contains a stand-alone executable and a doc file. The original source of the tool by “Yoda” is unknown by me, but it looks like it was probably a free tool. If this bothers you, google for how to determine if an exe is 32bit and you can find out how to tell using a hexdump utility.
To use PESnoop, copy it to the client machine in a convient place, like C:\Pesnoop.exe. Next, open a cmd prompt inside the virtual environment using the /EXE %COMSPEC% trick on the sfttray line. Then run C:\Pesnoop.exe nameOfYourExe /PE_NH
You’ll get a display like the one in the image below. I have marked the two important fields to look at. The first indicates the bitness (VM value) and the second whether it is a windows app (WINDOWS_GUI) or console app (WINDOWS_CUI). Fix your OSD, which almost always means changing “windows” to “console”.
But what happens when a mismatch isn’t the problem?
In one case, we had a program being launched by the OSD was correctly specified, but we still ran into this error. The app where the main program exe that turned out to be just a launcher program to start Java. We proved this by running the Task Manager (adding the column showing the command line with arguments to the display) and watching processes during the launching. This launcher program would get started, quickly launch Java, and then shut itself down. Because Java takes so much in the way of resources at startup, this caused a timing issue that allowed the original exe to terminate before the client could get its hooks in. 10 minutes later, blam the client steps in and messes everything up claiming the launcher program didn’t run properly. I’m reasonably sure that nothing would really mess up with the client if it just let this “problem” go, but Microsoft doesn’t provide a way to turn this off. So went go rogue and solved the problem another way.
A possible, but rejected solution was to determine the command line arguments used to start Java (with process explorer, for example), and just skip the launcher. But we can’t be sure what exactly what that launcher program does other than start Java. That launcher might be doing something else important that we have no way of detecting that. What we really needed was a program that would start up, start the launcher exe that starts Java. We want something that will stay around long enough, but not forever as that would keep the virtual environment open. It happens that I had written one.
Many years back, before we even had TERMINATE_CHILDREN, I wrote a small launcher program to monitor child (and grand-child) processes and perform process cleanup. The program, originally called Softlaunch and now called LaunchIt, has been enhanced over the years to mitigate a variety of app issues. Version 5.0.2 of the LaunchIt tool can be found in the Tools section of the main website. The solution to our launcher app was to copy this stand-alone Launchit executable into the asset folder of the package during sequencing, and then edit the OSD. Looking at the CODEBASE line, if it included something like:
... FILENAME="Q:\Folder\Program.exe" PARAMETERS="/argument" ...
then I would change it to something like:
... FILENAME="Q:\Folder\LaunchIt32.exe" PARAMETERS="Q:\Folder\Program.exe /argument" ...
This would cause the App-V client to start LaunchIt, which in turn would start the main program (in our case the launcher app that started Java). For normally behaving apps this is all you need to do, but if you wanted the cleanup functionality you should read the documentation with the tool.
We recently saw this timeout error crop up in a different way in an application that Patrick was working on for a customer of our Contract Custom Sequencing Service.
In this case, we first tried TRICK #1. No luck.
So we went to TRICK #2, and verified that indeed the program being launched by the OSD was correctly specified as a Win32 Windows app, but we still had this error.
Moving on to see if TRICK #3 would work, we found that the main program started by the OSD was indeed a launcher program (again, starting Java). But when we traced using Task Manager we could see that the launcher app did not shut itself down. So LaunchIt wasn’t likely to help.
We tried the LaunchIt trick anyway, and found it had no effect. So let’s look at the app a little more closely for clues…
The package is a 64 bit package, sequenced on an x64 OS and being tested on an x64 OS. It has several shortcuts, mostly to 64 bit apps, but this particular shortcut is to a 32-bit app. And the other shortcuts do not exhibit this problem.
We tried a test to remove the launcher program, java, and other vendor components from the mix. We edited the OSD to have LaunchIt just open a cmd prompt. I mean, what could possibly go wrong there if nothing inside the bubble is actually executed? After 10 minutes, the Client killed LaunchIt with the 41E error.
After a few other failed test, we tried something different. We had LaunchIt32 start a 64bit system app from that same OSD. This was not killed! So we ran with the theory that because there were 64bit programs in this package that it needed something to be running 64bit. In the verbose version of the log file, you do see that the client must notice each process that is created and determine if it should be added to the virtual environment — there is a line in the log file showing the process added. Since the launcher and Java were both 32bit, this might explain the difference in results of this test.
So I cracked open Visual Studio and created a 64 bit version of LaunchIt. Popped it in the package, edited the OSD to run the 64-bit version and have it launch a cmd prompt. While there I needed to change the Win32 to Win64 also. And that test failed with the 41E error also, so the theory might not be right.
Although this trick hasn’t fixed an app yet, it might if someone who needs it reads this, so it is now included in the LaunchIt..zip download and stays in the post.
Moving on to something more useful…
We happened to notice that the Blue Bar above the icon tray remains with a “Launching” message until the timeout happens. I guess this is clue that the problem is going to happen earlier than the 10 minute wait. This speeded up our testing a bit since I could now be confident it would fail if the bar remained up for more than, say 60 seconds.
After some other investigations that didn’t help much, we hit on a new idea. What if we moved the problem program a “script”, and let it run to completion before the client gets to the App launching phase? I knew from previous issues that the hooking doesn’t happen to the scripts. If that ran, we could then replace the app being launched with something harmless that would terminate right away.
Back to editing the OSD, we moved the troubled program as a pre launch script, and add a dummy program as the main event, as follows. On the Codebase line, replace
... FILENAME="Q:\Folder\Program.exe" PARAMETERS="/argument" ...
... FILENAME="Q:\Folder\SilentTimedApp.exe" PARAMETERS="" ...
and then add a script in the DEPENDENCY area of the OSD.
<SCRIPT EVENT="LAUNCH" TIMING="PRE" WAIT="TRUE" TIMEOUT="0" PROTECT="TRUE" EXTERN="FALSE">
There are a few tricks to point out in the script.
- The EVENT and TIMING are necessary as specified as the script must run before the codebase line executable starts.
- WAIT and TIMEOUT are necessary as specified, or the script will be terminated after 300 seconds. We need this script to be allowed to run as long as necessary, and we do not want the client starting the FILENAME program of the CODEBASE line until the user exits.
- PROTECT is specified so that the script runs inside the virtual environment.
- EXTERN really isn’t needed as FALSE is the default value. If the program listed in the HREF was 64bit, we would have needed to set EXTERN to TRUE.
- The program inside the HREF is placed in quotes because it might have a space in one of the folder names.
The “SilentTimedApp.exe.exe” program added is a harmless Windows program I wrote to solve the problem (also found at TMurgent Tools). It runs for 10 seconds and then exists. The real trick here is to use the WAIT and TIMEOUT as specified. The WAIT element of the script has been officially deprecated since before 4.0, but it still works so I still use it. If you want to more about these OSD elements and parameters (and more), be sure to check out The OSD Illustrated.
So what happens when this runs? The blue launching bar remains up the entire time, but that is a small price to pay for a working virtual app! When the user finally shuts down the app, a short while later the 41E error appears, but only in the log file. Not really an acceptable fix to me, but the OSD change above was “good enough” for the customer. When I get a chance I am hoping to look at this further. Stay tuned.