GeoExpress is a software package that provides image compression technology for converting raw imagery into the compressed MrSID format. Not all MrSID files are created equally, however, which makes understanding the differences between them very important. MrSID Generation 3 (MG3) files are a legacy format still used by many GIS applications. MrSID Generation 4 (MG4) is the newest file format available in GeoExpress. MG4 files offer many benefits when using imagery that contains unnecessary extra pixels. These pixels are found around the edges of any imagery that is not perfectly rectangular. Most users do not want to see the extra pixels, so GeoExpress offers the option to define these pixels as “nodata” so that other applications will handle them appropriately.
Consider the following two pixels:
Imagine that these pixels are on the western edge of an image. The blue pixel is the ocean and the black pixel is not a part of the imagery – it is an extra pixel around the edge of the image that we don’t care about. We want that pixel to disappear so that we can use this image in a public product. You could try to crop it out exactly without losing the blue pixel, but what if you have an irregular border of blue and black pixels? The fastest way to make the border disappear is to turn off black pixels using nodata.
Nodata is a bit self-explanatory, however the way it is implemented in GIS applications is not as clear. Nodata can be defined either by the metadata of the file itself or by using symbology settings within the viewing application. Either way the effect is the same – anything defined as nodata is handled differently than the rest of the image. Most often nodata is labeled as “transparent” or “background” and simply tells the viewing application not to display any pixels that fall under the nodata definition.
Using MrSID Generation 3:
The method that MG3 files use to set nodata is a blunt instrument. The nodata value set in GeoExpress does not change anything in the image itself. It only tells the software how to encode nodata in the metadata. It is a single function but does two things in practice. First it says that the background is the value of the nodata setting – “anything with a value of (0,0,0) is background”. Second it says that any pixel with that same background value is nodata and should be transparent – “anything with a value of (0,0,0) is nodata”. Viewing applications use this metadata to determine how pixels should be displayed. So if you set the background value to black then the previous two pixels will look like this against a fuchsia background:
This is great! The viewer recognizes that the nodata value is background and makes it transparent. The edge pixel is now gone and we only see the underlying fuchsia color of our viewer. However we run into problems when there are other pixels in the image with the same value. Because the MG3 nodata definition applies to the entire image it means that any other black pixels in the image will also be transparent. So you will have pixels missing from parts of your image where there was valid data before compression. This can often happen in shadows or dark trees if nodata is black, and rooftops or snowfields if nodata is white. There is another knowledgebase article that goes into this in much greater detail at this link:
You must also set a nodata value if you do any cropping, reprojection, or mosaicking that will create new pixels without a value. Imagine that our blue pixel above was originally part of a larger rectangular image that did not contain any nodata areas:
We care about every pixel in this image so we don’t want to set nodata to any value, because there is not any nodata in the image. However we also want to reproject the image when we compress it to the MrSID format:
By reprojecting the image we created triangles of nodata pixels in the NW and SE corners of the image. Even though we started with a square image without any nodata, GeoExpress needs to know what value to give to these newly created pixels. If you don’t choose anything on the transparency tab then GeoExpress will use the default of 0. The MG3 format simply assigns the nodata value to these new pixels so that they will become transparent in viewers. The unfortunate side effect is that this setting also makes any pixel with that same value transparent. This means that the black buildings on our islands (which are also our nodata color) will vanish and allow the background color to shine through:
In order to resolve this problem you should use the MG4 format explained in the next section.
To set transparency using a MG3 file, go to the Transparency tab of Advanced Job Options and check the “Set transparency (no-data) and background values” checkbox. By default the nodata value is set to zero for the red, green, and blue bands. This is appropriate for our image since the pixel value of black is (0,0,0) for RGB. However some imagery has white (255,255,255) or other nodata values so it would need to be set to this other value. If you need to change the default then enter the value in the “Edit Value” box and click Apply to all to set the new values.
Using MrSID Generation 4:
MrSID Generation 4 handles nodata much differently. Instead of encoding a single nodata value in the metadata it actually creates a completely new band called the alpha band. Almost all GIS applications recognize the alpha band as a determinant for whether a pixel should be transparent or not. In an MG4 image every pixel value is valid so nothing is defined as transparent within the original bands. The new alpha band then gives every pixel a value of either 0 to make it transparent or 1 to render the actual value. There is no such thing as "sort of" transparent - the pixel must be either visible or invisible. Here again is our raw image:
When we convert this raw image to MG4 format it will create the alpha band, but every alpha value will be 255 meaning that no pixel will be transparent. Now let’s say we did a reprojection of this image like before. Since we care about every pixel in this image we still don’t want to set nodata to any value – but this time we don’t have to! Leave the transparency settings as opaque:
The MG4 format will assign the new pixels a 0 value for all bands, including the alpha band. So the newly created pixels are actually true black just like the buildings on our islands. However, unlike the MG3 file, the transparency is handled by the alpha band rather than by a metadata setting so only our newly created pixels are transparent. We get to keep our buildings intact.
You can still run into trouble with MG4 files if you have legitimate unwanted pixels in your original raw file. In this case you should not use the opaque selection because the extra pixels will display as their true value instead of transparent. Unfortunately this means that you will run into the same problem that occurs with the MG3 format in this scenario.
To set transparency using a MG4 file that has no unnecessary pixels – do nothing! Leave the transparency settings at opaque and GeoExpress will sort out the pixel values on its own. If you have a file that already contains areas of nodata you can manually set nodata to this value on the Alpha tab of Advanced Job Options. Select either the “Transparency Value (metadata)” button if the metadata of your file already contains the transparency information, or select the “Specify Value” button if you know the value of the nodata pixels. By default the nodata value is set to zero for the red, green, and blue bands. This is appropriate for our image since the pixel value of black is (0,0,0) for RGB. However some imagery has white (255,255,255) or other nodata values so it would need to be set to this other value. If you need to change the default then enter the value in the “Edit Value” box and click Apply to all to set the new values.
The above information is applicable only when doing lossless compression. If we compress the image at any other ratio (20:1, 50:1, etc) we’re doing lossycompression, meaning that each pixel’s color value in the output image will be close to, but not exactly equal to, the corresponding pixel’s color value in the input image. Lossy compression means that the value of any given pixel “loses” its true value in compression. The compression process will change some pixels by just a few units, such as changing a (0,0,0) pixel to a (0,2,1) pixel. Fortunately, this change is imperceptible to the human eye. Let’s say these two pixels have the values of (0,0,0) and (2,4,8):
These pixels are part of a huge file so we have to use an equally large compression ratio to save hard drive space. The higher the compression ratio, the more the data can get skewed, so our two pixels eventually end up at (0,2,1) and (0,0,0) after compression:
They look very similar and the slight color difference would be easily overlooked in the context of the main image. Unfortunately the software that recognizes nodata pixels is not a casual observer. It will only recognize the true value that we defined in the metadata in our MG3 file. Now the pixel on the right will become transparent and the pixel on the left will display its new value, even though in the raw data it was the opposite! This leads to very messy looking edges on our image:
The solution to this problem lies in despeckling as detailed in the Despeckling Overview knowledgebase article:
Luckily with MG4 files you will not encounter this problem. The alpha band is always compressed losslessly no matter how high the compression ratio is set. So, even though the pixel on the left changed to not-quite-black in output, GeoExpress recognized it as nodata before compression so it will still be transparent. The same is for the pixel on the right – it wasn’t nodata before, and the alpha band still knows that, meaning it will not be transparent even though the color values changed in output.