Here you can download interface translation for 7-Zip archive format dlls. This article has also
posted on Code Project and won The Code Project Best C# article of June 2008.
About 7-Zip
7-Zip
is open-source archive program with plug-in interface. New archive
formats and/or archive codecs can be added by dlls. 7-Zip ships with
several archive formats preinstalled:
- 7z - its own format features good compression (LZMA, PPMd) but can be slow in terms of packing/unpacking
- Packing / unpacking: ZIP, GZIP, BZIP2 and TAR
- Unpacking only: RAR, CAB, ISO, ARJ, LZH, CHM, Z, CPIO, RPM, DEB and NSIS
The project is written in C++ language.
More you can find on official 7-Zip site - www.7-zip.org.
About this contribution
This contribution allows you to use 7-zip archive format dlls in your programs written in .net languages.
This
module I create for my own project that have ability to work with
archives. Currently my project has only extract capabilities, so only
this part of 7-Zip interface translated to C#. Later I plan to translate
compress capability as well. For now if you need such functionality
right now you can implement it by yourself, with this code, and 7-Zip
source code.
This translation is tested and already working in my own project.
Implementation details
All
communication with archive dlls done with com-like interfaces (why
com-like, and not com see in known issues section). Callbacks are also
implemented as interfaces.
Every dll contains class that can implement one or
more interfaces. Some formats allows only extracting, some also provide
compress abilities. Public interfaces translated to C#:
- IProgress - basic progress callback
- IArchiveOpenCallback - archive open callback
- ICryptoGetTextPassword - callback for prompt password for archive
- IArchiveExtractCallback - extract files from archive callback
- IArchiveOpenVolumeCallback - open additional archive volumes callback
- ISequentialInStream - simple read-only stream interface
- ISequentialOutStream - simple write-only stream interface
- IInStream - input stream interface with seek capability
- IOutStream - output stream interface
- IInArchive - main archive interface
Every dll export function for creating archive class handler and function to get archive format properties. These functions translated as .net delegates:
- CreateObject - creates object with given class id. Used mostly for create IInArchive instance.
- GetHandlerProperty - get archive format description (implemented class ids, default archive extension, etc)
Update (1.3): In 7-Zip 4.45 there is some changes in dll interface. Now all archive formats and compression codecs implemented as one big dll. So several new exported functions (and delegates for these functions in translation) are added to handle several archive handler classes in one dll.
Extracting algorithm
- Load 7z.dll library
- Get CreateObject function (use CreateObjectDelegate)
- Execute CreateObject function with appropriate format interface GUID, function will return interface, cast this interface to IInArchive.
- Open existing archive using IInArchive.Open function (you can optionally provide IArchiveOpenCallback, note that some formats require it and some not)
- Examine archive content and create list of file numbers to extract (numbers of files inside archive)
- Execute IInArchive.Extract function with file numbers and provide IArchiveExtractCallback.
- For each file to extract 7z.dll will call IArchiveExtractCallback.GetStream, provide destination file stream for every file to extract
- Optionally you can implement other IArchiveExtractCallback functions to show progress, make cleanup, etc
- Close IInArchive and existing archive stream
- Unload 7z.dll library
Packing algorithm
- tbd
Points of interest
7-Zip
interfaces uses variants (PropVariant) for property values. C# does not
support such variants as classes and all such parameters are
implemented in C# as IntPtr. This is done for compatibility and because I prefer not to use unsafe code in my projects.
Fortunately
managed class System.Runtime.InteropServices.Marshal has method
GetObjectForNativeVariant that you can use for converting such
"pointers" to objects. However this method does not handle all
PropVariant types (for example VT_FILETIME), for these cases I added my
GetObjectForNativeVariant method to this translation.
7-Zip works with
files through its own interfaces, so if you want to open file on disk,
or in memory you need to provide class implement one or more necessary
interfaces. Several such wrapper classes are also present in this translation
(they are wrap around standard .net Stream class).
Update (1.2): Most of the complexity related to PropVariant processing is now hidden in special PropVariant structure. And interface methods now return PropVariant instead of IntPtr.
Known issues
First
and most disappointing issue is that you cannot use 7-Zip dlls
directly. This means that you cannot simple take such dlls from 7-Zip
distribution and you them in your projects. This is because of
the incomplete COM interfaces implementations in 7-Zip code. All issues are
related to IUnknown.QueryInterface implementation. 7-Zip's
QueryInterface does not return IUnknown interface if prompted (this
part is most critical for working with com-interfaces in .net), and
some classes do not return any interface at all!
This is done
because 7-Zip code is C++ code and works with pointers, and most
functions returns direct pointers to interface implementation. That means that 7-Zip code not use QueryInterface at all. Sad, but .net
works in a different way, and first access to any interface always goes
though QueryInterface and IUnknown.
So if we use dlls directly we
have constant InvalidCastException. So we need to make several
changes in 7-Zip code and rebuild dlls. Or ask Igor Pavlov to include such changes to the 7-Zip code itself :)
Important Update: Starting from 7-Zip 4.46 alpha Igor did necessary changes in code. So, from this version forward, you can use format dlls directly, without applying any patch. Superb!
Second
issue is much smaller one. It is related to multi-threading. If you
plan to use 7-Zip interfaces only in one stream you have no problem.
Problem came when you try to use one interface in several thread. In
this case all thread except main one (thread where interface are
created) throw exception on any interface method calls. This is because
of RCW behavior. RCW is an object that wraps COM-interface in .net.
When you try to use interface in different thread RCW tries to marshal
interface and fails (because this implementation does not support
ITypeInfo).
Fortunately I've found simple solutions for this. Main
interface (IInArchive) returns as IntPtr, and not as RCW object. When
you need to access this interface, call System.Runtime.InteropServices.Marshal.GetTypedObjectForIUnknown
or any other related method and get RCW object. If you need to use this
interface in another thread simple call System.Runtime.InteropServices.Marshal.FinalReleaseComObject
(or ReleaseComObject), and create another RCW wrapper around returned
IntPtr pointer. Of course in this case you can use interface only in
one thread in time, but this is better than using interface only in one
thread. And any logic can be easily implemented with correct thread
locking.
And third is a well known issue but still I think it must be noted here. It appears that .net runtime does not support com interfaces inheritance (interfaces marked with ComImport attribute). This is definitely .net bug, but I don't know when Microsoft fixes this bug or fix it at all.
There is simple solution to avoid this bug. Inherited interface must be declared as standalone one and first methods must be methods of inherited interfaces in the order of appearance. You can see sample of such "inheritance" in this translation source.
Demo
Due to many request, I have spend some time and written a little demo program. Demo program lacks proper error checking, lacks different archives support (zip format is hardcoded in source, but can be easily changed), it lacks almost everything, but it has two advantages: it's simple, and it's works.
Demo has only two modes, first to list all files in archive, second is to extract single file from archive. I think that this is enough to understand how to use 7-zip interfaces and how to create something more complex.
If you want to run demo, don't forget to put 7z.dll (can be found on official 7-zip site) to the executable folder with executable.
Update (2.0): Now demo can create simple archives. Also in this demo some issues found by CodeProject community users was fixed.
Version history
2.0 - Packing support introduced, added support for the latest 7-zip version (4.60+), demo updated to show basic packing principles
1.5 - Small demo added
1.3 - Added two new delegate for features added in 7-Zip 4.45
1.2 - Variant type changed from IntPtr to newly created PropVariant structure
1.1 - Stream wrappers added, minor interface translation changes for better usability
1.0 - initial release
Downloads
7zIntf20.zip | mirror (Slow)
module with 7-Zip interfaces translated to c#
7z445bPatch.zip (temporary removed due to google bandwidth limitations)
Patched files from 7-Zip 4.45 beta which correctly returns IUnknown interface and contains several bug fixes (all related to QueryInterface, some archive handlers does not return any interface at all!)
7-Zip 4.46 beta sources on SourceForge