TreeList An AVL-Tree or Doubly Linked List |
||||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
|
||||||||||||||
Download: |
|
|||||||||||||
G.M.Adelson-Velskii and E.M.Landis (AVL) Binary Tree |
The following code is a simplified and yet optimised AVL Tree which can morph to and from a Doubly-Linked List very quickly and efficiently.
As usual the code is written for readability and maintainability - A marginally faster version was written, but it was too ugly to survive.
The normal AVL rebalancing makes lots of status checks which are unnecessary if you just do the rotations at the right time in the code.
The order is determined by your implementation of the Compare function (examples follow).
There must be a function T::Compare(T&) or the function Compare of the Template class CTreeListNode<T> has to be overridden.
In normal use the structure is an AVL Tree (A Binary Search Tree which is optimised (balanced) whenever nodes are Inserted or Deleted).
Once insertion and searching is finished the structure can be turned into a linked list for fast iterations.
The TreeList insertion is faster than linked list insertion for large numbers of nodes (unless the Nodes are inserted in reverse order).
Turning the Tree into a linked list only need happen once all the nodes are inserted.
Then iterating the list is faster than iterating the Tree.
To put some numbers to these ideas:To insert 32768 nodes in descending order into this structure in List mode took 2 units of time To insert 32768 nodes in ascending order into this structure in List mode took 3800 units of time To insert 32768 nodes in ascending order into this structure in Tree mode took 5 units of time To insert 32768 nodes in descending order into this structure in Tree mode took 6 units of timeThere are many ways of implementing AVL Trees and a lot of thought went into choosing the methods used here.
If you think you know a faster or better way then please use the Feedback button at the top of every page, but I've probably tried it already!
Examples being:
- Having each node store a pointer to its parent (makes Insert 1/3 faster but makes the code bulky and less readable (and isn't necessary)).
- Using separate code for double rotations (negligible speed increase and creates duplication of code).
- Cut/Link Restructure instead of rotations (big speed decrease (half as long again)).
- Moving methods from one class to another (They're all in the right places:)
You can use this Structure just in its Tree mode, or its List mode.
You'd use the Tree to search and sort data quickly or to identify or remove duplicates from data.
It is similar to std::map but (particularly for STL beginners) easier to use (and std::map uses "Red Black" Tree instead of AVL Tree (see https://www.sgi.com/tech/stl/stl_tree.h)).
Any tree is slow to iterate in sorted order in comparison to a Linked List.
An AVL tree is not perfectly balanced.
The height of a perfectly balanced tree is approximately lg n, where n is the number of nodes
whereas the worst case height of an AVL tree is approximately 1.44 lg n.
In practice, however, the AVL height approaches lg n as n gets large and so is almost optimal.
So where n is large and performance counts, AVL trees are preferable to random Binary Trees.
This is a template-based class, requiring the nodes to have a Compare function that returns an int value:
<0, if *this < t 0, if *this == t >0, if *this > tFor example:#include <iostream.h> #include "TreeList.h" class CInteger { public: int i; CInteger(int i) : i(i) {} int Compare(const CInteger& j) const {return i-j.i;} }; main() { CTreeList<CInteger> TreeList; TreeList.Insert(new CInteger(1)); TreeList.Insert(new CInteger(2)); TreeList.Insert(new CInteger(2)); // Returns 0 because 2 is already inserted (Duplicates are not stored). TreeList.Insert(new CInteger(3)); for(CTreeListIterator<CInteger> it(TreeList); it; ++it) cout << it.GetData()->i << endl; }Here's an example using the Iterator a little more:#include "TreeList.h" #include <fstream.h> #include <string> class CMyString { public: CMyString(const std::string& S) {String=S;} int Compare(const CMyString& S) const {return String.compare(S.String);} std::string String; }; void SortFile(const char* iPath, const char* oPath) { ifstream iStream(iPath); ofstream oStream(oPath); char Buffer[255]; for(CTreeList<CMyString> Tree; iStream.getline(Buffer, sizeof(Buffer)-1); Tree.Insert(new CMyString(Buffer))); Tree.Morph2List(); // (This is only worth doing if you're going to iterate more than once) for(CTreeListIterator<CMyString> it(Tree); it; ++it) oStream << it->String.c_str() << endl; // Note that it-> returns it->GetData() it's just easier to write! it.Last(); while(it) oStream << it--.GetData()->String.c_str() << endl; }Here's the same thing using MFC:#include "TreeList.h" void SortFile(const char* IPath, const char* OPath) { CFile IFile,OFile; if(!IFile.Open(IPath,CFile::modeRead|CFile::shareDenyNone)) return false; CArchive IAr(&IFile, CArchive::load); if(!OFile.Open(OPath,CFile::modeCreate|CFile::modeWrite|CFile::shareDenyNone)) return false; CArchive OAr(&OFile, CArchive::store); CTreeList<CString> TreeList; for(CString S; IAr.ReadString(S); TreeList.Insert(new CString(S))); TreeList.Morph2List(); // (This is only worth doing if you're going to iterate more than once) for(CTreeListIterator<CString> it(TreeList); it; ++it) OAr.WriteString(*it+"\r\n"); // Note that *it returns *(it->GetData()) it's just easier to write! it.Last(); while(it) OAr.WriteString(*(it--.GetData())+"\r\n"); }If you're new to this sort of thing there are a couple of things you'll need to know:
When using MFC files #include <string> in .cpp files has to be at the top of the file above the static char THIS_FILE[] = __FILE__; section.
Declarations using the Standard Template Library sometimes need a space before the trailing > so this will work:
std::map<std::string, int, std::less<std::string> > UserCode;
and this will give a weird unrelated error (syntax error : missing ',' before identifier 'UserCode') because it sees the >> as the shift-right operator:
std::map<std::string, int, std::less<std::string>> UserCode;
There must be a function T::Compare(T&) or the function Compare of the Template class CTreeListNode<T> has to be overridden.
You must have the word const twice in the Compare function definition!
You can see that the definition must be identical from the Non-MFC example above: std:string has a suitable compare function but the name Comapare is all lower case...
So a whole new class that just provides a 'Compare' interface to the 'compare' function had to be made...
Of course, if you always use the std::string you'd be better off just changing the case of the Compare function in TreeList.h
Here's an example with nodes that contain a CString Key and int data that is used to maintain a count of files with a specific file extension in a given folder.
The CString in each node is the file extension (eg. "cpp") and the int is the count of files with that extension.
This class is going to be used as the node:
class CExtension : public CString { int Count; public: CExtension(CString S) : Count(0) {*((CString*)this)=S;} int& operator++() {return ++Count;} };Ext is a CString currently holding the file extension.
The class hasCTreeList<CExtension> TreeList;in the header and each time a file is found the following code inserts a new node or increments the count for the files extension:Ext.MakeLower(); TreeList.Insert(new CExtension(Ext)); // The new CExtension is deleted if a node with this extension already exists. CTreeListIterator<CExtension> it(Ext); it.Find(CExtension(S)); int CountOfFilesWithThisExtension=++*it;Finally a note on the "Behaviour Class"
If you want something to happen to every Node's Data then derive a class from CTreeListNodeBehaviour and use ForAll:
Here's an example class that goes through a Tree of CStrings and puts '<' and '>' characters on either side of the string (so "hello" becomes "<hello>"):class CTreeListTagger: public CTreeListNodeBehaviour<CString> { private: void Behaviour(CString& S) {S='<'+S+'>';} };Don't try Deleting or Inserting within Behaviour, the restructuring that they induce will mess up the recursion and Behaviour may be missed on some nodes and done twice on others as a result.
Here's a version that counts the nodes in the Tree...
It is only 25% slower than the GetNodeCount function:class CTreeListNodeCounter: public CTreeListNodeBehaviour<CString> { private: int Count; void Behaviour(CString&) {++Count;} public: CTreeListNodeCounter() : Count(0) {} int GetCount() const {return Count;} void Reset() {Count=0;} };It's slower because the ForAll function has to pass the Behaviour as a parameter.
Arsen Gogeshvili has a Java Applet that helps visualise the processes involved in AVL Trees here
THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS 'AS IS' AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE REGENTS OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.