Making PIXIE
I just released PIXIE (PIcture eXploration and Inference Engine), a novel tool for visual creatives to intuitively browse/search for inspiration and reference amongst saved images. Below is a overview of my process, thoughts, and experience with the project.
Here are the various links for the project:
Table of Contents
Intro
Pixie started as a short project that used a vector database to search for images using CLIP. As I was building, I wanted to add even more search types and I had a cool idea for the UI, which lead to the final version. This post is going to cover what tools I used and what steps I took to complete this.
Using PyQt
The entire application is made with Python with the UI made with PyQt. I used PyQt5 over PyQt6 because it’s my first time using Qt. The Qt docs are only in C++, which isn’t a huge deal, but there were so many tutorials for 5 over 6, so using PyQt5 was a lot easier to get started with. I’d say like a third of the UI code was LLM generated, just because it was really good at generating examples for me and getting things up and running. This was mostly just copy pasting code into GPT, Gemini, or Claude, but I also tried out GitHub Copilot which is integrated into VSCode. I’m trying not to use AI so I can learn, so I tried to use LLMs only on trivial stuff I would’ve had to spend time just looking at the docs for. I was really tempted to try out the free Gemini CLI though.
Similarity Search
As the intro said, at I first was just trying to make a simple image search. Vector databases are probably the most standard way to do this so I started with ChromaDB (I this changed later), because it was popular and had a built in CLIP embedding function. When that was done, I had the idea of searching for images by color.
Searching by Color
This was a pretty unique problem. I first started researching how to do this. The best method I found was a shutterstock demo. In that program, they converted an image into a histogram of color. This was a really good solution of course, which you can see in action in that demo. I could’ve used that approach but I wanted to do something unique this project, so I came up with my own method. Also, I don’t think they had a search by colors from an image option (I could still try using a histogram for that but nah). Coming from using vector embeddings for text and images in CLIP, I stayed consistent and tried a vector approach. This includes 2 parts. Embedding an image into a color vector, and then computing the distance/similarity of those vectors. The simplest way would be to get the average color of all the pixels and then use euclidean distance with the RGB values. But this would probably give me a gray or brown mush of all the pixels. Then I figured to reduce the number of colors of in the image. After that I could just get the most frequent color and use RGB euclidean distance. This would work on a basic level, but it wouldn’t capture the other colors in the image. To do that, we could just get the k most frequent/dominant colors in that reduced palette. If the reduced color that was picked is not actually that dominant (like less than 5% of the most dominant color) it shouldn’t be considered dominant. I also checked if a possible dominant color was significantly different than any other color of the more previous dominant colors. While we can have k dominant colors, it’s possible some images will have less than that, so the resulting color vectors won’t always have the same dimensions. Now we have a vector that contains the RGB values for the k dominant colors in our image. This is great because this also doubles as a method to get a color palette for the image.
Now we have a different problem. With multiple colors, we can’t just simply get the euclidean distance of every value in our vector like we could with 2 RGB vectors. To solve this, I created a custom distance function that calculates and sums the distance of each color to every other color. I then weighted these distances with how dominant they were, so a large difference/distance between 2 dominant colors would result in a large total image-level color distance. At first I tried using rank weighting, but I ended up using the actual frequencies of the color on the reduced image, which is much more intuitive, also considering these were already picked as the dominant colors. For the individual color distances, I don’t use simple RGB euclidean distance. I convert each RGB into the CIE-LAB color space and then get euclidean distance because it’s supposed to be more close to actual perceptual differences in color, but honestly I don’t really see much difference in the results. What did have a significant difference though is adding a penalty for hue differences. I also weighed this hue difference by the saturation and its value. My logic behind this is that the more saturated a color is, the more the hue matters. The hue also matters the most at medium lightness/value. At first I just weighted the penalty with a peak at 0.5 lightness, but I was having an issue with white dominant images being scattered within results making me think they were close/had low distance to a lot of images. I think this arises with using CIE-LAB. Then I decided to weigh the penalty higher for colors that have low-medium lightness and that improved the results. Even though it should reduce the penalty for white colors supposedly giving them an even closer distance, it made them appear less in the results. I’m not really sure how this worked. My final code isn’t exactly elegant, but it gives very visually appealing results on most images and searches. I looked into other possible options, like Hungarian matching, but honestly they seemed to do worse. Plus, my method is a lot easier to tweak because of how simple it is. A lot of this algorithm was just testing different ideas and parameters a bunch of queries and deciding if I liked it or not. Right now, I think the single solid color query works great, which I consider the baseline (from that shutterstock demo and other programs). The color search by image also works well, but I’m still not 100% satisfied with it. That’s probably at the top of the list of things that I want to improve in the future.
I also looked into something called color hashing, a form of image hashing, but I didn’t interpret the different clusters it made that well. I also didn’t spend much time with this. I did add a class to store color hashes though (similar to the vector store). This is another thing I might try to use in the future.
This custom distance function was really annoying, partly because of the problems above, but also because I couldn’t find an off the shelf vector database/store that let you define your own distance function. Not sure if you can enter different sized vectors (for varying number of colors) either, but I still could’ve just padded the color vectors. The custom distance was a problem. Maybe this was a sign that the color vector approach was dumb, but I just decided to create my own vector store from scratch lol. Here I just implemented KNN. I looked into HNSW, which is a popular algorithm for Approximate Nearest Neighbors (greedy version of KNN that’s much faster), but I figured that KNN was good enough for me knowing that nobody was going to have more than like 10,000 images in a single collection, especially not millions. If I wanted to scale up this project to have a shared images across users like Pinterest, I would use HNSW (but also look into other vector database/store options).
ChromaDB -> FAISS
While I started using ChromaDB, it worked pretty well. But when I started integrating the database access with the actual application and UI, for some reason the database wasn’t working. It seemed like the problem happened when I tried accessing the persisted/saved DB from a separate python file than the one I created it in which didn’t make sense at all. All my code looked correct (most of it was basic code I copied from the ChromaDB examples), so I was tweaking out when this happened and I just scrapped using ChromaDB. My usage wasn’t even complex so I really was annoyed. Then I started using FAISS because I had experimented with it before and I haven’t had any problems since.
Creating Mosaics
So I knew I wanted a view that had a central image for color and visual search, and for CLIP I just let it be blank in the middle. I had the idea to place images radiating outwards on a 2D plane, and I stuck with that.
Hexagon Layout
I first started with the hexagon layout, because I thought it would look cool (it does) but I probably should’ve started with the circle one because it was the simplest. I started placing the center image. and then creating hexagonal rings around that. Placing the images in hexagonal rings was NOT trivial. I ended up binning each image into a side of a hexagonal ring. Then I randomly chose from each side, and only when I had sampled from each side once, I could go back and sample an image from a side. This created a even structured and random method of populating the hexagon layout. This page by Red Blob Games was really useful in creating this.
Circular Layouts
Next up was the circle layouts. The default circular layout was pretty easy to make. I used a polar approach. First I decided how many images would be in each ring, then I split up the angular space by the number of images planned to be in the ring. Then I placed the images at that angle and increased the radius each time I stepped up to a new ring. I added stuff like a offset and a randomness to the number of images in each ring so that there wasn’t a line of images radiating from where there was guranteed an image to be at θ = zero or unnatural patterns in the layout. The final result is pretty cool. Next was the circles by hue layout. This was also pretty easy using the original circle layout logic, just replacing the angle θ of the image with it’s average hue (extracted using a circular mean on the hues of the dominant colors). At first I just sorted the images by hue (within each ring) and tried using the original circle layout logic, but it produce any visually appealing results like what I have now. I do have the problem of images being covered up by others, which I might look into in the future, but I left it as is because spreading images out may produce a less appealing affect. Plus, the varying density of the hue area looks cool and hints to how many images there are, and a user can actually fully explore it by searching by a color (that represents that hue area) in that collection (with a circle or hexagon layout).
Issues
Bugs
There’s a bug with the panning of the view. There’s a bias to the left side of the view, and only occurs when I drag a considerable distance (not short mouse movements). Also, sometimes when I stop panning, the program starts linearly panning (usually left or diagonally left and up) when I let go, then stops after a second. I tried a lot to fix this but I couldn’t figure it out. These panning issues don’t affect the core program, but does affect UX. I also didn’t fully stress test this app completely. There’s probably some invalid input or UI error somewhere but should be fine if you just restart the app.
Bottlenecks
The main processes in this program is creating indices, downloading pinterest boards, and rendering images. All of the search types seem to be working as best as they can. It does take a little bit of time to add CLIP and DINO indices for me, but also I don’t have a GPU. My method of using it I think is pretty standard, so I’m not really focused on improving that, especially because it’s just a one/few time wait for each collection. Downloading the Pinterest board is straightforward and PinterestDL is mainly doing everything so not much room ofr improvement there. It would be nice to have real progress percentage in the progress bar for downloading, but PinterestDL doesn’t have an easy option to do that. The main issue I’m not satisified with is rendering the images. what’s happening is that the QImages, which I need to place on the QGraphicsView, are taking a significant amount of time to create. I’m not an expert in Qt, but I did try researching a lot but couldn’t find a better option. It would be really nice to have short wait times for each query, but idk. I tried caching the QImages, which had really good speed, but I ran out of RAM really quick on my 8GB laptop. I just decided to leave it alone for release.
Everything else
Most of the other code was UI stuff and performance stuff like using threads with QThreads. I’m skipping over explaining it because it’s not super interesting, but this was probably the bulk of where the time in this project went. The stuff above (besides tweaks to the color distance which I was making throughout the project timeline) only took like 1-2 weeks, but the stuff I just mentioned plus learning Qt, testing, bug fixes, saving collections, and everything to make the program actually usable pushed it to 4-5 weeks and taught me a lot about project timelines. If you’ve read this far, thank you for reading and feel free to contact me!