About Searx
Searx is an open-source metasearch engine that you can host yourself.
Install Searx on your workstation, and you can begin using your own metasearch engine, locally, and privately.
Alternatively, Searx can be added to a server at your site and other users on that local network can use the metasearch engine.
With some networking changes it can also be served over the internet and accessible everywhere; though it’s important to note that there are potential security and bandwidth implications associated with doing this. Opening your locally hosted services to the internet requires awareness and reasoned consideration for how it should be safely managed and monitored.
If you are not into self-hosting, there are instances hosted by others who permit public access. Use these with caution though, you would be effectively running your searches through unknown and unregulated third-party providers.
Regardless, Searx is intended to offer a more private way to search, and since Searx is a metasearch engine, you can have your privacy without relinquishing the results you would get from your favourite search engines.
Searx Hosting
My preferred way to run Searx is as a docker container. Using docker, the basic engine and interface can be running in a few minutes (seconds, actually, if you are already familiar with docker).
Searx is also expandable, and there are other projects which are designed to improve various aspects of the metasearch engine. I will look into these later, but for the time being, I am thrilled to have my own private way of searching for things.
Though I have only had one significant issue with Searx for docker, Searx is actually a relatively young project. The current Searx release (as of the latest edit on this post) is only v1.1.0. Despite this, I found nothing unreliable or awkward about using it, and while there are areas for improvement I am sure those will come with time. Currently, there are about 300 open issues showing on the searx GitHub page, however, to be fair many of these are feature requests or usage questions. I have not come across any true showstoppers for my implementation.
From another point of view, the number of open issue tickets for Searx can be an indication of user engagement. Having a good level of community activity can help identify and prioritise issues, problems, and features for the development team, but can also lead to useful solutions coming from that community.
From my own experience, this final point was demonstrated by a recent minor issue I experienced. The configuration for a particular feature was not working for me. It turned out that the instructions in the configuration file were incorrect for that functionality. This caused only a small loss of time during setup without any service interruption. In any case, it was resolved by a quick scan through the issue tickets and a solution was provided by prior comments from others in the community. This was a good result, despite the bad comment in the config file.
After two years of regular use of my own Searx instance, I would not do without it; I am happy with the results it is giving, the features and customizability are impressive, and searches over direct connections are fast. I look forward to seeing future developments.
User Personalisation
Searx comes preconfigured to use multiple search engines, and personalisation on a per-browser level is available. Settings are saved in cookies only read by that searx instance, and their contents are not obfuscated – through the preferences panels you can easily view the contents of those cookies.
The personalisation settings are quite broad, so while the full range of options available and how to navigate to them may take a little time to get used to, that’s fine since the default setup provides a comfortable and generally familiar environment. They also go a little deeper into the technical aspects of search engine operation than on sites like Google, Bing, or DuckDuckGo. I have not yet investigated use cases for all of the options in Searx, but it is comforting to know that there is some built-in flexibility for future needs.
For a general overview of options, they include the ability to:
- Toggle default plugins
- Toggle search engines for each category
- Choose a theme (including a dark theme)
- Select a query submission method.
- Select a level of “safe search”.
- Choosing how results are presented.
Privacy
While the Searx Metasearch approach is more private than using traditional search engines directly, I feel that it might not be private enough. This is not the fault of Searx; it is inherent in how HTTP requests are made.
Perhaps I am misunderstanding how query IP addresses can be concealed from the individual search engines, but I am not confident in the that the originating IP of a search can be obscured completely, even using POST requests.
The reason behind this concern is that I wanted to extend my use of this Searx instance beyond the confines of my personal network and use my hosted Searx instance while travelling. The simplest way to do this is to allow access to the service from a broader range of IP addresses. THe significant downside of this is it would enable literally anyone else in that IP range to have access to my search engine, and this could result in who-knows-what being logged as having been searched from my IP, which has the potential to become very awkward, or worse. I think the upside of having external access to my metasearch engine really does not justify the potential downside of strangers using my IP to do searches. Sure, I could use a VPN to gain access remotely, but that involves extra steps while travelling which, to be honest, I probably would not be interested in performing for every offsite search.
So I came to the conclusion that this would be a perfect job for redirecting the secondary searches though a VPN. Directing all the queries and results that Searx makes to search engines like Google and Bing through a VPN cannot actually prevent misuse by third parties, I think this precaution along with passive obfuscation strategies should limit or eliminate potential issues.
In any case, this is when I came across the gluetun docker image. It was very simple to implement, functions as a filtering proxy (though my VPN service already implements filters), and since the searx/searx docker image provides a settings.yml file where you can nominate a proxy server for search connections, it was a simple matter to run the queries that Searx makes through a VPN to a reasonably close (fast) endpoint.
It was not all smooth though. I’ll just say that you need to be aware that the comments in the settings.yml file can result in an incorrect proxy configuration. This was mentioned in an issue back in 2018, but the file I started with still contains that error. It may have been fixed by the time you work with it, but check the link above to be sure.
Secondarily I did have to increase the default search timeout a little to prevent failures due to the delay caused by running the queries through a VPN. I am happy with the speed of my VPN service, but it will never be as fast as a direct connection. How you set the timeout depends on multiple factors; your location, which endpoint you choose, and the level of responsiveness you feel is required, so a little trial and error may be required.
If you decide to implement search query proxying as I have, do ensure that the searches that Searx makes actually do go through that VPN. Once you have the configuration settled and test searches function correctly, try disabling the VPN proxy to ensure they fail to fail. When you see searches fail with the VPN down, bring it back up and see that they begin functioning again. This provides some confidence that Searx is querying external search engines only via that VPN, and that your IP won’t be identified or flagged if a bad actor finds and uses your Searx instance.
Summary
Searx is good for my use as a traditional search engine replacement and seems to improve privacy. It has a lot of built-in flexibility for customisation and service expansion.
I chose to run two instances of Searx in docker:
- Internal: This instance is only available on the local network. It makes direct queries to traditional search engines, for onsite queries. Access is blocked for any IP outside my local network.
- External: This second instance is available over the internet. It is used for searches while I am away from my local network. For this instance of Searx all communication with external search engines is proxied through a VPN, so there will be no association of my site IP to any query terms made to that externally facing instance. Additionally, certain IP ranges have been geo blocked. I believe this is sufficient risk mitigation.
My reason for proxying the searches of the externally facing instance may just be my old friend paranoia; the VPN may not be strictly necessary, but I intend to keep it this way.
You might want to run Searx (directly, or in docker) on your laptop while travelling. Doing it that way would actually make it a lot simpler for one user to run private and secure searches through a commercial VPN.
Since we do most of our searches on Android devices while travelling it is actually worth the extra effort to have a private and secure search engine available for all of the family.
I hope this has been helpful to you. All the best!