OpenSearch ML reranking with AWS Personalize Service – PART 2

O

Hi, that is the continuation of the 1st part, which you can find here.

So, after beating the AWS permissions “beast”, I was finally able to install and configure OpenSearch re-ranking plugin. But it appeared to be only half of the puzzle. There is 2 ways to apply plugin at OpenSearch Service queries:

  • at index level – which I refused from, as current approach would not allow to provide A/B in a flexible way
  • at individual query – that is what I chose as suitable solution

To apply search pipeline at individual query, two things are required :

  • pass special parameter search_pipeline with pipeline name as a GET parameter for API request
  • pass special query “ext” part at API request by itself. Here is the example from official documentation:
 body = {
    "query": {
        "multi_match": {
            "query": "Toyota",
            "fields": ["BRAND"]
        }
    },
    "ext": {
        "personalize_request_parameters": {
            "user_id": "USER ID"
            "context": { "DEVICE" : "mobile phone" }
        }
    }
}

At 1st glance – nothing special. But it appeared to be a real problem. The search system, which has to be boosted with ML, is written using PHP Symfony framework. Under the hood it uses abstract DSL library to provide objective query builder and low level official php Elasticsearch client. If you a little bit confused why I decided to use OpenSearch as “backend” for Elasticsearch application, then I recommend you to read my article “What is OpenSearch? If OpenSearch and Elasticsearch are compatible?“.

In short – OpenSearch is fully compatible with Elasticsearch 7th version. The key word here – “almost” 🙂 . It is indeed fully compatible with OpenSearch 1.3 version. But re-ranking plugin requires OpenSearch 2.9 or higher, which already has some functions that are absent at Elasticsearch at all. And search pipelines, as a search pre/post processors, are among such a functions. It appeared, that the list of allowed query GET parameters at ElasticSearch is defined as a final dictionary. To overwrite it, I had to make a fork from official php client package, the same was in case DSL – I had to to add custom query builder for “ext” body request.

And that is, as you understand, a problem. Elasticsearch community will not accept such a changes, which is logical. That means – that own support for fork repos is required. And it is mostly not acceptable. From another side – search application evolved at rather complicated software – rewriting it with some custom raw queries looked to be even more stupid idea, then supporting own PHP packages for Elasticsearch. But I decided to finish my experiment and after that think how to be with that complicated issue.

Finally, after making forks and forcing required behavior at PHP packages, I’ve got a desired result – working personalized search system with using AWS Personalized-Ranking ML model and OpenSearch Personalize ranking plugin. I was happy like a child 🙂 But only 5 min. Yep, only 5 min, As after first tests of how search system works, I understood that is not what I expected to get.

Here how it worked in the end:

  • User (which was I as a tester) comes at search UI, set filters and get results with pagination. Let’s assume that OpenSearch returned 1st 40 items. Current items are send to ML AWS Personalized model, which returns re-ranked results 
  • OpenSearch plugin operates at returned results only – NOT AT WHOLE OpenSearch engine. The last one option would be logical and useful behavior. That is was I expected to see – fully operational personalized search system at ML steroids – not a limited minimalistic re-ranking for partial results.
  • Such behavior, as for me, makes OpenSearch Personalize ranking plugin mostly useless for personalizing any real search system. Let me explain why. At first I thought, OK – maybe I can replace paging with infinite scroll, find all items, then re rank it, and after that return it to the user. But I found rather fast, after increasing page size 1000, that something is not yes again, as last results were not re-ranked. I found why it is so at documentation:

So, here are my final conclusions:

  • Opensearch re-ranking plugin is not a solution for creating personalized search system, the same as AWS Personalized Service at all.
  • AWS Personalized Service can be rather good choice for creating some items recommendations that are not changed often in time – like products at e-commerce sore, or films streaming platforms. Though it still has a lot of limitations.
  • The only possible practice usage of Opensearch re-ranking plugin, in my opinion, is some narrowed area, when we need to provide tough filtering at first, and then represent personalized results with < 500 items, which is a really rare occasion at middle or big search applications.
  • OpenSearch and Elasticsearch started to diverge. While looking at last versions (OpenSearch > 2.9, Elasticsearch >8.x), it becomes hard to use them as compatible products. Suppose, in future, that difference would only deepen.

After my sad summary I started to look at aws documentation for according information. Finally I found the part, which confirmed my practice experience. Despite I read it before my experiments, I missed the main thing:

Hope that you found current article to be interesting, the same as I hope will not repeat my mistakes 🙂

In case you want to know more about OpenSearch, Elasticsearch or how to build advanced search systems – then welcome to my courses:

Best regards

architecture AWS cluster cyber-security devops devops-basics docker elasticsearch flask geo high availability java machine learning opensearch php programming languages python recommendation systems search systems spring boot symfony