Elasticsearch: Unexpected interaction between synonym_graph and stop filter in custom analyzer

Description

I'm tring to query with multi words synonym including a stop word. Let's start with an exemple to explain.

I've got the following documents into a index.

  • foo
  • bar
  • foo bar
  • foo of bar
  • fb

Expected result with the query {"query":{"match":{"test":{"query":"foo of bar"}}}} is to return documents:

  • foo bar
  • foo of bar
  • fb

configuration

In this exemple, I got 2 filters:

  • stop: will remove the token of
  • synonym_graph: handle synonymes fb, foo bar, foo of bar

Mapping

{
  "properties": {
    "test": {
      "type": "text",
      "analyzer": "test_index_analyzer",
      "search_analyzer": "test_search_analyzer"
    }
}

Settings

{
    "settings" : {
        "index": {
            "number_of_shards": 1,
            "number_of_replicas": 0,
            "analysis": {
                "analyzer": {
                    "test_index_analyzer": {
                        "type": "custom",
                        "tokenizer": "whitespace",
                        "filter": [
                            "english_stop"
                        ]
                    },
                    "test_search_analyzer": {
                        "type": "custom",
                        "tokenizer": "whitespace",
                        "filter": [
                            "english_stop",
                            "english_syn"
                        ]
                    }
                },
                "filter": { 
                    "english_stop": {
                        "type": "stop",
                        "stopwords": "_english_",
                        "ignore_case": true,
                        "remove_trailing": false
                    },
                    "english_syn": {
                        "type": "synonym_graph",
                        "synonyms": [
                            "fb,foo of bar",
                            "fb,foo bar"
                        ]
                    }
                }
            }
        }
    }
}

Result

token format: "token,start_offset-end_offset,type / position / positionLength"

Query Search Result index analysys Search analysys
fb fb fb,0-2,word,0,1 foo,0-2,SYNONYM / 0 / 1
foo,0-2,SYNONYM / 0 / 3
fb,0-2,word / 0 / 4
bar,0-2,SYNONYM / 2 / 2
bar,0-2,SYNONYM / 3 / 1
foo of bar fb foo,0-3,word,0,1
bar,7-10,word,2,1
fb,0-10,SYNONYM / 0 / 3
foo,0-3,word / 0 / 1
bar,7-10,word / 2 / 1
foo bar fb,foo bar foo,0-3,word,0,1
bar,4-7,word,1,1
fb,0-7,SYNONYM / 0 / 2
foo,0-3,word / 0 / 1
bar,4-7,word / 1 / 1

All search expect to return the 3 lines:

  • fb
  • foo bar
  • foo of bar

Note: foo of bar is never returned

My guess is than foo of bar got indexed with position [foo, ,bar] by the stop filter and synonym is looking for [foo, bar].

Do you have any advice to reach my goal ?



Read more here: https://stackoverflow.com/questions/67000708/elasticsearch-unexpected-interaction-between-synonym-graph-and-stop-filter-in-c

Content Attribution

This content was originally published by Joan at Recent Questions - Stack Overflow, and is syndicated here via their RSS feed. You can read the original post over there.

%d bloggers like this: