kor_search

Korean text search extension for PostgreSQL.

Description

kor_search is a PostgreSQL extension that provides text search functionality between Korean and English. Developed without reliance on external APIs like translators or morphological analyzers, it is optimized for word-based searches. While it supports sentence-to-sentence searches to some extent, it is primarily designed for word searches. This extension can be used in environments where PostgreSQL is installed, and it also provides functions that can be used in environments where external extensions are restricted, such as RDS.

Key Features

  • LIKE Search: Checks if the input text matches or includes the specified search term.
  • tsvector Search: Converts text to tsvector format to support similar word searches.
  • Regex Search: Provides regex search functionality for complex search conditions.
  • Similarity Search: Evaluates sentence similarity based on a dictionary of synonyms.

Flexible Custom Search

kor_search allows you to modify internal table data to provide search tailored to specific industries or fields. For example, in the case of \xBB\xEA\xBE\xF7\xC0\xC7 \xBF\xAA\xB1\xBA, the extension has been customized for searches related to construction by applying a large amount of construction-related data.

Performance Considerations

Performance analysis is essential when querying large amounts of data, as search speed can be affected by the size of the dictionary. While functions similar to the extension are implemented for environments where external extensions are restricted (such as RDS), they may not perform as well as the extension itself.

Usage Example:

  1. Sentence Similarity Check:

    -- '\xB9\xE4 \xB8Դ\xD9' is semantically similar to 'I eat rice', so TRUE is expected
    SELECT kor_search_similar('I eat rice', '\xB9\xE4 \xB8Դ\xD9');  -- Result: true
    
    -- '\xBC\xAD\xBF\xEF \xBB\xEC\xB4\xD9' is semantically similar to 'She lives in Seoul', so TRUE is expected
    SELECT kor_search_similar('She lives in Seoul', '\xBC\xAD\xBF\xEF \xBB\xEC\xB4\xD9');  -- Result: true
    
    -- '\xC2\xF7\xB0\xA1 \xBA\xFC\xB8\xA3\xB4\xD9' is semantically similar to 'The car is fast', so TRUE is expected
    SELECT kor_search_similar('The car is fast', '\xC2\xF7\xB0\xA1 \xBA\xFC\xB8\xA3\xB4\xD9');  -- Result: true
    

kor_like

  • kor_like(input_text text, search_text text): Checks if the synonyms corresponding to search_text are included in input_text using a LIKE query.

Usage Example:

  1. Word Inclusion Check:

    -- Search for 'lg' keyword with '\xBF\xA4\xC1\xF6', '\xBE\xD9\xC1\xF6'
    SELECT kor_like('\xC0̰\xCD\xC0\xBA \xBF\xA4\xC1\xF6 \xC1\xA6ǰ\xC0Դϴ\xD9', 'lg');  -- Result: true
    SELECT kor_like('\xC0̰\xCD\xC0\xBA LG \xC1\xA6ǰ\xC0Դϴ\xD9', '\xBF\xA4\xC1\xF6');  -- Result: true
    
    -- Search for 'apple' keyword with '\xBE\xD6\xC7\xC3', '\xBB\xE7\xB0\xFA'
    SELECT kor_like('\xBE\xD6\xC7\xC3\xC0\xBA \xC8Ǹ\xA2\xC7\xD1 \xB0\xFA\xC0\xCF\xC0Դϴ\xD9', 'apple');  -- Result: true
    SELECT kor_like('\xBB\xE7\xB0\xFA\xB8\xA6 \xC1\xC1\xBE\xC6\xC7մϴ\xD9', 'apple');  -- Result: true
    SELECT kor_like('Apple\xC0\xBA \xB0\xFA\xC0\xCF\xC0Դϴ\xD9', '\xBB\xE7\xB0\xFA');  -- Result: true
    

kor_search_tsvector

  • kor_search_tsvector(input_text text, search_text text): Checks if the synonyms corresponding to search_text are included in the tsvector of input_text.

Usage Example:

  1. Search for Similar Words Using tsvector:

    -- Search for 'data science' keyword with '\xB5\xA5\xC0\xCC\xC5\xCD \xB0\xFA\xC7\xD0', '\xB5\xA5\xC0\xCC\xC5\xCD \xBB\xE7\xC0̾\xF0\xBD\xBA'
    SELECT kor_search_tsvector('\xB5\xA5\xC0\xCC\xC5\xCD \xB0\xFA\xC7\xD0\xC0\xBA \xB9\xCC\xB7\xA1\xC0\xC7 \xC0\xAF\xB8\xC1\xC7\xD1 \xBAо\xDF\xC0Դϴ\xD9', 'data science');  -- Result: true
    SELECT kor_search_tsvector('\xB5\xA5\xC0\xCC\xC5\xCD \xBB\xE7\xC0̾𽺴\xC2 \xB8\xB9\xC0\xBA \xB0\xA1\xB4\xC9\xBC\xBA\xC0\xBB \xC1\xA6\xB0\xF8\xC7մϴ\xD9', 'data science');  -- Result: true
    SELECT kor_search_tsvector('Data Science\xB4\xC2 \xB8\xB9\xC0\xBA \xB0\xA1\xB4\xC9\xBC\xBA\xC0\xBB \xC1\xA6\xB0\xF8\xC7մϴ\xD9', '\xB5\xA5\xC0\xCC\xC5\xCD \xB0\xFA\xC7\xD0');  -- Result: true
    
    -- Search for 'machine learning' keyword with '\xB8ӽŷ\xAF\xB4\xD7', '\xB1\xE2\xB0\xE8\xC7н\xC0'
    SELECT kor_search_tsvector('\xB8ӽŷ\xAF\xB4\xD7 \xB1\xE2\xBC\xFA\xC0\xCC \xB9\xDF\xC0\xFC\xC7ϰ\xED \xC0ֽ\xC0\xB4ϴ\xD9', 'machine learning');  -- Result: true
    SELECT kor_search_tsvector('\xB1\xE2\xB0\xE8\xC7н\xC0 \xBE˰\xED\xB8\xAE\xC1\xF2\xC0\xBB \xBF\xAC\xB1\xB8\xC7մϴ\xD9', 'machine learning');  -- Result: true
    SELECT kor_search_tsvector('Machine Learning \xBE˰\xED\xB8\xAE\xC1\xF2\xC0\xBB \xBF\xAC\xB1\xB8\xC7մϴ\xD9', '\xB1\xE2\xB0\xE8\xC7н\xC0');  -- Result: true
    
  • kor_regex_search(input_text text, pattern text): Checks if a regex pattern matches the input_text.

Usage Example:

  1. Regex Search:

    -- Search for specific word patterns using regex
    SELECT kor_regex_search('\xC0ڹٴ\xC2 \xB0\xAD\xB7\xC2\xC7\xD1 \xBE\xF0\xBE\xEE\xC0Դϴ\xD9', '\xC0ڹ\xD9|\xC6\xC4\xC0̽\xE3');  -- Result: true
    SELECT kor_regex_search('\xC6\xC4\xC0̽\xE3\xC0\xBA \xB9\xE8\xBF\xEC\xB1\xE2 \xBD\xAC\xBF\xEE \xBE\xF0\xBE\xEE\xC0Դϴ\xD9', '\xC0ڹ\xD9|\xC6\xC4\xC0̽\xE3');  -- Result: true
    SELECT kor_regex_search('JAVA\xBF\xCD PYTHON\xC0\xBA \xC0α\xE2\xC0ִ\xC2 \xBE\xF0\xBE\xEE\xC0Դϴ\xD9', '(?i)\xC0ڹ\xD9|\xC6\xC4\xC0̽\xE3');  -- Result: true
    
    -- Search for 'big data' and '\xB4\xEB\xBF뷮 \xB5\xA5\xC0\xCC\xC5\xCD' using regex
    SELECT kor_regex_search('\xBA\xF2\xB5\xA5\xC0\xCC\xC5\xCD \xBA\xD0\xBC\xAE\xC0\xCC \xC1߿\xE4\xC7մϴ\xD9', '\xBA\xF2\xB5\xA5\xC0\xCC\xC5\xCD|\xB4\xEB\xBF뷮 \xB5\xA5\xC0\xCC\xC5\xCD');  -- Result: true
    SELECT kor_regex_search('\xB4\xEB\xBF뷮 \xB5\xA5\xC0\xCC\xC5͸\xA6 ó\xB8\xAE\xC7մϴ\xD9', '\xBA\xF2\xB5\xA5\xC0\xCC\xC5\xCD|\xB4\xEB\xBF뷮 \xB5\xA5\xC0\xCC\xC5\xCD');  -- Result: true
    SELECT kor_regex_search('Big Data\xB4\xC2 \xC7\xF6\xB4\xEB \xB1\xE2\xBC\xFA\xC0\xC7 \xC7ٽ\xC9\xC0Դϴ\xD9', '(?i)\xBA\xF2\xB5\xA5\xC0\xCC\xC5\xCD|\xB4\xEB\xBF뷮 \xB5\xA5\xC0\xCC\xC5\xCD');  -- Result: true
    

Managing the Word Conversion Table

You can add new keywords and synonyms to the word conversion table. This allows for implementing custom searches tailored to specific industries or business needs. For example, to add a synonym for the ‘apple’ keyword, do the following:

INSERT INTO kor_search_word_transform (keyword)
VALUES ('apple');

INSERT INTO kor_search_word_synonyms (keyword_id, synonym)
VALUES ((SELECT id FROM kor_search_word_transform WHERE keyword = 'apple'), '\xBE\xD6\xC7\xC3');