Contents
kor_search
Korean text search extension for PostgreSQL.
Description
kor_search
is a PostgreSQL extension that provides text search functionality between Korean and English. Developed without reliance on external APIs like translators or morphological analyzers, it is optimized for word-based searches. While it supports sentence-to-sentence searches to some extent, it is primarily designed for word searches. This extension can be used in environments where PostgreSQL is installed, and it also provides functions that can be used in environments where external extensions are restricted, such as RDS.
Key Features
- LIKE Search: Checks if the input text matches or includes the specified search term.
- tsvector Search: Converts text to tsvector format to support similar word searches.
- Regex Search: Provides regex search functionality for complex search conditions.
- Similarity Search: Evaluates sentence similarity based on a dictionary of synonyms.
Flexible Custom Search
kor_search
allows you to modify internal table data to provide search tailored to specific industries or fields. For example, in the case of \xBB\xEA\xBE\xF7\xC0\xC7 \xBF\xAA\xB1\xBA, the extension has been customized for searches related to construction by applying a large amount of construction-related data.
Performance Considerations
Performance analysis is essential when querying large amounts of data, as search speed can be affected by the size of the dictionary. While functions similar to the extension are implemented for environments where external extensions are restricted (such as RDS), they may not perform as well as the extension itself.
Usage Example:
-
Sentence Similarity Check:
-- '\xB9\xE4 \xB8Դ\xD9' is semantically similar to 'I eat rice', so TRUE is expected SELECT kor_search_similar('I eat rice', '\xB9\xE4 \xB8Դ\xD9'); -- Result: true -- '\xBC\xAD\xBF\xEF \xBB\xEC\xB4\xD9' is semantically similar to 'She lives in Seoul', so TRUE is expected SELECT kor_search_similar('She lives in Seoul', '\xBC\xAD\xBF\xEF \xBB\xEC\xB4\xD9'); -- Result: true -- '\xC2\xF7\xB0\xA1 \xBA\xFC\xB8\xA3\xB4\xD9' is semantically similar to 'The car is fast', so TRUE is expected SELECT kor_search_similar('The car is fast', '\xC2\xF7\xB0\xA1 \xBA\xFC\xB8\xA3\xB4\xD9'); -- Result: true
kor_like
kor_like(input_text text, search_text text)
: Checks if the synonyms corresponding tosearch_text
are included ininput_text
using a LIKE query.
Usage Example:
-
Word Inclusion Check:
-- Search for 'lg' keyword with '\xBF\xA4\xC1\xF6', '\xBE\xD9\xC1\xF6' SELECT kor_like('\xC0̰\xCD\xC0\xBA \xBF\xA4\xC1\xF6 \xC1\xA6ǰ\xC0Դϴ\xD9', 'lg'); -- Result: true SELECT kor_like('\xC0̰\xCD\xC0\xBA LG \xC1\xA6ǰ\xC0Դϴ\xD9', '\xBF\xA4\xC1\xF6'); -- Result: true -- Search for 'apple' keyword with '\xBE\xD6\xC7\xC3', '\xBB\xE7\xB0\xFA' SELECT kor_like('\xBE\xD6\xC7\xC3\xC0\xBA \xC8Ǹ\xA2\xC7\xD1 \xB0\xFA\xC0\xCF\xC0Դϴ\xD9', 'apple'); -- Result: true SELECT kor_like('\xBB\xE7\xB0\xFA\xB8\xA6 \xC1\xC1\xBE\xC6\xC7մϴ\xD9', 'apple'); -- Result: true SELECT kor_like('Apple\xC0\xBA \xB0\xFA\xC0\xCF\xC0Դϴ\xD9', '\xBB\xE7\xB0\xFA'); -- Result: true
kor_search_tsvector
kor_search_tsvector(input_text text, search_text text)
: Checks if the synonyms corresponding tosearch_text
are included in the tsvector ofinput_text
.
Usage Example:
-
Search for Similar Words Using tsvector:
-- Search for 'data science' keyword with '\xB5\xA5\xC0\xCC\xC5\xCD \xB0\xFA\xC7\xD0', '\xB5\xA5\xC0\xCC\xC5\xCD \xBB\xE7\xC0̾\xF0\xBD\xBA' SELECT kor_search_tsvector('\xB5\xA5\xC0\xCC\xC5\xCD \xB0\xFA\xC7\xD0\xC0\xBA \xB9\xCC\xB7\xA1\xC0\xC7 \xC0\xAF\xB8\xC1\xC7\xD1 \xBAо\xDF\xC0Դϴ\xD9', 'data science'); -- Result: true SELECT kor_search_tsvector('\xB5\xA5\xC0\xCC\xC5\xCD \xBB\xE7\xC0̾\xC2 \xB8\xB9\xC0\xBA \xB0\xA1\xB4\xC9\xBC\xBA\xC0\xBB \xC1\xA6\xB0\xF8\xC7մϴ\xD9', 'data science'); -- Result: true SELECT kor_search_tsvector('Data Science\xB4\xC2 \xB8\xB9\xC0\xBA \xB0\xA1\xB4\xC9\xBC\xBA\xC0\xBB \xC1\xA6\xB0\xF8\xC7մϴ\xD9', '\xB5\xA5\xC0\xCC\xC5\xCD \xB0\xFA\xC7\xD0'); -- Result: true -- Search for 'machine learning' keyword with '\xB8ӽŷ\xAF\xB4\xD7', '\xB1\xE2\xB0\xE8\xC7н\xC0' SELECT kor_search_tsvector('\xB8ӽŷ\xAF\xB4\xD7 \xB1\xE2\xBC\xFA\xC0\xCC \xB9\xDF\xC0\xFC\xC7ϰ\xED \xC0ֽ\xC0\xB4ϴ\xD9', 'machine learning'); -- Result: true SELECT kor_search_tsvector('\xB1\xE2\xB0\xE8\xC7н\xC0 \xBE˰\xED\xB8\xAE\xC1\xF2\xC0\xBB \xBF\xAC\xB1\xB8\xC7մϴ\xD9', 'machine learning'); -- Result: true SELECT kor_search_tsvector('Machine Learning \xBE˰\xED\xB8\xAE\xC1\xF2\xC0\xBB \xBF\xAC\xB1\xB8\xC7մϴ\xD9', '\xB1\xE2\xB0\xE8\xC7н\xC0'); -- Result: true
kor_regex_search
kor_regex_search(input_text text, pattern text)
: Checks if a regex pattern matches theinput_text
.
Usage Example:
-
Regex Search:
-- Search for specific word patterns using regex SELECT kor_regex_search('\xC0ڹٴ\xC2 \xB0\xAD\xB7\xC2\xC7\xD1 \xBE\xF0\xBE\xEE\xC0Դϴ\xD9', '\xC0ڹ\xD9|\xC6\xC4\xC0̽\xE3'); -- Result: true SELECT kor_regex_search('\xC6\xC4\xC0̽\xE3\xC0\xBA \xB9\xE8\xBF\xEC\xB1\xE2 \xBD\xAC\xBF\xEE \xBE\xF0\xBE\xEE\xC0Դϴ\xD9', '\xC0ڹ\xD9|\xC6\xC4\xC0̽\xE3'); -- Result: true SELECT kor_regex_search('JAVA\xBF\xCD PYTHON\xC0\xBA \xC0α\xE2\xC0ִ\xC2 \xBE\xF0\xBE\xEE\xC0Դϴ\xD9', '(?i)\xC0ڹ\xD9|\xC6\xC4\xC0̽\xE3'); -- Result: true -- Search for 'big data' and '\xB4\xEB\xBF뷮 \xB5\xA5\xC0\xCC\xC5\xCD' using regex SELECT kor_regex_search('\xBA\xF2\xB5\xA5\xC0\xCC\xC5\xCD \xBA\xD0\xBC\xAE\xC0\xCC \xC1߿\xE4\xC7մϴ\xD9', '\xBA\xF2\xB5\xA5\xC0\xCC\xC5\xCD|\xB4\xEB\xBF뷮 \xB5\xA5\xC0\xCC\xC5\xCD'); -- Result: true SELECT kor_regex_search('\xB4\xEB\xBF뷮 \xB5\xA5\xC0\xCC\xC5\xA6 ó\xB8\xAE\xC7մϴ\xD9', '\xBA\xF2\xB5\xA5\xC0\xCC\xC5\xCD|\xB4\xEB\xBF뷮 \xB5\xA5\xC0\xCC\xC5\xCD'); -- Result: true SELECT kor_regex_search('Big Data\xB4\xC2 \xC7\xF6\xB4\xEB \xB1\xE2\xBC\xFA\xC0\xC7 \xC7ٽ\xC9\xC0Դϴ\xD9', '(?i)\xBA\xF2\xB5\xA5\xC0\xCC\xC5\xCD|\xB4\xEB\xBF뷮 \xB5\xA5\xC0\xCC\xC5\xCD'); -- Result: true
Managing the Word Conversion Table
You can add new keywords and synonyms to the word conversion table. This allows for implementing custom searches tailored to specific industries or business needs. For example, to add a synonym for the ‘apple’ keyword, do the following:
INSERT INTO kor_search_word_transform (keyword)
VALUES ('apple');
INSERT INTO kor_search_word_synonyms (keyword_id, synonym)
VALUES ((SELECT id FROM kor_search_word_transform WHERE keyword = 'apple'), '\xBE\xD6\xC7\xC3');