[GraphRAG] GraphDB와 LLM으로 추천 시스템 만들기

NLP/실습

[GraphRAG] GraphDB와 LLM으로 추천 시스템 만들기

miimu 2025. 10. 30. 20:32

https://www.youtube.com/watch?v=dzQZvebTvKc

추천 시스템 예시 : 사용자 기반

특정 영화를 좋아하는 사용자가 좋아하는 또 다른 영화 추천

= "토이스토리" 영화를 재밌게 봤던 사람은 000 영화도 재밌게 봤어요

MATCH (m:Movie)<-[r:RATED]-(u:User)-[recr:RATED]->(rec:Movie)
WHERE m.title = 'Toy Story' AND r.rating >= 4 AND recr.rating >= 4
RETRUN DISTINCT rec LIMIT 10;

추천 시스템 예시 : 아이템 기반

장르가 겹치는 영화 추천

MATCH (m:Movie)-[:IN_GENRE]->(g:Genre)<-[:IN_GENRE]-(rec:Movie)
WHERE m.title = 'Inception'
WITH rec, collect(g.name) AS genres, count(*) AS commonGenres
RETURN rec.title, genres, commonGenres
ORDER BY commonGenres DESC LIMIT 10;

Inception이라는 영화의 장르와 같은 장르의 영화를 리스트로 표현
리스트 안의 장르의 개수를 세서 commonGenres에 저장
겹치는 장르의 수를 기준으로 ORDER BY해서, 많이 겹칠 수록 먼저 추천

장르, 배우, 감독이 겹치는 영화 추천(점수화)

MATCH (m:Movie) WHERE m.title = "Wizard of Oz, The"
MATCH (m)-[:IN_GENRE]->(g:Genre)<-[:IN_GENRE]-(rec:Movie)

WITH m, rec, count(*) AS gs # 겹치는 장르의 수

OPTIONAL MATCH (m)<-[:ACTED_IN]-(a)-[:ACTED_IN]->(rec)
WITH m, rec, gs, count(a) AS as # 겹치는 배우의 수

OPTIONAL MATCH (m)<-[:DIRECTED]-(d)-[:DIRECTED]->(rec)
WITH m, rec, gs, as, count(d) AS ds # 겹치는 감독의 수 (0 or 1)

RETURN rec.title AS recommendation,
	(5*gs) + (3*as) + (4*ds) AS score
ORDER BY score DESC LIMIT 20;

영화 추천 시스템 구현하기

1. 환경 설정 및 GraphDB 연결하기

!pip install neo4j-graphrag neo4j openai

import json
import os

with open('./drive/MyDrive/실습/251010_GraphRAG/openai_api_key.json') as j :
    json_file = json.load(j)
    j.close()

os.environ["OPENAI_API_KEY"] = json_file['OPENAI_API_KEY']

GraphDB는 sandbox에서 recommendations로 db를 생성한 후 connect via drivers로 드라이버 관련 auth를 설정한다.

https://sandbox.neo4j.com/

Home - Neo4j Sandbox

sandbox.neo4j.com

2. GraphRAG 구현하기

Text2Cypher Retriever로 만든 그래프 쿼리 결과 기반 RAG

from neo4j_graphrag.retrievers import Text2CypherRetriever
from neo4j_graphrag.llm import OpenAILLM

# 쿼리 텍스트 기반으로 Cypher 쿼리문 생성, Retrieval 후 답변을 생성할 때 사용할 LLM
llm = OpenAILLM(model_name="gpt-4o", model_params={"temperature" : 0})

Cypher 자동 생성을 위해 필요한 정보 제공

Neo4j DB Schema
Input / Output(Query) 예시

Node properties:
Person {name: STRING, born: INTEGER}
Movie {tagline: STRING, title: STRING, released: INTEGER}
Relationship properties:
ACTED_IN {roles: LIST}
REVIEWED {summary: STRING, rating: INTEGER}
The relationships:
(:Person)-[:ACTED_IN]->(:Movie)
(:Person)-[:DIRECTED]->(:Movie)
(:Person)-[:PRODUCED]->(:Movie)
(:Person)-[:WROTE]->(:Movie)
(:Person)-[:FOLLOWS]->(:Person)
(:Person)-[:REVIEWED]->(:Movie)

어떤 노드를 가지고 있는지
그 노드의 프로퍼티와, 프로퍼티의 데이터 타입
노드와 노드 사이의 어떤 관계들이 있는지
관계 속에는 어떤 프로퍼티 가지고 있고 어떤 데이터 타입을 가지고 있는지

schema 생성 코드

from neo4j import GraphDatabase
from neo4j.time import Date

def get_node_datatype(value):
    """
        입력된 노드 Value의 데이터 타입을 반환하는 함수
    """
    if isinstance(value, str):
        return "STRING"
    elif isinstance(value, int):
        return "INTEGER"
    elif isinstance(value, float):
        return "FLOAT"
    elif isinstance(value, bool):
        return "BOOLEAN"
    elif isinstance(value, list):
        return f"LIST[{get_node_datatype(value[0])}]" if value else "LIST"
    elif isinstance(value, Date):
        return "DATE"
    else:
        return "UNKNOWN"

def get_schema(uri, user, password):
    """
        Graph DB의 정보를 받아 노드 및 관계의 프로퍼티를 추출하고 스키마 딕셔너리를 반환하는 함수
    """
    driver = GraphDatabase.driver(
        uri,
        auth=basic_auth(user, password))

    with driver.session() as session:
        # 노드 프로퍼티 및 타입 추출
        node_query = """
        MATCH (n)
        WITH DISTINCT labels(n) AS node_labels, keys(n) AS property_keys, n
        UNWIND node_labels AS label
        UNWIND property_keys AS key
        RETURN label, key, n[key] AS sample_value
        """
        nodes = session.run(node_query)

        # 관계 프로퍼티 및 타입 추출
        rel_query = """
        MATCH ()-[r]->()
        WITH DISTINCT type(r) AS rel_type, keys(r) AS property_keys, r
        UNWIND property_keys AS key
        RETURN rel_type, key, r[key] AS sample_value
        """
        relationships = session.run(rel_query)

        # 관계 유형 및 방향 추출
        rel_direction_query = """
        MATCH (a)-[r]->(b)
        RETURN DISTINCT labels(a) AS start_label, type(r) AS rel_type, labels(b) AS end_label
        ORDER BY start_label, rel_type, end_label
        """
        rel_directions = session.run(rel_direction_query)

        # 스키마 딕셔너리 생성
        schema = {"nodes": {}, "relationships": {}, "relations": []}

        for record in nodes:
            label = record["label"]
            key = record["key"]
            sample_value = record["sample_value"] # 데이터 타입을 추론하기 위한 샘플 데이터
            inferred_type = get_node_datatype(sample_value)
            if label not in schema["nodes"]:
                schema["nodes"][label] = {}
            schema["nodes"][label][key] = inferred_type

        for record in relationships:
            rel_type = record["rel_type"]
            key = record["key"]
            sample_value = record["sample_value"] # 데이터 타입을 추론하기 위한 샘플 데이터
            inferred_type = get_node_datatype(sample_value)
            if rel_type not in schema["relationships"]:
                schema["relationships"][rel_type] = {}
            schema["relationships"][rel_type][key] = inferred_type

        for record in rel_directions:
            start_label = record["start_label"][0]
            rel_type = record["rel_type"]
            end_label = record["end_label"][0]
            schema["relations"].append(f"(:{start_label})-[:{rel_type}]->(:{end_label})")

        return schema

def format_schema(schema):
    """
        스키마 딕셔너리를 LLM에 제공하기 위해 원하는 형태로 formatting 하는 함수
    """
    result = []

    # 노드 프로퍼티 출력
    result.append("Node properties:")
    for label, properties in schema["nodes"].items():
        props = ", ".join(f"{k}: {v}" for k, v in properties.items())
        result.append(f"{label} {{{props}}}")

    # 관계 프로퍼티 출력
    result.append("Relationship properties:")
    for rel_type, properties in schema["relationships"].items():
        props = ", ".join(f"{k}: {v}" for k, v in properties.items())
        result.append(f"{rel_type} {{{props}}}")

    # 관계 프로퍼티 출력
    result.append("The relationships:")
    for relation in schema["relations"]:
        result.append(relation)

    return "\n".join(result)

# Neo4j DB Schema 제공
schema = get_schema("neo4j://44.204.122.61:7687", "neo4j", "catalog-associates-blows")
neo4j_schema = format_schema(schema)
print(neo4j_schema)

Retriever 예시 작성

사용자 입력 : Which actors starred in the Toy Story?(Toy Story 영화에 어떤 배우들이 출연했나요?)
자동 생성 Cypher 예시 : MATCH (a:Actor)-[:ACTED_IN]->(m:Movie) WHERE m.title = "Toy Story" Return a.name

LLM query 생성 예시 작성

# LLM INPUT / QUERY 예시 제공
examples = [
    "USER INPUT: 'Toy Story에 어떤 배우들이 출연하나요?' QUERY: MATCH (a:Actor)-[:ACTED_IN]->(m:Movie) WHERE m.title = 'Toy Story' RETURN a.name",
    "USER INPUT: 'Toy Story의 평균 평점은 몇점인가요?' QUERY: MATCH (u:User)-[r:RATED]->(m:Movie) WHERE m.title = 'Toy Story' RETURN AVG(r.rating)",

    """USER INPUT: '저는 Toy Story 영화를 좋아합니다. Toy Story를 재밌게 본 사람은 또 어떤 영화를 재밌게 봤나요?'
    QUERY: MATCH (m:Movie)<-[r:RATED]-(u:User)-[recr:RATED]->(userBasedRec:Movie)
    WHERE m.title = 'Toy Story' AND r.rating >= 4 AND recr.rating >= 4
    WITH userBasedRec, COUNT(recr) AS recCount, AVG(recr.rating) AS avgRating
    ORDER BY avgRating DESC, recCount DESC
    RETURN DISTINCT userBasedRec.title, avgRating, recCount
    LIMIT 10
    """,

    """USER INPUT: '저는 'Wizard of Oz, The' 와 같은 영화를 좋아합니다. 이 영화와 비슷한 영화 추천해줄 수 있나요?',
    QUERY: MATCH (m:Movie) WHERE m.title = 'Wizard of Oz, The'
    MATCH (m)-[:IN_GENRE]->(g:Genre)<-[:IN_GENRE]-(rec:Movie)
    WITH m, rec, count(*) AS gs

    OPTIONAL MATCH (m)<-[:ACTED_IN]-(a)-[:ACTED_IN]->(rec)
    WITH m, rec, gs, count(a) AS as

    OPTIONAL MATCH (m)<-[:DIRECTED]-(d)-[:DIRECTED]->(rec)
    WITH m, rec, gs, as, count(d) AS ds

    RETURN rec.title AS recommendation,
            rec.poster AS rec_poster,
            gs AS genre_similarity,
            as AS actor_similarity,
            ds AS director_similarity,
           (5*gs)+(3*as)+(4*ds) AS score
    ORDER BY score DESC LIMIT 10
    """,

    """USER INPUT: '영화 'Inception'과 비슷한 장르 혹은 비슷한 분위기의 영화를 추천해주세요.'
    QUERY: MATCH (m:Movie)-[:IN_GENRE]->(g:Genre)<-[:IN_GENRE]-(rec:Movie)
    WHERE m.title = 'Inception' WITH rec, collect(g.name) AS genres, count(*) AS commonGenres
    RETURN rec.title, genres, commonGenres ORDER BY commonGenres DESC LIMIT 10;"""
]

Text2CypherRetriever 생성

# Text2CypherRetriever
retriever = Text2CypherRetriever(
    driver=driver,
    llm=llm,  # type: ignore
    neo4j_schema=neo4j_schema,
    examples=examples,
)

# LLM을 통해 Cypher 쿼리를 생성하여 Neo4j DB에 보내고, 그 결과를 반환 => 이 결과는 RAG에 활용됨
query_text = "Tom Hanks 가 어떤 영화에 출연했나요?"
search_result = retriever.search(query_text=query_text)

Retriever 기반 RAG 생성

from neo4j_graphrag.generation import GraphRAG
# RAG 파이프라인 초기화
rag = GraphRAG(retriever=retriever, llm=llm)

retriever : 위에서 지정한 Text2CypherRetriever
llm : 사용하기로 한 'gpt-4o'

질문하기

# 질문하기
query_text = "Titanic과 비슷한 장르의 영화 추천해주세용."

response = rag.search(query_text=query_text, return_context = True)
print("==== [Text2Cypher 를 통해 자동생성한 Cypher] ====")
print(response.retriever_result.metadata['cypher'])
print("\n==== [생성된 Cypher를 기반으로 최종답변생성] ====")
print(response.answer)

# 질문하기
query_text = "Toy Story와 The Godfather 영화를 좋아하는 사람은 또 어떤 영화를 좋아하나요?"

response = rag.search(query_text=query_text, return_context = True)
print("==== [Text2Cypher 를 통해 자동생성한 Cypher] ====")
print(response.retriever_result.metadata['cypher'])
print("\n==== [생성된 Cypher를 기반으로 최종답변생성] ====")
print(response.answer)

동영상의 neo4j_genai 패키지를 사용하니, RagInitializationError? ValidationError가 떴다.

neo4j_graphrag 패키지를 사용할 것.(25.10.30 기준)

'NLP > 실습' 카테고리의 다른 글

[GraphRAG] Neo4j 생성형 AI 패키지로 영화 줄거리 검색 엔진 만들기 (0)	2025.10.14
[GraphRAG] Graph 생성하기 with neo4j ERExtractionTemplate (0)	2025.10.10
Agent AI 실습3 : IBM watsonx orchestrate - Intelligent assistant (0)	2025.09.21
Agent AI 실습2 : IBM watsonx orchestrate - HR Agent (0)	2025.09.21
Agent AI 실습1 : IBM watsonx Orchestrate - 마케팅 에이전트 (0)	2025.09.21

현재글[GraphRAG] GraphDB와 LLM으로 추천 시스템 만들기

와구와구

프로그래머스, LLM, graphrag, 다이나믹프로그래밍, Agent AI, DFS, Rag, LangChain, 격자그래프, 트리, BFS, 분할정복, nlp, 그래프탐색, 트리에서의다이나믹프로그래밍, 그래프이론, lv. 1, HTML, 스위핑, Git,

Today :
Yesterday :

일	월	화	수	목	금	토
	1	2	3	4	5	6
7	8	9	10	11	12	13
14	15	16	17	18	19	20
21	22	23	24	25	26	27
28	29	30	31

와구와구