How to use LLMs as a learning tool for Data Engineers
There are varying opinions on LLMs. I take it as a disrupting technology – a tool that can be used, misused or misapplied. Disruptive technologies tend to be over-hyped in the short-term and under-appreciated over longer horizons (Read How Innovation Works if you want to know more). LLMs in pedagogy are most exciting to me. I attest to its efficacy.
One of the demands of Data Engineering is to learn new tech quickly. I want to show you how I use LLMs towards this end.
A Framework for Skill Acquisition
Imagine you’re a Data Engineer, tasked with deploying a Memgraph graph database for a new client. The database will serve as a model for fraud detection, for an insurance company.
The theme of self-evaluation (mentioned in my previous blogs) continues. The Dreyfus model is one methodology to use:
I’d be a Novice if I’ve only heard about graph databases, or played around with them for exploration; An Advanced Beginner has used graph databases as part of his/her professional life; a Competent person can use graph databases for the correct purpose and efficiently etc.
Here are some key observations:
- There is no shame in being a novice in any field. I am a novice in many areas and view it as an exciting challenge to learn something new
- There’s a difference between being an expert and thinking you’re one. I have to repeat the importance of self-honesty in your evaluation
- Reaching a level of Competent is sufficient for most endeavours
- Try to objectively evaluate your level of expertise in a field if you find self-evaluation challenging.
Enter LLMs
That is where LLMs come into play. Here is a PDF I generated with the prompts in this link:
Dreyfus_Model_Graph_Database_Self_AssessmentYou can download the file here:
The ability to generate a bespoke self-evaluation framework – for what I’m trying to learn – already demonstrates the power of this approach.
Interactive Learning
After completing the self-evaluation framework, I’ve determined I’m at a Novice level. I ask the LLM to base its teaching on my current level. Here is the prompt I used:
I am at a Novice level. Teach me the basics of Graph DBs, using Memgraph. Use examples and links to references, where appropriate.
Basics of Graph Databases Using Memgraph for Fraud Detection in an Insurance Company
Key Concepts
- Nodes: Represent entities such as customers, policies, claims, and agents.
- Edges: Represent relationships such as ownership, submission, approval, and connections between entities.
- Properties: Key-value pairs associated with nodes or edges.
Getting Started with Memgraph
Installation
To install Memgraph, you can follow the instructions on the official Memgraph Installation Guide.
Basic Operations
- Starting Memgraph After installation, start Memgraph by running:
mgconsole
- Creating Nodes and Relationships
- Create Nodes
CREATE (c1:Customer {name: 'Alice', customerID: 'C001'}), (c2:Customer {name: 'Bob', customerID: 'C002'}), (p1:Policy {policyID: 'P001', type: 'Auto'}), (p2:Policy {policyID: 'P002', type: 'Home'}), (cl1:Claim {claimID: 'CL001', amount: 5000}), (cl2:Claim {claimID: 'CL002', amount: 15000}), (a1:Agent {name: 'Charlie', agentID: 'A001'}), (a2:Agent {name: 'Dana', agentID: 'A002'});
- Create Relationships
cypher MATCH (c1:Customer {name: 'Alice'}), (c2:Customer {name: 'Bob'}), (p1:Policy {policyID: 'P001'}), (p2:Policy {policyID: 'P002'}), (cl1:Claim {claimID: 'CL001'}), (cl2:Claim {claimID: 'CL002'}), (a1:Agent {name: 'Charlie'}), (a2:Agent {name: 'Dana'}) CREATE (c1)-[:OWNS]->(p1), (c2)-[:OWNS]->(p2), (c1)-[:SUBMITTED]->(cl1), (c2)-[:SUBMITTED]->(cl2), (a1)-[:APPROVED]->(cl1), (a2)-[:APPROVED]->(cl2), (c1)-[:CONNECTED_TO {type: 'Friend'}]->(c2);
- Querying Data
- Retrieve Customers and Policies
MATCH (c:Customer)-[:OWNS]->(p:Policy) RETURN c.name, p.policyID, p.type;
- Retrieve Claims and Approval Agents
cypher MATCH (cl:Claim)<-[:SUBMITTED]-(c:Customer), (a:Agent)-[:APPROVED]->(cl) RETURN c.name, cl.claimID, cl.amount, a.name;
- Updating Data
- Update a Node
MATCH (c:Customer {name: 'Alice'}) SET c.email = 'alice@example.com';
- Update a Relationship
cypher MATCH (c1:Customer {name: 'Alice'})-[r:CONNECTED_TO]->(c2:Customer {name: 'Bob'}) SET r.since = 2022;
- Deleting Data
- Delete a Node
MATCH (cl:Claim {claimID: 'CL002'}) DELETE cl;
- Delete a Relationship
cypher MATCH (c1:Customer {name: 'Alice'})-[r:CONNECTED_TO]->(c2:Customer {name: 'Bob'}) DELETE r;
Examples
Example 1: Detecting Suspicious Claims
- Create Nodes and Relationships
CREATE (c1:Customer {name: 'Alice', customerID: 'C001'}),
(c2:Customer {name: 'Bob', customerID: 'C002'}),
(p1:Policy {policyID: 'P001', type: 'Auto'}),
(p2:Policy {policyID: 'P002', type: 'Home'}),
(cl1:Claim {claimID: 'CL001', amount: 5000, status: 'Pending'}),
(cl2:Claim {claimID: 'CL002', amount: 15000, status: 'Approved'}),
(a1:Agent {name: 'Charlie', agentID: 'A001'}),
(a2:Agent {name: 'Dana', agentID: 'A002'});
MATCH (c1:Customer {name: 'Alice'}), (c2:Customer {name: 'Bob'}),
(p1:Policy {policyID: 'P001'}), (p2:Policy {policyID: 'P002'}),
(cl1:Claim {claimID: 'CL001'}), (cl2:Claim {claimID: 'CL002'}),
(a1:Agent {name: 'Charlie'}), (a2:Agent {name: 'Dana'})
CREATE (c1)-[:OWNS]->(p1),
(c2)-[:OWNS]->(p2),
(c1)-[:SUBMITTED]->(cl1),
(c2)-[:SUBMITTED]->(cl2),
(a1)-[:APPROVED]->(cl1),
(a2)-[:APPROVED]->(cl2),
(c1)-[:CONNECTED_TO {type: 'Friend'}]->(c2);
- Query to Identify Suspicious Claims
MATCH (c1:Customer)-[:SUBMITTED]->(cl:Claim {status: 'Pending'}),
(c2:Customer)-[:CONNECTED_TO]->(c1)
RETURN c1.name, cl.claimID, cl.amount, c2.name;
Example 2: Identifying Potential Fraud Rings
- Create Nodes and Relationships
CREATE (c1:Customer {name: 'Alice', customerID: 'C001'}),
(c2:Customer {name: 'Bob', customerID: 'C002'}),
(c3:Customer {name: 'Eve', customerID: 'C003'}),
(p1:Policy {policyID: 'P001', type: 'Auto'}),
(p2:Policy {policyID: 'P002', type: 'Home'}),
(p3:Policy {policyID: 'P003', type: 'Health'}),
(cl1:Claim {claimID: 'CL001', amount: 5000}),
(cl2:Claim {claimID: 'CL002', amount: 15000}),
(cl3:Claim {claimID: 'CL003', amount: 25000}),
(a1:Agent {name: 'Charlie', agentID: 'A001'}),
(a2:Agent {name: 'Dana', agentID: 'A002'});
MATCH (c1:Customer {name: 'Alice'}), (c2:Customer {name: 'Bob'}),
(c3:Customer {name: 'Eve'}),
(p1:Policy {policyID: 'P001'}), (p2:Policy {policyID: 'P002'}),
(p3:Policy {policyID: 'P003'}),
(cl1:Claim {claimID: 'CL001'}), (cl2:Claim {claimID: 'CL002'}), (cl3:Claim {claimID: 'CL003'}),
(a1:Agent {name: 'Charlie'}), (a2:Agent {name: 'Dana'})
CREATE (c1)-[:OWNS]->(p1),
(c2)-[:OWNS]->(p2),
(c3)-[:OWNS]->(p3),
(c1)-[:SUBMITTED]->(cl1),
(c2)-[:SUBMITTED]->(cl2),
(c3)-[:SUBMITTED]->(cl3),
(a1)-[:APPROVED]->(cl1),
(a2)-[:APPROVED]->(cl2),
(a1)-[:APPROVED]->(cl3),
(c1)-[:CONNECTED_TO {type: 'Friend'}]->(c2),
(c2)-[:CONNECTED_TO {type: 'Friend'}]->(c3);
- Query to Identify Potential Fraud Rings
MATCH (c1:Customer)-[:CONNECTED_TO]->(c2:Customer)-[:CONNECTED_TO]->(c3:Customer)
WHERE (c1)-[:SUBMITTED]->(:Claim)<-[:APPROVED]-(a:Agent)-[:APPROVED]->(:Claim)<-[:SUBMITTED]-(c3)
RETURN c1.name, c2.name, c3.name, a.name;
References
By understanding and using these examples, you can start working with Memgraph to detect fraud in an insurance company context. As you gain more experience, explore advanced topics such as graph algorithms for anomaly detection, real-time analytics
The response from Chat-GPT is already helpful but I can make it even more powerful by incorporating my frame of reference. Remember we need to use it in an insurance company for fraud detection:
Change the examples to be based on an Insurance company that wants to use Memgraph for fraud detection
Basics of Graph Databases Using Memgraph for Fraud Detection in an Insurance Company
Key Concepts
- Nodes: Represent entities such as customers, policies, claims, and agents.
- Edges: Represent relationships such as ownership, submission, approval, and connections between entities.
- Properties: Key-value pairs associated with nodes or edges.
Getting Started with Memgraph
Installation
To install Memgraph, you can follow the instructions on the official Memgraph Installation Guide.
Basic Operations
- Starting Memgraph After installation, start Memgraph by running:
mgconsole
- Creating Nodes and Relationships
- Create Nodes
CREATE (c1:Customer {name: 'Alice', customerID: 'C001'}), (c2:Customer {name: 'Bob', customerID: 'C002'}), (p1:Policy {policyID: 'P001', type: 'Auto'}), (p2:Policy {policyID: 'P002', type: 'Home'}), (cl1:Claim {claimID: 'CL001', amount: 5000}), (cl2:Claim {claimID: 'CL002', amount: 15000}), (a1:Agent {name: 'Charlie', agentID: 'A001'}), (a2:Agent {name: 'Dana', agentID: 'A002'});
- Create Relationships
cypher MATCH (c1:Customer {name: 'Alice'}), (c2:Customer {name: 'Bob'}), (p1:Policy {policyID: 'P001'}), (p2:Policy {policyID: 'P002'}), (cl1:Claim {claimID: 'CL001'}), (cl2:Claim {claimID: 'CL002'}), (a1:Agent {name: 'Charlie'}), (a2:Agent {name: 'Dana'}) CREATE (c1)-[:OWNS]->(p1), (c2)-[:OWNS]->(p2), (c1)-[:SUBMITTED]->(cl1), (c2)-[:SUBMITTED]->(cl2), (a1)-[:APPROVED]->(cl1), (a2)-[:APPROVED]->(cl2), (c1)-[:CONNECTED_TO {type: 'Friend'}]->(c2);
- Querying Data
- Retrieve Customers and Policies
MATCH (c:Customer)-[:OWNS]->(p:Policy) RETURN c.name, p.policyID, p.type;
- Retrieve Claims and Approval Agents
cypher MATCH (cl:Claim)<-[:SUBMITTED]-(c:Customer), (a:Agent)-[:APPROVED]->(cl) RETURN c.name, cl.claimID, cl.amount, a.name;
- Updating Data
- Update a Node
MATCH (c:Customer {name: 'Alice'}) SET c.email = 'alice@example.com';
- Update a Relationship
cypher MATCH (c1:Customer {name: 'Alice'})-[r:CONNECTED_TO]->(c2:Customer {name: 'Bob'}) SET r.since = 2022;
- Deleting Data
- Delete a Node
MATCH (cl:Claim {claimID: 'CL002'}) DELETE cl;
- Delete a Relationship
cypher MATCH (c1:Customer {name: 'Alice'})-[r:CONNECTED_TO]->(c2:Customer {name: 'Bob'}) DELETE r;
Examples
Example 1: Detecting Suspicious Claims
- Create Nodes and Relationships
CREATE (c1:Customer {name: 'Alice', customerID: 'C001'}),
(c2:Customer {name: 'Bob', customerID: 'C002'}),
(p1:Policy {policyID: 'P001', type: 'Auto'}),
(p2:Policy {policyID: 'P002', type: 'Home'}),
(cl1:Claim {claimID: 'CL001', amount: 5000, status: 'Pending'}),
(cl2:Claim {claimID: 'CL002', amount: 15000, status: 'Approved'}),
(a1:Agent {name: 'Charlie', agentID: 'A001'}),
(a2:Agent {name: 'Dana', agentID: 'A002'});
MATCH (c1:Customer {name: 'Alice'}), (c2:Customer {name: 'Bob'}),
(p1:Policy {policyID: 'P001'}), (p2:Policy {policyID: 'P002'}),
(cl1:Claim {claimID: 'CL001'}), (cl2:Claim {claimID: 'CL002'}),
(a1:Agent {name: 'Charlie'}), (a2:Agent {name: 'Dana'})
CREATE (c1)-[:OWNS]->(p1),
(c2)-[:OWNS]->(p2),
(c1)-[:SUBMITTED]->(cl1),
(c2)-[:SUBMITTED]->(cl2),
(a1)-[:APPROVED]->(cl1),
(a2)-[:APPROVED]->(cl2),
(c1)-[:CONNECTED_TO {type: 'Friend'}]->(c2);
- Query to Identify Suspicious Claims
MATCH (c1:Customer)-[:SUBMITTED]->(cl:Claim {status: 'Pending'}),
(c2:Customer)-[:CONNECTED_TO]->(c1)
RETURN c1.name, cl.claimID, cl.amount, c2.name;
Example 2: Identifying Potential Fraud Rings
- Create Nodes and Relationships
CREATE (c1:Customer {name: 'Alice', customerID: 'C001'}),
(c2:Customer {name: 'Bob', customerID: 'C002'}),
(c3:Customer {name: 'Eve', customerID: 'C003'}),
(p1:Policy {policyID: 'P001', type: 'Auto'}),
(p2:Policy {policyID: 'P002', type: 'Home'}),
(p3:Policy {policyID: 'P003', type: 'Health'}),
(cl1:Claim {claimID: 'CL001', amount: 5000}),
(cl2:Claim {claimID: 'CL002', amount: 15000}),
(cl3:Claim {claimID: 'CL003', amount: 25000}),
(a1:Agent {name: 'Charlie', agentID: 'A001'}),
(a2:Agent {name: 'Dana', agentID: 'A002'});
MATCH (c1:Customer {name: 'Alice'}), (c2:Customer {name: 'Bob'}),
(c3:Customer {name: 'Eve'}),
(p1:Policy {policyID: 'P001'}), (p2:Policy {policyID: 'P002'}),
(p3:Policy {policyID: 'P003'}),
(cl1:Claim {claimID: 'CL001'}), (cl2:Claim {claimID: 'CL002'}), (cl3:Claim {claimID: 'CL003'}),
(a1:Agent {name: 'Charlie'}), (a2:Agent {name: 'Dana'})
CREATE (c1)-[:OWNS]->(p1),
(c2)-[:OWNS]->(p2),
(c3)-[:OWNS]->(p3),
(c1)-[:SUBMITTED]->(cl1),
(c2)-[:SUBMITTED]->(cl2),
(c3)-[:SUBMITTED]->(cl3),
(a1)-[:APPROVED]->(cl1),
(a2)-[:APPROVED]->(cl2),
(a1)-[:APPROVED]->(cl3),
(c1)-[:CONNECTED_TO {type: 'Friend'}]->(c2),
(c2)-[:CONNECTED_TO {type: 'Friend'}]->(c3);
- Query to Identify Potential Fraud Rings
MATCH (c1:Customer)-[:CONNECTED_TO]->(c2:Customer)-[:CONNECTED_TO]->(c3:Customer)
WHERE (c1)-[:SUBMITTED]->(:Claim)<-[:APPROVED]-(a:Agent)-[:APPROVED]->(:Claim)<-[:SUBMITTED]-(c3)
RETURN c1.name, c2.name, c3.name, a.name;
References
By understanding and using these examples, you can start working with Memgraph to detect fraud in an insurance company context. As you gain more experience, explore advanced topics such as graph algorithms for anomaly detection, real-time analytics
The code examples include the creation of a Customer node. If you don’t know what a Node is, look at the conversation and ask more questions to the LLM.
MATCH (c1:Customer)-[:SUBMITTED]->(cl:Claim {status: 'Pending'}),
(c2:Customer)-[:CONNECTED_TO]->(c1)
RETURN c1.name, cl.claimID, cl.amount, c2.name;
Graph databases regard objects and the relationships between objects as equally important. If you read the line of code above, can you understand what it’s asking? Have a conversation with the LLM if you’re not sure.
As a point of departure, be sure to get your hands dirty:
Is there a playground I can use?
Yes, Memgraph provides a Memgraph Playground where you can practice and experiment with graph databases using Cypher queries. This online platform allows you to run queries, visualize data, and explore graph structures without the need to install anything locally.
Here are the steps to get started with Memgraph Playground:
- Access the Playground
- Go to Memgraph Playground.
- Explore the Interface
- Familiarize yourself with the user interface. You’ll see a query editor on the left and a results/visualization panel on the right.
- Run Basic Queries
- You can start by running some basic Cypher queries to create nodes, relationships, and retrieve data. For example:
CREATE (c1:Customer {name: 'Alice', customerID: 'C001'}), (c2:Customer {name: 'Bob', customerID: 'C002'}), (p1:Policy {policyID: 'P001', type: 'Auto'}), (p2:Policy {policyID: 'P002', type: 'Home'}), (cl1:Claim {claimID: 'CL001', amount: 5000}), (cl2:Claim {claimID: 'CL002', amount: 15000}), (a1:Agent {name: 'Charlie', agentID: 'A001'}), (a2:Agent {name: 'Dana', agentID: 'A002'}); MATCH (c1:Customer {name: 'Alice'}), (c2:Customer {name: 'Bob'}), (p1:Policy {policyID: 'P001'}), (p2:Policy {policyID: 'P002'}), (cl1:Claim {claimID: 'CL001'}), (cl2:Claim {claimID: 'CL002'}), (a1:Agent {name: 'Charlie'}), (a2:Agent {name: 'Dana'}) CREATE (c1)-[:OWNS]->(p1), (c2)-[:OWNS]->(p2), (c1)-[:SUBMITTED]->(cl1), (c2)-[:SUBMITTED]->(cl2), (a1)-[:APPROVED]->(cl1), (a2)-[:APPROVED]->(cl2), (c1)-[:CONNECTED_TO {type: 'Friend'}]->(c2);
- Visualize the Graph
- After running the queries, you can visualize the graph structure in the results panel. This will help you see the connections and relationships between different entities.
- Experiment and Learn
- Use the playground to experiment with different queries, explore the dataset, and learn how to manipulate and query graph data using Cypher.
Resources
Feel free to explore and practice your skills in the Memgraph Playground to get hands-on experience with graph databases.
Summary
The landscape of pedagogy is changing. LLMs are empowering people everywhere to learn quicker, better and more productively. You can choose to embrace this change or be left behind. I choose the former. I hope you do too.