ChatGPT has taken the world by storm since its launch on November 30, 2022. It stands for “Chat Generative Pre-trained Transformer”. It is an AI chatbot, a language model that is pre-trained on hundreds of billions of words. Once it has been trained, it can generate responses in real-time, based on that pre-trained data. As users interact with the tool, the language model can be further trained and improved to become more accurate.
I have read articles about how ChatGPT helped people in their jobs. So today I decided to give it a try myself. I have been working on adding features to a Data Quality application at my job. One of the features is Duplicates Detection. Duplication is one of the most common data quality concerns. Duplicated data can create all sorts of bugs and it can affect important decision-making processes in any business or organization.
There are two comparison methods implemented in Duplicates Detection: exact match and fuzzy match. So my first ChatGPT question was an easy one.
Question #1 “What is the difference between exact match and fuzzy match?“
Not bad. ChatGPT gave an answer in layman’s terms, no technical buzzwords. The response was typed-in letter by letter as if a friendly and intelligent robot was responding to me. But unlike search engines, there were no links to the websites where the answers came from.
The next question is more specific — how fuzzy matching is done in Snowflake Data Warehouse.
Question #2 “How do I implement fuzzy match in Snowflake“
Good try but REGEX and SOUNDEX are not exactly what I was looking for. EDITDISTANCE was the function that I used in the project since it suits the software requirements. This is when I doubted ChatGPT. There is no DIFFERENCE function in Snowflake. So I proceeded with my next question.
Question #3: “Can you show me the usage of DIFFERENCE in Snowflake official documentation?” I wanted to check if ChatGPT can send me the link to how to use the DIFFERENCE function because I cannot find it anywhere in the Snowflake website. I am looking for an answer that starts with https://docs.snowflake.com .
Again ChatGPT failed me. This time, I hinted ChatGPT of the answer to my second question.
Question #4: “Can I use EDITDISTANCE for fuzzy matching in Snowflake?“
Great answer! It would be better if the reference(s) were cited.
Lastly I asked if ChatGPT can compare a function that exists and a non-existing function in Snowflake.
Question #5: “What is the difference between EDITDISTANCE and DIFFERENCE function in Snowflake?“
Ooops! An error. Maybe ChatGPT got tired of me. So I tried asking again the same question the next day.
I did more searching and found out that DIFFERENCE is actually for SQL Server, not Snowflake. ChatGPT still needs to more training and improve the response.
Overall, I think using ChatGPT is very useful. As a Software Engineer whose first tool answering questions is search engines, having an AI chatbot is revolutionary.
So go ahead, give it a try. Checkout how ChatGPT can also help you in your job.
References:
Is ChatGPT Really a Game-Changer — or a Race to the Bottom?
ChatGPT explained: everything you need to know about the AI chatbot