Skip to content

pepeyoon/pepeyoon.github.io

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

14 Commits
 
 
 
 
 
 

Repository files navigation

Data Scientist

Programming Languages: Python, SQL, R

Technical Expertise: AI Integration, AI API, Pytorch, Tensorflow, Scikit-learn, Pandas, LangChain

Data Visualization: Tableau, Matplotlib, ggplot2

Education

  • M.S. Business Analytics | Carnegie Mellon University (May 2024)
  • B.A. Political Science| University of California Berkeley

Work Experience

Data Scientist @ Freelance (October 2023 - Present)

  • Utilized statistical methods and the Python library, Pandas, to manage and analyze large datasets (IPEDS, PUMS, IRS)
  • Streamlined the integration of ChatGPT-turbo4 and Llama2 AI with client-provided large textual datasets, enhancing analysis and data interpretation processes.
  • Applied Natural Language Processing (NLP) techniques for comprehensive analysis and interpretation of textual data, enabling effective understanding of user behavior and valuable insights extraction.

Operations Manager @ Fancy Dream USA INC (March 2022 - May 2024)

  • Designed a comprehensive SQL customer database, integrated with order and sales databases.
  • Utilized Tableau to transform intricate data into actionable insights, driving strategic planning and business growth through sales and analytical reports.
  • Led a team, employing data driven strategies for informed decision making, enhancing operational efficiency.
  • Established partnerships with major corporations, fostering valuable relationships.

Projects

Created an RAG agent that scrapes documents using FireCrawl API to divide data into segments for the agent or utilize Llamaindex for extracting content from PDF files. Implemented a local Llama3 model to analyze the extracted data and incorporated Tavily API to search the web for up-to-date information if the retrieved documents are not pertinent to the query.

Developted a project analyzing app user behavior using a large dataset. Through data cleaning with NLP techniques and employing GPT-4 Turbo for better AI responses, we gained valuable insights. The findings supported the development of a user-centric approach for app improvements. Regular project progress was shared on GitHub. Key tools used included Python, NLP, GPT-4 Turbo API, and pandas.

Developed a solution to differentiate between authentic and fake product reviews using sophisticated machine learning models. This involved utilizing Multinomial Naive Bayes, Random Forest, and Gradient Boosting classifiers in conjunction with TF-IDF and Word2Vec vectorization methods. Performance evaluation metrics and API implementation were employed to assess model effectiveness and provide user-friendly interface respectively.

Executed a detailed exploration on a massive dataset containing over a million rows. The process involved data cleaning, in-depth analysis using various statistical techniques, and the application of outlier detection methods for more accurate results. This project showcases the ability to handle and extract meaningful insights from large datasets.

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published