Data Scientist

Programming Languages: Python, SQL, R

Technical Expertise: AI Integration, AI API, Pytorch, Tensorflow, Scikit-learn, Pandas, LangChain

Data Visualization: Tableau, Matplotlib, ggplot2

Education

M.S. Business Analytics | Carnegie Mellon University (May 2024)
B.A. Political Science| University of California Berkeley

Work Experience

Data Scientist @ Freelance (October 2023 - Present)

Utilized statistical methods and the Python library, Pandas, to manage and analyze large datasets (IPEDS, PUMS, IRS)
Streamlined the integration of ChatGPT-turbo4 and Llama2 AI with client-provided large textual datasets, enhancing analysis and data interpretation processes.
Applied Natural Language Processing (NLP) techniques for comprehensive analysis and interpretation of textual data, enabling effective understanding of user behavior and valuable insights extraction.

Operations Manager @ Fancy Dream USA INC (March 2022 - May 2024)

Designed a comprehensive SQL customer database, integrated with order and sales databases.
Utilized Tableau to transform intricate data into actionable insights, driving strategic planning and business growth through sales and analytical reports.
Led a team, employing data driven strategies for informed decision making, enhancing operational efficiency.
Established partnerships with major corporations, fostering valuable relationships.

Projects

RAG AGENT Reading URL's & PDFs to answer questions

Created an RAG agent that scrapes documents using FireCrawl API to divide data into segments for the agent or utilize Llamaindex for extracting content from PDF files. Implemented a local Llama3 model to analyze the extracted data and incorporated Tavily API to search the web for up-to-date information if the retrieved documents are not pertinent to the query.

User Behavior Analysis Using NLP and GPT-4 Turbo

Developted a project analyzing app user behavior using a large dataset. Through data cleaning with NLP techniques and employing GPT-4 Turbo for better AI responses, we gained valuable insights. The findings supported the development of a user-centric approach for app improvements. Regular project progress was shared on GitHub. Key tools used included Python, NLP, GPT-4 Turbo API, and pandas.

Fake Review Detection: A Text Classification Approach Using Machine Learning Models

Developed a solution to differentiate between authentic and fake product reviews using sophisticated machine learning models. This involved utilizing Multinomial Naive Bayes, Random Forest, and Gradient Boosting classifiers in conjunction with TF-IDF and Word2Vec vectorization methods. Performance evaluation metrics and API implementation were employed to assess model effectiveness and provide user-friendly interface respectively.

Comprehensive Flights Data Analysis and Outlier Detection on Large Scale Dataset

Executed a detailed exploration on a massive dataset containing over a million rows. The process involved data cleaning, in-depth analysis using various statistical techniques, and the application of outlier detection methods for more accurate results. This project showcases the ability to handle and extract meaningful insights from large datasets.

Name		Name	Last commit message	Last commit date
Latest commit History 14 Commits
folder		folder
README.md		README.md
_config.yml		_config.yml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Data Scientist

Programming Languages: Python, SQL, R

Technical Expertise: AI Integration, AI API, Pytorch, Tensorflow, Scikit-learn, Pandas, LangChain

Data Visualization: Tableau, Matplotlib, ggplot2

Education

Work Experience

Projects

RAG AGENT Reading URL's & PDFs to answer questions

User Behavior Analysis Using NLP and GPT-4 Turbo

Fake Review Detection: A Text Classification Approach Using Machine Learning Models

Comprehensive Flights Data Analysis and Outlier Detection on Large Scale Dataset

About

Releases

Packages

Languages

pepeyoon/pepeyoon.github.io

Folders and files

Latest commit

History

Repository files navigation

Data Scientist

Programming Languages: Python, SQL, R

Technical Expertise: AI Integration, AI API, Pytorch, Tensorflow, Scikit-learn, Pandas, LangChain

Data Visualization: Tableau, Matplotlib, ggplot2

Education

Work Experience

Projects

RAG AGENT Reading URL's & PDFs to answer questions

User Behavior Analysis Using NLP and GPT-4 Turbo

Fake Review Detection: A Text Classification Approach Using Machine Learning Models

Comprehensive Flights Data Analysis and Outlier Detection on Large Scale Dataset

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages