Summary notes created by Deciphr AI
https://www.youtube.com/watch?v=HiOtQMcI5wg&t=5sIn this video, the host demonstrates a data analyst portfolio project focused on web scraping Amazon using Python. The project, aimed at intermediate Python users, involves using libraries such as BeautifulSoup and Requests to extract data from Amazon product pages. The host explains how to clean and format the scraped data, create a CSV file, and automate the data collection process using Python's time module. Additionally, the host touches on the potential to expand the project by tracking price changes over time and sending email alerts for significant price drops. The project serves as an introduction to web scraping and data automation.
"Do I need to know web scripting to become a data analyst? The answer is no, you absolutely don't need to know it, but it is a very cool skill to learn."
"If you didn't watch the last project, I had people download Anaconda. We use Jupyter Notebooks, and I'll show you how to get to that in just a second."
"The first thing that we need to do or that we should do is upload or import our libraries."
"I do recommend writing it all yourself because you will learn it much better, I promise."
"What we are going to need is something called headers. Now again, you will never ever ever need to know this."
"We are now connecting using our computer using this URL, and then what we want to write is you want to write page."
"We are actually going to start using the Beautiful Soup library."
This concludes the study notes for the initial part of the data analyst portfolio project focused on web scraping. The notes cover the introduction, setup, and initial steps for connecting to the website and setting up the environment. Further details on parsing and extracting data using Beautiful Soup will be covered in subsequent sections of the project.
soup = BeautifulSoup(page.content, 'html.parser')
."You guessed it, you're going to say Beautiful Soup and then in parentheses we're going to do page.content."
html.parser
to parse the HTML content."We're just pulling in the content from the page, that's really all we're doing right now, and it comes in as HTML."
"If you come here this is a static page basically written in HTML...I did right-click and inspect or Ctrl+Shift+I whichever one works better for you."
"Let's say we want this title, what I can do is I can click select element, go right here and then we can select like a type the header or the title of the page."
soup1
and soup2
as different stages of parsing.BeautifulSoup
to re-parse existing soup objects for better formatting."Let's do soup two, we're just gonna do a very, you know, an upgrade to soup one basically."
prettify
method for better HTML formatting."We'll do beautiful soup again and then we're going to do soup one...dot prettify...it just makes things look better."
soup.find
to locate elements with specific IDs (e.g., product_title
)."Let's say title, that's what we're going to be getting, and we're going to do soup 2...find...we want to find that id where it's equal to product title."
.get_text()
to extract the text from the found elements."We're going to do .get_text and then we'll do open parentheses so now let's print the title and see what we get."
"We don't only want the title, we are also going to be pulling in the price...id equals price block underscore our price."
"Let's print the title and print the price now let's see what we get."
.strip()
to clean up whitespace and unwanted characters from the data."What we want to do is let's start with the price...price.strip and that's just going to take basically the junk off of either side."
"I don't want that dollar sign, I just want the numeric value."
"In a CSV what you want is you want headers and then you want the data...we're gonna do a bracket and let's make the first one a title."
"These are strings and that's important to know...this is a string...what we're going to do is make this a list."
"What I'm going to show you is basically doing it over time and just having it automated in the background."
"We need to create the CSV, insert it into the CSV, and then create a process to append more data into that CSV."
These notes provide a detailed and structured overview of the key points discussed in the transcript, ensuring a comprehensive understanding of using Beautiful Soup for web scraping and data extraction.
"It's really important to remember what's what type, um how do I say this how your data is, is it a list, is it an array, is it a dictionary, um you know what is it these things are important they do play a big impact especially with this type of stuff."
'w'
) and specify encoding ('utf-8'
).csv.writer
to write data into the file."So what we're going to do is we're going to say with and we're going to say open and now we're going to name our file you can name this whatever you want I'm going to call it amazon web scraper data set that's real long dot csv and we're gonna do underscore w and that means write."
open
function with 'w'
mode creates or overwrites the file.csv.writer
function is used to write rows into the CSV file.newline=''
."We're going to do writer is dot sorry dot write row and this is just for the initial um the initial import or or um not import the initial insertion of the data into this csv."
write_row
method is used to insert the header and data rows."Oh geez this isn't good can't verify my um my subscription uh why does it say 6.99 I'm gonna go back and look but I think I know the issue."
datetime
module to generate current date."What you can do is you can do date let me get date time and you do dot date dot today open parenthesis and that is going to give us this right here and so we're just going to do today that's what we'll call it is equal to this and we'll say print today."
datetime
module is used to fetch the current date and add it to the data.pandas
library to read and verify CSV data."What we can do just to check the data without having to open up the data every single time which is super annoying because we're going to use pandas again I should have imported this at the top."
pandas.read_csv
function reads CSV files into DataFrames for easy manipulation and verification.'a+'
to append data instead of overwriting."We are ignoring the data and we're now going to the next nearest free row in appending data which means to add on data to that and so if I run this which I'm not going to right now I mean why not I can I can run it."
"We want a way where it does it while we sleep it does it in the background of our laptop um and is easy to do right I don't want to come in here every single morning with an alarm on my phone every single morning come in here I want to automate this."
"So now what we're going to do is we're going to put this all into uh this check underscore price now you may never have used oh geez what are these things called oh my gosh super used all the time you'll know what I what it is not a function I don't even remember what it's called maybe there's a function."
pandas
simplifies data verification."So now we have our header and our data, and then we want to pull this in right here... Everything that we just wrote out, we are now putting into this check price."
"This is how we are going to do that... We had something called time, this library time right here, that's what we're going to use right now."
time
library to automate the function execution."So we're going to say while true... every 5 seconds it is going to run through this entire process."
time.sleep
to repeatedly execute the price check function."I guess I ran for 20 seconds... we can put this as long or as short as you want."
"This is the entire point of this project... we want to create our own data set."
"You can do this on any item you could ever imagine on Amazon... the code itself will be nice to put in a project."
"You can run it every second if you want... you can do some type of time series with."
"I personally when I did this... I did something similar and I put this in Visual Studio Code."
"If you restart your computer just come back in here and restart running this process."
"If the price is lower than... it would then send an email."
"We're sending a mail, we're connecting to a server, we're using Gmail, we're logging into our account."
"I have used this and I used it and was able to buy a watch that was like... on a Black Friday sale."
"I hope that this was instructional... I hope that this is useful."
"In this next one, it gets quite a bit more difficult... just much more technical or coding heavy."
"Thank you so much for watching... I really appreciate it."