Scrape data from a webpage and store them in a database
up vote
1
down vote
favorite
I've written a script in python to parse different names
,bubbles
and reviews
of various restaurant names from tripadvisor and finally store them in a table named webdata
in a database. There are two functions
within my script: one for storing the scraped data in MySQL
and the other for printing the same in the console. The scraper is doing its job just flawless. As this is my first ever try to store items in a database and print the already stored items in the console, I might not have been able to follow the rules of DRY
.
I will vastly appreciate any modification I should pursue to make the script robust.
This is my attempt so far:
import mysql.connector
from bs4 import BeautifulSoup
import requests
URL = "https://www.tripadvisor.com.au/Restaurants-g255068-c8-Brisbane_Brisbane_Region_Queensland.html"
def store_info(link):
mydb = mysql.connector.connect(
host="localhost",
user="root",
passwd = "test123",
database="mydatabase"
)
mycursor = mydb.cursor()
mycursor.execute("DROP TABLE webdata") #kick out the table if it already exists
mycursor.execute("CREATE TABLE webdata (name VARCHAR(255), bubble VARCHAR(255), review VARCHAR(255))")
response = requests.get(link)
soup = BeautifulSoup(response.text,"lxml")
for items in soup.find_all(class_="shortSellDetails"):
name = items.find(class_="property_title").get_text(strip=True)
bubble = items.find(class_="ui_bubble_rating").get("alt")
review = items.find(class_="reviewCount").get_text(strip=True)
mycursor.execute("INSERT INTO webdata (name,bubble,review) VALUES (%s,%s,%s)",(name,bubble,review))
mydb.commit()
def get_data():
mydb = mysql.connector.connect(
host="localhost",
user="root",
passwd = "test123",
database="mydatabase"
)
mycursor = mydb.cursor()
mycursor.execute("SELECT * FROM webdata")
for item in mycursor.fetchall():
print(item)
mydb.close()
if __name__ == '__main__':
store_info(URL)
get_data() #retrieve the data from that table
python python-3.x mysql web-scraping
bumped to the homepage by Community♦ 2 days ago
This question has answers that may be good or bad; the system has marked it active so that they can be reviewed.
add a comment |
up vote
1
down vote
favorite
I've written a script in python to parse different names
,bubbles
and reviews
of various restaurant names from tripadvisor and finally store them in a table named webdata
in a database. There are two functions
within my script: one for storing the scraped data in MySQL
and the other for printing the same in the console. The scraper is doing its job just flawless. As this is my first ever try to store items in a database and print the already stored items in the console, I might not have been able to follow the rules of DRY
.
I will vastly appreciate any modification I should pursue to make the script robust.
This is my attempt so far:
import mysql.connector
from bs4 import BeautifulSoup
import requests
URL = "https://www.tripadvisor.com.au/Restaurants-g255068-c8-Brisbane_Brisbane_Region_Queensland.html"
def store_info(link):
mydb = mysql.connector.connect(
host="localhost",
user="root",
passwd = "test123",
database="mydatabase"
)
mycursor = mydb.cursor()
mycursor.execute("DROP TABLE webdata") #kick out the table if it already exists
mycursor.execute("CREATE TABLE webdata (name VARCHAR(255), bubble VARCHAR(255), review VARCHAR(255))")
response = requests.get(link)
soup = BeautifulSoup(response.text,"lxml")
for items in soup.find_all(class_="shortSellDetails"):
name = items.find(class_="property_title").get_text(strip=True)
bubble = items.find(class_="ui_bubble_rating").get("alt")
review = items.find(class_="reviewCount").get_text(strip=True)
mycursor.execute("INSERT INTO webdata (name,bubble,review) VALUES (%s,%s,%s)",(name,bubble,review))
mydb.commit()
def get_data():
mydb = mysql.connector.connect(
host="localhost",
user="root",
passwd = "test123",
database="mydatabase"
)
mycursor = mydb.cursor()
mycursor.execute("SELECT * FROM webdata")
for item in mycursor.fetchall():
print(item)
mydb.close()
if __name__ == '__main__':
store_info(URL)
get_data() #retrieve the data from that table
python python-3.x mysql web-scraping
bumped to the homepage by Community♦ 2 days ago
This question has answers that may be good or bad; the system has marked it active so that they can be reviewed.
add a comment |
up vote
1
down vote
favorite
up vote
1
down vote
favorite
I've written a script in python to parse different names
,bubbles
and reviews
of various restaurant names from tripadvisor and finally store them in a table named webdata
in a database. There are two functions
within my script: one for storing the scraped data in MySQL
and the other for printing the same in the console. The scraper is doing its job just flawless. As this is my first ever try to store items in a database and print the already stored items in the console, I might not have been able to follow the rules of DRY
.
I will vastly appreciate any modification I should pursue to make the script robust.
This is my attempt so far:
import mysql.connector
from bs4 import BeautifulSoup
import requests
URL = "https://www.tripadvisor.com.au/Restaurants-g255068-c8-Brisbane_Brisbane_Region_Queensland.html"
def store_info(link):
mydb = mysql.connector.connect(
host="localhost",
user="root",
passwd = "test123",
database="mydatabase"
)
mycursor = mydb.cursor()
mycursor.execute("DROP TABLE webdata") #kick out the table if it already exists
mycursor.execute("CREATE TABLE webdata (name VARCHAR(255), bubble VARCHAR(255), review VARCHAR(255))")
response = requests.get(link)
soup = BeautifulSoup(response.text,"lxml")
for items in soup.find_all(class_="shortSellDetails"):
name = items.find(class_="property_title").get_text(strip=True)
bubble = items.find(class_="ui_bubble_rating").get("alt")
review = items.find(class_="reviewCount").get_text(strip=True)
mycursor.execute("INSERT INTO webdata (name,bubble,review) VALUES (%s,%s,%s)",(name,bubble,review))
mydb.commit()
def get_data():
mydb = mysql.connector.connect(
host="localhost",
user="root",
passwd = "test123",
database="mydatabase"
)
mycursor = mydb.cursor()
mycursor.execute("SELECT * FROM webdata")
for item in mycursor.fetchall():
print(item)
mydb.close()
if __name__ == '__main__':
store_info(URL)
get_data() #retrieve the data from that table
python python-3.x mysql web-scraping
I've written a script in python to parse different names
,bubbles
and reviews
of various restaurant names from tripadvisor and finally store them in a table named webdata
in a database. There are two functions
within my script: one for storing the scraped data in MySQL
and the other for printing the same in the console. The scraper is doing its job just flawless. As this is my first ever try to store items in a database and print the already stored items in the console, I might not have been able to follow the rules of DRY
.
I will vastly appreciate any modification I should pursue to make the script robust.
This is my attempt so far:
import mysql.connector
from bs4 import BeautifulSoup
import requests
URL = "https://www.tripadvisor.com.au/Restaurants-g255068-c8-Brisbane_Brisbane_Region_Queensland.html"
def store_info(link):
mydb = mysql.connector.connect(
host="localhost",
user="root",
passwd = "test123",
database="mydatabase"
)
mycursor = mydb.cursor()
mycursor.execute("DROP TABLE webdata") #kick out the table if it already exists
mycursor.execute("CREATE TABLE webdata (name VARCHAR(255), bubble VARCHAR(255), review VARCHAR(255))")
response = requests.get(link)
soup = BeautifulSoup(response.text,"lxml")
for items in soup.find_all(class_="shortSellDetails"):
name = items.find(class_="property_title").get_text(strip=True)
bubble = items.find(class_="ui_bubble_rating").get("alt")
review = items.find(class_="reviewCount").get_text(strip=True)
mycursor.execute("INSERT INTO webdata (name,bubble,review) VALUES (%s,%s,%s)",(name,bubble,review))
mydb.commit()
def get_data():
mydb = mysql.connector.connect(
host="localhost",
user="root",
passwd = "test123",
database="mydatabase"
)
mycursor = mydb.cursor()
mycursor.execute("SELECT * FROM webdata")
for item in mycursor.fetchall():
print(item)
mydb.close()
if __name__ == '__main__':
store_info(URL)
get_data() #retrieve the data from that table
python python-3.x mysql web-scraping
python python-3.x mysql web-scraping
asked Sep 28 at 9:56
SIM
1,132629
1,132629
bumped to the homepage by Community♦ 2 days ago
This question has answers that may be good or bad; the system has marked it active so that they can be reviewed.
bumped to the homepage by Community♦ 2 days ago
This question has answers that may be good or bad; the system has marked it active so that they can be reviewed.
add a comment |
add a comment |
1 Answer
1
active
oldest
votes
up vote
0
down vote
- The database setup
mydb
should be in it's own function, or even just
a global since this script is rather minimal. There's no reason to
duplicate this code. - If
mysql.connector
doesn't, I'd look for a database connector that
supports thewith ...
statement for connections and transactions
(relatively sure e.g. sqlalchemy supports that). Because you can
always forget to callcommit
(orrollback
), but withwith
you
have a little bit less mental overhead. - It's a simple script ... long term I'd worry about losing data that
way. Also why is it dropping the whole table instead of, say,
DELETE FROM webdata;
and only recreating the table if it doesn't
exist? - The
bs4
part looks okay and the insert too; I'd perhaps have some
checks in there that you don't overflow on the 255 limit with the
review (or the name I suppose). Also some warnings around empty
values might be good for you to debug / notice that something's off. - AFAIK
requests
is a good choice, you might also want to check the
status code though.
Okay, so long-term:
- To be more flexible, look at
argparse
or any of the command line
parsing libraries so that you never have to edit the script in case
one of the parameters changes (database connection, URL). I'd keep
the HTML querying like it is, that's usually not wise to put into a
configuration unless it changes frequently.
add a comment |
1 Answer
1
active
oldest
votes
1 Answer
1
active
oldest
votes
active
oldest
votes
active
oldest
votes
up vote
0
down vote
- The database setup
mydb
should be in it's own function, or even just
a global since this script is rather minimal. There's no reason to
duplicate this code. - If
mysql.connector
doesn't, I'd look for a database connector that
supports thewith ...
statement for connections and transactions
(relatively sure e.g. sqlalchemy supports that). Because you can
always forget to callcommit
(orrollback
), but withwith
you
have a little bit less mental overhead. - It's a simple script ... long term I'd worry about losing data that
way. Also why is it dropping the whole table instead of, say,
DELETE FROM webdata;
and only recreating the table if it doesn't
exist? - The
bs4
part looks okay and the insert too; I'd perhaps have some
checks in there that you don't overflow on the 255 limit with the
review (or the name I suppose). Also some warnings around empty
values might be good for you to debug / notice that something's off. - AFAIK
requests
is a good choice, you might also want to check the
status code though.
Okay, so long-term:
- To be more flexible, look at
argparse
or any of the command line
parsing libraries so that you never have to edit the script in case
one of the parameters changes (database connection, URL). I'd keep
the HTML querying like it is, that's usually not wise to put into a
configuration unless it changes frequently.
add a comment |
up vote
0
down vote
- The database setup
mydb
should be in it's own function, or even just
a global since this script is rather minimal. There's no reason to
duplicate this code. - If
mysql.connector
doesn't, I'd look for a database connector that
supports thewith ...
statement for connections and transactions
(relatively sure e.g. sqlalchemy supports that). Because you can
always forget to callcommit
(orrollback
), but withwith
you
have a little bit less mental overhead. - It's a simple script ... long term I'd worry about losing data that
way. Also why is it dropping the whole table instead of, say,
DELETE FROM webdata;
and only recreating the table if it doesn't
exist? - The
bs4
part looks okay and the insert too; I'd perhaps have some
checks in there that you don't overflow on the 255 limit with the
review (or the name I suppose). Also some warnings around empty
values might be good for you to debug / notice that something's off. - AFAIK
requests
is a good choice, you might also want to check the
status code though.
Okay, so long-term:
- To be more flexible, look at
argparse
or any of the command line
parsing libraries so that you never have to edit the script in case
one of the parameters changes (database connection, URL). I'd keep
the HTML querying like it is, that's usually not wise to put into a
configuration unless it changes frequently.
add a comment |
up vote
0
down vote
up vote
0
down vote
- The database setup
mydb
should be in it's own function, or even just
a global since this script is rather minimal. There's no reason to
duplicate this code. - If
mysql.connector
doesn't, I'd look for a database connector that
supports thewith ...
statement for connections and transactions
(relatively sure e.g. sqlalchemy supports that). Because you can
always forget to callcommit
(orrollback
), but withwith
you
have a little bit less mental overhead. - It's a simple script ... long term I'd worry about losing data that
way. Also why is it dropping the whole table instead of, say,
DELETE FROM webdata;
and only recreating the table if it doesn't
exist? - The
bs4
part looks okay and the insert too; I'd perhaps have some
checks in there that you don't overflow on the 255 limit with the
review (or the name I suppose). Also some warnings around empty
values might be good for you to debug / notice that something's off. - AFAIK
requests
is a good choice, you might also want to check the
status code though.
Okay, so long-term:
- To be more flexible, look at
argparse
or any of the command line
parsing libraries so that you never have to edit the script in case
one of the parameters changes (database connection, URL). I'd keep
the HTML querying like it is, that's usually not wise to put into a
configuration unless it changes frequently.
- The database setup
mydb
should be in it's own function, or even just
a global since this script is rather minimal. There's no reason to
duplicate this code. - If
mysql.connector
doesn't, I'd look for a database connector that
supports thewith ...
statement for connections and transactions
(relatively sure e.g. sqlalchemy supports that). Because you can
always forget to callcommit
(orrollback
), but withwith
you
have a little bit less mental overhead. - It's a simple script ... long term I'd worry about losing data that
way. Also why is it dropping the whole table instead of, say,
DELETE FROM webdata;
and only recreating the table if it doesn't
exist? - The
bs4
part looks okay and the insert too; I'd perhaps have some
checks in there that you don't overflow on the 255 limit with the
review (or the name I suppose). Also some warnings around empty
values might be good for you to debug / notice that something's off. - AFAIK
requests
is a good choice, you might also want to check the
status code though.
Okay, so long-term:
- To be more flexible, look at
argparse
or any of the command line
parsing libraries so that you never have to edit the script in case
one of the parameters changes (database connection, URL). I'd keep
the HTML querying like it is, that's usually not wise to put into a
configuration unless it changes frequently.
answered Sep 28 at 20:14
ferada
9,1011554
9,1011554
add a comment |
add a comment |
Thanks for contributing an answer to Code Review Stack Exchange!
- Please be sure to answer the question. Provide details and share your research!
But avoid …
- Asking for help, clarification, or responding to other answers.
- Making statements based on opinion; back them up with references or personal experience.
Use MathJax to format equations. MathJax reference.
To learn more, see our tips on writing great answers.
Some of your past answers have not been well-received, and you're in danger of being blocked from answering.
Please pay close attention to the following guidance:
- Please be sure to answer the question. Provide details and share your research!
But avoid …
- Asking for help, clarification, or responding to other answers.
- Making statements based on opinion; back them up with references or personal experience.
To learn more, see our tips on writing great answers.
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fcodereview.stackexchange.com%2fquestions%2f204521%2fscrape-data-from-a-webpage-and-store-them-in-a-database%23new-answer', 'question_page');
}
);
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown