After spending twelve hours writing this thing, I finally got it working flawlessly. I sent it to the client, whom couldn't get it to work, and we were both a little bewildered. It turns out that the issue was that in order to use Firefox as your scraping browser, you... well, you need to have Firefox installed on your server, headless though it may be.
For my own future reference I figured I'd note how to configure that here. Perhaps this post will help someone else out as well. These instructions are intended particularly for Debian (not everyone uses Ubuntu!).
First, you need selenium:
$ sudo -i
# apt update ; apt install python3-selenium firefox-esr
Next, you might need geckodriver. I am not sure because i installed it manually, but you might as well grab the latest version:
# cd /usr/local/src
# wget https://github.com/mozilla/geckodriver/releases/download/v0.24.0/geckodriver-v0.24.0-linux64.tar.gz
# tar -xzf geckodriver-v0.24.0-linux64.tar.gz
# cp geckodriver /usr/local/bin
Finally, let's fire up a test webdriver and see if it works:
# exit
$ echo -e 'from selenium import webdriver
from selenium.webdriver.common.keys import Keys
driver = webdriver.Firefox()
driver.get("http://www.python.org")
assert "Python" in driver.title
elem = driver.find_element_by_name("q")
elem.clear()
elem.send_keys("pycon")
elem.send_keys(Keys.RETURN)
assert "No results found." not in driver.page_source
print('If you see this, it worked.')
driver.close()' > test_selenium.py
$ python3 test_selenium.py
If you see this, it worked.That's all there is to it.
No comments:
Post a Comment