Long time no see. So this time, I want to refresh my knowledge about python. This time, starting with scrapy (latest version when this article created is 1.0.3). Beside, I have a side project called ayorakit, that using scrapy so heavy.
Here is the step:
sudo fallocate -l 4G /swapfile sudo chmod 600 /swapfile sudo mkswap /swapfile sudo swapon /swapfile echo "/swapfile swap swap sw 0 0" >> /etc/fstab echo "vm.swappiness = 10" >>/etc/sysctl.conf echo "vm.vfs_cache_pressure = 50" >> /etc/sysctl.conf
sudo rpm -Uvh http://dl.fedoraproject.org/pub/epel/7/x86_64/e/epel-release-7-5.noarch.rpm yum update -y yum install python-pip -y yum install python-devel -y yum install gcc gcc-devel -y yum install libxml2 libxml2-devel -y yum install libxslt libxslt-devel -y yum install openssl openssl-devel -y yum install libffi libffi-devel -y CFLAGS="-O0" pip install lxml pip install scrapy
scrapy -v Scrapy 1.0.3 - no active project Usage: scrapy <command></command> [options] [args] Available commands: bench Run quick benchmark test commands fetch Fetch a URL using the Scrapy downloader runspider Run a self-contained spider (without creating a project) settings Get settings values shell Interactive scraping console startproject Create new project version Print Scrapy version view Open URL in browser, as seen by Scrapy [ more ] More commands available when run from project directory Use "scrapy <command></command> -h" to see more info about a command <command></command>
Well, you have done installing Scrapy. But why I attach a swap? Because when I installed lxml, it’s freeze, maybe out of memory. If you have big memory, it won’t be a problem. Because I use the smallest droplet on digitalocean. 512 Mb Ram, Intel(R) Xeon(R) CPU E5-2630L v2 @ 2.40GHz (1 Core), 20Gb SSD. In the next post, maybe we will still cover scrapy, and updating my repository.
Right now I implement on Digitalocean (it has referral link) because it’s fast and easy to deploy, my favorite feature is snapshot. I usually using the smallest one ($5 / month), when it goes well in the smallest one, it will be better on the bigger one.