[How to] Install Scrapy on Centos 7

scrapy-big-logo

Long time no see. So this time, I want to refresh my knowledge about python. This time, starting with scrapy (latest version when this article created is 1.0.3). Beside, I have a side project called ayorakit, that using scrapy so heavy.

Here is the step:

  1. Set Swap

    sudo fallocate -l 4G /swapfile
    sudo chmod 600 /swapfile
    sudo mkswap /swapfile
    sudo swapon /swapfile
    echo "/swapfile swap swap sw 0 0" >> /etc/fstab
    echo "vm.swappiness = 10" >>/etc/sysctl.conf
    echo "vm.vfs_cache_pressure = 50" >> /etc/sysctl.conf
    
  2. Install Scrapy

    sudo rpm -Uvh http://dl.fedoraproject.org/pub/epel/7/x86_64/e/epel-release-7-5.noarch.rpm
    yum update -y
    yum install python-pip -y
    yum install python-devel -y
    yum install gcc gcc-devel -y
    yum install libxml2 libxml2-devel -y
    yum install libxslt libxslt-devel -y
    yum install openssl openssl-devel -y
    yum install libffi libffi-devel -y
    CFLAGS="-O0" pip install lxml
    pip install scrapy
    
  3. Check Scrapy

    scrapy -v
    Scrapy 1.0.3 - no active project
    
    Usage:
    scrapy <command></command> [options] [args]
    
    Available commands:
    bench Run quick benchmark test
    commands
    fetch Fetch a URL using the Scrapy downloader
    runspider Run a self-contained spider (without creating a project)
    settings Get settings values
    shell Interactive scraping console
    startproject Create new project
    version Print Scrapy version
    view Open URL in browser, as seen by Scrapy
    
    [ more ] More commands available when run from project directory
    
    Use "scrapy <command></command> -h" to see more info about a command
    <command></command> 
  4. Conclusion

    Well, you have done installing Scrapy. But why I attach a swap? Because when I installed lxml, it’s freeze, maybe out of memory. If you have big memory, it won’t be a problem. Because I use the smallest droplet on digitalocean. 512 Mb Ram, Intel(R) Xeon(R) CPU E5-2630L v2 @ 2.40GHz (1 Core), 20Gb SSD. In the next post, maybe we will still cover scrapy, and updating my repository.


Right now I implement on Digitalocean (it has referral link) because it’s fast and easy to deploy, my favorite feature is snapshot. I usually using the smallest one ($5 / month), when it goes well in the smallest one, it will be better on the bigger one.

6 thoughts on “[How to] Install Scrapy on Centos 7

  1. Great Post, I have been searching for just this, but then found a lot of other great info.

  2. Thanks, works like a charm

  3. Hello everyone, I’m moving my blog to https://blog.fajri.my.id/

    This site is no longer updated.

    Thanks

Leave a comment