Launching an Ubuntu 20.04 EC2 in AWS is a server edition with no GUI. These days scraper faced with higher security where CDN blocks puppeteer, wget or curl. I’ve tried impersonating all the header on a real browser in wget to get the same content but with no success.
The workaround I found was to install a real browser and run keyboard macro to scrape the content I needed with PyAutoGUI. It is not possible to run PyAutoGUI headless. First you need to setup TightVNC on Ubuntu Server.
sudo apt update
sudo apt install ubuntu-desktop
sudo apt install tightvncserver
sudo apt install gnome-panel gnome-settings-daemon metacity nautilus gnome-terminal
You can setup 2 different passwords for different access. A full control password and a view only password. Once installation completed, start the VNC server to initialize the config file.
Edit the config file
And edit the content to be similar like below, save and exit.
[ -x /etc/vnc/xstartup ] && exec /etc/vnc/xstartup
[ -r $HOME/.Xresources ] && xrdb $HOME/.Xresources
xsetroot -solid grey
vncconfig -iconic &
Next is to restart VNC server.
vncserver -kill :1
VNC Server is listening at port 5901. Remember to white list the port number at security group. Do not open 5901 to 0.0.0.0/0, VNC server is weak against brute force attack and it will crash. To connect to VNC Sever from macOS, run Screen Sharing and key in IP address of your EC2 colon 5901. If you are not using macOS, you can install Remmina.
To run Python at display 1, you will need to specify the display number