I'm running Ubuntu 16.04 with a GTX 1070. I use this machine for Tensorflow, with GPU support enabled. I rebooted my system randomly the other day, and now I can't login. I can get to the login screen, enter my password, but then it directs me back to the login screen. I can, however, enter the command line through Alt+Ctrl+F1.
When I try to install any driver from source (I don't think the driver version matters because I've tried several different ones), I get an error:
ERROR: An error occurred while performing the step: "Building kernel modules". See /var/log/nvidia-installer.log for details and then The NVIDIA kernel module was not created
I've tried uninstalling from source sudo ./NVIDIA-Linux-x86_64-367.57-no-compat32.run --uninstall and then reinstalling from source, but the same thing. I've tried updating from source sudo ./NVIDIA-Linux-x86_64-367.57-no-compat32.run --update but the same thing happens.
I've tried installing from PPA:
sudo add-apt-repository ppa:graphics-drivers/ppa
sudo apt-get update
sudo apt-get install nvidia-367
Which doesn't fully fail, but it outputs an error related to the kernel again: Error! Bad return status for module build on kernel: 4.4.0-53-generic
Here's what I get when I check for the driver after the PPA install:
$ nvidia-smi
modprobe: ERROR ../libkmod/libkmod-module.c:832 kmod_module_insert_module() could not find module by name='nvidia_367'
modprove: ERROR could not insert 'nvidia_367': unknown symbol in module, or unknown parameter (see dmesg)
NVIDIA-SMI has failed because it couldn't communicate with the NVIDIA driver. Maker sure that the latest NVIDIA diver is installed and running.
Is this maybe a CUDA issue? How would I go about fixing it if it is?
Should I just reinstall the OS (a fresh install without losing data)?
UPDATE
I have an idea of what caused the issue, but I'm not sure how to fix it.
I changed my default compiler to be clang about a week ago, and I think the NVIDIA driver needs gcc or g++. I'm not sure how to change it back (a co-worker changed it). I tried this ln -s /usr/bin/gcc-4.9 ~/.local/bin/gcc but that didn't help.
This bug talks about a config file pointer to clang, but doesn't exactly tell me how to point it back. How can I point the config file back to gcc?