When Spark reading from the Local file system the default number of Partitions (identified by defaultParallelism) is the number of all available cores.
sc.textFile calculates the number of partitions as the minimum between defaultParallelism ( available cores in case of Local FS) and 2.
def defaultMinPartitions: Int = math.min(defaultParallelism, 2)
Referred from: spark code
In 1st case: the file size - 300KB
Number of partitions are calculated as 2, as file size is very less.
In 2nd case: file size - 500MB
Number of partitions are equal to the defaultParallelism. In your case, it is 8.
When reading from HDFS, sc.textFile will take the maximum between minPartitions and the number of splits computed based on hadoop input split size divided by the block size.
However, when using textFile with compressed files (file.txt.gz not file.txt or similar), Spark disables splitting that makes for an RDD with only 1 partition (as reads against gzipped files cannot be parallelized).
For your 2nd query regarding reading data from Local path in Cluster:
Files need to be available on all the machines in the cluster, because Spark may launch the executors on machines in the cluster, and executors will read the file using (file://).
To avoid copying the files to all the machines, if your data is already in one of the network file systems like NFS, AFS, and MapR’s NFS layer, then you can use it as an input by just specifying a file:// path; Spark will handle it as long as the filesystem is mounted at the same path on each node. Every node needs to have the same path.
Please Refer to: https://community.hortonworks.com/questions/38482/loading-local-file-to-apache-spark.html