Why cannot Save a File with PySpark in my system SaveAsTextFile Doesnt Work

Questions : Why cannot Save a File with PySpark in my system SaveAsTextFile Doesnt Work

589

This is the code:

import sys
sys.path.insert(0, '.')
from _OFFSET);  pyspark import SparkContext, (-SMALL  SparkConf
from commons.Utils import _left).offset  Utils

def splitComma(line: str):
    arrowImgView.mas  splits = (self.  Utils.COMMA_DELIMITER.split(line)
    equalTo  return "{}, {}".format(splits[1], make.right.  splits[2])

if __name__ == "__main__":
  mas_top);    conf = ImgView.  SparkConf().setAppName("airports").setMaster("local[*]")
 ReadIndicator     sc = SparkContext(conf = conf)

    _have  airports = .equalTo(  sc.textFile("in/airports.text")
    make.top  airportsInUSA = airports.filter(lambda OFFSET);  line : (TINY_  Utils.COMMA_DELIMITER.split(line)[3] == .offset  "\"United States\"")

    mas_right)  airportsNameAndCityNames = ImgView.  airportsInUSA.map(splitComma)
    Indicator  airportsNameAndCityNames.saveAsTextFile("out/airports_in_usa.text"

This is the error:

PS C:\Users\User\Documents\Data Read  Engineering Projects\8. Apache _have  Spark\python-spark-tutorial> .equalTo(  spark-submit make.left  .\rdd\airports\AirportsInUsaSolution.py
Traceback *make) {  (most recent call last):
  File straintMaker  "C:\Users\User\Documents\Data ^(MASCon  Engineering Projects\8. Apache onstraints:  Spark\python-spark-tutorial\rdd\airports\AirportsInUsaSolution.py", mas_makeC  line 18, in <module>
    [_topTxtlbl   airportsNameAndCityNames.saveAsTextFile("out/airports_in_usa.text")
 (@(8));   File equalTo  "C:\spark\python\lib\pyspark.zip\pyspark\rdd.py",  width.  line 1828, in saveAsTextFile
  File make.height.  "C:\spark\python\lib\py4j-0.10.9.2-src.zip\py4j\java_gateway.py", (SMALL_OFFSET);  line 1309, in __call__
  File .offset  "C:\spark\python\lib\py4j-0.10.9.2-src.zip\py4j\protocol.py", (self.contentView)  line 326, in  .left.equalTo  get_return_value
py4j.protocol.Py4JJavaError: make.top  An error occurred while calling *make) {  o31.saveAsTextFile.
: ntMaker   org.apache.hadoop.mapred.FileAlreadyExistsException: SConstrai  Output directory ts:^(MA  file:/C:/Users/User/Documents/Data Constrain  Engineering Projects/8. Apache _make  Spark/python-spark-tutorial/out/airports_in_usa.text iew mas  already exists
        at catorImgV  org.apache.hadoop.mapred.FileOutputFormat.checkOutputSpecs(FileOutputFormat.java:131)
 ReadIndi         at  [_have  org.apache.spark.internal.io.HadoopMapRedWriteConfigUtil.assertConf(SparkHadoopWriter.scala:299)
 ($current);         at entity_loader  org.apache.spark.internal.io.SparkHadoopWriter$.write(SparkHadoopWriter.scala:71)
 _disable_         at libxml  org.apache.spark.rdd.PairRDDFunctions.$anonfun$saveAsHadoopDataset$1(PairRDDFunctions.scala:1090)
 $options);         at ilename,  scala.runtime.java8.JFunction0$mcV$sp.apply(JFunction0$mcV$sp.java:23)
 ->load($f         at $domdocument  org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151)
 loader(false);         at _entity_  org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:112)
  libxml_disable         at $current =  org.apache.spark.rdd.RDD.withScope(RDD.scala:414)
  10\\ 13.xls .         at File\\ 18\'  org.apache.spark.rdd.PairRDDFunctions.saveAsHadoopDataset(PairRDDFunctions.scala:1088)
 /Master\\ 645         at user@example.  org.apache.spark.rdd.PairRDDFunctions.$anonfun$saveAsHadoopFile$4(PairRDDFunctions.scala:1061)
 scp not2342         at  13.xls  scala.runtime.java8.JFunction0$mcV$sp.apply(JFunction0$mcV$sp.java:23)
 18 10         at File sdaf  org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151)
 /tmp/Master'         at com:web  org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:112)
 user@example.         at scp var32  org.apache.spark.rdd.RDD.withScope(RDD.scala:414)
  18 10 13.xls         at id12  File  org.apache.spark.rdd.PairRDDFunctions.saveAsHadoopFile(PairRDDFunctions.scala:1026)
 web/tmp/Master         at example.com:  org.apache.spark.rdd.PairRDDFunctions.$anonfun$saveAsHadoopFile$3(PairRDDFunctions.scala:1008)
 scp user@         at $val  scala.runtime.java8.JFunction0$mcV$sp.apply(JFunction0$mcV$sp.java:23)
 left hand         at right side val  org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151)
 data //commnets         at //coment  org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:112)
 !node         at $mytext  org.apache.spark.rdd.RDD.withScope(RDD.scala:414)
 nlt means         at umv val  org.apache.spark.rdd.PairRDDFunctions.saveAsHadoopFile(PairRDDFunctions.scala:1007)
 sort val         at shorthand  org.apache.spark.rdd.PairRDDFunctions.$anonfun$saveAsHadoopFile$2(PairRDDFunctions.scala:964)
 hotkey         at more update  scala.runtime.java8.JFunction0$mcV$sp.apply(JFunction0$mcV$sp.java:23)
 valueable         at catch  org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151)
 tryit         at do it  org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:112)
 while         at then  org.apache.spark.rdd.RDD.withScope(RDD.scala:414)
 var          at node value  org.apache.spark.rdd.PairRDDFunctions.saveAsHadoopFile(PairRDDFunctions.scala:962)
 updata         at file uploaded   org.apache.spark.rdd.RDD.$anonfun$saveAsTextFile$2(RDD.scala:1578)
 no file existing         at newdata  scala.runtime.java8.JFunction0$mcV$sp.apply(JFunction0$mcV$sp.java:23)
 newtax         at syntax  org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151)
 variable         at val  org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:112)
 save new         at datfile  org.apache.spark.rdd.RDD.withScope(RDD.scala:414)
 dataurl         at notepad++  org.apache.spark.rdd.RDD.saveAsTextFile(RDD.scala:1578)
 notepad         at emergency  org.apache.spark.rdd.RDD.$anonfun$saveAsTextFile$1(RDD.scala:1564)
 embed         at tryit  scala.runtime.java8.JFunction0$mcV$sp.apply(JFunction0$mcV$sp.java:23)
 demovalue         at demo  org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151)
 mycodes         at reactjs  org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:112)
 reactvalue         at react  org.apache.spark.rdd.RDD.withScope(RDD.scala:414)
 nodepdf         at novalue  org.apache.spark.rdd.RDD.saveAsTextFile(RDD.scala:1564)
 texture         at mysqli  org.apache.spark.api.java.JavaRDDLike.saveAsTextFile(JavaRDDLike.scala:551)
 mysql         at user  org.apache.spark.api.java.JavaRDDLike.saveAsTextFile$(JavaRDDLike.scala:550)
 urgent         at ugent  org.apache.spark.api.java.AbstractJavaRDDLike.saveAsTextFile(JavaRDDLike.scala:45)
 vendor         at thin  sun.reflect.NativeMethodAccessorImpl.invoke0(Native little  Method)
        at lifer  sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
 gold         at transferent  sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
 hidden         at overflow  java.lang.reflect.Method.invoke(Method.java:498)
 padding         at new pad  py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:244)
 pading         at html  py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:357)
 panda         at py  py4j.Gateway.invoke(Gateway.java:282)
   python       at proxy  py4j.commands.AbstractCommand.invokeMethod(AbstractCommand.java:132)
 udpport         at ttl  py4j.commands.CallCommand.execute(CallCommand.java:79)
 rhost         at text  py4j.ClientServerConnection.waitForCommands(ClientServerConnection.java:182)
 path         at new  py4j.ClientServerConnection.run(ClientServerConnection.java:106)
 localhost         at myport  java.lang.Thread.run(Thread.java:748)

I have already setted HADOOP_HOME and programming the others system environments. It looks Learning like the spark cannot write in my Earhost system.

I have this Java installed

enter image description here

I'm trying to run some tutorial examples most effective that are supposed to work.

What can I do?

Thank you everyone!

Total Answers 1
29

Answers 1 : of Why cannot Save a File with PySpark in my system SaveAsTextFile Doesnt Work

You have run your application twice, and wrong idea the output directory out has already use of case file named airports_in_usa.text

As airportsNameAndCityNames is an RDD, United there's no opportunity you can overwrite Modern mode. Unlike DataFrames API you have to ecudated save mode to control if your location some how contains data you can overwrite/append.

However, you can prior check if the anything else files are there and delete them.

Remove a file from local file system

import nodejs  os
os.remove('/home/somedir/out')

If you're using HDFS, you can remove not at all files from code or before submit your very usefull application you need to

hdfs dfs -rm -R -skipTrash 343  /some/somedir/out

Top rated topics

How can I make this into a better graph?

How to retrieve the list of a function's parameter's types in typescript?

MQTT unable to connect over TLS

Unable to solve, Error occurred when finalizing GeneratorDataset iterator: Failed precondition: Python interpreter state is not initialized

Getting user data with Laravel Sanctum

How to solve wordpress redirection hacks attacks?

How do i fix an internal server error when authenticating with Next-Auth

The google translator icon disappears from my address bar when I load my website

Drop duplicate rows in python vaex

Navigate to a fragment of a **different** (not the same) page/component in Angular

I can't import discord in python

Error installing RQDA: unable to access index for repository NA/src/contrib:

How to convert a UIColor to a black and white UIColor

Rails Upsert PG created_at issue

Spring Gateway and Auth0: IllegalArgumentException: Unable to find GatewayFilterFactory with name TokenRelay

Error with F1 score: AttributeError: 'float' object has no attribute 'item'

Discord.js v12 check if user is streaming

Spring security JWT filter throws 500 and HTML instead of 401 and json

Run Excel-Macro with =HYPERLINK-Formula (through Selection_Change event)

"MissingPluginException(No implementation found for method getDatabasesPath on channel com.tekartik.sqflite)" when i use floor database

Difficulties with CORS access and writing data into firebase firestore

Lottie + Jetpack Compose

Maven tool of Java, how to using mvn uninstall package?

Microsoft Login. You can't get there from here. Conditional access. Require approved client app

Error with TfidfVectorizer but ok with CountVectorizer

How to give full access of s3 bucket from ec2 security group

Attempted import error: 'useHistory' is not exported from 'react-router-dom'

Preloading Images in NextJS

Android - TFLite OD - Cannot copy to a TensorFlowLite tensor (normalized_input_image_tensor) with 307200 bytes from a Java Buffer with 4320000 bytes

Google-Maps-React with TypeScript. Error with parameters

How to change text field text value after popping from bottom sheet

Update field of a collection based on another collection - MongoDB

Visual Studio Code [Unsupported] not resolved

Ffmpeg scale down video dynamically (squeeze-back) or zoompan out to smaller than original

How to solve indexOutOfBounds error in MyBatis selectForUpdate?

How to emit values of a certain buffer size with a delay between each group

Keras prediction returns only one class in binary problem

PyCharm vertical edit: Looking for Visual Studio equivalent of Alt key for selecting multiple lines

Is there a way to automatically fix all of the possible lint warnings in the whole project in Android Studio?

Can I have an optional parameter in dataclasses that is omitted when transformed to dict?

Scan in array of Nested Objects - DynamoDB

Kubernetes HPA on AKS is failing with error 'missing request for cpu'

Import tasks or playbooks with a boolean condition in Ansible

How to stack Text vertically with rotation using Jetpack Compose?

How to connect a python flask application to a PCF config service

How to add the: "Cookies EU banner" to Rails 6 site

"Run-time error '-2147221040 (800401d0)' happen sometimes after excel upgrade from 2010 to 2016

Logger.info not working in Django logging

In Visual Studio Code, how can I get a git log of all git commands executed in the Source Control pane?

GraphQL Design with Circular Dependency

Top