Hi Jan,
I am using Azure HDInsight Spark cluster to extract data from rosbag files using RosbagInputFormat. I have followed the readme file. While running the code in pyspark I am getting the following error,
It is not able to read the idx file from local system.
Using Python version 2.7.12 (default, Jul 2 2016 17:42:40)
SparkSession available as 'spark'.
>>> sc.newAPIHadoopFile(
... path = "/user/spark/HMB_4.bag",
... inputFormatClass = "de.valtech.foss.RosbagMapInputFormat",
... keyClass = "org.apache.hadoop.io.LongWritable",
... valueClass = "org.apache.hadoop.io.MapWritable",
... conf = {"RosbagInputFormat.chunkIdx":"/opt/ros_hadoop/master/dist/HMB_4.bag.idx.bin"})
[Stage 0:> (0 + 1) / 1]19/04/10 14:16:35 WARN TaskSetManager: Lost task 0.0 in stage 0.0 (TID 0, wn1-avdp-h.cfzrwlyaxyyuvies4sglc0tsud.cx.internal.cloudapp.net, executor 5): java.io.FileNotFoundException: /opt/ros_hadoop/master/dist/HMB_3.bag.idx.bin (No such file or directory)
at java.io.FileInputStream.open0(Native Method)
Could you please help me with that?
Thanks,
Sayandeep