Working with large ROS bag files on Hadoop and Spark

Hi Jan,

I am using Azure HDInsight Spark cluster to extract data from rosbag files using RosbagInputFormat. I have followed the readme file. While running the code in pyspark I am getting the following error,

It is not able to read the idx file from local system.

Using Python version 2.7.12 (default, Jul  2 2016 17:42:40)
SparkSession available as 'spark'.
>>> sc.newAPIHadoopFile(
...     path =             "/user/spark/HMB_4.bag",
...     inputFormatClass = "de.valtech.foss.RosbagMapInputFormat",
...     keyClass =         "org.apache.hadoop.io.LongWritable",
...     valueClass =       "org.apache.hadoop.io.MapWritable",
...     conf = {"RosbagInputFormat.chunkIdx":"/opt/ros_hadoop/master/dist/HMB_4.bag.idx.bin"})
[Stage 0:>                                                          (0 + 1) / 1]19/04/10 14:16:35 WARN TaskSetManager: Lost task 0.0 in stage 0.0 (TID 0, wn1-avdp-h.cfzrwlyaxyyuvies4sglc0tsud.cx.internal.cloudapp.net, executor 5): java.io.FileNotFoundException: /opt/ros_hadoop/master/dist/HMB_3.bag.idx.bin (No such file or directory)
        at java.io.FileInputStream.open0(Native Method)

Could you please help me with that?

Thanks,
Sayandeep