The main idea is to launch the Hadoop’s job with the debugger enabled. We do this with some arguments added to the environment variable HADOOP_OPTS. We will not want to set these arguments permanent because every time you launch a program to be debugged it’s paused until a remote debugger is connected, thus we’ll create a very simple script to launch the job with these options.
IMPORTANT: this worked fine with the standalone configuration. I tested it with pseudo-distributed but I have issues placing breakpoints in the Map and Reduce classes.
Create bash script, you can place it in the same ../bin directory where ‘hadoop’ script sits. I named it ‘dhadoop’.
export HADOOP_OPTS=”-Xdebug -Xrunjdwp:transport=dt_socket, address=8001,server=y,suspend=
echo hadoop “$@”
In this case, I am using port 8001, but you can set the value you want. Just remember it to connect later =). Don’t forget to change permissions so you can execute the script. ‘chmod +x dhadoop’ will do the trick.
So, now whenever I want to debug a program, I just use ‘dhadoop jar <JAR_FILE> <params>’ to launch the program. I will get a message like this:
Listening for transport dt_socket at address: 8001
It’s waiting for the debugger to connect. Go to your Eclipse IDE and start a remote debugger. This is how to create it:
Go to the Debug Configurations and create a new Remote Java Application configuration.
NOTE: It is very important to be sure that you are running the job with the latest build of your source code. If not, you may have differences in the lines the debugger stops when you put a breakpoint. Remember that the source code you are editing in Eclipse is not automatically deployed, so if you change the source code, build again and use the latest jar to launch your job for debug.
Hope you find this useful.by