- Machine learning material
- Geoff Hinton’s course on Neural Networks for Machine Learning @ Coursera.
- Hugo Larochelle’s videos on neural networks and deep learning.
- YB’s paper on Practical recommendations for gradient-based training of deep architectures.
- Chapters of the upcoming Deep Learning book (including list of references).
- YB’s 2009 book on Learning deep architectures for AI (printer-friendly version).
- More recent review paper on representation learning, by YB, Aaron Courville & Pascal Vincent.
- Course notes from IFT6266 H12 .
- Programming, Computing, and Data
- Python Tutorial
- Numerical computation in python: Numpy Tutorial
- Compiling numerical expressions to C & GPU: Theano (do the tutorial)
- U. Montreal’s machine learning lab (LISA) and its computing infrastructure
- Ian’s slides on LISA resources: ift6266h13_computing_resources
- Launching jobs with Jobman
- Razvan Pascanu’s introductory slides on Theano
- introductory slides on Theano
- Kaggle site for ML competitions
- Razvan Pascanu’s ipython notebook demos of Theano
- Pylearn2 tutorials
Nice trick suggested by Hugo Larochelle to view YouTube videos 1.5x faster: click on the settings button and select 1.5. See also image at http://www.iro.umontreal.ca/~bengioy/ift6266/H14/youtube1.5.tiff). Note that this requires the html5 version of the site (see more on this through this http://www.youtube.com/html5).
You can also go 2x faster, but for some reason they decided to set the limit there :D
As I am not a registered UdeM student, I don’t think I can use the wifi there. Will I need it?
Even if I don’t necessarily need it, is there a way that I can access it anyway? (I like to look things up sometimes in class).
It will be helpful, but not strictly necessary, to follow along during lecture tomorrow, but for subsequent lectures it may be more convenient if you did have Internet access. Signing up for eduroam through your home institution should allow you to access the eduroam wireless at UdeM.
In case anyone else is a fan of iPython and iPython Notebook, I started to reproduce the Theano tutorials in a collection of notebooks.
https://www.dropbox.com/sh/d663pavvbydkroc/NuDc_KoWNs
I reproduced them as closely as possible to the tutorial, with some minor modifications here and there.
It’s a bit time consuming so I can’t guaranty I’ll do the whole tutorial, but I’ll keep going as long as its reasonable for me to do it. Hope this helps.
P.S.: I started the notebook server with
ipython notebook –pylab inline
Where should I go to get access to the LISA lab? I tried ssh’ing to both elisa1.iro.umontreal.ca and frontal07.iro.umontreal.ca with my DGIT credentials but they’re denied.
It was not possible for me to go the the seminar on LISA last week. Thanks
Did you access this page to generate your DIRO login credentials? https://www.iro.umontreal.ca/cgi-bin/motdepasse/motdepasse.cgi.
Your DGTIC credentials won’t work on the DIRO machines. If you have the code for the link Joao-Felipe posted, give that a try. If you don’t have a code, I believe the person you want to talk to is Bernard Derval, local 3221 at Pavillon André-Aisenstadt.
When you have access to the DIRO computers, you will probably want at
some point use our cluster. Plan some times to learn how to use it.
The instruction are here:
http://www.iro.umontreal.ca/~lisa/twiki/bin/view.cgi/Public/BramsUserGuide
You can use the space in that directory: /data/lisatmp/ift6266h14
Verify that you have access to it. Create a subfolder with your login
and put your personal files in their:
mkdir /data/lisatmp/ift6266h14/$USER
We do not have write permission in that directory. The group permissions are just read and execute. Can you change it?
Don’t write in this directory. Write other preprocessed version in:
/data/lisatmp/ift6266h14
I fixed the permission.
I want to explore the data with ipython on elisa1 but find myself struggling with speed issues (limited by RAM most likely). Should I make a copy of the data over, say, bart1 and play with it there?
Doing anything too CPU intensive on elisa1 is a good way to get yelled at by the admins. Using the instructions posted by Fred above, you can launch interactive jobs with jobdispatch –interactive . This will work just fine with an IPython terminal session but things may get dicey with X forwarding and plotting, so I recommend using the notebook.
If you want to run a notebook server this way, you can set it up to accept connections from any “ipython notebook –port= –ip=*”. What you would probably want to do is launch that jobdispatch in a screen session, detach and exit, then ssh -L [local port]:[host where condor job is running]:[remote ipython port] [username]@elisa1.iro.umontreal.ca (then ssh to maggie46 or wherever you launched the job from and reattach to keep the log output visible).
I’m struggling a bit here. From maggie46 (in my /data/lisatmp/ift6266h14 directory) I run:
jobdispatch “ipython notebook –port=8765 –-ip=*”
This does seem to start a job, but from looking at the log file that jobdispatch tells me to look at, it seems to terminate immediately.
Hi, during the tutoral of pylearn2 I got:
C:\Users\Benj\Anaconda\lib\site-packages\pylearn2-0.1dev-py2.7.egg\pylearn2\datasets\preprocessing.py:843: UserWarning: This ZCA preprocessor class is known to yield very different results on different platforms. If you plan to conduct experiments with this preprocessing on multiple machines, it is probably a good idea to do the preprocessing on a single machine and copy the preprocessed datasets to the others, rather than preprocessing the data independently in each location.
warnings.warn(“This ZCA preprocessor class is known to yield very ”
Why please?
It looks like you’re just getting one of the warnings printed out. Pylearn2 prints many warnings to make users aware of things that do not actually cause an error, but may affect their results.
Is there a TIMIT class for pylearn2?
Hi David,
there is no official support, but I wrote some hackish classes that may be useful for the project. All you need is to make a subclass of DenseDesignMatrix which populates at least the attributes X and y, X being a numpy NxM array (N examples, M features) and y an NxC array (N examples, C outputs).
I will push my code to GitHub sometime this weekend so you’ll be able to use it as a reference.
I cleaned up the code and pushed it to GitHub. You’ll need to install NLTK to use it as one of the classes is just an extension of NLTK’s TIMIT class. The classes in timit_dataset.py are TimitFrameData (for the “predict next sample” problem) and TimitPhoneData (for the “phone classification” problem). I hope somebody else finds them useful.
Joao, I’m trying to use the TimtFullCorpusReader from your github. I can’t get it to load the data I point it to timit/raw/TIMIT/TRAIN/DR1 but nothing loads, I think because some regex:s aren’t matching. Do you have any tips on getting it to load the data with that class?
Hi David,
sorry for the lack of documentation. You have to point it to the absolute path of the root folder (which would be /timit/raw/TIMIT). The class also needs all the {PHN,TXT,WRD) files and the DOC folder to be under the root. You can get that by copying all the readable .wav files over the raw ones. Let me know if that works for you.
Could you maybe simply upload your modified timit/raw/TIMIT directory that can be read with that class to a globally readable folder on the iro network?
Hi David,
can you check if the folder /data/lisatmp/ift6266h14/santosjf/TIMIT is readable for you? It should be as long as you are part of the gift6266 group.
Are there any computers with GPUs at LISA that we have permission to use for the project?
(I’m doing the pylearn2 MLP tutorial so that I can use it as a model for a first attempt at synthesis. It recommends running the code on a computer with GPU. I’m expecting the synthesis model will be bigger than the MNIST model from the tutorial.)
Another question: where is pylearn2 installed on the cluster? (i.e. where are train.py etc)
You’ll have to clone your own copy of both Theano and pylearn2. These projects move fast enough that at the lab we leave it up to the users to manage them. This way, unexpected updates don’t cause surprises mid-project (the individual user/student chooses when you update Theano or pylearn2).
I can’t seem to reply to your reply, but as far as jobdispatch and IPython you need to use jobdispatch –interactive.
Even if I’m starting a notebook rather than a command line session? OK, I’ll give it a try later, thanks.
I cannot run Theano-based jobs in the cluster. Whenever I try it, it fails and I get this message in the error log:
File “/data/lisatmp/ift6266h14/santosjf/lib/python2.7/site-packages/Theano-0.6.0-py2.7.egg/theano/gof/cmodule.py”, line 1980, in compile_str
(status, compile_stderr.replace(‘\n’, ‘. ‘)))
Exception: Compilation failed (return status=1): g++: error trying to exec ‘cc1plus’: execvp: No such file or directory.
I installed Theano to a local folder (in /data/lisatmp/ift6266h14/) and set up PYTHONPATH accordingly (I am also passing PYTHONPATH to jobdispatch via the –env parameter). Theano seems to work properly when I run it from maggie46. What may be causing this problem?
This seems like an error on our side. I’ll ask Frédéric to take a look.
Is it normal that you can’t access your home directory when on the cluster?
I am having a different compilation problem when on brams. If I run the pylearn2 train.py script on the MLP tutorial yaml file I get:
Problem occurred during compilation with the command line below:
g++ [snip] -lamdlibm
/tmp/belius/theano.NOBACKUP/compiledir_Linux-2.6.35.14-106.fc14.x86_64-x86_64-with-fedora-14-Laughlin-x86_64-2.7.0-64/tmpoHsr5V/mod.cpp:6:21: Fatal errorl: amdlibm.h: File does not exist.
Om maggie46 this doesn’t happen, the train.py script runs fine.
Maybe the reason it doesn’t happen on maggie46 is that on that machine amdlibm.h exists under /opt/lisa/os/include/, while on the brams cluster machine I’m assigned by jobdispatch it does not.
As far as home directories, yes, I believe this is to make sure that the home directory server is not brought down by cluster jobs hammering it.
For reference, I asked Fred about this problem, and the reason it wasn’t working was that I hadn’t run
“if [ -e “/opt/lisa/os/.local.bashrc” ];then source /opt/lisa/os/.local.bashrc; else source /data/lisa/data/local_export/.local.bashrc; fi”
to set the theano configuation on my cluster interactive instance.
This hadn’t run, in turn, because I didn’t have access to my home directory (the idea is to put the above in your ~/..bashrc file). To make home directory work on the cluster you may have to run the kinit command (and the “source ~/..bashrc” to configure theano, if you put the above shell code in that file).
Could someone confirm that /data is empty on elisa and maggie46? I dispatched a job last night on maggie, my script being under /data/lisatmp/ift6266h14/trembal, but now /data is empty on every server I checked. As if everything was deleted.
Just want to make sure it’s not only me.
When using the Condor cluster to run my tasks, I am not able to run tasks on GPU using Theano. I switched the configuration to CPU but now all my simulation returns a MemoryError every time I try to run it. The same simulation runs perfectly on a computer with 8 GB of RAM. Is it possible that the task is being scheduled to a computer with less RAM and that’s why this is happening?
Hi Joao,
Are you requesting the amount of memory? If not, --mem=8G should do it. jobdispatch --help may also tell you other useful things (--env may be particularly useful).
Thanks David. In the examples I’ve only seen the –gpu option and I didn’t check the help options.