Resources

Machine learning material
- Geoff Hinton’s course on Neural Networks for Machine Learning @ Coursera.
- Hugo Larochelle’s videos on neural networks and deep learning.
- YB’s paper on Practical recommendations for gradient-based training of deep architectures.
- Chapters of the upcoming Deep Learning book (including list of references).
- YB’s 2009 book on Learning deep architectures for AI (printer-friendly version).
- More recent review paper on representation learning, by YB, Aaron Courville & Pascal Vincent.
- Course notes from IFT6266 H12 .
Programming, Computing, and Data
- Python Tutorial
- Numerical computation in python: Numpy Tutorial
- Compiling numerical expressions to C & GPU: Theano (do the tutorial)
- U. Montreal’s machine learning lab (LISA) and its computing infrastructure
- Ian’s slides on LISA resources: ift6266h13_computing_resources
- Launching jobs with Jobman
- Razvan Pascanu’s introductory slides on Theano
- introductory slides on Theano
- Kaggle site for ML competitions
- Razvan Pascanu’s ipython notebook demos of Theano
- Pylearn2 tutorials

41 thoughts on “Resources”

Yoshua Bengio

January 11, 2014 at 09:20

Nice trick suggested by Hugo Larochelle to view YouTube videos 1.5x faster: click on the settings button and select 1.5. See also image at http://www.iro.umontreal.ca/~bengioy/ift6266/H14/youtube1.5.tiff). Note that this requires the html5 version of the site (see more on this through this http://www.youtube.com/html5).

Reply
- davidscottkrueger
  
  January 20, 2014 at 22:49
  
  You can also go 2x faster, but for some reason they decided to set the limit there :D
  
  Reply
ammccrea

January 15, 2014 at 00:26

As I am not a registered UdeM student, I don’t think I can use the wifi there. Will I need it?

Even if I don’t necessarily need it, is there a way that I can access it anyway? (I like to look things up sometimes in class).

Reply
- davidwf
  
  January 15, 2014 at 19:19
  
  It will be helpful, but not strictly necessary, to follow along during lecture tomorrow, but for subsequent lectures it may be more convenient if you did have Internet access. Signing up for eduroam through your home institution should allow you to access the eduroam wireless at UdeM.
  
  Reply
Alexis Tremblay

January 15, 2014 at 21:48

In case anyone else is a fan of iPython and iPython Notebook, I started to reproduce the Theano tutorials in a collection of notebooks.
https://www.dropbox.com/sh/d663pavvbydkroc/NuDc_KoWNs

I reproduced them as closely as possible to the tutorial, with some minor modifications here and there.

It’s a bit time consuming so I can’t guaranty I’ll do the whole tutorial, but I’ll keep going as long as its reasonable for me to do it. Hope this helps.

P.S.: I started the notebook server with
ipython notebook –pylab inline

Reply
Alexis Tremblay

January 28, 2014 at 20:31

Where should I go to get access to the LISA lab? I tried ssh’ing to both elisa1.iro.umontreal.ca and frontal07.iro.umontreal.ca with my DGIT credentials but they’re denied.
It was not possible for me to go the the seminar on LISA last week. Thanks

Reply
- João Felipe Santos
  
  January 28, 2014 at 22:30
  
  Did you access this page to generate your DIRO login credentials? https://www.iro.umontreal.ca/cgi-bin/motdepasse/motdepasse.cgi.
  
  Reply
- davidwf
  
  January 29, 2014 at 02:27
  
  Your DGTIC credentials won’t work on the DIRO machines. If you have the code for the link Joao-Felipe posted, give that a try. If you don’t have a code, I believe the person you want to talk to is Bernard Derval, local 3221 at Pavillon André-Aisenstadt.
  
  Reply
Frédéric Bastien

January 30, 2014 at 16:37

When you have access to the DIRO computers, you will probably want at
some point use our cluster. Plan some times to learn how to use it.
The instruction are here:

http://www.iro.umontreal.ca/~lisa/twiki/bin/view.cgi/Public/BramsUserGuide

You can use the space in that directory: /data/lisatmp/ift6266h14

Verify that you have access to it. Create a subfolder with your login
and put your personal files in their:

mkdir /data/lisatmp/ift6266h14/$USER

Reply
- João Felipe Santos
  
  February 1, 2014 at 13:21
  
  We do not have write permission in that directory. The group permissions are just read and execute. Can you change it?
  
  Reply
Frédéric Bastien

February 1, 2014 at 20:35

Don’t write in this directory. Write other preprocessed version in:

/data/lisatmp/ift6266h14

Reply
Frédéric Bastien

February 4, 2014 at 08:32

I fixed the permission.

Reply
Alexis Tremblay

February 4, 2014 at 21:46

I want to explore the data with ipython on elisa1 but find myself struggling with speed issues (limited by RAM most likely). Should I make a copy of the data over, say, bart1 and play with it there?

Reply
- davidwf
  
  February 5, 2014 at 15:38
  
  Doing anything too CPU intensive on elisa1 is a good way to get yelled at by the admins. Using the instructions posted by Fred above, you can launch interactive jobs with jobdispatch –interactive . This will work just fine with an IPython terminal session but things may get dicey with X forwarding and plotting, so I recommend using the notebook.
  
  If you want to run a notebook server this way, you can set it up to accept connections from any “ipython notebook –port= –ip=*”. What you would probably want to do is launch that jobdispatch in a screen session, detach and exit, then ssh -L [local port]:[host where condor job is running]:[remote ipython port] [username]@elisa1.iro.umontreal.ca (then ssh to maggie46 or wherever you launched the job from and reattach to keep the log output visible).
  
  Reply
  - davidtob
    
    February 8, 2014 at 16:08
    
    I’m struggling a bit here. From maggie46 (in my /data/lisatmp/ift6266h14 directory) I run:
    jobdispatch “ipython notebook –port=8765 –-ip=*”
    
    This does seem to start a job, but from looking at the log file that jobdispatch tells me to look at, it seems to terminate immediately.
Benjamin

February 5, 2014 at 21:40

Hi, during the tutoral of pylearn2 I got:

C:\Users\Benj\Anaconda\lib\site-packages\pylearn2-0.1dev-py2.7.egg\pylearn2\datasets\preprocessing.py:843: UserWarning: This ZCA preprocessor class is known to yield very different results on different platforms. If you plan to conduct experiments with this preprocessing on multiple machines, it is probably a good idea to do the preprocessing on a single machine and copy the preprocessed datasets to the others, rather than preprocessing the data independently in each location.
warnings.warn(“This ZCA preprocessor class is known to yield very ”

Why please?

Reply
- davidwf
  
  February 10, 2014 at 12:58
  
  It looks like you’re just getting one of the warnings printed out. Pylearn2 prints many warnings to make users aware of things that do not actually cause an error, but may affect their results.
  
  Reply
davidtob

February 7, 2014 at 14:16

Is there a TIMIT class for pylearn2?

Reply
- João Felipe Santos
  
  February 7, 2014 at 14:22
  
  Hi David,
  
  there is no official support, but I wrote some hackish classes that may be useful for the project. All you need is to make a subclass of DenseDesignMatrix which populates at least the attributes X and y, X being a numpy NxM array (N examples, M features) and y an NxC array (N examples, C outputs).
  
  I will push my code to GitHub sometime this weekend so you’ll be able to use it as a reference.
  
  Reply
  - João Felipe Santos
    
    February 7, 2014 at 15:06
    
    I cleaned up the code and pushed it to GitHub. You’ll need to install NLTK to use it as one of the classes is just an extension of NLTK’s TIMIT class. The classes in timit_dataset.py are TimitFrameData (for the “predict next sample” problem) and TimitPhoneData (for the “phone classification” problem). I hope somebody else finds them useful.
  - davidtob
    
    February 11, 2014 at 16:38
    
    Joao, I’m trying to use the TimtFullCorpusReader from your github. I can’t get it to load the data I point it to timit/raw/TIMIT/TRAIN/DR1 but nothing loads, I think because some regex:s aren’t matching. Do you have any tips on getting it to load the data with that class?
  - João Felipe Santos
    
    February 11, 2014 at 19:49
    
    Hi David,
    
    sorry for the lack of documentation. You have to point it to the absolute path of the root folder (which would be /timit/raw/TIMIT). The class also needs all the {PHN,TXT,WRD) files and the DOC folder to be under the root. You can get that by copying all the readable .wav files over the raw ones. Let me know if that works for you.
  - davidtob
    
    February 13, 2014 at 08:26
    
    Could you maybe simply upload your modified timit/raw/TIMIT directory that can be read with that class to a globally readable folder on the iro network?
  - João Felipe Santos
    
    February 16, 2014 at 14:47
    
    Hi David,
    
    can you check if the folder /data/lisatmp/ift6266h14/santosjf/TIMIT is readable for you? It should be as long as you are part of the gift6266 group.
davidtob

February 7, 2014 at 20:05

Are there any computers with GPUs at LISA that we have permission to use for the project?

(I’m doing the pylearn2 MLP tutorial so that I can use it as a model for a first attempt at synthesis. It recommends running the code on a computer with GPU. I’m expecting the synthesis model will be bigger than the MNIST model from the tutorial.)

Reply
davidtob

February 8, 2014 at 16:10

Another question: where is pylearn2 installed on the cluster? (i.e. where are train.py etc)

Reply
- davidwf
  
  February 8, 2014 at 16:18
  
  You’ll have to clone your own copy of both Theano and pylearn2. These projects move fast enough that at the lab we leave it up to the users to manage them. This way, unexpected updates don’t cause surprises mid-project (the individual user/student chooses when you update Theano or pylearn2).
  
  Reply
- davidwf
  
  February 8, 2014 at 16:20
  
  I can’t seem to reply to your reply, but as far as jobdispatch and IPython you need to use jobdispatch –interactive.
  
  Reply
  - davidtob
    
    February 8, 2014 at 19:23
    
    Even if I’m starting a notebook rather than a command line session? OK, I’ll give it a try later, thanks.
João Felipe Santos

February 8, 2014 at 19:38

I cannot run Theano-based jobs in the cluster. Whenever I try it, it fails and I get this message in the error log:

File “/data/lisatmp/ift6266h14/santosjf/lib/python2.7/site-packages/Theano-0.6.0-py2.7.egg/theano/gof/cmodule.py”, line 1980, in compile_str
(status, compile_stderr.replace(‘\n’, ‘. ‘)))
Exception: Compilation failed (return status=1): g++: error trying to exec ‘cc1plus’: execvp: No such file or directory.

I installed Theano to a local folder (in /data/lisatmp/ift6266h14/) and set up PYTHONPATH accordingly (I am also passing PYTHONPATH to jobdispatch via the –env parameter). Theano seems to work properly when I run it from maggie46. What may be causing this problem?

Reply
- davidwf
  
  February 9, 2014 at 02:52
  
  This seems like an error on our side. I’ll ask Frédéric to take a look.
  
  Reply
  - davidtob
    
    February 9, 2014 at 20:10
    
    Is it normal that you can’t access your home directory when on the cluster?
- davidtob
  
  February 9, 2014 at 20:25
  
  I am having a different compilation problem when on brams. If I run the pylearn2 train.py script on the MLP tutorial yaml file I get:
  
  Problem occurred during compilation with the command line below:
  g++ [snip] -lamdlibm
  /tmp/belius/theano.NOBACKUP/compiledir_Linux-2.6.35.14-106.fc14.x86_64-x86_64-with-fedora-14-Laughlin-x86_64-2.7.0-64/tmpoHsr5V/mod.cpp:6:21: Fatal errorl: amdlibm.h: File does not exist.
  
  Om maggie46 this doesn’t happen, the train.py script runs fine.
  
  Reply
  - davidtob
    
    February 9, 2014 at 21:40
    
    Maybe the reason it doesn’t happen on maggie46 is that on that machine amdlibm.h exists under /opt/lisa/os/include/, while on the brams cluster machine I’m assigned by jobdispatch it does not.
  - davidwf
    
    February 10, 2014 at 13:04
    
    As far as home directories, yes, I believe this is to make sure that the home directory server is not brought down by cluster jobs hammering it.
  - davidtob
    
    February 10, 2014 at 13:31
    
    For reference, I asked Fred about this problem, and the reason it wasn’t working was that I hadn’t run
    “if [ -e “/opt/lisa/os/.local.bashrc” ];then source /opt/lisa/os/.local.bashrc; else source /data/lisa/data/local_export/.local.bashrc; fi”
    to set the theano configuation on my cluster interactive instance.
    
    This hadn’t run, in turn, because I didn’t have access to my home directory (the idea is to put the above in your ~/..bashrc file). To make home directory work on the cluster you may have to run the kinit command (and the “source ~/..bashrc” to configure theano, if you put the above shell code in that file).
Alexis Tremblay

March 1, 2014 at 11:06

Could someone confirm that /data is empty on elisa and maggie46? I dispatched a job last night on maggie, my script being under /data/lisatmp/ift6266h14/trembal, but now /data is empty on every server I checked. As if everything was deleted.

Reply
- Alexis Tremblay
  
  March 1, 2014 at 11:07
  
  Just want to make sure it’s not only me.
  
  Reply
João Felipe Santos

March 11, 2014 at 11:16

When using the Condor cluster to run my tasks, I am not able to run tasks on GPU using Theano. I switched the configuration to CPU but now all my simulation returns a MemoryError every time I try to run it. The same simulation runs perfectly on a computer with 8 GB of RAM. Is it possible that the task is being scheduled to a computer with less RAM and that’s why this is happening?

Reply
- davidwf
  
  March 12, 2014 at 13:21
  
  Hi Joao,
  Are you requesting the amount of memory? If not, --mem=8G should do it. jobdispatch --help may also tell you other useful things (--env may be particularly useful).
  
  Reply
  - João Felipe Santos
    
    March 12, 2014 at 18:47
    
    Thanks David. In the examples I’ve only seen the –gpu option and I didn’t check the help options.

	Yoshua Bengio on Apr10 – Exam
	Amjad Almahairi on Apr10 – Exam
	Amjad Almahairi on Apr10 – Exam
	Yoshua Bengio on Apr10 – Exam
	Jean-Philippe Raymon… on Mar13

Representation Learning – ift6266h14

Yoshua Bengio's graduate class on representation learning and deep learning

Resources

41 thoughts on “Resources”

Leave a reply to João Felipe Santos Cancel reply

Share this:

41 thoughts on “Resources”

Leave a reply to João Felipe Santos Cancel reply