[lug] Question about Machine Learning

Fri Mar 12 17:16:48 MST 2010

Folks,
yesterday Tom was asking me (should I say pressing?) about my claim
that theoretically learning and non-learning are equivalent, but do
have implication in practice. Since he wasn't convinced, after the
talk he asked me if I could find any paper or publication about it. I
tried a little bit, starting from my references, but I couldn't.
Now, this argument clearly has two sides:

1 - proof that any learning system can be coded as a non-learning
algorithm. This is trivial, since you can just hard-code in an
algorithm the things that the machine learnt and you are done.

2 - the vice versa is more tricky and Tom's example about prime
numbers is a good example of an algorithm that's guaranteed to produce
the right results even for an unbounded input. The point here is that
ML provides a framework in which the "target functions" can be
discovered automatically (by target function one means the function
that given the input provides the output, for example given 234
provides the 234th prime number - of course such a function may not
have any analytical expression).
It is possible to demonstrate that given the particular problem one is
interested in, and the desired accuracy, there is a number of examples
needed to learn the target function with that accuracy (this number of
examples can be infinite for certain problems).
This is a theorem about PAC
http://en.wikipedia.org/wiki/Probably_approximately_correct_learning
(the wikipedia page is pretty thin and does not say enough on the
subject). The property of the problem is the Vapnik-Chervonenkis
dimension, and the number of required examples is infinite if and only
if the VC-dim is infinite.
What I don't know (at this time) is wether or not VC-dim infinite
implies something like NP-hard or other forms of
practically-you-can't-do-that-with-an-algorithm-either. If so, my
statement of theoretically equivalence is correct. Otherwise, my
statement is incorrect, that is, some algorithms cannot be learnt by
example.
I'm stopping here, but I encourage whoever is interested in doing some
homework with these keywords and then getting back to me for further
discussion onlist or offlist.

Thanks everybody for listening and asking interesting questions. The
slides should be posted online soon, and they include several links
for practical uses of ML.

Davide
PS: Jeffrey or somebody who has access to the list of subscribers: can
you please be sure that Tom receives this message? if not, can you
please forward it to him (I don't have his email). Thanks!