最简单图片数字验证码的识别(最初级)
验证码识别用途
1、抓取他人网站(京东的价格信息,点评的用户评论)的数据,干一些不耻的勾当
2、12306.cn,火车票订票刷票也必须用得着。
本文只涉及类似如下的验证码图片
, 此类验证码字体和字体颜色不变化,字符的坐标位置也是固定的,虽然背景色会有所变化。
步骤:
1.获取验证码图片
2.灰度化与二值化图片
3.建立特征库(这里使用每个像素二值化后的0,1字符串)
4.切割图片
5.使用特征库来识别图片
以下代码为java+httpclient
1、获取验证码图片(先抓下来100幅图片再说)
HttpClient client = new HttpClient();
//公司内网代理设置,只允许浏览器通过
client.getHostConfiguration().setProxy( "192.168.2.96", 3128);
List<Header> headers = new ArrayList<Header>();
headers.add( new Header("User-Agent" ,
"Mozilla/4.0 (compatible; MSIE 7.0; Windows NT 5.1)"));
client.getHostConfiguration().getParams()
.setParameter( "http.default-headers", headers);
for (int i = 0; i < 100; i++) {
GetMethod get = new GetMethod(
"http://www.****.com/getCheckImage" );
client. executeMethod(get);
File storeFile = new File("D:\\workspace\\LoadImage\\images\\"
+ String. valueOf(i) + ".jpg");
FileOutputStream output = new FileOutputStream(storeFile);
// 得到网络资源的字节数组,并写入文件
output.write(get.getResponseBody());
output.close();
}
2、灰度化与二值化图片
彩色图片:有一个一个的像素点组成,每个像素点有RGB三个值代表其颜色
灰度图:每个像素点的R=G=B,三个值相等
彩色图像灰度化有两种方法效果较好:加权平均,平均值
由于人眼对绿色的敏感最高,对蓝色敏感最低,因此,按下式对RGB三分量进行加权平均能得到较合理的灰度图像。
其中加权平均为 f(i,j)=0.299R(i,j)+ 0.587G(i,j) + 0.114B(i,j)。
该验证码图片为18*45,高度18像素,宽度45像素
读取图片每个像素的RGB值,并使用加权法灰度化
BufferedImage image = toBufferedImage(new ImageIcon("c:\a.jpg").getImage());
int height = image.getHeight();
int width = image.getWidth();
for (int y=0;y<height;y++) {
for (int x=0;x<width;x++) {
Color color = new Color(image.getRGB(x, y));
//灰度化
double greyvalue = 0.299*color.getRed()+0.587*color.getGreen()+0.114*color.getBlue();
// 在缓冲区返回图片的内容
public static BufferedImage toBufferedImage(Image image) {
if (image instanceof BufferedImage) {
return (BufferedImage) image;
}
// Determine if the image has transparent pixels; for this method's
// implementation, see e661 Determining If an Image Has Transparent Pixels
boolean hasAlpha = hasAlpha(image);
// Create a buffered image with a format that's compatible with the screen
BufferedImage bimage = null;
GraphicsEnvironment ge = GraphicsEnvironment
. getLocalGraphicsEnvironment();
try {
// Determine the type of transparency of the new buffered image
int transparency = Transparency.OPAQUE;
if (hasAlpha) {
transparency = Transparency. BITMASK;
}
// Create the buffered image
GraphicsDevice gs = ge.getDefaultScreenDevice();
GraphicsConfiguration gc = gs.getDefaultConfiguration();
bimage = gc.createCompatibleImage(image.getWidth(null), image.getHeight(null), transparency);
} catch (HeadlessException e) {
// The system does not have a screen
}
if (bimage == null) {
// Create a buffered image using the default color model
int type = BufferedImage.TYPE_INT_RGB;
if (hasAlpha) {
type = BufferedImage. TYPE_INT_ARGB;
}
bimage = new BufferedImage(image.getWidth(null), image.getHeight( null), type);
}
// Copy image to buffered image
Graphics g = bimage.createGraphics();
// Paint the image onto the buffered image
g.drawImage(image, 0, 0, null);
g.dispose();
return bimage;
}
上面验证码图片灰度化后的部分数据,红色显示的即为图片中是数字”1″,可以发现一些规律,有意义的数字部分,灰度化后的值小于150,而其他部分大于150,所以二值化就非常简单,大于150的像素赋0,而小于的赋值为1.
若不显示0,则二值化后为
3.建立特征库(这里使用每个像素二值化后的0,1字符串)
因该验证码字体和大小相同,所以图片中四个数字的位置固定不变,获取图片的宽度和高度后,直接从固定的像素开始读取即可。
for (int i=0;i<4;i++){
for(int row=4;row<14;row++){
for(int col=5+i*10;col<12+i*10;col++){
根据0-9十个数字的二值化结果建立特征库如下(每个数字占用10*7像素)。
static int zimo[][]={//10,70,从上往下依次为0,1,2,3,其他略
{0,0,1,1,1,0,0,0,1,0,0,0,1,0,1,0,0,0,0,0,1,1,0,0,0,0,0,1,1,0,0,0,0,0,1,1,
0,0,0,0,0,1,1,0,0,0,0,0,1,1,0,0,0,0,0,1,0,1,0,0,0,1,0,0,0,1,1,1,0,0},
{0,0,1,1,0,0,0,0,1,0,1,0,0,0,0,0,0,1,0,0,0,0,0,0,1,0,0,0,0,0,0,1,0,0,0,0,
0,0,1,0,0,0,0,0,0,1,0,0,0,0,0,0,1,0,0,0,0,0,0,1,0,0,0,0,1,1,1,1,1,0},
{0,1,1,1,1,0,0,1,0,0,0,0,1,0,0,0,0,0,0,1,0,0,0,0,0,0,1,0,0,0,0,0,1,0,0,
0,0,0,1,0,0,0,0,0,1,0,0,0,0,0,1,0,0,0,0,0,1,0,0,0,0,0,0,1,1,1,1,1,1,0},
};
4.切割图片
略,该步骤对于最简单的此类验证码可忽略。因为数字的位置都固定不变,直接从固定的像素开始读取即可。
5.使用特征库来识别图片
对于新的验证码,获取其数字(7*10)的rgb值,灰度化和二值化后,与特征码进行比较,哪个差异最小,则识别为该数字
private static int getMatchNum(int[] pix) {
int result = -1;
int temp = 100;
int x;
for (int k = 0; k <= 9; k++) {
x = 0;
for (int i = 0; i < 70; i++) {
x = x + Math. abs(pix[i] - zimo[k][i]);
}
if(x == 0){
result = k;
break;
} else if (x < temp){
temp = x;
result = k;
}
}
return result;
}



