123
 123

Tip: 看不到本站引用 Flickr 的图片? 下载 Firefox Access Flickr 插件 | AD: 订阅 DBA notes --

2010-03-05 Fri

18:15 简化SQLULDR2的命令行选项设置 (5435 Bytes) » AnySQL.net

    看到Kamus对SQLULDR2的留言后, 破有感触. 人们应当比较关注, 他们想要的功能用起来方便是否, 关键并不在于功能的多少. 而SQLULDR2的众多的命令行选项, 也确实有些让人发晕, 包括我自已.

    为了方便大多数人使用, 简化了SQLULDR2的命令行帮助, 简化到如下所示.

SQL*UnLoader: Fast Oracle Text Unloader (GZIP), Release 3.0.1
(@) Copyright Lou Fangxin (AnySQL.net) 2004 - 2010, all rights reserved.

Usage: SQLULDR2 keyword=value [,keyword=value,...]

Valid Keywords:
  user    = username/password@tnsname
  sql    = SQL file name
  query  = select statement
  field  = separator string between fields
  record  = separator string between records
  read    = set DB_FILE_MULTIBLOCK_READ_COUNT at session level
  sort    = set SORT_AREA_SIZE at session level (UNIT:MB)
  hash    = set HASH_AREA_SIZE at session level (UNIT:MB)
  array  = array fetch size
  rows    = print progress for every given rows (default, 1000000)
  file    = output file name(default: uldrdata.txt)
  log    = log file name, prefix with + to append mode
  text    = output type (MYSQL, CSV, MYSQLINS, ORACLEINS, FORM, SEARCH).
  parfile = read command option from parameter file

  for field and record, you can use '0x' to specify hex character code,
  \r=0x0d \n=0x0a |=0x7c ,=0x2c, \t=0x09, :=0x3a, #=0x23, "=0x22 '=0x27

    对于专家而言, 可以用如下方式得到以前全部的命令行选项.

sqluldr2 help=yes

    通过引入一个TEXT选项, 来针对不同格式的导出进行相关选项的设置, 不仅方便了大家使用, 也可以对SQLULDR2的功能有一个很直接的了解, 例如SQLULDR2可以导出数据给MySQL用, 或导出成Excel可以打开的标准CSV文件, 或是生成MySQL和ORACLE上的INSERT语句, 也可以按列显示记录, 或为一些特殊的搜索程序生成数据源.

    再次感谢Kamus的好建议, 今年是支付宝的用户体验年, 应当从用户角度进行反思.

Relative Posts:

12:16 Blogrotate #18: The Weekly Roundup of News for System Administrators (7433 Bytes) » The Pythian Blog

Is it Friday already? Where does the time go? Lots of stuff going on this week–here’s a few of the things that I found interesting.

Operating Systems

Russia Today-TV announced the existence of “Red Star”, the new OS developed in North Korea and based on Linux. I found this by way of Slashdot of course, citing the source as The Korea herald. According to the article it looks very much like the Windows UI, and features a “My Country” icon that allows connection to Korea’s closed internet-like network and the Woori office application. Slightly more information can be found there in the article N. Korea develops own OS.

The upcoming Ubuntu 10.04 (Lucid Lynx) has had the third alpha version released. The alpha versions are not for the faint of heart, and I have had to put in several bug reports myself already. It’s just part of the process though and will hopefully result in a rock solid release next month. The full release announcement can be found on the mailing list: see Lucid Alpha 3 released. A rundown of some of the things you can expect is on The H Online in Canonical releases Ubuntu 10.04 LTS Alpha 3.

Ars Technica has more on an item mentioned last week about the future of Open Solaris. OpenSolaris not dead; might not get all new Solaris features The exact details of which features may be excluded are not clear from the article but I did not get a chance to dig into some of the source material referenced so you may find more by clicking through them.

Internet

Opera 10.5 is out and Ars Technica has a detailed review of the new features, performance improvements and UI changes. There’s a lot of detail in the article, I suggest you read it for yourself. Hands on: Opera 10.50 makes impressive performance gains is the place to go. I think I’ll give the new Opera a try myself.

Local company makes good! I mentioned last week the takedown of the Waledac botnet, this week the Mariposa botnet suffered a similar fate. Ottawa’s own Defence Intelligence detected this botnet in 2009 and started an investigation that ended with the dismantling of the botnet by Spanish authorities. This is a much more important takedown than Waledac because Mariposa actually put sensitive and financial information at risk. You can read more of this story in Spanish arrests mark the end of dangerous botnet. Ottawa, represent!

Virtualization

Microsoft has released new drivers to improve the performance of Red Hat Enterprise Linux systems running under Hyper-V. The new drivers include new network and storage drivers, but still only support a single virtual CPU. These drivers are not officially included in, or supported by Red Hat, but they are fully support by Microsoft. Read Microsoft releases Hyper-V drivers for RHEL for more, and you can check out Microsoft’s side of the story on the technet blog in Red Hat Enterprise Linux and Hyper-V.

How Server Virtualization Impacts Storage. The title says it all. Penned by Heidi Biggar from Hitachi it has a good perspective on current industry trends and growth. We all know that consolidation methods, like virtualization, add bottlenecks on resources like CPU, memory and storage because they are all shared by many instances. Find out what Heidi has to say on the subject.

Data Center

Data Center Knowledge is reporting that the demand for data centers has been growing continually and no change is in sight for 2010. We’re certainly seeing the trend here with many companies making the push for full, real disaster recovery sites. There seems to be a real shortage of data center space and new ones are popping up slowly, so if you have lots of cash this may be a good business to get into. But I digress. The full story is in Strong Data Center Demand Seen for 2010.

Security

The H Online has a good article about the US government’s new cyber security policy. I haven’t had a chance to review the whole document as yet, but I have it bookmarked for later reading. US government publishes parts of its cyber security directive is where to go, which also links the source document.

Hardware

Tom’s Hardware has an interesting Op-Ed piece on the past, present and future of the 3D graphics industry. See Opinion: AMD, Intel, And Nvidia In The Next Ten Years for 10 pages of goodness.

Allyn Malventano has a very detailed review of Western Digital’s first foray into the SSD market. The new 256GB offering performed well through the tests but is still priced a bit too high for the market. The full review is at PC Perspective, see Western Digital SiliconEdge Blue 256GB SSD Review – WD enters the SSD market. You can find the Western Digital product information page on their site.

Well that about wraps it up for this edition. Comments are always welcome. Until next time, may your days be productive and the levels of administratium be non-toxic.

09:26 Advert – ODTUG (1 Bytes) » Oracle Scratchpad
09:22 Log Buffer #181: a Carnival of the Vanities for DBAs (1060 Bytes) » The Pythian Blog

The 181st edition of Log Buffer has been published by Gary Myers on his Sydney Oracle Lab.

Having recently moved his blog, Gary approached the Log Buffer coordinator to volunteer for an edition because he knows that, with LB being a popular and established destination in the database blogoshphere, it would help him broadcast his new blog and welcome readers to it. You can do it too–simply send an email to the Log Buffer coordinator.

Here’s Gary’s Log Buffer #181.

08:52 On the Perils of Importing Remote Tags in Git (3738 Bytes) » The Pythian Blog

So mothers keep your hackers at home
Don’t let them journey all alone
Tell them this world is full of danger
And to shun the repositories of strangers

– The Tag Set of Strangers,
(with apologies to) Nick Cave and the Bad Seeds

One of the things I love about Git is how I can add branches from remote repositories in mine at will without fearing messing up anything. The remote branches will not clash with mine, even if they share the same names, because they are referenced as repository/branch. However, as for anything else, you can still poke yourself in the eye if you try hard enough:

$ git remote
bob

$ git checkout -b bob/baroque
Switched to a new branch 'bob/baroque'

$ git fetch bob
From ../bob
* [new branch]      baroque    -> bob/baroque

$ git checkout bob/baroque
warning: refname 'bob/baroque' is ambiguous.
Already on 'bob/baroque'

Here, I created a local branch called bob/baroque, which will end up having the same name as the branch baroque imported from Bob’s repository. Confusing, but not the end of the world. I can still see the different branches with gitk and access the local and remote branches via git checkout remotes/bob/baroque and git checkout remotes/bob/baroque.[1] The lesson to take from this, of course, is simply not to use slashes in branch names and sidestep the whole issue.

[1] The fiendish-minded reader probably wonders at this point what would happen if I were to create another local branch called remotes/bob/baroque. I would deserve to be shot, that’s what would happen.

With tags, surprisingly, the matter is much more prickly. Not only are tags not kept in the ”namespace” of their repository of origin, but git fetch has a very dangerous default behavior:

$ git log -n 1 somework
commit 483d008c6207554236232fef4e8cd22cfb4b9bb8
Author: Yanick Champoux <yanick@babyl.dyndns.org>
Date:   Wed Mar 3 21:14:43 2010 -0500

    some work on my repo

$ git fetch --tags bob
From ../b
- [tag update]      somework   -> somework

$ git log -n 1 somework
commit 5f7f8eddd2d44e359fe8bc0d1a2f1642d073cad9
Author: Yanick Champoux <yanick@babyl.dyndns.org>
Date:   Wed Mar 3 21:15:25 2010 -0500

    some work from Bob

Yup, that’s right, if there is a conflict, fetch --tags will silently clobber the local tags with their remote counterparts. Hope you remember all the commits you painstakingly tagged in the last six months… This behavior is so mind-bogglingly dangerous that, to this day, I wonder if I’m not missing something obvious.

Now, granted, it’s fairly rare to import tags from remote repositories. But there are instances, like when adding the gitpan history of your module to your repository, where it’s relevant. In those cases, for your own sake, make sure that the remote tags won’t clash with yours. Even better, don’t use --tags at all. Instead, do something akin to:

$ git ls-remote --tags bob | \
   perl -nae'$F[1]=~s#refs/tags#bob#; `git tag $F[1] $F[0]`'

This will name remote tags using the repository/tag convention. Also, if the tag already exist, Git will complain and it won’t be clobbered by the new version.

08:37 Offline and Production Notes on MongoDB (6085 Bytes) » MyNoSQL

Last week has featured two of the most interesting posts about MongoDB: first coming from Mathias Meyer (@roidrage) ☞ offline investigation of MongoDB and the second, a set of notes from running MongoDB in production published on ☞ Boxed Iced blog.

If you are interested in getting started with MongoDB, I’d encourage you take the time to go through Mathias’ post which covers the following aspects (I’ve also included a couple of comments)

  • collections and capped collections

    Note: I couldn’t really understand the usage of namespaces and the implication on indexes

  • data format
  • references

    Note: I’d also strongly suggest taking a look at MongoDB documentation on ☞ schema design for more details

  • indexes
  • updates
  • querying
  • durability

    Note: we have covered before the MongoDB durability tradeoff in much detail

  • replication
  • caching
  • backup
  • storage
  • concurrency

    Note: I’d really appreciate more details on this topic as it is not completely clear if all access (both read and writes) is serialized or just writes are serialized (or not?); also the impact on indexes is not clear either.

  • memory
  • GridFS
  • protocol access

    Note: We argued before that access protocols are extremely important. MongoDB is one of the NoSQL solutions using a proprietary protocols and tries to “compensate” for that with tons of libraries

  • sharding

    Note: probably biased, but I still wait for the moment MongoDB sharding would become at least beta.

MapReduce support seems to be missing from Mathias notes, but luckily we have that covered for you: MongoDB MapReduce tutorial.

While keeping in mind that some of these features are not unique to MongoDB and can be found in other systems, you should be ready to cross check your app requirements with the lessons learned by the guys at Boxed Ice:

  • namespace limits

    We split our customers across (currently) 3 MongoDB databases because there is a namespace limit of 24,000 per database. This is essentially the number of collections + number of indexes.

  • initial sync/replication of large databases

    Our databases are very large and it takes about 48-72 hours to fully sync all our current data onto a new slave in a different DC (via a site-to-site VPN for security). During this time you’re at risk because the slave is not up to date.

  • initial sync “slows” things

    When doing a fresh sync from a master to a slave, we have observed a “slowdown” in our application response times.

  • index creation blocks

    However, if you have an existing collection and create a new index on it then that process will block the database until the index is created.

  • efficiency of reclaiming diskspace

    We have found that there is a massive discrepancy between a master and a freshly copied slave.

Even if not every application will have to deal with the size Boxed Ice is dealing, I couldn’t stop noticing that parts of the process of scaling MongoDB were really painful. Or as Sergio Bossa put it in ☞ one of the comments:

Anyways, it seems indeed you had almost the same problems you would had with a MySQL solution:

  • Huge data to deal with.
  • Manual sharding.
  • Sync/replication delays.

So why didn’t you evaluate to switch to a more “large-scale” nosql solution like Cassandra or Riak?



Last but not least, drop me a note if you are planning to use or already using MongoDB in production and you’d like to share your experience with the NoSQL community

07:55 日志文件切换 (3286 Bytes) » dbthink

本文主要翻译自Jonathan Lewis的Blog Entry Log File Switch, 最后关于checkpoint部分,我添加了部分我自己的注释(以斜体字显示).

日志文件切换

几天前,我发布一个简单的查询语句,它通过查询v$log_history视图展示日志文件切换之间的时间间隔.如果你运行这个脚本,并且认为你的系统的日志文件切换频率不合适,那么你该如何处理呢?
如果时间间隔太短(由于日志文件频繁切换引发的checkpoint动作可能会导致DBWR进程过于活跃),可以通过新增新的更大的日志文件并删除旧的日志文件来解决.
如果切换的时间间隔太久,或者是时间间隔波动太大,而你希望使日志切换更加有规律,那么可以使用参数archive_lag_target(在Oracle 9i中引入)来为日志文件切换设置一个超时值(单位为秒).如果日志文件没有在指定的时间间隔内发生切换,那么系统会强制其切换日志,并且归档对应的已经使用的在线重做日志.
在此之前,人们一般使用类似于cron或者dbms_job一类的工具来实现(通过执行一条”alter system switch logfile“命令).从Oracle 9i开始,利用这个参数来实现就简单多了.

附录

  • Instance Recovery Checkpoint
  • 实例恢复触发的检查点

  • Media Recovery Checkpoint
  • 介质恢复触发的检查点

  • Thread Checkpoint
  • 单个instance关闭以及log file switch触发的检查点

  • Interval Checkpoint
  • log_checkpoint_timeout参数超时触发的检查点

  • Tablespace Checkpoint
  • 表空间online/offline/read only/read write操作对应的检查点

  • PQ Tablespace Checkpoint
  • 不清楚具体原因

  • Close Database Checkpoint
  • Incremental Checkpoint
  • 常规的检查点动作

  • Local Database Checkpoint
  • 对应于alter system checkpoint

  • Global Database Checkpoint
  • 对应于alter system checkpoint all命令的检查点.

  • Object Reuse Checkpoint
  • 对应于truncate table操作的检查点

  • Object Checkpoint
  • 对应于drop table的检查点操作

Oracle支持多种不同类型的checkpoint,这个列表会随着Oracle版本的不同而有所不同.Oracle 10.2.0.3的对应的checkpoint看似有如下这些:
在这些中间,我认为日志文件切换时引发的checkpoint可能是ThreadCheckpoint.

No related posts.

07:14 Video: Will Leinweber: Relaxing with CouchDB (1418 Bytes) » MyNoSQL

If we never have enough intro presentations to MongoDB, why would we have enough CouchDB videos?

Embedded below is a video of Will Leinweber presenting Relaxing with CouchDB (38 minutes)

05:27 Presentation: Intro to MongoDB by Alex Sharp (11048 Bytes) » MyNoSQL

We’ve never got enough introductions to NoSQL systems. Embedded below are the slides from Alex Sharp’s (@ajsharp): Intro to MongoDB presentation. Just to allow you quick overview, you can find below also the text only version.

Text-only version of Intro to MongoDB

  • Slide: 1

    Intro to MongoDB

    Alex Sharp

    twitter: @ajsharp

  • Slide: 2

    So what is MongoDB?

  • Slide: 3

    First and foremost…

  • Slide: 4

    IT’S THE NEW HOTNESS!!!

  • Slide: 5

    omgomgomg

    SHINY OBJECTS

    omgomgomg

  • Slide: 6

    MongoDB (from “humongous”) is a scalable, high-performance, open source, schema-free, document-oriented database.

    - mongodb.org

  • Slide: 7

    Philosophy

  • Slide: 8

    Philosophy

    “One size fits all” approach no longer applies

  • Slide: 9

    Philosophy

    Non-relational DBs scale more easily, especially horizontally

  • Slide: 10

    Philosophy

    Focus on speed, performance, flexibility and scalability

  • Slide: 11

    Philosophy

    Not concerned with transactional stuff and relational semantics

  • Slide: 12

    Philosophy

    DBs should be an on-demand commodity, in a cloud-like fashion

  • Slide: 13

    Philosophy

    Mongo tries to achieve the performance of traditional key-value stores while maintaining functionality of traditional RDBMS

  • Slide: 14

    Features

  • Slide: 15

    Features

    Standard database stuff

  • Slide: 16

    Features

    Standard database stuff

    Indexing

  • Slide: 17

    Features

    Standard database stuff

    Indexing

    replication/failover support

  • Slide: 18

    Features: Document Storage

    Documents are stored in BSON (binary JSON)

  • Slide: 19

    BSON is a binary serialization of JSON-like objects

    Features: Document Storage

  • Slide: 20

    Features: Document Storage

    This is extremely powerful, b/c it means mongo understands JSON natively

  • Slide: 21

    Features: Document Storage

    Any valid JSON can be easily imported and queried

  • Slide: 22

    Features

    Schema-less; very flexible

  • Slide: 23

    Features

    Schema-less; very flexible

    no more blocking ALTER TABLE

  • Slide: 24

    Features

    Auto-sharding (alpha)

  • Slide: 25

    Features

    Makes for easy horizontal scaling

  • Slide: 26

    Features

    Map/Reduce

  • Slide: 27

    Features

    Very, very fast

  • Slide: 28

    Features

    Super easy to install

  • Slide: 29

    Features

    Strong with major languages

  • Slide: 30

    Features

    Document-oriented = flexible

  • Slide: 31

    Features: Querying

    Rich, javascript-based query syntax

  • Slide: 32

    Features: Querying

    Rich, javascript-based query syntax

    Allows us to deep, nested queries

  • Slide: 33

    Features: Querying

    Rich, javascript-based query syntax

    Allows us to do deep, nested queries

    db.order.find( { shipping: { carrier: "usps" } } );

  • Slide: 34

    Features: Querying

    Rich, javascript-based query syntax

    Allows us to deep, nested queries

    db.order.find( { shipping: { carrier: "usps" } } );

    shipping is an embedded document (object)

  • Slide: 35

    Features: Binary Object Store

    Efficient binary large object store via GridFS

  • Slide: 36

    Features: Binary Object Store

    Efficient binary large object store via GridFS

    i.e. store images, videos, anything

  • Slide: 37

    Concepts

  • Slide: 38

    Concepts: Document-oriented

    Think of “documents” as database records

  • Slide: 39

    Concepts: Document-oriented

    Think of “documents” as database records

    Documents are basically just JSON objects that Mongo stores in binary

  • Slide: 40

    Concepts: Document-oriented

    Think of “collections” as database tables

  • Slide: 44

    Concept Mapping

    RDBMS (mysql, postgres)

    Tables

    Records/rows

    Queries return record(s)

    MongoDB

    Collections

    Documents/objects

    Queries return a cursor

     ???

  • Slide: 45

    Concepts: Cursors

    Queries return “cursors” instead of collections

  • Slide: 46

    Concepts: Cursors

    Queries return “cursors” instead of collections

    A cursor allows you to iterate through the result set

  • Slide: 47

    Concepts: Cursors

    Queries return “cursors” instead of collections

    A cursor allows you to iterate through the result set

    A big reason for this is performance

  • Slide: 48

    Concepts: Cursors

    Queries return “cursors” instead of collections

    A cursor allows you to iterate through the result set

    A big reason for this is performance

    Much more efficient than loading all objects into memory

  • Slide: 49

    Concepts: Cursors

    The find() function returns a cursor object

  • Slide: 50

    Concepts: Cursors

    The find() function returns a cursor object

    var cursor = db.logged_requests.find({ 'status_code' : 200 })

    cursor.hasNext() // "true"

    cursor.forEach( function (item) {

    print(tojson(item))

    });

    cursor.hasNext() // "false"

  • Slide: 51

    Cool Features

  • Slide: 52

    Cool Features

    Capped collections

  • Slide: 53

    Cool Features

    Capped collections

    Fixed-sized, limited operation, auto-LRU age-out collections

  • Slide: 54

    Cool Features

    Capped collections

    Fixed-sized, limited operation, auto-LRU age-out collections

    Fixed insertion order

  • Slide: 55

    Cool Features

    Capped collections

    Fixed-sized, limited operation, auto-LRU age-out collections

    Fixed insertion order

    Super fast

  • Slide: 56

    Cool Features

    Capped collections

    Fixed-sized, limited operation, auto-LRU age-out collections

    Fixed insertion order

    Super fast

    Ideal for logging and caching

  • Slide: 57

    Cool Uses

    Data Warehouse

    Mongo understands JSON natively

  • Slide: 58

    Cool Uses

    Data Warehouse

    Mongo understands JSON natively

    Very powerful for analysis

  • Slide: 59

    Cool Uses

    Data Warehouse

    Mongo understands JSON natively

    Very powerful for analysis

    Query a bunch of data from some web service

  • Slide: 60

    Cool Uses

    Data Warehouse

    Mongo understands JSON natively

    Very powerful for analysis

    Query a bunch of data from some web service

    Import into mongo (mongoimport -f filename.json)

  • Slide: 61

    Cool Uses

    Data Warehouse

    Mongo understands JSON natively

    Very powerful for analysis

    Query a bunch of data from some web service

    Import into mongo (mongoimport -f filename.json)

    Analyze to your heart’s content

  • Slide: 62

    Cool Uses

    Harmonyapp.com

    Large rails app for building websites (kind of a CMS)

  • Slide: 63

    Cool Uses

    Hardcore debugging

    Spit out large amounts of data

  • Slide: 64

    Limitations

    Transaction support

  • Slide: 65

    Limitations

    Transaction support

    Relational integrity

  • Slide: 66

    Resources

03:30 NoSQL Ecosystem News 2010-03-04 (1242 Bytes) » MyNoSQL

  1. Would you like to play with a tapir on your couch? If yes, then you have TapirWiki, a basic wiki built on CouchDB
  2. I almost always fall for these geeky things: Tokyo Cabinet API for Clojure
03:19 Cassandra As An Embedded Service (1642 Bytes) » MyNoSQL
Cassandra As An Embedded Service:

With the help of the community I’ve built an embedded cassandra service ideal for unit testing and perhaps other uses. I’ve also built a cleanup utility that helps wipe out all data before the service starts running so the combination of both provides isolation etc. Now each test process runs an in-process, embedded instance of cassandra.

While I do agree with the main goal behind this initiative (i.e. testability), I do think that a stub solution (take for example RockingChair a Ruby stub library for CouchDB) would be a much better fit and would perform a lot better. As the original post mentions, the embedded Cassandra service requires each test to be run in a separate JVM and as you can imagine that will be far from being quick.

01:48 French Translations (1 Bytes) » Oracle Scratchpad

2010-03-04 Thu